A suite to write and run distributed experiments across multiple network nodes.
python3 -m venv venv
source venv/bin/activate
pip install --upgrade https://github.com/mjasny/distexprunner/archive/master.zip
At this stage you can already try the functionality by running locally the following commands in different shells. (e.g. using tmux
)
Start one server instance. If you need multiple on the same machine you need to specify a different port with --port
. In general only one instance is needed because it is capable to run multiple commands and even experiments in parallel without interfering with eachother.
distexp-server -vv
Now a client is ready to connect to the servers and execute experiments.
python client.py -vv examples/
The folder parameter (examples/
) is the search path for new experiments. It scans recursively all .py
files for experiments which are registered with @reg_exp(...)
. You can add multiple folders and also refer directly to a .py
file if you want to run a subset of all experiments.
The order of the arguments is used as an execution order. This might be useful to do compilation jobs beforehand: python client.py -vv compile.py scaleout.py
.
import config
from distexprunner import *
server_list = ServerList(
Server('node01', '127.0.0.1', config.SERVER_PORT, optional_field=True),
)
@reg_exp(servers=server_list)
def ls(servers):
servers[0].run_cmd('ls').wait()
@reg_exp(servers=server_list)
def kill_yes(servers):
for s in servers[lambda s: s.optional_field==True]:
yes_cmd = s.run_cmd('yes > /dev/null')
sleep(3) # not time.sleep()!
yes_cmd.kill()
Experiments are functions decorated with @reg_exp(...)
and are grouped in .py
. The order in which they appear in the file is the same in which they are executed. The function name is used as the experiment name, for parameter grids a suffix is added.
The decorator takes the following arguments:
- servers => ServerList (required)
- params => ParameterGrid (optional)
- max_restarts => int (optional, default unlimited=0)
The ServerList
needs to contain all Servers
which are needed for the experiment. Upon execution it is supplied to the function, servers can be selected via the []
operator using an int-index, the server id or a lambda filter predicate.
A Server
has two mandatory construction arguments:
- id => str (required)
- ip => str (required, can also be a hostname)
- port => int (optional, default 20000)
- **kwargs => dict *(optional, additional attributes)
Before an experiment is run a connection to all Server
in the ServerList
is made and at the end the connection is terminated, which kills all still running on the Server. It is recommended to use a config.py
for configuration parameters shared across different experiment files.
Inside the experiment function commands can be executed on servers using: cmd = server.run_cmd(...)
.
It takes the following arguments:
- cmd => str (required)
- stdout => Console/File (optional, can be a single object or list which is called for each line)
- stderr => Console/File (optional, can be a single object or list which is called for each line)
- env => dict (optional, adds environment variables)
It returns on object cmd = run_cmd(...)
with the following callable methods:
- cmd.wait(block=True) => int
Is a by default blocking call which waits until the spawned process on the server finishes and returns the returncode. If block=False
it can return None
if the process is still running. If the process already finished the returncode is returned immediately.
- cmd.kill() => int
Kills the running process and returns a returncode.
- cmd.stdin(close=False) => None
Feeds a string into stdin of the running command. \n
is needed at the end to simulate an ENTER keypress. If close=True
then the stdin to the process is closed.
If the experiment()
function returns before running commands are terminated they are killed. So it is advised to use .wait()
calls on running commands.
The experiment function can return Action.RESTART
in case some returncodes of previous commands are !=0
to restart the current experiment indefinetly or max_restarts
times.
ParameterGrid
can be used to do parameterized grid executions. The experiment is called for the kartesian product (using itertools.product
) of all parameters (e.g. a
and b
). These named arguments are then given to the experiment function upon runtime. The parameters are also used to add a unique suffix the the experiment name, e.g.: grid__a=4_b=4_to_file=False
.
import config
from distexprunner import *
server_list = ServerList(
Server('node01', '127.0.0.1', config.SERVER_PORT, optional_field=True),
)
parameter_grid = ParameterGrid(
a=range(1, 5),
b=[2, 4],
to_file=[True, False]
)
@reg_exp(servers=server_list, params=parameter_grid)
def grid(servers, a, b, to_file):
for s in servers:
stdout = File('grid.log', append=True)
if not to_file:
stdout = [stdout, Console(fmt=f'{s.id}: %s')]
s.run_cmd(f'echo {a} {b}', stdout=stdout).wait()
This generates the following set of experiments:
experiments = [
grid__a=1_b=2_to_file=True, grid__a=1_b=2_to_file=False,
grid__a=1_b=4_to_file=True, grid__a=1_b=4_to_file=False,
grid__a=2_b=2_to_file=True, grid__a=2_b=2_to_file=False,
grid__a=2_b=4_to_file=True, grid__a=2_b=4_to_file=False,
grid__a=3_b=2_to_file=True, grid__a=3_b=2_to_file=False,
grid__a=3_b=4_to_file=True, grid__a=3_b=4_to_file=False,
grid__a=4_b=2_to_file=True, grid__a=4_b=2_to_file=False,
grid__a=4_b=4_to_file=True, grid__a=4_b=4_to_file=False
]
Examples can be found in examples/.
usage: client.py [-h] [-v] [--resume] [--slack-webhook SLACK_WEBHOOK] experiment [experiment ...]
Distributed Experiment Runner Client Instance
positional arguments:
experiment path to experiments, folders are searched recursively, order is important
optional arguments:
-h, --help show this help message and exit
-v, --verbose -v WARN -vv INFO -vvv DEBUG
--resume Resume execution of experiments from last run
--slack-webhook SLACK_WEBHOOK
Notify to slack when execution finishes
--no-progress Hide progressbar
--log LOG Log into file
experiment
Used to search for experiments--resume
In case of interruption, only runs experiments which are not present in file.distexprunner
--slack-webhook
if supplied a notification is sent to the channel after all experiments are run (see: https://api.slack.com/tutorials/slack-apps-hello-world)--progress
Displays a progressbar, but needs to completely disable logging output. Therefore use in conjuncition with--log
--log
Appends all logging output in addition to stderr into a file.
usage: server.py [-h] [-v] [-ip IP] [-p PORT] [-rf] [-mi MAX_IDLE]
Distributed Experiment Runner Server
optional arguments:
-h, --help show this help message and exit
-v, --verbose -v WARN -vv INFO -vvv DEBUG
-ip IP, --ip IP Listening ip
-p PORT, --port PORT Listening port
-rf, --run-forever Disable auto termination of server
-mi MAX_IDLE, --max-idle MAX_IDLE
Maximum idle time before auto termination (in seconds). Default 1 hour.
-o LOG, --log LOG Log into file
python3 -m venv venv
source venv/bin/activate
pip install -e .
pip install --upgrade autopep8
autopep8 --in-place --recursive .