- Documentation: http://jobarchitect.readthedocs.io
- GitHub: https://github.com/JIC-CSB/jobarchitect
- PyPI: https://pypi.python.org/pypi/jobarchitect
- Free software: MIT License
This tool is intended to automate generation of scripts to run analysis on data sets. To use it, you will need a data set that has been created (or annotated) with dtool. It aims to help by:
- Removing the need to know where specific data items are stored in a data set
- Providing a means to split an analyses into several chunks (file based parallelization)
- Providing a framework for seamlessly running an analyses inside a container
This project has two main components. The first is a command line tool named
sketchjob
intended to be used by the end user. It is used to generate
scripts defining jobs to be run. The second (_analyse_by_ids
) is a command
line tool that is used by the scripts generated by sketchjob
. The end user
is not meant to make use of this second script directly.
To install the jobarchitect package.
$ cd jobarchitect $ python setup.py install
The jobarchitect
tool only works with "smart" tools.
A "smart" tool is a tool that understands dtoolcore
datasets, has no positional command line arguments and supports the
named arguments --dataset-path
, --identifier
, --output-directory
.
The tool should only process the dataset item specified by the identifier
and write all output to the specified output directory.
A dtool dataset can be created using dtool. Below is some sample:
$ dtool new dataset project_name [project_name]: dataset_name [dataset_name]: example_dataset ... $ echo "My example data" > example_dataset/data/my_file.txt $ datatool manifest update example_dataset/
Create an output directory:
$ mkdir output
Then you can generate analysis run scripts with:
$ sketchjob my_smart_tool.py exmaple_dataset output/ #!/bin/bash _analyse_by_ids \ --tool_path=my_smart_tool.py \ --input_dataset_path=example_dataset/ \ --output_root=output/ \ 290d3f1a902c452ce1c184ed793b1d6b83b59164
Try the script with:
$ sketchjob my_smart_tool.py exmaple_dataset output/ > run.sh $ bash run.sh $ cat output/first_image.png 290d3f1a902c452ce1c184ed793b1d6b83b59164 /private/var/folders/hn/crprzwh12kj95plc9jjtxmq82nl2v3/T/tmp_pTfc6/stg02d730c7-17a2-4d06-a017-e59e14cb8885/first_image.png
The unix command split
is a good way to divide the single large output (that concatenates many command invocations) produced by sketchjob into individual files. For example:
$ split -n 60 many_slurm_scripts.slurm all_slurm_scripts/submit_segment
For the tests to pass, you will need to build an example Docker image, which you do with the provided script:
$ bash build_docker_image.sh
By inspecting the script and associcated Docker file, you can get an idea of how to build Docker images that can be used with the jobarchitect Docker backend, e.g:
$ sketchjob scripts/my_smart_tool.py ~/junk/cotyledon_images ~/junk/output --backend=docker --image-name=jicscicomp/jobarchitect #!/bin/bash IMAGE_NAME=jicscicomp/jobarchitect docker run \ --rm \ -v /Users/olssont/junk/cotyledon_images:/input_dataset:ro \ -v /Users/olssont/junk/output:/output \ -v /Users/olssont/sandbox/scripts:/scripts:ro \ $IMAGE_NAME \ _analyse_by_ids \ --tool_path=/scripts/my_smart_tool.py \ --input_dataset_path=/input_dataset \ --output_root=/output \ 290d3f1a902c452ce1c184ed793b1d6b83b59164 09648d19e11f0b20e5473594fc278afbede3c9a4