Architect jobs for running analyses

Documentation: http://jobarchitect.readthedocs.io
GitHub: https://github.com/JIC-CSB/jobarchitect
PyPI: https://pypi.python.org/pypi/jobarchitect
Free software: MIT License

Overview

This tool is intended to automate generation of scripts to run analysis on data sets. To use it, you will need a data set that has been created (or annotated) with dtool. It aims to help by:

Removing the need to know where specific data items are stored in a data set
Providing a means to split an analyses into several chunks (file based parallelization)
Providing a framework for seamlessly running an analyses inside a container

Design

This project has two main components. The first is a command line tool named sketchjob intended to be used by the end user. It is used to generate scripts defining jobs to be run. The second (_analyse_by_ids) is a command line tool that is used by the scripts generated by sketchjob. The end user is not meant to make use of this second script directly.

Installation

To install the jobarchitect package.

$ cd jobarchitect
$ python setup.py install

Use

The jobarchitect tool only works with "smart" tools. A "smart" tool is a tool that understands dtoolcore datasets, has no positional command line arguments and supports the named arguments --dataset-path, --identifier, --output-directory. The tool should only process the dataset item specified by the identifier and write all output to the specified output directory.

A dtool dataset can be created using dtool. Below is some sample:

$ dtool new dataset
project_name [project_name]:
dataset_name [dataset_name]: example_dataset
...

$ echo "My example data" > example_dataset/data/my_file.txt
$ datatool manifest update example_dataset/

Create an output directory:

$ mkdir output

Then you can generate analysis run scripts with:

$ sketchjob my_smart_tool.py exmaple_dataset output/
#!/bin/bash

_analyse_by_ids \
  --tool_path=my_smart_tool.py \
  --input_dataset_path=example_dataset/ \
  --output_root=output/ \
  290d3f1a902c452ce1c184ed793b1d6b83b59164

Try the script with:

$ sketchjob my_smart_tool.py exmaple_dataset output/ > run.sh
$ bash run.sh
$ cat output/first_image.png
290d3f1a902c452ce1c184ed793b1d6b83b59164  /private/var/folders/hn/crprzwh12kj95plc9jjtxmq82nl2v3/T/tmp_pTfc6/stg02d730c7-17a2-4d06-a017-e59e14cb8885/first_image.png

Use with split

The unix command split is a good way to divide the single large output (that concatenates many command invocations) produced by sketchjob into individual files. For example:

$ split -n 60 many_slurm_scripts.slurm all_slurm_scripts/submit_segment

Working with Docker

Building a Docker image

For the tests to pass, you will need to build an example Docker image, which you do with the provided script:

$ bash build_docker_image.sh

Running code with the Docker backend

By inspecting the script and associcated Docker file, you can get an idea of how to build Docker images that can be used with the jobarchitect Docker backend, e.g:

$ sketchjob scripts/my_smart_tool.py ~/junk/cotyledon_images ~/junk/output --backend=docker --image-name=jicscicomp/jobarchitect
#!/bin/bash

IMAGE_NAME=jicscicomp/jobarchitect
docker run  \
  --rm  \
  -v /Users/olssont/junk/cotyledon_images:/input_dataset:ro  \
  -v /Users/olssont/junk/output:/output  \
  -v /Users/olssont/sandbox/scripts:/scripts:ro \
  $IMAGE_NAME  \
  _analyse_by_ids  \
    --tool_path=/scripts/my_smart_tool.py \
    --input_dataset_path=/input_dataset  \
    --output_root=/output  \
    290d3f1a902c452ce1c184ed793b1d6b83b59164 09648d19e11f0b20e5473594fc278afbede3c9a4

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
deploy/docker		deploy/docker
docs		docs
jobarchitect		jobarchitect
tests		tests
.gitignore		.gitignore
CHANGELOG.rst		CHANGELOG.rst
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.rst		README.rst
build_docker_image.sh		build_docker_image.sh
deploy_singularity_image.sh		deploy_singularity_image.sh
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Architect jobs for running analyses

Overview

Design

Installation

Use

Use with split

Working with Docker

Building a Docker image

Running code with the Docker backend

About

Releases

Packages

Contributors 2

Languages

License

JIC-CSB/jobarchitect

Folders and files

Latest commit

History

Repository files navigation

Architect jobs for running analyses

Overview

Design

Installation

Use

Use with split

Working with Docker

Building a Docker image

Running code with the Docker backend

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages