Skip to content

Commit

Permalink
wip! workflows as programs (nextstrain run)
Browse files Browse the repository at this point in the history
  • Loading branch information
tsibley committed Mar 5, 2025
1 parent 6143c85 commit e4ff937
Show file tree
Hide file tree
Showing 27 changed files with 1,968 additions and 118 deletions.
10 changes: 7 additions & 3 deletions doc/commands/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ nextstrain

.. code-block:: none
usage: nextstrain [-h] {build,view,deploy,remote,shell,update,setup,check-setup,login,logout,whoami,version,init-shell,authorization,debugger} ...
usage: nextstrain [-h] {run,build,view,deploy,remote,shell,update,setup,check-setup,login,logout,whoami,version,init-shell,authorization,debugger} ...
Nextstrain command-line interface (CLI)
Expand All @@ -41,6 +41,10 @@ commands



.. option:: run

Run pathogen workflow. See :doc:`/commands/run`.

.. option:: build

Run pathogen build. See :doc:`/commands/build`.
Expand All @@ -63,11 +67,11 @@ commands

.. option:: update

Update a runtime. See :doc:`/commands/update`.
Update a pathogen or runtime. See :doc:`/commands/update`.

.. option:: setup

Set up a runtime. See :doc:`/commands/setup`.
Set up a pathogen or runtime. See :doc:`/commands/setup`.

.. option:: check-setup

Expand Down
259 changes: 259 additions & 0 deletions doc/commands/run.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
.. default-role:: literal

.. role:: command-reference(ref)

.. program:: nextstrain run

.. _nextstrain run:

==============
nextstrain run
==============

.. code-block:: none
usage: nextstrain run [options] <pathogen-name>[@<version>] <workflow-name> <analysis-directory> [<target> [<target> [...]]]
nextstrain run --help
Runs a pathogen workflow in a Nextstrain runtime with config and input from an
analysis directory and outputs written to that same directory.

This command focuses on the routine running of existing pathogen workflows
(mainly provided by Nextstrain) using your own configuration, data, and other
supported customizations. Pathogens are initially set up using `nextstrain
setup` and can be updated over time as desired using `nextstrain update`.
Multiple versions of a pathogen may be set up and run independently without
conflict, allowing for comparisons of output across versions. The same
pathogen workflow may also be concurrently run multiple times with separate
analysis directories (i.e. different configs, input data, etc.) without
conflict, allowing for independent outputs and analyses.

Compared to `nextstrain build`, this command is a higher-level interface to
running pathogen workflows that does not require knowledge of Git or management
of pathogen repositories and source code. For now, the `nextstrain build`
command remains more suitable for active authorship and development of
workflows.

All Nextstrain runtimes are supported. For AWS Batch, all runs will detach
after submission and `nextstrain build` must be used to further monitor or
manage the run and download results after completion.

positional arguments
====================



.. option:: <pathogen-name>[@<version>]

The name (and optionally, version) of a previously set up pathogen.
See :command-reference:`nextstrain setup`. If no version is
specified, then the default version (if any) will be used.

Required.

.. option:: <workflow-name>

The name of a workflow for the given pathogen, e.g. typically
``ingest``, ``phylogenetic``, or ``nextclade``.

Available workflows may vary per pathogen (and possibly between
pathogen version). Some pathogens may provide multiple variants or
base configurations of a top-level workflow, e.g. as in
``phylogenetic/mpxv`` and ``phylogenetic/hmpxv1``. Refer to the
pathogen's own documentation for valid workflow names.

Workflow names conventionally correspond directly to directory
paths in the pathogen source, but this may not always be the case.

Required.

.. option:: <analysis-directory>

The path to your analysis directory. The workflow uses this as its
working directory for all local inputs and outputs, including
config files, input data files, resulting output data files, log
files, etc.

We recommend keeping your config files and static input files (e.g.
reference sequences, inclusion/exclusion lists, annotations, etc.)
in a version control system, such as Git, so you can keep track of
changes over time and recover previous versions. When using
version control, dynamic inputs (e.g. downloaded input filefs) and
outputs (e.g. resulting data files, log files, etc.) should
generally be marked as ignored/excluded from tracking, such as via
:file:`.gitignore` for Git.

An empty directory will be automatically created if the given path
does not exist but its parent directory does.

Required.

.. option:: <target>

One or more workflow targets. A target is either a file path
(relative to :option:`<analysis-directory>`) produced by the
workflow or the name of a workflow rule or step.

Available targets will vary per pathogen (and between versions of
pathogens). Refer to the pathogen's own documentation for valid
targets.

Optional.

options
=======



.. option:: --force

Force a rerun of the whole workflow even if everything seems up-to-date.

.. option:: --cpus <count>

Number of CPUs/cores/threads/jobs to utilize at once. Limits containerized (Docker, AWS Batch) workflow runs to this amount. Informs Snakemake's resource scheduler when applicable. Informs the AWS Batch instance size selection. By default, no constraints are placed on how many CPUs are used by a workflow run; workflow runs may use all that are available if they're able to.

.. option:: --memory <quantity>

Amount of memory to make available to the workflow run. Units of b, kb, mb, gb, kib, mib, gib are supported. Limits containerized (Docker, AWS Batch) workflow runs to this amount. Informs Snakemake's resource scheduler when applicable. Informs the AWS Batch instance size selection.

.. option:: --exclude-from-upload <pattern>

Exclude files matching ``<pattern>`` from being uploaded as part of
the remote build. Shell-style advanced globbing is supported, but
be sure to escape wildcards or quote the whole pattern so your
shell doesn't expand them. May be passed more than once.
Currently only supported when also using :option:`--aws-batch`.
Default is to upload the entire pathogen build directory (except
for some ancillary files which are always excluded).

Note that files excluded from upload may still be downloaded from
the remote build, e.g. if they're created by it, and if downloaded
will overwrite the local files. When attaching to the build, use
:option:`nextstrain build --no-download` to avoid downloading any
files or :option:`nextstrain build --exclude-from-download` to
avoid downloading specific files.

Besides basic glob features like single-part wildcards (``*``),
character classes (``[…]``), and brace expansion (``{…, …}``),
several advanced globbing features are also supported: multi-part
wildcards (``**``), extended globbing (``@(…)``, ``+(…)``, etc.),
and negation (``!…``).

Patterns should be relative to the build directory.




.. option:: --help, -h

Show a brief help message of common options and exit

.. option:: --help-all

Show a full help message of all options and exit

runtime selection options
=========================

Select the Nextstrain runtime to use, if the
default is not suitable.

.. option:: --docker

Run commands inside a container image using Docker. (default)

.. option:: --conda

Run commands with access to a fully-managed Conda environment.

.. option:: --singularity

Run commands inside a container image using Singularity.

.. option:: --ambient

Run commands in the ambient environment, outside of any container image or managed environment.

.. option:: --aws-batch

Run commands remotely on AWS Batch inside the Nextstrain container image.

runtime options
===============

Options shared by all runtimes.

.. option:: --env <name>[=<value>]

Set the environment variable ``<name>`` to the value in the current environment (i.e. pass it thru) or to the given ``<value>``. May be specified more than once. Overrides any variables of the same name set via :option:`--envdir`. When this option or :option:`--envdir` is given, the default behaviour of automatically passing thru several "well-known" variables is disabled. The "well-known" variables are ``AUGUR_RECURSION_LIMIT``, ``AUGUR_MINIFY_JSON``, ``AWS_ACCESS_KEY_ID``, ``AWS_SECRET_ACCESS_KEY``, ``AWS_SESSION_TOKEN``, ``ID3C_URL``, ``ID3C_USERNAME``, ``ID3C_PASSWORD``, ``RETHINK_HOST``, and ``RETHINK_AUTH_KEY``. Pass those variables explicitly via :option:`--env` or :option:`--envdir` if you need them in combination with other variables.

.. option:: --envdir <path>

Set environment variables from the envdir at ``<path>``. May be specified more than once. An envdir is a directory containing files describing environment variables. Each filename is used as the variable name. The first line of the contents of each file is used as the variable value. When this option or :option:`--env` is given, the default behaviour of automatically passing thru several "well-known" variables is disabled. Envdirs may also be specified by setting ``NEXTSTRAIN_RUNTIME_ENVDIRS`` in the environment to a ``:``-separated list of paths. See the description of :option:`--env` for more details.

development options
===================

These should generally be unnecessary unless you're developing Nextstrain.

.. option:: --image <image>

Container image name to use for the Nextstrain runtime (default: nextstrain/base for Docker and AWS Batch, docker://nextstrain/base for Singularity)

.. option:: --exec <prog>

Program to run inside the runtime

development options for --docker
================================



.. option:: --augur <dir>

Replace the image's copy of augur with a local copy

.. option:: --auspice <dir>

Replace the image's copy of auspice with a local copy

.. option:: --fauna <dir>

Replace the image's copy of fauna with a local copy

.. option:: --sacra <dir>

Replace the image's copy of sacra with a local copy

.. option:: --docker-arg ...

Additional arguments to pass to `docker run`

development options for --aws-batch
===================================

See <https://docs.nextstrain.org/projects/cli/page/aws-batch>
for more information.

.. option:: --aws-batch-job <name>

Name of the AWS Batch job definition to use

.. option:: --aws-batch-queue <name>

Name of the AWS Batch job queue to use

.. option:: --aws-batch-s3-bucket <name>

Name of the AWS S3 bucket to use as shared storage

.. option:: --aws-batch-cpus <count>

Number of vCPUs to request for job

.. option:: --aws-batch-memory <mebibytes>

Amount of memory in MiB to request for job

41 changes: 32 additions & 9 deletions doc/commands/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,24 @@ nextstrain setup

.. code-block:: none
usage: nextstrain setup [-h] [--dry-run] [--force] [--set-default] <runtime>
usage: nextstrain setup [--dry-run] [--force] [--set-default] <pathogen-name>[@<version>[=<url>]]
nextstrain setup [--dry-run] [--force] [--set-default] <runtime-name>
nextstrain setup --help
Sets up a Nextstrain runtime for use with `nextstrain build`, `nextstrain
view`, etc.
Sets up a Nextstrain pathogen for use with `nextstrain run` or a Nextstrain
runtime for use with `nextstrain run`, `nextstrain build`, `nextstrain view`,
etc.

Only the Conda runtime currently supports automated set up, but this command
may still be used with other runtimes to check an existing (manual) setup and
set the runtime as the default on success.
For pathogens, set up involves downloading a specific version of the pathogen's
Nextstrain workflows. By convention, this download is from Nextstrain's
repositories. More than one version of the same pathogen may be set up and
used independently. This can be useful for comparing analyses across workflow
versions. A default version can be set.

For runtimes, only the Conda runtime currently supports fully-automated set up,
but this command may still be used with other runtimes to check an existing
(manual) setup and set the runtime as the default on success.

Exits with an error code if automated set up fails or if setup checks fail.

Expand All @@ -29,9 +38,23 @@ positional arguments



.. option:: <runtime>
.. option:: <pathogen>|<runtime>

The Nextstrain pathogen or runtime to set up.

A pathogen is usually the plain name of a Nextstrain-maintained
pathogen (e.g. ``measles``), optionally with an ``@<version>``
specifier (e.g. ``measles@v42``). If ``<version>`` is specified in
this case, it must be a tag name (i.e. a release name), development
branch name, or a development commit id.

A pathogen may also be fully-specified as ``<name>@<version>=<url>``
where ``<name>`` and ``<version>`` in this case are (mostly)
arbitrary and ``<url>`` points to a ZIP file containing the
pathogen workflow contents.

A runtime is one of {docker, conda, singularity, ambient, aws-batch}.

The Nextstrain runtime to set up. One of {docker, conda, singularity, ambient, aws-batch}.

options
=======
Expand All @@ -52,5 +75,5 @@ options

.. option:: --set-default

Use the runtime as the default if set up is successful.
Use this pathogen version or runtime as the default if set up is successful.

Loading

0 comments on commit e4ff937

Please sign in to comment.