Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

zincware/dask4dvc

Repository files navigation

Note

The usage of dask and distributed and the task to implement dvc experiments made this project very convoluted. It will no longer be maintained: checkout https://github.com/zincware/paraffin for a simpler version instead.

Coverage Status pre-commit.ci status PyTest PyPI version zincware

Dask4DVC - Distributed Node Execution

DVC provides tools for building and executing the computational graph locally through various methods. The dask4dvc package combines Dask Distributed with DVC to make it easier to use with HPC managers like Slurm.

The dask4dvc repro package will run the DVC graph in parallel where possible. Currently, dask4dvc run will not run stages per experiment sequentially.

⚠️ This is an experimental package not affiliated in any way with iterative or DVC.

Usage

Dask4DVC provides a CLI similar to DVC.

  • dvc repro becomes dask4dvc repro.
  • dvc queue start becomes dask4dvc run

You can follow the progress using dask4dvc <cmd> --dashboard.

SLURM Cluster

You can use dask4dvc easily with a slurm cluster. This requires a running dask scheduler:

from dask_jobqueue import SLURMCluster

cluster = SLURMCluster(
    cores=1, memory='128GB',
    queue="gpu",
    processes=1,
    walltime='8:00:00',
    job_cpu=1,
    job_extra=['-N 1', '--cpus-per-task=1', '--tasks-per-node=64', "--gres=gpu:1"],
    scheduler_options={"port": 31415}
)
cluster.adapt()

with this setup you can then run dask4dvc repro --address 127.0.0.1:31415 on the example port 31415.

You can also use config files with dask4dvc repro --config myconfig.yaml. All dask.distributed Clusters should be supported.

default:
  SGECluster:
    queue: regular
    cores: 10
    memory: 16 GB

dask4dvc repro