Skip to content

gtauzin/kedro-dagster

Repository files navigation

Kedro-Dagster

Powered by Kedro Python Version License PyPI Version Run tests and checks Slack Organisation

Important: This package is under active development but is not yet ready for production.

What is Kedro-Dagster?

The Kedro-Dagster plugin enables seamless integration between Kedro, a framework for creating reproducible and maintainable data science code, and Dagster, a data orchestrator for machine learning and data pipelines. This plugin makes use of Dagster's orchestration capabilities to automate and monitor Kedro pipelines effectively.

What are the features of Kedro-Dagster?

  • Dataset Translation: Converts Kedro datasets into Dagster assets and IO managers, facilitating smooth data handling between the two frameworks.
  • Pipeline Translation: Transforms Kedro pipelines into Dagster jobs, enabling their execution and scheduling.
  • Configuration-Driven Execution and Automation: Utilizes Kedro's configuration to specify job executors and define schedules, allowing for flexible and dynamic pipeline management.
  • Hook Support: Preserves Kedro hooks within the Dagster context, ensuring that custom behaviors and plugins are maintained during pipeline execution.
  • Logger Integration: Integrates Kedro's logging with Dagster's logging system, providing unified and comprehensive logging across both platforms.

How to install Kedro-Dagster?

Install the Kedro-Dagster plugin using pip:

pip install kedro-dagster

How to get started with Kedro-Dagster?

  1. Initialize the Plugin in Your Kedro Project:

    Navigate to your Kedro project directory and install the plugin:

    pip install kedro-dagster
  2. Generate Dagster Definitions and Configuration:

    Use the following command to generate a definitions.py file, where all translated Kedro objects are available as Dagster objects, and a dagster.yml configuration file:

    kedro dagster init --env <ENV_NAME>
  3. Configure Jobs, Executors, and Schedules:

    Define your job executors and schedules in the dagster.yml configuration file located in your Kedro project's conf/<ENV_NAME> directory. This file allows you to filter Kedro pipelines and assign specific executors and schedules to them.

    # conf/base/dagster.yml
    schedules:
      my_job_schedule:
        cron_schedule: "0 0 * * *"
    executors:
      my_executor:
         retries: 3
    jobs:
      my_job:
        pipeline:
          pipeline_name: __default__
    
        executor: my_executor
        schedule: my_job_schedule
    
  4. Launch the Dagster UI:

    Start the Dagster UI to monitor and manage your pipelines using the following command:

    kedro dagster dev

How do I use Kedro-Dagster?

The Kedro-Dagster documentation will be available soon, stay tuned!

Can I contribute?

Yes! We welcome all kinds of contributions. Check out our [guide to contributing](https://github.com/kedro-org/kedro/wiki/Contribute-to-Kedro.

Where can I learn more?

There is a growing community around the Kedro project and we encourage you to become part of it. To ask and answer technical questions on the Kedro Slack and bookmark the Linen archive of past discussions. For questions related specifically to Kedro-Dagster, you can also open a discussion.

License

This project is licensed under the terms of the Apache 2.0 License.

Acknowledgements

This plugin is inspired by existing Kedro plugins such as the official Kedro plugins, kedro-kubeflow, kedro-mlflow.