You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having issues getting the --group-in-memory flag to actually group nodes.
Context
I'm running kedro v0.19.11 and kedro-airflow v0.9.2, and trying to deploy a simple 2-node test pipeline to our internal Airflow. Using the --group-in-memory flag doesn't seem to be doing anything.
Steps to Reproduce
I have a simple test pipeline with 2 nodes. One fetches a file from a server, converts it to a DataFrame, and outputs as a MemoryDataset. The other node uses that DataFrame, does a simple group by with some stats, and dumps that out to a CSV.
I run kedro airflow create --target-dir=dags/ --env-airflow --group-in-memory to convert the pipeline into an Airflow DAG.
I should note that this is just a simple test to see if I can get kedro working with our Airflow deployment, so the nodes are just simple code snippets for testing purposes.
Expected Result
This could totally just be my misunderstanding this, but I expected those 2 nodes being munged into one task in the DAG (since the output from the first node and input to the second node is the same MemoryDataset).
Actual Result
With or without the --group-in-memory flag, the resulting DAG file always has 2 tasks.
Your Environment
Include as many relevant details about the environment in which you experienced the bug:
Kedro version used (pip show kedro or kedro -V): v0.19.11
Kedro plugin and kedro plugin version used (pip show kedro-airflow): kedro-airflow v0.9.2
Python version used (python -V): 3.11.9
Operating system and version: Windows 10 Enterprise
The text was updated successfully, but these errors were encountered:
I think I might have found the issue? If you look at the kedro_airflow.grouping._is_memory_dataset()function, it always returns False if the dataset name is not in the catalog (there's no check if the dataset is a MemoryDataset).
I think it should be something along the lines of:
Description
I'm having issues getting the
--group-in-memory
flag to actually group nodes.Context
I'm running
kedro v0.19.11
andkedro-airflow v0.9.2
, and trying to deploy a simple 2-node test pipeline to our internal Airflow. Using the--group-in-memory
flag doesn't seem to be doing anything.Steps to Reproduce
MemoryDataset
. The other node uses that DataFrame, does a simple group by with some stats, and dumps that out to a CSV.kedro airflow create --target-dir=dags/ --env-airflow --group-in-memory
to convert the pipeline into an Airflow DAG.I should note that this is just a simple test to see if I can get
kedro
working with our Airflow deployment, so the nodes are just simple code snippets for testing purposes.Expected Result
This could totally just be my misunderstanding this, but I expected those 2 nodes being munged into one task in the DAG (since the output from the first node and input to the second node is the same
MemoryDataset
).Actual Result
With or without the
--group-in-memory
flag, the resulting DAG file always has 2 tasks.Your Environment
Include as many relevant details about the environment in which you experienced the bug:
pip show kedro
orkedro -V
): v0.19.11pip show kedro-airflow
): kedro-airflow v0.9.2python -V
): 3.11.9The text was updated successfully, but these errors were encountered: