You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Plugin developers and advanced users face limitations due to the absence of public methods for modifying the catalog datasets, and injecting dynamic behaviour or configuration parameters on the fly during pipeline execution. Although these limitations are made intentionally by not providing corresponding public APIs users bypass them by using private APIs.
We propose to:
Rethink the concept of keeping DataCatalog immutable.
Explore the feasibility of providing public API for modifying the catalog datasets and configuration parameters, enabling users to adapt the pipeline's behaviour in response to changing runtime requirements or environmental conditions.
Users need the ability to view and modify information within the Data Catalog dynamically during pipeline execution. This includes injecting dynamic data or swapping dataset implementations to accommodate varying runtime requirements.
Plugin developers are interested in checking the dataset's type and injecting dynamic behaviour based on that type. They want to determine whether a dataset belongs to a certain class or type and then modify its parameters or behaviour accordingly, such as configuring it based on their environment or integration needs.
There's general agreement that we don't necessarily want to make all mutations of the catalog easy (like crazy injection of datasets in the middle of the lifecycle) but maybe there's more ways we can open up the collection of datasets just before the catalog is first instantiated for the rest of the run.
For interactive use on the other hand, building the DataCatalog in an imperative way seems unnecessary and there are other possibilities we can offer #3612 (comment)
In the new catalog - KedroDataCatalog we implemented dict-like interface and removed _FrozenDatasets as well as access datasets like properties.
The new catalog is partially mutable as it supports a setter which allows adding new or replacing existing datasets.
We also decided with the team to not make catalog fully mutable. The datasets property remained private so as not to encourage behaviour when users configure the catalog via modifying the datasets dictionary. For the same reason KedroDataCatalog will not support all dictionary-specific methods, such as pop(), popitem(), or deletion by key (del).
It is also possible to modify the existing datasets in place as get() method returns a reference to datset object, but we do not recommend this and encourage users to be careful. These changes might affect the pipeline run and lead to unexpected results, as the framework itself doesn't track these kind of changes and does not synchronize them.
Description
Plugin developers and advanced users face limitations due to the absence of public methods for modifying the catalog datasets, and injecting dynamic behaviour or configuration parameters on the fly during pipeline execution. Although these limitations are made intentionally by not providing corresponding public APIs users bypass them by using private APIs.
We propose to:
DataCatalog
immutable.Relates to #2728
Context
https://github.com/Galileo-Galilei/kedro-mlflow/blob/64b8e94e1dafa02d979e7753dab9b9dfd4d7341c/kedro_mlflow/framework/hooks/mlflow_hook.py#L145
https://github.com/getindata/kedro-azureml/blob/d5c2011c7ed7fdc03235bf2bd6701f1901d1139c/kedro_azureml/hooks.py#L20
The text was updated successfully, but these errors were encountered: