Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix catalog._datasets access for KedroDataCatalog #4438

Merged
merged 12 commits into from
Jan 27, 2025

Conversation

ElenaKhaustova
Copy link
Contributor

@ElenaKhaustova ElenaKhaustova commented Jan 23, 2025

Description

Solves #4436

Development notes

After we implemented lazy dataset loading for KedroDataCatalog, we differed internal attributes _datasets and _lazy_datasets and started tracking them separately. At the same time, for old DataCatalog, we extensively used the internal _datasets attribute across the codebase, which caused a compatibility issue as the _datasets attribute stored a different set of datasets for old (all datasets) and new catalogs (all materialized datasets).

This behaviour didn't affect kedro run as we did a warm-up before the run, so needed datasets were materialized and there was no discrepancy.

We fixed that at the KedroDataCatalog level and left TODOs for when we switch to the new catalog completely:

  1. Added __datasets property for KedroDataCatalog, so we could still differ materialized and lazy datasets at the catalog level
  2. Made KedroDataCatalog.datatest return both materialized and lazy datasets
  3. Updated __getattribute__ for KedroDataCatalog, so if someone calls KedroDataCatalog._datatest they get the same result as KedroDataCatalog.datatest. We could add just _datasets property but for that, we needed to modify CatalogProtocol which we preferred to avoid.
  4. Added a test to check that when having materialized and lazy datasets both KedroDataCatalog.datatest and KedroDataCatalog._datatest return the same.

Note: These changes are temporary and needed for compatibility with the old catalog. They will be removed when migrating to the new catalog; corresponding TODOs were left.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

@ElenaKhaustova ElenaKhaustova marked this pull request as ready for review January 23, 2025 11:59
@ElenaKhaustova ElenaKhaustova mentioned this pull request Jan 24, 2025
4 tasks
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a test to check that kedro catalog list now works?

Otherwise LGTM 👍

kedro/io/kedro_data_catalog.py Show resolved Hide resolved
@ElenaKhaustova
Copy link
Contributor Author

@merelcht

Perhaps add a test to check that kedro catalog list now works?

I think we might need to tackle this for all commands together. Now all catalog CLI commands unit tests only run on the old catalog.

I tried it, but because of the way catalog CLI tests are organized, I couldn't do it in a reasonable time. I think we need to create a separate ticket to address that, so this one doesn't block the release in case it takes time.

@merelcht
Copy link
Member

I tried it, but because of the way catalog CLI tests are organized, I couldn't do it in a reasonable time. I think we need to create a separate ticket to address that, so this one doesn't block the release in case it takes time.

Sounds good. Can you create the ticket?

@ElenaKhaustova
Copy link
Contributor Author

Sounds good. Can you create the ticket?

#4442

Copy link
Contributor

@lrcouto lrcouto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ElenaKhaustova ElenaKhaustova enabled auto-merge (squash) January 27, 2025 18:48
@ElenaKhaustova ElenaKhaustova merged commit a098829 into main Jan 27, 2025
42 checks passed
@ElenaKhaustova ElenaKhaustova deleted the fix/4436-catalog-list branch January 27, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants