-
Notifications
You must be signed in to change notification settings - Fork 923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix catalog._datasets
access for KedroDataCatalog
#4438
Conversation
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add a test to check that kedro catalog list
now works?
Otherwise LGTM 👍
I think we might need to tackle this for all commands together. Now all catalog CLI commands unit tests only run on the old catalog. I tried it, but because of the way catalog CLI tests are organized, I couldn't do it in a reasonable time. I think we need to create a separate ticket to address that, so this one doesn't block the release in case it takes time. |
Sounds good. Can you create the ticket? |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Elena Khaustova <[email protected]>
Description
Solves #4436
Development notes
After we implemented lazy dataset loading for
KedroDataCatalog
, we differed internal attributes_datasets
and_lazy_datasets
and started tracking them separately. At the same time, for oldDataCatalog,
we extensively used the internal_datasets
attribute across the codebase, which caused a compatibility issue as the_datasets
attribute stored a different set of datasets for old (all datasets) and new catalogs (all materialized datasets).This behaviour didn't affect
kedro run
as we did a warm-up before the run, so needed datasets were materialized and there was no discrepancy.We fixed that at the
KedroDataCatalog
level and left TODOs for when we switch to the new catalog completely:__datasets
property forKedroDataCatalog
, so we could still differ materialized and lazy datasets at the catalog levelKedroDataCatalog.datatest
return both materialized and lazy datasets__getattribute__
forKedroDataCatalog
, so if someone callsKedroDataCatalog._datatest
they get the same result asKedroDataCatalog.datatest
. We could add just_datasets
property but for that, we needed to modifyCatalogProtocol
which we preferred to avoid.KedroDataCatalog.datatest
andKedroDataCatalog._datatest
return the same.Note: These changes are temporary and needed for compatibility with the old catalog. They will be removed when migrating to the new catalog; corresponding TODOs were left.
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file