Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to pass another sources dict to DataSource #4003

Merged
merged 2 commits into from
Jan 7, 2025

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Jan 7, 2025

Overview

Addresses a lightweight version of catalyst-cooperative/pudl-archiver#499 in concert with catalyst-cooperative/pudl-archiver#506.

What problem does this address?

In the pudl-archiver repository, we want to make it possible to archive datasets that will never end up in the PUDL repository. Eventually, we'll want to fully separate these two repositories, which raises bigger questions about where the metadata should be defined, etc. But for now, we want to make it possible to add a dictionary of NON_PUDL_SOURCES to the pudl-archiver repository, to be called in a method similar to from_pudl_metadata in frictionless.py.

What did you change?

This method imports the DataSource.from_id() method from PUDL, which currently hardcodes SOURCES into the method. In order to make it possible to provide an alternative dictionary of source metadata into this method, I have added a sources parameter to this method which defaults to SOURCES, but can take a different SOURCE dictionary.

Documentation

Make sure to update relevant aspects of the documentation.

Tasks

Preview Give feedback

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Preview Give feedback

@e-belfer e-belfer self-assigned this Jan 7, 2025
@e-belfer e-belfer added datapkg Frictionless data package input, output, metadata, manipulation metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages. labels Jan 7, 2025
@@ -1035,20 +1035,22 @@ def to_rst(
sys.stdout.write(rendered)

@classmethod
def from_field_namespace(cls, x: str) -> list["DataSource"]:
def from_field_namespace(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't get called in pudl-archiver, so I could revert this change. Just figured for consistency it'd make sense to have the same behavior as below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think consistency here is good

@@ -1285,6 +1287,7 @@ class Resource(PudlMeta):
"pudl",
"nrelatb",
"vcerare",
"phmsagas",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! This crossed over from another branch, I'll remove this.

Copy link
Member

@zschira zschira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable/simple solution to avoid cluttering our main metadata collection

@e-belfer e-belfer marked this pull request as ready for review January 7, 2025 18:01
@e-belfer e-belfer enabled auto-merge January 7, 2025 18:08
@e-belfer e-belfer added this pull request to the merge queue Jan 7, 2025
Merged via the queue into main with commit db823c1 Jan 7, 2025
17 checks passed
@e-belfer e-belfer deleted the generalize-sources-for-data-source branch January 7, 2025 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datapkg Frictionless data package input, output, metadata, manipulation metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants