-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new archiver for EPA eGRID #549
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think we want those two method PDFs to get added to their respective years, even if we're hard-coding the file names for now (which don't seem to have a clear year pattern). It doesn't look like the NH3 data etc. are getting included in the 2021 file when I run this currently.
Co-authored-by: E. Belfer <[email protected]>
Co-authored-by: E. Belfer <[email protected]>
…ver into epaegrid
async def _download_add_unlink(self, link: str, filename: str, zip_path: str): | ||
"""Download the file, add it to an zip file in the archive and unlink. | ||
|
||
Little helper function because we are doing this same pattern several times | ||
for this dataset within :meth`get_year_resource` because the data is stored | ||
across several pages or have bespoke patterns. | ||
""" | ||
download_path = self.download_directory / filename | ||
await self.download_file(link, download_path) | ||
self.add_to_archive( | ||
zip_path=zip_path, | ||
filename=filename, | ||
blob=download_path.open("rb"), | ||
) | ||
# Don't want to leave multiple files on disk, so delete | ||
# immediately after they're safely stored in the ZIP | ||
download_path.unlink() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bc adding the bespoke get the pm methodology pdfs added a third iteration of this i pulled it up into its own little helper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: Do you want to just add this in helpers.py
? Seems like some version of this also wound up in #534 and maybe it just wants to be a common shared function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay i moved! i assume you meant i should add this into the base class in classes.py
okay @e-belfer if these look good to you i will publish then add the dois and add to the script docs and move this module into the epa dir |
For more information, see https://pre-commit.ci
…ver into epaegrid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 2020 data is still missing a PM methodology PDF: https://www.epa.gov/system/files/documents/2022-12/eGRID2020%20DRAFT%20PM%20Memo.pdf
- 2021 is still missing NH3 and VOC data
Otherwise all looks as expected.
async def _download_add_unlink(self, link: str, filename: str, zip_path: str): | ||
"""Download the file, add it to an zip file in the archive and unlink. | ||
|
||
Little helper function because we are doing this same pattern several times | ||
for this dataset within :meth`get_year_resource` because the data is stored | ||
across several pages or have bespoke patterns. | ||
""" | ||
download_path = self.download_directory / filename | ||
await self.download_file(link, download_path) | ||
self.add_to_archive( | ||
zip_path=zip_path, | ||
filename=filename, | ||
blob=download_path.open("rb"), | ||
) | ||
# Don't want to leave multiple files on disk, so delete | ||
# immediately after they're safely stored in the ZIP | ||
download_path.unlink() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: Do you want to just add this in helpers.py
? Seems like some version of this also wound up in #534 and maybe it just wants to be a common shared function.
good catches! i added in the methodology pdf manually and fixed the pattern to catch the 2021 other emissions new draft archives here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some non-blocking questions, but everything looks ok to me!
pm_combo_years = [2018, 2019, 2020, 2021] | ||
if year in pm_combo_years: | ||
url = "https://www.epa.gov/system/files/documents/2024-06/egrid-draft-pm-emissions.xlsx" | ||
filename = f"epaegrid-{year}-pm-emissions.xlsx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: Do we want to keep draft
in this filename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason many many of the pm files have draft in their names... I decided not to systematically keep that in these names
Overview
Closes #517.
What problem does this address?
What did you change in this PR?
current working sandbox deposition:
https://sandbox.zenodo.org/uploads/159880
https://zenodo.org/uploads/14765659
Question:
Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list
Tasks