Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESA_EHST_XMM: retrieve metadata and data from program, TAP: download in streaming, XMM: updated RMF matrices #2910

Merged
merged 10 commits into from
Jan 25, 2024

Conversation

jespinosaar
Copy link
Contributor

@jespinosaar jespinosaar commented Dec 29, 2023

Dear Astroquery team,

This PR aims to improve two modules:

  • TAP: downloads are now executed in streaming mode. This will allow us to download large files without a huge memory consumption.
  • eHST Module: we have included two new methods to retrieve observations and data associated to a specific program.
  • XMM Module: RMF matrices updated to v21.

Thanks a lot in advance for your support!

Kind regards.
Javi
cc @esdc-esac-esa-int

@pep8speaks
Copy link

pep8speaks commented Dec 29, 2023

Hello @jespinosaar! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-01-25 08:10:54 UTC

Copy link

codecov bot commented Dec 29, 2023

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (a725c7a) 67.23% compared to head (f9fea1c) 67.33%.
Report is 38 commits behind head on main.

❗ Current head f9fea1c differs from pull request most recent head 2d7474d. Consider uploading reports for the commit 2d7474d to get more accurate results

Files Patch % Lines
astroquery/utils/tap/core.py 16.66% 5 Missing ⚠️
astroquery/esa/hubble/core.py 93.54% 2 Missing ⚠️
astroquery/esa/xmm_newton/core.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2910      +/-   ##
==========================================
+ Coverage   67.23%   67.33%   +0.09%     
==========================================
  Files         235      237       +2     
  Lines       18155    18163       +8     
==========================================
+ Hits        12207    12230      +23     
+ Misses       5948     5933      -15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jespinosaar
Copy link
Contributor Author

jespinosaar commented Dec 29, 2023

Great! it seems all the issues are now solved. Waiting for your comments.

@jespinosaar jespinosaar changed the title ESA_EHST: retrieve metadata and data from program, TAP: download in streaming ESA_EHST_XMM: retrieve metadata and data from program, TAP: download in streaming, XMM: updated RMF matrices Jan 4, 2024
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jespinosaar!

I left a few comments, but most of them are meant to be a discussion starter rather than a blocker for merging, so please do share your insights. My overall motivation is to eventually converge towards using the same method names throughout the modules for the very similar functionalities, and to avoid over-complicating the API with too many methods that are in fact just calling one other method without much else. But each archive is different enogh that this desire sometimes doesn't make sense.

@@ -63,7 +64,7 @@ def __init__(self, *, tap_handler=None, show_messages=True):
self.get_status_messages()

def download_product(self, observation_id, *, calibration_level=None,
filename=None, verbose=False, product_type=None):
filename=None, folder=os.getcwd(), verbose=False, product_type=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit odd, I would rather default to None that means current directory (I know I'm contradicting myself as normally I advocate for being explicit about defaults in the signature)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replaced os.getcwd with None. For the output_folders, I created a new method so, in case folder is None, only the filename is returned, so it is downloaded in the current directory. If it is provided, then using os.path we get the absolute path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid the change hasn't been committed for this method

CHANGES.rst Outdated
@@ -56,6 +56,11 @@ esa.hubble
- Status and maintenance messages from eHST TAP when the module is instantiated. get_status_messages method to retrieve them. [#2597]
- New methods to download single files ``download_file`` and download FITS associated to an observation ``download_fits_files``. [#2797]
- New function to retrieve all the files associated to an observation. [#2797]
- New methods to retrieve metadata and files associated to a proposal. [#2910]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention the name of the new methods here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New method names included.

@@ -215,6 +219,85 @@ def get_observation_type(self, observation_id):
raise ValueError("Invalid Observation ID")
return obs_type

def get_observations_from_program(self, program, *, output_file=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether there is indeed any need for this convenience method at the price of complicating the API, as all it does is to pass on the parameters to query_criteria (maybe a well written example in the docs would be be sufficient)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have created this method based on feedback from users. They wanted a specific, clearly defined function to retrieve metadata and files from a proposal. This is why we have included it. Is it ok with you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, it sounds good.


Returns
-------
None. It downloads the artifact indicated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see doesn't, it returns a Table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it returns a table. Copy/paste problem! It has been fixed now.

return self.query_criteria(proposal=program, output_file=output_file,
output_format=output_format, async_job=False, verbose=verbose)

def download_files_from_program(self, program, *, folder=os.getcwd(), calibration_level=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using a more generic name, such as download_files or download_products and making program optional? (e.g. allowing the download of any results of query_criteria, as I see the only difference in the signature is that this mandates program to be specified, while it's optional for query_criteria)

Copy link
Contributor Author

@jespinosaar jespinosaar Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, users wanted an explicit method. If it is not ok with you, please let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think it's better to try not to have too many methods that only differ minimally and use more parameters if needed (good for removing code duplications, etc.). However, it's always difficult to say where the line is especially if there is already a user request for a functionality.
So it's all good for now.

@@ -239,7 +322,7 @@ def _get_product_filename(self, product_type, filename):
else:
return f"{filename}.zip"

def get_artifact(self, artifact_id, *, filename=None, verbose=False):
def get_artifact(self, artifact_id, *, filename=None, folder=os.getcwd(), verbose=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here (and everywhere else below, too, maybe use None as default that defaults to current dir

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed and replaced with None.

@@ -39,6 +39,21 @@
__all__ = ['Jwst', 'JwstClass']


def _check_rename_to_gz(filename):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether it would make sense to have some esa utilities somewhere that lets some shared functions to be reused rather than repeated? (certainly not required for this PR, I'm just wondering about it). It could live either under esa/utils or utils/esa. I suppose this, and some of the test mock classes can be dumped in there for sure, but maybe there is more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this from ESA HST and JWST modules and created a new one under esa/utils. It is included in my latest commits.

docs/esa/hubble/hubble.rst Outdated Show resolved Hide resolved
@jespinosaar
Copy link
Contributor Author

Many thanks for your inputs @bsipocz!

I have implemented all the changes and answered all the comments. The objective of the new methods is to offer to our users a explicit method to download data from a program. It is true this can be achieved using other methods, but we wanted to make it easier for users to find it (and they even requested it). But please, let me know if you don't agree with this approach and we will iterate.

Please let me know if further changes are required.

Kind regards,
Javi

P.D. happy new year!

@jespinosaar
Copy link
Contributor Author

Hi @bsipocz , is it ok with you if we keep the new methods? Thanks!

@jespinosaar
Copy link
Contributor Author

Hi all, we need to plan if more effort is required on this PR. Could you please let us know if more changes are required? Thanks in advance!

@bsipocz
Copy link
Member

bsipocz commented Jan 25, 2024

Hi all, we need to plan if more effort is required on this PR.

Sorry for being slow on coming back for the review, it wasn't held up on purpose.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now with one super minor oversight.

Thanks!

@@ -63,7 +64,7 @@ def __init__(self, *, tap_handler=None, show_messages=True):
self.get_status_messages()

def download_product(self, observation_id, *, calibration_level=None,
filename=None, verbose=False, product_type=None):
filename=None, folder=os.getcwd(), verbose=False, product_type=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid the change hasn't been committed for this method

@jespinosaar
Copy link
Contributor Author

This looks good now with one super minor oversight.

Thanks!

Great! I have fixed it and updated the PR (there was another remote test that was failing, due to changes in our DB).

@jespinosaar
Copy link
Contributor Author

Hi all, we need to plan if more effort is required on this PR.

Sorry for being slow on coming back for the review, it wasn't held up on purpose.

Please do not worry! I was just planning our next sprint. And thanks for looking into this.

@bsipocz bsipocz merged commit 110cb83 into astropy:main Jan 25, 2024
7 of 8 checks passed
@jespinosaar
Copy link
Contributor Author

As always, many thanks @bsipocz !!!

@jespinosaar jespinosaar deleted the ESA_ehst-proposal_features branch January 25, 2024 08:23
@jespinosaar
Copy link
Contributor Author

Will a pre-released version be generated?

@bsipocz
Copy link
Member

bsipocz commented Jan 25, 2024

Yeap, I can do that. And I really need to wrap up a few things and get 0.4.7 out of the door and then switch to a better versioning with more frequent releases.

@bsipocz
Copy link
Member

bsipocz commented Jan 25, 2024

https://pypi.org/project/astroquery/0.4.7.dev9064/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants