Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESO: decompressing .fits.Z files is broken #1818

Closed
saimn opened this issue Sep 10, 2020 · 11 comments
Closed

ESO: decompressing .fits.Z files is broken #1818

saimn opened this issue Sep 10, 2020 · 11 comments

Comments

@saimn
Copy link
Contributor

saimn commented Sep 10, 2020

Folowing #1613, and while looking at reading .fits.Z with astropy (astropy/astropy#10714) I was curious to see how astroquery managed those files.

@bsipocz bsipocz added the bug label Sep 10, 2020
@almicol
Copy link

almicol commented Feb 3, 2022

Dear astroquery maintainers, I'd like to know if you have any advise on this issue, as we at ESO are getting several tickets of users with exactly the same problem. Is there any recommendation that I should give to our users? Many thanks in advance!

@keflavich
Copy link
Contributor

@almicol any chance you could provide a minimum working example for us to play with? We can try to find a solution, but I don't have an easy way to reproduce this right now.

@almicol
Copy link

almicol commented Feb 3, 2022

Hi Adam, I cannot reproduce the problem myself; I'll ask some users if they can provide me, or you directly, an example.

@almicol
Copy link

almicol commented Feb 7, 2022

A user (Nicolas Buchschacher) provided this interesting feedback:
`- The first files are downloaded correctly (.fits.Z).

  • After a certain amount of time (~2h) even if the script is still downloading frames, the system logs out automatically (Nicolas really means that the cookie expires, it is not a logout)- The script continues to think it is downloading files, but the content of the .fits.Z files are an html page with the login form.
  • When the script tries to uncompress the files, they are not zip files but html pages (96Ko per file).`

My suggestion would be to change the astroquery.eso module to use token (OIDC) authentication instead. An example of that could be found here:
http://archive.eso.org/programmatic/HOWTO/jupyter/authentication_and_authorisation/
which uses the eso_programmatic.py published here:
http://archive.eso.org/programmatic/HOWTO/jupyter/authentication_and_authorisation/eso_programmatic.py
A token expires after 8 hours, plus one could tell exactly when by examining the token itself.

Most importantly: ESO will decommission the request handler during the course of 2022; the download routine used by astroquery.eso interfaces with the request handler, hence that will soon break entirely.
So, I'd like to ask you the change the download part using the examples in the mentioned notebook, which includes also a programmatic way (via DataLink) to query for the calibration reference files (calSelector).

The only thing not yet available within the new ESO programmatic layer is the ability to query the instrument-specific tables. I hope to have them available in TAP during the year. Once that is done, the entire astroquery.eso module could be re-written (and simplified!) using TAP ADQL queries, DataLink, OIDC, and downloading files directly using the access_url provided in the TAP responses. Would you consider? I'd be happy to help answering any question you might have on this.

Many thanks!

@keflavich
Copy link
Contributor

@almicol we would love to use the more modern interface, but we need some help. Is there any chance ESO could contribute the updated code?

@bsipocz
Copy link
Member

bsipocz commented Feb 7, 2022

@almicol - I can reiterate what Adam said above. it's very unlikely that any of us on the maintainer team will have the bandwidth to reimplement the module. However contributions from ESO would be very much appreciated, and we could provide code review for PRs that are making the change.

@almicol
Copy link

almicol commented Feb 9, 2022

@keflavich @bsipocz Many thanks to both. Indeed, it would be best if we do that. We will probably schedule this activity for the second half of the year, surely before decommissioning the current system. I guess we will interact with you when the time comes. Thanks!

@almicol
Copy link

almicol commented Feb 9, 2022

Unitl we have a new astroquery.eso module in place, would you be able to fix the current issue for our users?
It is due to long requests that end up exceeding the 2 hours limit.

I reproduced that by putting a sleep(10800) in the
for i, fileLink in enumerate(fileLinks, 1):
loop of the retrieve_data method inside eso/core.py.
I also added some debugging print statements that showed me, after exceeding the expiration time,
that indeed the file downloaded is no longer the gzipped data file, but instead is the login html page.
Here the output:


DEBUG: Content-Type=[text/html;charset=UTF-8]
DEBUG:          url=[https://www.eso.org:443/sso/login?service=https%3A%2F%2Fdataportal.eso.org%2FdataPortal%2Flogin%2Fcas]

followed by the obvious error message that ... "OSError: Not a gzipped file"

The def _download_file clearly fails to catch this, as it is trying to match the following:

            if (resp.headers['Content-Type'] == 'text/html;charset=UTF-8' and resp.url.startswith('https://www.eso.org/sso/login')):
                if trials == 1:
                    log.warning("Session expired, trying to re-authenticate")
                    self.login()
                    trials += 1
                else:
                    raise LoginError("Could not authenticate")
            else:
                break

3 things:
the url.startswith is too strictly defined, as one never knows what the internal ESO redirects could return (currently they return also the port number :443); I guess best would be to match just only "sso/login"

the Content-Type match is also too strictly defined as there could be a blank separator (currently not present, but that is not guaranteed in the future); btw if the url matches "sso/login" there is no need to check for the content type;

the trials == 1 would work only the first time the 2h limit is reached, and not if the list of datasets is so long, or the files so heavy, or the network so bad, that the download time exceeds the 4 hours, or the 6 hours, etc.

It would be very nice if you could have a look at this. Many thanks!

@keflavich
Copy link
Contributor

@almicol I have a bit of a hard time reading that; could you edit it to put the code blocks into triple-backticks so I can see which parts are (user-entered) code and which are not?

We could certainly soften some of the type checking - again, a PR would be helpful!

@almicol
Copy link

almicol commented Feb 9, 2022

@keflavich I hope is more readable now

@bsipocz
Copy link
Member

bsipocz commented Dec 11, 2023

Addressed by #2681

@bsipocz bsipocz closed this as completed Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants