ESO: decompressing .fits.Z files is broken #1818

saimn · 2020-09-10T09:23:28Z

Folowing #1613, and while looking at reading .fits.Z with astropy (astropy/astropy#10714) I was curious to see how astroquery managed those files.

Before remove call to system-wide gunzip. Use python version instead. #1613 it was done by gzip which knows how to decompress .Z files.
But now, Python's gzip cannot decompress .Z files so this is broken. Testing with a file from Get error when trying to download data using astroquery.eso #1580 raises a BadGzipFile: Not a gzipped file (b'\x1f\x9d') exception.

The text was updated successfully, but these errors were encountered:

almicol · 2022-02-03T16:23:28Z

Dear astroquery maintainers, I'd like to know if you have any advise on this issue, as we at ESO are getting several tickets of users with exactly the same problem. Is there any recommendation that I should give to our users? Many thanks in advance!

keflavich · 2022-02-03T16:31:32Z

@almicol any chance you could provide a minimum working example for us to play with? We can try to find a solution, but I don't have an easy way to reproduce this right now.

almicol · 2022-02-03T16:50:55Z

Hi Adam, I cannot reproduce the problem myself; I'll ask some users if they can provide me, or you directly, an example.

almicol · 2022-02-07T11:39:29Z

A user (Nicolas Buchschacher) provided this interesting feedback:
`- The first files are downloaded correctly (.fits.Z).

After a certain amount of time (~2h) even if the script is still downloading frames, the system logs out automatically (Nicolas really means that the cookie expires, it is not a logout)- The script continues to think it is downloading files, but the content of the .fits.Z files are an html page with the login form.
When the script tries to uncompress the files, they are not zip files but html pages (96Ko per file).`

My suggestion would be to change the astroquery.eso module to use token (OIDC) authentication instead. An example of that could be found here:
http://archive.eso.org/programmatic/HOWTO/jupyter/authentication_and_authorisation/
which uses the eso_programmatic.py published here:
http://archive.eso.org/programmatic/HOWTO/jupyter/authentication_and_authorisation/eso_programmatic.py
A token expires after 8 hours, plus one could tell exactly when by examining the token itself.

Most importantly: ESO will decommission the request handler during the course of 2022; the download routine used by astroquery.eso interfaces with the request handler, hence that will soon break entirely.
So, I'd like to ask you the change the download part using the examples in the mentioned notebook, which includes also a programmatic way (via DataLink) to query for the calibration reference files (calSelector).

The only thing not yet available within the new ESO programmatic layer is the ability to query the instrument-specific tables. I hope to have them available in TAP during the year. Once that is done, the entire astroquery.eso module could be re-written (and simplified!) using TAP ADQL queries, DataLink, OIDC, and downloading files directly using the access_url provided in the TAP responses. Would you consider? I'd be happy to help answering any question you might have on this.

Many thanks!

keflavich · 2022-02-07T12:45:35Z

@almicol we would love to use the more modern interface, but we need some help. Is there any chance ESO could contribute the updated code?

bsipocz · 2022-02-07T21:03:12Z

@almicol - I can reiterate what Adam said above. it's very unlikely that any of us on the maintainer team will have the bandwidth to reimplement the module. However contributions from ESO would be very much appreciated, and we could provide code review for PRs that are making the change.

almicol · 2022-02-09T16:09:07Z

@keflavich @bsipocz Many thanks to both. Indeed, it would be best if we do that. We will probably schedule this activity for the second half of the year, surely before decommissioning the current system. I guess we will interact with you when the time comes. Thanks!

almicol · 2022-02-09T16:39:37Z

Unitl we have a new astroquery.eso module in place, would you be able to fix the current issue for our users?
It is due to long requests that end up exceeding the 2 hours limit.

I reproduced that by putting a sleep(10800) in the
for i, fileLink in enumerate(fileLinks, 1):
loop of the retrieve_data method inside eso/core.py.
I also added some debugging print statements that showed me, after exceeding the expiration time,
that indeed the file downloaded is no longer the gzipped data file, but instead is the login html page.
Here the output:


DEBUG: Content-Type=[text/html;charset=UTF-8]
DEBUG:          url=[https://www.eso.org:443/sso/login?service=https%3A%2F%2Fdataportal.eso.org%2FdataPortal%2Flogin%2Fcas]

followed by the obvious error message that ... "OSError: Not a gzipped file"

The def _download_file clearly fails to catch this, as it is trying to match the following:

            if (resp.headers['Content-Type'] == 'text/html;charset=UTF-8' and resp.url.startswith('https://www.eso.org/sso/login')):
                if trials == 1:
                    log.warning("Session expired, trying to re-authenticate")
                    self.login()
                    trials += 1
                else:
                    raise LoginError("Could not authenticate")
            else:
                break

3 things:
the url.startswith is too strictly defined, as one never knows what the internal ESO redirects could return (currently they return also the port number :443); I guess best would be to match just only "sso/login"

the Content-Type match is also too strictly defined as there could be a blank separator (currently not present, but that is not guaranteed in the future); btw if the url matches "sso/login" there is no need to check for the content type;

the trials == 1 would work only the first time the 2h limit is reached, and not if the list of datasets is so long, or the files so heavy, or the network so bad, that the download time exceeds the 4 hours, or the 6 hours, etc.

It would be very nice if you could have a look at this. Many thanks!

keflavich · 2022-02-09T17:06:31Z

@almicol I have a bit of a hard time reading that; could you edit it to put the code blocks into triple-backticks so I can see which parts are (user-entered) code and which are not?

We could certainly soften some of the type checking - again, a PR would be helpful!

almicol · 2022-02-09T17:15:14Z

@keflavich I hope is more readable now

bsipocz · 2023-12-11T18:09:33Z

Addressed by #2681

bsipocz added the bug label Sep 10, 2020

thespacedoctor mentioned this issue Feb 14, 2022

Astropy isssue reading of fits.Z compressed fles thespacedoctor/soxspipe#92

Closed

bsipocz added the eso label Feb 24, 2022

szampier mentioned this issue Mar 10, 2023

Refactor ESO authentication and download #2681

Merged

bsipocz closed this as completed Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESO: decompressing .fits.Z files is broken #1818

ESO: decompressing .fits.Z files is broken #1818

saimn commented Sep 10, 2020

almicol commented Feb 3, 2022

keflavich commented Feb 3, 2022

almicol commented Feb 3, 2022

almicol commented Feb 7, 2022

keflavich commented Feb 7, 2022

bsipocz commented Feb 7, 2022

almicol commented Feb 9, 2022

almicol commented Feb 9, 2022 •

edited

Loading

keflavich commented Feb 9, 2022

almicol commented Feb 9, 2022

bsipocz commented Dec 11, 2023

ESO: decompressing .fits.Z files is broken #1818

ESO: decompressing .fits.Z files is broken #1818

Comments

saimn commented Sep 10, 2020

almicol commented Feb 3, 2022

keflavich commented Feb 3, 2022

almicol commented Feb 3, 2022

almicol commented Feb 7, 2022

keflavich commented Feb 7, 2022

bsipocz commented Feb 7, 2022

almicol commented Feb 9, 2022

almicol commented Feb 9, 2022 • edited Loading

keflavich commented Feb 9, 2022

almicol commented Feb 9, 2022

bsipocz commented Dec 11, 2023

almicol commented Feb 9, 2022 •

edited

Loading