Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the latest date available for the dataset? #80

Open
ghost opened this issue Jul 31, 2023 · 8 comments
Open

How to get the latest date available for the dataset? #80

ghost opened this issue Jul 31, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@ghost
Copy link

ghost commented Jul 31, 2023

Is your feature request related to a problem? Please describe.

When attempting to automatically download data (e.g., ERA5) using the cdsapi, I consistently find it necessary to implement a try...except block to handle potential program failures when the data exceeds the latest available date. Nevertheless, I can only retrieve this information from error logging. Is it feasible to provide an interface that allows me to determine when to stop the download process?

Describe the solution you'd like

implement an API that displays the latest available data for the dataset.

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

@ghost ghost added the enhancement New feature or request label Jul 31, 2023
@WeatherGod
Copy link

This is similar (in spirit) to my request in #78. I have figured out a partial solution using the requests library:

import requests
r = requests.get("https://cds.climate.copernicus.eu/api/v2.ui/resources/reanalysis-era5-single-levels-monthly-means")
print(r.json()['update_date'])

where reanalysis-era5-single-levels-monthly-means was my dataset I was using.

@WeatherGod
Copy link

There might be other information in that json object that might provide a date range as that "update_date" is specifically about when the dataset was last updated, rather than the latest date in the dataset. Being able to access all of this information (and more!) from within the cdsapi would be very valuable, I think.

@WeatherGod
Copy link

Just came across this gem in an exception traceback (the exception is actually from the server and reported as part of an error message). I had accidentally requested a date in the future for a dataset.

...
2023-08-15 12:34:00,473 ERROR     File "/home/cds/cdsservices/services/mars/preprocess_request.py", line 172, in implement_embargo
2023-08-15 12:34:00,473 ERROR       f"{embargo_datetime.strftime(embargo_error_time_format)}", ""
2023-08-15 12:34:00,473 ERROR   cdsinf.exceptions.BadRequestException: None of the data you have requested is available yet, please revise the period requested. The latest date available for this dataset is: 2023-08-10 16:00

So, it is definitely theoretically possible to retrieve...

@luabida
Copy link

luabida commented Feb 19, 2024

This is similar (in spirit) to my request in #78. I have figured out a partial solution using the requests library:

import requests
r = requests.get("https://cds.climate.copernicus.eu/api/v2.ui/resources/reanalysis-era5-single-levels-monthly-means")
print(r.json()['update_date'])

where reanalysis-era5-single-levels-monthly-means was my dataset I was using.

.

This can't be used to every dataset tho, reanalysis-era5-single-levels dataset returns today's date, but the actual update date is 2024-02-13

In [14]: import requests
    ...: r = requests.get("https://cds.climate.copernicus.eu/api/v2.ui/resources/reanalysis-era5-single-levels")
    ...: print(r.json()['update_date'])
2024-02-19

image

@zqianem
Copy link

zqianem commented Feb 19, 2024

This can't be used to every dataset tho, reanalysis-era5-single-levels dataset returns today's date, but the actual update date is 2024-02-13

Update date in this case means the date new files are added, not the date those new files are for. From the overview tab:

ERA5 is updated daily with a latency of about 5 days.

@luabida
Copy link

luabida commented Feb 19, 2024

ERA5 is updated daily with a latency of about 5 days.

I've been using 6 days, but it would be helpful to have a way of gettings this last available date programmatically

@abreufilho
Copy link

https://cds.climate.copernicus.eu/api/v2.ui/resources/reanalysis-era5-land

And check for response.json()["structured_data"]["temporalCoverage"]

@AliWaseem607
Copy link

Update:

this can be done by going to the data page of the required dataset and following the link to the "STAC" under the Standard Metadata heading on the right hand side of the page.:

image

After that on the STAC page there will be a button with "source" near the top right hand side of the screen:

image

There you will be able to find the target url to use for the request.get call:

image

After this the following code can be used to obtain a list with the [start_date, end_date] information for the dataset. Note that while the end data has hour:minutes:seconds of 00:00:00 the data appears to be available until the end of the day.

import requests

pressure_level_address = "https://cds.climate.copernicus.eu/api/catalogue/v1/collections/reanalysis-era5-pressure-levels"

r = requests.get(pressure_level_address)
time_interval = r.json()["extent"]["temporal"]["interval"][0]
end_time = time_interval[1]

I hope that helps anyone coming to this thread later on!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants