This repository provides various utility scripts for downloading Daymet datasets [1, 2]. Originally, this project was intended to prepare Daymet surface data for basins of the CAMELS-US dataset [3]. However, the scripts provided in this repo can be used for any other geospatial objects with polygon geometry.
Daymet data contain gridded estimates of daily weather and climatology parameters at a 1 km x 1 km raster for
North America, Hawaii, and Puerto Rico. Daymet Version 3 and Version 4 data are provided by ORNL DAAC and can be
via ORNL DAAC's Thematic Real-time Environmental Distributed Data Services (THREDDS).
To install the Daymet PyProcessing package just use Pip. To install the latest version of the package, run:
python3 -m pip install git+https://github.com/SebaDro/daymet-pyprocessing.git
It is also possible to clone this repository to your local machine and install the package from your local copy:
python3 -m pip install -e .
For developing purpose, you also can either use Conda or Pip to install all required dependencies. Therefore, this repo comes with an environment.yml and a requirements.txt in this repository, respectively.
Note, that this project depends on GeoPandas, which may not install all required dependencies for some operations systems. In this case, you'll find installing instructions within the GeoPandas documentation.
In order to run the download_daymet script, you have to provide a configuration file which controls the download process. You'll find an exemplary config files inside ./config which you can use as starting point. The download script supports two modes: download for multiple areas based on a geo file and download for a fixed bounding box.
Prepare a config file as stated above and run the download_daymet
script with the path to the config file as
only positional argument:
download_daymet ./config/download-config.yml
The script will download Daymet datasets via NetCDF Subset Service (NCSS) for each geospatial object present in the provided geo file and indicated by the ids in the config file. To do so, the bounding box of each geospatial object as well as the specified variable and timeframe will be used as request parameters.
This mode takes the polygonal geometries for different basins or other geospatial features from a geo file and downloads Daymet data for each of the geometries based on its bounding box. Daymet files will be downloaded for a certain variable.
Config parameter | Description |
---|---|
loggingConfig | Path to a logging configuration file. This must be a YAML file according to the Python logging dictionary schema. |
geo.file | Path to a file that contains geospatial data. The file must be in a data format that can be read by GeoPandas and should contain polygon geometries with WGS84 coordinates, which will be used for requesting Daymet data. |
geo.idCol | Name of the column that contains unique identifiers for the geospatial objects. |
geo.ids | IDs of the geospatial objects used for requesting Daymet data. If None , all geospatial objects from the geo.file will be considered. |
readTimeout | Sets a read timeout for the download. |
singleFileStorage | For true the downloaded yearly Daymet datasets will be concatenated by time dimension and stored within a single file for each geospatial object. For false the downloaded yearly Daymet datasets will be stored within separate files foreach object and year. |
timeFrame | startTime and endTime in UTC time for requesting Daymet data. |
outputDir | Path to the output directory directory. Downloaded datasets will be stored here. |
variable | Data variable that should be included in the downloaded Daymet datasets. |
version | Version of the Daymet dataset to be downloaded. |
This mode takes a bbox parameter that will be directly used for download Daymet files for the specified variable.
Config parameter | Description |
---|---|
loggingConfig | Path to a logging configuration file. This must be a YAML file according to the Python logging dictionary schema. |
bbox | A static bbox used for downloading Daymet files. Format: [minLon, minLat, maxLon, maxLat] (e.g. [-73.73, 40.93, -73.72, 40.94]) |
readTimeout | Sets a read timeout for the download. |
singleFileStorage | For true the downloaded yearly Daymet datasets will be concatenated by time dimension and stored within a single file for each geospatial object. For false the downloaded yearly Daymet datasets will be stored within separate files foreach object and year. |
timeFrame | startTime and endTime in UTC time for requesting Daymet data. |
outputDir | Path to the output directory directory. Downloaded datasets will be stored here. |
variable | Data variable that should be included in the downloaded Daymet datasets. |
version | Version of the Daymet dataset to be downloaded. |
This repo also comes with some processing routines. Up to now, it supports combining, clipping and aggregating Daymet NetCDF files. You can control processing Daymet data via the process_daymet script by providing a configuration file. You'll find different exemplary files inside ./config which you can use as starting point.
Prepare a config file as stated above and run the process_daymet
script with the path to the config file as
positional argument followed by a certain operation that should be applied to the Daymet files:
process_daymet {operation} ./config/processing-config.yml
The combine
discovers multiple Daymet NetCDF files which have been downloaded with the download.py script and merges
those files that refer to the same basin. NetCDF files with the same basin ID as file name prefix will be handled as
related files and merged.
In order to discover all relevant files, folder structure and file naming must follow the conventions mentioned below:
{data_dir}/{variable}/{id}/{id}_daymet_v4_daily_na_{variable}_*.nc.
{data_dir}/{variable}/{id}/{id}_daymet_v3_{variable}_*_na.nc4
These patterns follow the naming style for single downloaded files as a result of the download.py script.
Config parameter | Description |
---|---|
dataDir | Path of the data directory which contains the Daymet NetCDF files. Only files which are stored according to a certain folder structure (you'll find an example below) within this directory will be considered for processing. |
loggingConfig | Path to a logging configuration file. This must be a YAML file according to the Python logging dictionary schema. |
outputDir | Path to the output directory directory. Processing results will be stored here. |
ids | Identifier used to determine, which Daymet files should be considered for processing. Leave empty, if all Daymet files inside the dataDir should be considered. |
outputFormat | Format for storing the results. Supporter: netcdf, zarr |
version | Version of the Daymet datasets. |
operationParameters.variables | Only a subset of the available Daymet datasets containing these variables will be considered for processing. |
The clip
operation clips Daymet data for given polygonal geometries stored in a geo file.
In order to discover all relevant files, folder structure and file naming must follow the conventions mentioned below:
{data_dir}/{id}_daymet_v4_daily_na.nc
{data_dir}/{id}_daymet_v3_na.nc4
These patterns follow the naming style for stored results of the combine operation.
Config parameter | Description |
---|---|
dataDir | Path of the data directory which contains the Daymet NetCDF files. Only files which are stored according to a certain folder structure (you'll find an example below) within this directory will be considered for processing. |
loggingConfig | Path to a logging configuration file. This must be a YAML file according to the Python logging dictionary schema. |
outputDir | Path to the output directory directory. Processing results will be stored here. |
ids | Identifier used to determine, which Daymet files should be considered for processing. Leave empty, if all Daymet files inside the dataDir should be considered. |
outputFormat | Format for storing the results. Supporter: netcdf, zarr |
version | Version of the Daymet datasets. |
operationParameters.geomPath | Path to the file that contains polygonal geometries. |
operationParameters.idCol | Name of the ID column within the geo file. |
The aggregate
operation calculates the mean, min or max for Daymet data across the combined 'x' and 'y' dimension.
In order to discover all relevant files, folder structure and file naming must follow the conventions mentioned below:
{data_dir}/{id}_daymet_v4_daily_na.nc
{data_dir}/{id}_daymet_v3_na.nc4
These patterns follow the naming style for stored results of the combine operation.
Config parameter | Description |
---|---|
dataDir | Path of the data directory which contains the Daymet NetCDF files. Only files which are stored according to a certain folder structure (you'll find an example below) within this directory will be considered for processing. |
loggingConfig | Path to a logging configuration file. This must be a YAML file according to the Python logging dictionary schema. |
outputDir | Path to the output directory directory. Processing results will be stored here. |
ids | Identifier used to determine, which Daymet files should be considered for processing. Leave empty, if all Daymet files inside the dataDir should be considered. |
outputFormat | Format for storing the results. Supporter: netcdf, zarr |
version | Version of the Daymet datasets. |
operationParameters.aggregationMode | Defines which aggregation operation should be performed. Supported: mean, min, max |
[1] Thornton, P.E., M.M. Thornton, B.W. Mayer, Y. Wei, R. Devarakonda, R.S. Vose, and R.B. Cook. 2016. Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 3. ORNL DAAC, Oak Ridge, Tennessee, USA. https://doi.org/10.3334/ORNLDAAC/1328
[2] Thornton, M.M., R. Shrestha, Y. Wei, P.E. Thornton, S. Kao, and B.E. Wilson. 2020. Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 4. ORNL DAAC, Oak Ridge, Tennessee, USA. https://doi.org/10.3334/ORNLDAAC/1840
[3] Newman, A., Sampson, K., Clark, M. P., Bock, A., Viger, R. J., Blodgett, D. (2014). A large-sample watershed-scalehydrometeorological dataset for the contiguous USA. Boulder, CO: UCAR/NCAR. https://dx.doi.org/10.5065/D6MW2F4D