This package provides a set of checks that can be run on individual data files or on groups of data files. Each check relates to an aspect of the time representation within the data set/file.
For example, there is a check to make sure the representation of time
in the file name matches the time
variable inside the file. Another
check ensures that the time axis across multiple files is contiguous.
The full list of checks managed by this package are provided below:
The single file tests are:
-
check_file_name_time_format
Checks that the format of the file name contains two components in the form "start"-"end", separated by "-" Checks that the time components matches known regular expressions of the form YYYY[mm[DD[HH[MM]]]] and the "start" and "end" times must have the same the structure.
Where [] indicated the elements are optional and
- YYYY: is a four digit integer year
- mm: is a two digit integer month
- DD: is a two digit integer day
- HH: is a two digit integer hour
- MM: is a two digit integer minute
-
check_file_name_matches_time_var
Checks that the file name of a dataset (file) matches the time range found in the file. The time component in the file name is extracted at index The checks are done within the tolerance level specified.
Are the time axis start and end times as given in the filename the same as those given in the file relevant to a given tolerance. For example daily file with a start date given as 18500101 should have a datetime element of
<1850-01-01: 00:00:00>
, however a monthly file withe the start time element 18500101 may have a time stamp in the file at the mid month point e.g.<1850-01-15: 00:00:00>
, therefore a tolerance can be specified for this check. -
check_time_format_matches_frequency
Checks for consistenty between the time frequency and the format of the time.
- cmor_table = "yr", "Oyr": yearly data of the form: yyyy, test has length 4
- cmor_table = "Amon", "Lmon", "Omon", "LImon", "OImon", "cfMon": monthly data of the form: yyyyMM, test has length 6
- cmor_table = "monClim": monthly climatology data of the form: yyyyMM, test has length 6
- cmor_table = "day", "cfDay": daily data of the form: yyyyMMdd, test has length 8
- cmor_table = "6hrLev", "6hrPLev": 6 hourly data of the form: yyyyMMddhh, test has length 10
- cmor_table = "3hr", "": 3 hourly data of the form: yyyyMMddhhmm, test has length 12
- cmor_table = "subhr": sub-hourly data of the form: yyyyMMddhhmm, test has length 12
- NOT IMPLEMENTED YET: 'aero': 0,'cfOff': 0, 'cfSites': 0, 'fx': 0
-
check_valid_temporal_element
Checks whether the temporal elements are within the valid ranges:
- years, yyyy: a four digit integer > 0000
- months, MM: a two digit integer between 01 and 12
- days, dd: a two digit integer between 01-31
- hours, hh: a two digit integer between 00-23
- minutes, mm: a two digit integer between 00-59
-
check_regular_time_axis_increments
Checks that the given time axis increments for a given file are regularly spaced throughout the time series. Since it is common to have the timestamp of monthly data placed at the middle of month, monthly CMIP5 maybe irregularly spaced when using any of the following calendars: 'gregorian', 'proleptic_gregorian', 'julian', 'noleap', '365_day', 'standard', For these calendars valid time axis increments are 29.5, 30.5 and 31 days. For most other frequencies in CMIP5 the time increments should be regular.
run_file_timechecks.py
is a wrapper to file_time_checks.py which takes 1:n .nc
files and then
calls the time checks in given order.
- Check the format of the filename is correct
- Check that the file name temporal elements are valid
- Check that the frequency in the name matches the frequency in the file
- Check that the temporal elements in the filename match the time range found in the file.
- Check that the time axis increments are regularly spaced
Error codes are as follows:
- T1.000: [file_extension]
- T1.001: [check_file_name_time_format]
- T1.002: [check_valid_temporal_element]
- T1.003: [time_format_matches_frequency]
- T1.004: [file_name_matches_time_var]
- T1.005: [regular_time_axis_increments]
Given a variable level dataset (i.e. a directory of a timeseries Continutity - a: Are there any identified gaps in the timeseries For each "end" the following "start" is the next timestep in the series depending on the temporal resolution Check against filename only Or timestamp? Continutity - b: Are there any identified overlaps in the timeseries For each "end" the following "start" is the next timestep in the series depending on the temporal resolution Check against filename only Or timestamp? Completeness: start to end is continuous Completeness: For a given experiment is the timeseries complete, i.e for a list of files there are no gaps or overlaps between files
The library currently supports all the calendars supported by the netcdftime library. It therefore supports all the CMIP calendars.
It will only support AD times and will not support any time dated BC.
The package is managed in a GitHub repository here:
https://github.com/cedadev/time-checks
To install the package and its dependencies:
git clone https://github.com/cedadev/time-checks
cd time-checks
virtualenv venv
echo ". venv/bin/activate" > setup_env.sh
echo "export PYTHONPATH=." >> setup_env.sh
. setup_env.sh
python setup.py install
A beta release of this code is also available
wget https://github.com/cedadev/time-checks/archive/time_check_0-6.tar.gz
pip install time_check_0-6.tar.gz
Note however that this code is still under active development as of Nov 2017, new releases in the near future are anticipated.
To run all tests:
py.test [-v] [-k]
Use the -v verbose
option to get full details of all the tests.
Use the -k specific test
option to run just one test called by name.