-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ObsOperator: proposal for a generic observation operator diagnostic #2636
Comments
This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the issue from closing this issue. |
@mbertolacci, you may have better luck getting a response by reaching out to GEOS-Chem working groups or specific individuals by email. |
Hi @mbertolacci, apologies for the late reply. This feature would be very helpful for easily adding new instruments into our methane inversion work (IMI and others). @hannahnesser and @pennelise have been working on a flexible observation operator package for postprocessing GEOS-Chem data. They may have some thoughts on implementation. |
Hi @mbertolacci, Yes, @pennelise and I have a complete, working version of this for satellite observations. You can find the code and documentation here. I've implemented it in my (highly modified) branch of the IMI for my OCO work, though it hasn't been generally incorporated for the IMI. All that would need to be done for this to be done is (1) for someone to write a TROPOMI parser and add it to the parsers.py file and (2) for the IMI itself to be updated to use the operator. I've been meaning to do (1) for ages. If someone can volunteer to do (2), I can prioritize (1). Also, GOOPy doesn't yet have an ObsPack operator. This would also be very easy to implement and again I can prioritize it if someone offers to implement it within the IMI. (You can see our list of things we need to add, including an ObsPack capability, to GOOPy under the issues tab.) |
I'll just add a quick note that CHEEREIO supports a variety of observation operators across TCCON, TROPOMI, OMI, ObsPack, IASI, and CrIS. The GC adjoint has its own suite of observation operators, as does the JEDI initiative for a universal data assimilation platform. Little details in implementation on the DA side have so far prevented things from being compatible (i.e. the adjoint and JEDI don't output GC mapped to observations or vice-versa, but rather related metrics) so unfortunately there is tons of duplicated effort in this space. More directly to your idea, I have plans to release a version of CHEEREIO without assimilation that allows users to run GC and output whatever observational diagnostics using existing CHEEREIO operators, but this would have more overhead than the operator you propose @mbertolacci because CHEEREIO involves a lot of bonus code that would be superfluous in this case. I'll also note, following @laestrada , that the diagnostic you propose would be very useful in the ease of creating future operators. We've already greatly benefited from the satellite diagnostic for CrIS work. |
Thanks everyone for popping up with some comments! Hi @laestrada and @hannahnesser, GOOpy looks like it has very similar goals! The difference, if I understand GOOpy correctly, is that ObsOperator runs inline with GEOS-Chem rather than post-processing SpeciesConc or LevelEdge files. We needed that because it's infeasible in many cases for us to save model output at sufficiently fine resolutions. I have code (written in R) similar to GOOpy for satellite and obspack-style observations but felt it would be good to unify them in a single format. (Edited to add) The other difference (and this differs from ObsPack and other existing satellite operators in GEOS-Chem) is there's no consideration of parsing the observation input files in order to get the sampling scheme - that is a left as a preprocessing step for the user. One could provide scripts to do it for common formats. I think that's both a plus and a minus - it simplifies the Fortran code considerably, it gives a user flexibility in how to exactly implement the sampling, and (like GOOpy) it lets multiple input types be sampled into a common output format. On the other hand it's more work for the user. Hi @drewpendergrass, that's good to know about CHEEREIO, it sounds like there's some overlap. ObsOperator of course has no consideration of adjoint code at present. If anyone wants to try the code I'm happy to assist with that. A few notes on where this code is at:
I think after 1. and 2. the module should have a reasonably stable input and output file format. |
Your name
Michael Bertolacci
Your affiliation
The University of Western Australia
Please provide a clear and concise description of your question or discussion topic.
For a flux inversion project I'm involved in I've been working on a generic observation operator diagnostic for GEOS-Chem which I've tentatively termed ObsOperator.
I'm curious about the views of the GEOS-Chem community on the value of such a diagnostic, and what features it ought to have beyond what is already implemented.
Per offline discussions with @aschuh and @yantosca, I am tagging the following people who may be interested:
@msulprizio @lizziel @kbowman77 @dbajones @ltmurray @drewpendergrass @laestrada @djxjacob @randallvmartin @djvaron @nicholasbalasus @eastjames
The code is in a fork of GEOS-Chem. Most of it is here:
https://github.com/mbertolacci/geos-chem/tree/feature/obsoperator-diagnostic/ObsOperator
The basic idea is to return a (potentially weighted) average of the GEOS-Chem species concentration field. This could be a vertical average, box average, time average, or any combination thereof.
This is sufficient to do what the existing ObsPack diagnostic does. Due to the vertical averaging, it can also implement part of the sampling for OCO-2 or TCCON. I give examples of these operators below in a draft specification.
The rest of this is a draft specification, as currently implemented by the code above.
ObsOperator diagnostic
Overview
This ObsOperator diagnostic for GEOS-Chem is a flexible way to define and run observation operators over the GEOS-Chem species concentration field. The user specifies "observation operators" in YAML files, which are then used to calculate a set of output values for each hour of the run.
A single observation operator is defined as an operation on the GEOS-Chem species concentration field that produces a single number. A YAML file or files is used to define the set of observation operators for a given run, with a format described later in this document.
Every observation operator can apply to one or more species, and is the composition of a time operator, a horizontal operator, and a vertical operator. Each operator either samples a single value, or computed a (potentially weighted) average over the a set of values (conceptually, however, a point value is a single-value average with a weight of 1). The resulting calculated value is thus equal to
where
s
is the species index. The vertical indices can be defined directly as levels, by altitude, or by pressure. The weights can be fixed, or can be pressure weighted (and thus dynamically updated).Once all time points have been processed, the output values are written to a CSV file.
geoschem_config.yml entry
The entry in the
geoschem_config.yml
file is as follows:The
input_file
andoutput_file
are expanded to include the year, month, and day if they containYYYY
,MM
, orDD
, as per usual.YAML file format examples
The YAML file contains a list of observation operators under the entries key as follows:
Examples of entries are below, and full specification of the format is given later.
ObsPack-style time averaging
The ObsPack diagnostic implements either point or time-averaged sampling at a specific location and altitude. For example, suppose the ObsPack entry is:
The corresponding YAML entry could be:
OCO-2 retrieval vertical averaging
For OCO-2, the observation operator for a retrieval at a specific time and location has the following form:
where
xco2_apriori
is a given scalar,h
is a vector of weights (a combination of the averaging kernel and the pressure weights),vco2_model
is a vector of concentrations from GEOS-Chem at certain pressure levels, andvco2_apriori
is the apriori vector of concentrations.The ObsOperator diagnostic can compute
h^T vco2_model
; the user would then add the constant offsetxco2_apriori - h^T vco2_apriori
to this to get the final sampled value.A corresponding YAML entry could be:
(Here the weights have been multiplied by 1e6 to convert the output concentration from v/v to ppm.)
YAML file format specification
The YAML file contains a list of observation operators under the entries key as follows:
An individual observation operator is defined as a dictionary with the following keys:
If only one species is specified, it can be given as a single string instead of a list.
Each of the operators is described below
Time operator
The time operator can apply to a single point in time or a range of times.
A point in time is defined as follows:
where
YYYYMMDD
is the year, month, and day, andHHMM
is the hour and minute, both as integers.A range of times is defined as follows:
The weight can be either normalized or equal. Normalized are the default, and mean that the weights are normalized to sum to 1 (thus we are averaging the values over the time range). Equal means that the weights are equal to one for all time points (thus we are summing the values over the time range).
Horizontal operator
The horizontal operator can apply to a single point or a box.
A point is defined as follows:
Degrees are degrees north for latitude (-90 to 90), and degrees east for longitude (-180 to 180). The grid index depends on the choice of horizontal grid.
A box is defined as follows:
The weight can be either normalized area, area, normalized, or equal. Normalized means that the weights sum up to 1. Area means that the weights are proportional to area (or equal for weights = "area" itself). Equal means that the weights are equal to one for all grid points.
Vertical operator
The vertical operator can apply to a single point, a range of points, or a specific set of points.
A point vertical operator is defined as follows:
Pressure is in hPa. Altitude is in meters. Pressure level is the integer pressure level index, which depends on the choice of vertical grid.
A range of points is defined as follows:
As usual, normalized means that the weights sum up to 1. Pressure means that the weights are proportional to the pressure difference across the sampled vertical levels. Equal means that the weights are equal to one for all vertical levels.
The final vertical operator is a set of points, and is defined as follows:
Here, the weight must be specified for each value.
Output format
The output is a CSV file with the following columns:
id
: The id of the observation operatorspecies
: The species name of the observation operatorvalue
: The final accumulated value of the observation operator for the given speciesThe text was updated successfully, but these errors were encountered: