Diviner Data Processing

Paul Hayne, JPL -- June 11, 2015

Adapted for this Wiki by K.-Michael Aye, LASP -- August 21, 2018

Logging in

For access to the Diviner team data system, you have two options:

1. Use the web query (requires login credentials)

Link: https://luna1.diviner.ucla.edu/divproc/rel/pipetext?op=begin&cat=rdr
This option is great for simple one-time queries, but gets cumbersome quickly if you are doing serious data processing

2. Use the command line interface and "pipes" tools

Both the UCLA and JPL servers should work in exactly the same way:

UCLA: luna{i}.diviner.ucla.edu (where i={1, 2} )
JPL: cabeus.jpl.nasa.gov

This option is tricky to learn, but much more powerful than the web query.

To log in to one of the files servers, simply ssh using the "-Y" flag to pass display information for your screen:

phayne@localhost:~$ ssh -Y cabeus

or more explicitly,

phayne@localhost:~$ ssh -Y [email protected]

If you are off-lab, you must be connected to the VPN. If you cannot get onto the JPL VPN, just connect to UCLA instead:

phayne@localhost:~$ ssh -Y [email protected]

Using divdata

The program divdata is at the core of the data processing pipeline.

A basic help message is displayed by simply typing "divdata" with no options:

phayne@cabeus:data$ divdata
Using new divbase

Quick info:

divdata      (No arguments, prints this usage statement)

divdata [type=datatype] fields   (to just print out selectable fields)

----

Piping data into other pipes commands:

divdata [type=datatype] [noindex] daterange=BEGIN,END [clat=MIN,MAX] [c=MIN,MAX]
            [FIELD=MIN,MAX FIELD=MIN,MAX ...] | PIPES_COMMANDS ...

    BEGIN and END for daterange can be the following format:
        YYYYMM     - A month, gets you all the days in that month.
        YYYYMMDD   - A day,   gets you all the hours in that day.
        YYYYMMDDHH - An hour, gets you all the minutes in that hour.
    If BEGIN and END are equal, e.g. 200907, you can just use: daterange=200907
    Multiple daterange=BEGIN,END arguments specify disjoint times.

    Other fields (except for 'c', only one instance of each may be allowed):

    clat=MIN,MAX  - Center latitude of observation, greatly improves performance

    c=MIN,MAX     - Channel number, greatly improves performance
                    You can specify multiple arguments for this, e.g.:
                        c=1,1 c=5,6 c=8,9
                    but don't mix inclusive (MIN<=MAX) with exclusive (MIN>MAX)

    FIELD=MIN,MAX - Any other FIELD in the dataset,
                        moderately improves performance

    type=DATATYPE - Output data format.  Default is 'div38'.

    noindex - Do not use indexing to match data constraints.
              Significantly SLOWS DOWN your data access.
              Use only for debugging, a sanity check to make
              sure you are getting all your data.   Using this
              option *should* not alter your results except in
              terms of speed.  Let us know otherwise.

    nodel - Do not delete the catfile this program creates.

    debug=N - Debug level where N is one of:
                0 - Normal, only high level messages.
                1 - Detailed
                2 - Extra detailed.

              All debugging messages are printed to standard error.

A note on constraints:

    When specifying a MIN,MAX, if MIN<=MAX, you get all the data
    between MIN and MAX, inclusively.   If MIN>MAX, you get all
    the data OUTSIDE of the inclusive MIN,MAX range.   Examples:

    clat=-70,50 - All latitudes between -70 and 50, inclusively.
    clat=50,-70 - (-90,-70.0000000001) + (50.0000000001,90)

    c=3,3 - Channel 3 only.
    c=3,5 - Channels 3,4,5
    c=5,3 - Channels 1,2,6,7,8,9

Here is the basic pipeline concept behind the Diviner data processing system:

The "pipes" routines were developed in the 1990's, but they are still the most efficient and flexible way to manipulate the Diviner data! You can picture data "stream" flowing through pipes. Each of these pipes manipulates the data in some way (e.g. coordinate projection, unit conversion, averaging, histograms, etc). Pipes can be linked end-on-end to apply multiple processing steps in one command line. More information on pipes can be found here:

http://luna1.diviner.ucla.edu/~marks/pipes.html

Below are several examples of using divdata to access the database, apply constraints, and output the results in a useful format.

Example 1: Diurnal temperature cycle at a specific location

The following query applies constraints on the date, longitude, latitude, and channel #, in order to pull out local time and Tb for all channel-7 data within a 1°x1° box:

phayne@cabeus:data$ divdata daterange=20090601,20100630 clat=0,1 clon=0,1 c=7,7 af=0,199 cemis=0,10 
    | pextract extract=cloctime,tb 
    | pprint > mydata_c7.txt

Let's look at each step in this command:

First, divdata pulls out all the data within the specified range of June 1, 2009 to June 30, 2010. Then, divdata restricts the data to the 1°x1° box in latitude and longitude, pulls out only channel 7 data acquired in normal mapping mode ("af=0,199") with emission angle <= 10° (eliminates spacecraft rolls), and passes the stream to pextract.
Next, pextract stops the flow of all data fields except local time and brightness temperature, which it pipes to pprint.
Finally, pprint converts the binary data stream into ASCII text, and then we write the results to the file mydata_c7.txt.

If everything goes smoothly, you will see this message on the screen:

Latitude disks used: 10

Channels selected : 7

Files matching your date range: 8481

Files skipped due to indexing : 7651

\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\--

Files used: 830

Executing: .catfile.2015_06_11-10:00:26.10535.53729390 | extract_c38
     | pcons des=/u/marks/luner/c38/src/div38.des clat=0,1 clon=0,1 af=0,199 cemis=0,10

pcons: 66808 records written, 0.020309632 GB

The output data look like this:

phayne@cabeus:data$ head mydata_c7.txt

17.246389389 234.947006226

17.245559692 232.264999390

17.244720459 234.897003174

17.243890762 237.087005615

17.243330002 232.552993774

17.242500305 236.507003784

17.241670609 235.634994507

17.240829468 234.841995239

17.239999771 233.440994263

17.239170074 230.014007568

In this case, we just have the two columns we selected with pextract: cloctime and tb. Below is an example of how you might plot this up in Matlab. Here, I'm using my own binning function "bin1d" in order to see the mean values and standard deviation at each local time. Just for comparison, I plotted the radiative equilibrium temperature, which matches nicely during mid-day: .

Matlab session:

\>\> data = dlmread(\'mydata\_c7.txt\');

\>\> plot(data(:,1),data(:,2),\'.k\',\'markersize\',1)

\>\>
set(gca,\'fontsize\',14,\'color\',\'none\',\'box\',\'on\',\'fontweight\',\'normal\',\'tickdir\',\'in\',\'xminortick\',\'on\',\'yminortick\',\'on\')

\>\> xlabel(\'Local Time\')

\>\> ylabel(\'Brightness Temperature (K)\')

\>\> text(2,375,\'C7\',\'fontsize\',14)

\>\> help bin1d

function \[xbins, ymean, ystd, ymin, ymax, n\] = bin1d(x, y, xbins)

\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\--

Bin (x,y) data into bins centered at points in the array xbins. Returns
the mean, standard deviation, minimum, maximum, and number of points in
each bin.

\>\> ltbins = \[1.0,2.8,4.5,6.5,8.2,10,12,13.8,15.5,17.2,19.5,21.2,23\];
% I just eyeballed these bin centers

\>\> \[loctime,tbmean,tbstd\] = bin1d(data(:,1),data(:,2),ltbins);

\>\> figure(1), hold on

\>\>
errorbar(loctime,tbmean,tbstd,tbstd,\'sr\',\'markerfacecolor\',\'r\')

\>\> lday = 6.0:0.1:18.0;

\>\> albedo = 0.1;

\>\> sigma = 5.67e-8; % stefan-boltzmann constant

\>\> emissivity = 0.95;

\>\> hourangle = (2\*pi/24)\*(lday-12);

\>\> F0 = 1361; % solar irradiance (W.m-2)

\>\> F = F0.\*cos(hourangle);

\>\> T\_radeq = ( (1-albedo)/(emissivity\*sigma) \* F ).\^(1/4);

\>\> plot(lday,T\_radeq,\'b\',\'linew\',1)

{width="6.0in" height="4.722916666666666in"}

Example 2: Using pread and descriptor files

Often we end up running very similar data queries multiple times, with slight modifications towards the end of the query. For example, we might want to do the above query for each Diviner channel. Instead of running the whole query 9x, we can do the following:

phayne@cabeus:data$ divdata daterange=200906,201006 clat=0,1 clon=0,1
    af=0,199 cemis=0,10 | pextract extract=c,cloctime,tb nodes >
    mydata.bin

Now the data for all nine channels are in the binary file 'mydata.bin'. We can read this file in using the pipes command pread, along with a "descriptor file":

phayne@cabeus:data$ pcons des=mydescriptor.des c=1,1 < mydata.bin |
    pprint > mydata_c1.txt

This reads in the binary file we created above, and restricts the data stream to only channel 1. The descriptor file must match the format of the binary file, e.g. like this:

phayne@cabeus:data$ cat mydescriptor.des

\'Example descriptor file\'

\'c\' \'channel number (1-9)\'

\'cloctime\' \'local solar time\'

\'tb\' \'brightness temperature\'

Now that we have the binary file, it's straightforward to extract all the other channels at a later time (using a for-loop in this case):

phayne@cabeus:data$ for chnum in `seq 2 9`; do pcons
des=mydescriptor.des c=${chnum},${chnum} < mydata.bin | pprint >
mydata_c${chnum}.txt; echo "processed chan. ${chnum}"; done

Switching to Matlab, here is what the data from all seven channels look like:

{width="4.36124343832021in" height="2.8928576115485565in"}

Example 3: Making maps with pbin3d

There's a pipes tool called pbin3d, which we use to "bin" the data along up to three dimensions. Most often, we treat the first two dimensions as spatial coordinates (e.g. lon/lat) and the third dimension as either time, or a dummy variable. In the case below, I'm making it a dummy variable using tb.

phayne@cabeus:data$ divdata daterange=20090601,20100630 clat=20,25
    clon=-50,-45 c=7,7 cloctime=20,22 af=0,199 cemis=0,10 
    | pextract extract=clon,clat,tb 
    | pbin3d x=clon y=clat z=tb t=tb xrange=-50,-45 yrange=20,25 
      zrange=0,400 deltax=0.02 deltay=0.02 deltaz=400 
    > mybinneddata.txt

Plotting it up in Matlab:

\>\> data = dlmread(\'mybinneddata.txt\');

\>\> lon = reshape\_pbin3d(data,1);

\>\> lat = reshape\_pbin3d(data,2);

\>\> tb = reshape\_pbin3d(data,4);

\>\> tb(tb\<=0) = nan;

\>\> pcolor(lon,lat,tb), shading flat, axis equal tight

\>\> cb = colorbar;

\>\>
set(gca,\'fontsize\',14,\'color\',\'none\',\'box\',\'on\',\'fontweight\',\'normal\',\'tickdir\',\'in\',\'xminortick\',\'on\',\'yminortick\',\'on\')

\>\> xlabel(\'East Longitude\')

\>\> ylabel(\'Latitude (deg.)\')

\>\> ylabel(\'Latitude\')

{width="3.982646544181977in" height="3.4in"}

This map is fairly sparse (we're only looking at a 2-hr local time window), so I ran the same query, this time using the full mission (200906 -- 201506):

{width="4.0in" height="3.4148151793525807in"}

Here I did the same query on channel 8, and then made a difference map (rocky areas have large ∆Tb):

{width="4.0in" height="3.6333333333333333in"}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly