-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of "most rapidly varying" #530
Comments
Hi Chris, Thanks for describing this so clearly! Would a new entry in 1.3 terminology be sufficient? e.g. something like most rapidly varying dimension: <definition>, and then make sure we use that exact phrase elsewhere in the text:
I notice that in UGRID (which is also CF, now) there is sometimes the option to specify which dimension is the most rapidly varying, e.g.
If I understand correctly, given that in CDL/netCDF the most rapidly varying dimension is the last one, this description is misleading. It implies that the face index is always the slowest varying dimension, but that it can be in either position. Not so, right? |
Dear Chris and David Thanks for addressing this issue. I agree with defining "most rapidly varying dimension" in 1.3 (David's suggestion) and I agree also with saying "last" in the text (Chris's suggestion). In addition, I suggest we should clarify "last" in the text. That is, I think we should say what we mean in more than one way, consistently each time. That's redundancy in the text, but ought to help with clarity so long as we maintain consistency. My proposal for the three cases is:
In 1.3, we could insert a definition like this: most rapidly varying dimension: The dimension of a multidimensional variable which differs by unity (modulo dimension size) for elements that are adjacent in storage. When netCDF is represented in CDL, the most rapidly varying dimension is the last one e.g. How's that? Best wishes Jonathan |
lovely (and clear), from my perspective. |
This looks good to me, thanks! One nit:
Would't that be: (time, vertical, latitude, longitude) in CDL order? so a bit confusing to have it in the opposite order in the text. I know I follow an example before I carefully read the text! was COARDS originally written with Fortran in mind? |
Those words have not changed, @ChrisBarker-NOAA, but I agree that it would be logical to put it in CDL order - good point. I don't know what software environment the authors of COARDS had in mind! Is this OK:
Since we would not be quoting COARDS verbatim, I have rephrased it, in the hope (though not the certainty) of making it clearer. |
Thanks! I think that's better, yes. |
Hi, This is looking good to me, thanks. A couple of questions:
|
Dear @davidhassell
Best wishes Jonathan |
I like @davidhassell's wording :-) |
More than three weeks have passed with no further comment. I have prepared PR #535 to implement these changes, as I drafted, with the subsequent changes by @ChrisBarker-NOAA and @davidhassell. Please could someone check and merge. Thanks. |
I hate to be the fly in everyone's drink, but the current definition is faulty. In combination with some changes in the wording, may I propose an update to the text as follows: most rapidly varying dimension: The dimension of a multidimensional variable for which elements are adjacent in storage. When a netCDF file is represented in CDL, the most rapidly varying dimension is the last one listed, e.g. |
Thanks @pvanlaake : good catch! For me and everyone else, the text proposed in #535 is: """ So the change is that C and Fortran doesn't "also call it" and which is column-major and which is row-major are swapped. Also the text "C and Python NumPy use the same order as C" is reworded. But I think maybe the sentence was supposed to be: "C and Python NumPy use the same order as CDL ..." Which I think is worth saying. Small note: Numpy uses C-order by default, but also supports fortran-order -- though pobably a technicallity to detailled to get into here. Is it close enough to put this discussion in the PR for final editing? |
I proposed a few more little tweaks:
On the issue of supporting either ordering scheme: the conventions dropped the COARDS requirement on dimension ordering, the implication of which is that a CF-compliant reader should be able to manage both arrangements. Both Python and R support array permutation so no issues there (for those two programming ecosystems), just so long as one analyses the dimension ordering. |
How about "when a netCDF dataset is represented in CDL". NetCDF is a really a data model rather than a file format, and doesn't have to be represented in a file, e.g. netCDF-Zarr represents the dataset as a set of nested directories. |
Please could someone open a new defect issue and attach a new PR to it for this. It would be confusing to have the same issue listed twice in the revision history. Thanks. |
Defect issue opened as #583. PR will follow as soon as any further comments and suggestions have been received. |
closing in favor of: #583. |
Clarify use of "most rapidly varying" dimension.
In (at least three) places in CF, we refer to the "most rapidly varying" dimension (and are thinking of adding a fourth, in the cell definition, discussed in #163.
I'm enough of a computer geek to know what this means, though I'm not (wasn't) sure quite how it applied to CF.
e.g. ""most rapidly varying" index to mean the one which varies by 1 for the addresses of adjacent locations in storage, i.e. the first index in Fortran, the last in C and CDL"
If, ion fact, it's always the last in CDL (and in netcdf itself), then I think this language could not only being confusing to folks less familiar with the intricacies of array store, but also send. people on the wring track if they are, e.g. writing a file with Fortran, and might think that "most rapidly varying" means the first index, as it is in Fortran.
The three places I found "rapidly varying"
in 1.5 COORDS: "...COARDS restricts the axis (equivalently dimension) ordering to be longitude, latitude, vertical, and time (with longitude being the most rapidly varying dimension)."
in 2.2, in the discussion of strings: "... a variable of type string with n dimensions, or as a variable of type char with n+1 dimensions where the last (most rapidly varying)..."
in 7.1 in cell boundaries: "The additional dimension should be the most rapidly varying one"
Now that I've written this all out -- maybe the only thing to do is adjust the text in 7.1, which is bering worked on right now in #163 (PR #521)
However, maybe it would be good to put in the spec somewhere that "the most rapidly varying" dimension is always the last in a netcdf file? I'm sure that's defined in the netcdf spec itself, but having int in CF could be helpful.
NOTE: there may be other places to look at in the doc -- I only. found these three by searching "rapidly varying"
Moderator
TBA
The text was updated successfully, but these errors were encountered: