-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grid cells with a varying number of cell bounds #163
Comments
It sounds reasonable, but UGRID appears to specify Do you know how widespread the use of the |
@martinjuckes: No I don't know how widespread it is. Can others help out here? I would note that UGRID defines lots of "index" arrays, where perhaps an integer missing value makes it easier for software to handle the complicated relationships among grid cells that UGRID tries to describe. I'm not sure we need CF to be consistent with UGRID. |
@taylor13 My understanding in the past has been that storing I, personally, dislike the idea of repeating trailing values. It makes it more difficult to recognize that the vertices are not, in fact, valid vertices. Using the fill value is unambiguous and doesn't require you to construct the N-D vertex set before determining that some of the vertices are degenerate and should be discarded. |
I agree this alternative approach is really just a clever trick that would allow most software needing to work with vertices to operate without modification. Is there a case where the software would care that is was connecting two points that were co-located? The current approach with _FillValue used for unneeded vertices would require software to identify these special non-vertices and would complicate some software. I agree that _FillValue, rather than missing_value is the better choice if we want to go with the current approach. |
@taylor13 I guess it depends on the software. I don't work with GIS or other systems that use polygons as input enough to know what the "usual" case is, nor what the impact of degenerate vertices is. From a coding standpoint, pruning empty vertices is trivial when _FillValue is used. Pruning them with repeated values is more complicated. If there are, in fact, lots of packages out there that expect and properly handle degenerate vertices, then perhaps we should allow that option, but only if those packages also read and properly use netCDF content with no additional code required. I will point out that we regularly adopt the attitude that CF does things "the CF way" and that software needs to adapt to CF, rather than the other way around. |
I've checked the OGC standard which is used, apparently, by most GIS software, and they appear to have another approach: repeating the first value to indicate a closed polygon (i.e. I agree with @JimBiardCics that using the In response to the question from @taylor13 about UGRID: I think that having a conflict with UGRID could cause problems. There is a proposal to integrate UGRID into the CF conventions (#153). The two use-cases could possibly be kept apart (since the UGRID rules appear to apply only when |
@martinjuckes The repetition of the first value to indicate a closed polygon doesn't have anything to do with the issue at hand. I can see your point about interpreting the presence of a missing_value attribute as indicating the need for more careful handling of the bounds variable, but this has the minor downside of adding another semantic to the missing_value attribute. It crosses my mind to wonder how all of this may interact with the use case where there is a missing measurement that leads to missing auxiliary coordinate values and bounds vertices that are associated with that measurement. We need to tread carefully. We could address the potential overhead issue by declaring a new attribute which indicates whether or not the number of vertices in the bounds variable is variable or fixed. I agree that the issue is deeper than a question of redundant plotting of points or zero-length lines. It may be important for a software package to know the sided-ness of each bounding polygon. |
@JimBiardCics : I think the GIS standard is relevant because it shows that there are other approaches to presenting a list of polygons with varying numbers of vertices. We still don't know how wide-spread the usage described by @taylor13 is, but I'm now confident that it does not extend to GIS (a negative result, but still a small step forward). I agree that a range of problems can be avoided or mitigated by insisting on or recommending the use of a new attribute to declare when there are variable numbers of vertices. I also agree with the point that there is potential conflict with the broader, pre-existing meaning of either |
@martinjuckes The repeated first vertex policy doesn't address the question of varying numbers of vertices in a fixed array. The GIS approach differentiates an open polyline from a closed polygon, which is something we aren't concerned with in bounds variables. I don't think we can assume that the data grid is always known a priori. It's certainly true for models, but it may well not hold for data acquisition scenarios. |
@JimBiardCics : I hope we are agreed that differentiating a triangle from a rectangle is the main topic here, with a side discussion on how it is done in other communities and standards, and a few other interesting tangents as usual. Do you have a particular data acquisition scenario in mind? We are discussing grid cells in a regular grid, so I believe there is an assumption of some regularity in the structure. |
@martinjuckes, I don't have a particular use case in mind. I think we shouldn't assume that the only use case is a regular grid that is known beforehand. There's no particular reason to assume it, is there? |
@JimBiardCics : I'm not sure. We have to deal with use cases we have. On the other hand, it is explicitly permitted to have missing values in auxiliary coordinates, and these auxiliary coordinates may have bounds attached, so I suppose there is an argument that it is implicit that the bounds could have missing values. Still. I'm reluctant to propose a change to the convention based on a hypothetical use case. |
@martinjuckes, I think it's a great idea to add language to the bounds section in CF that explicitly describes how to handle varying number of vertices. The way to do it is already defined. It is not explicitly stated in relation to bounds vertices, but there is already a clear designator for unused variable elements—
and
My further argument against using In summary, UGRID got it right, and is following CF with the mechanism they are using. Let's absolutely add clear language about how to use |
@JimBiardCics , OK, I would be happy to accept the status quo, with some clarification. I think I misunderstood your interpretation of the distinction between That would make sense, but I think it goes beyond what is specified in CF or NUG. Older versions of the NUG had precisely the reverse interpretation: e.g. Version 2.3 of the NUG:
I know this is ancient, but it shows that people have had a different interpretation. The current NUG retreats from this and simply says that |
@martinjuckes, it's interesting to see how different people read the same text and get different things from it. (And it doesn't help when the text changes over time!) Based on the version(s) of the NUG that I am familiar with, I've always worked on the assumption that |
@JimBiardCics very true. It is certainly true that |
@martinjuckes I think it's possible to do what you are talking about with a combination of Given that, I don't think it's particularly necessary. I think an "empty" cell is a sufficient marker. |
@taylor13 asked: " Is there a case where the software would care that is was connecting two points that were co-located?" Absolutely. If you are simply drawing it, then duplicated points are a non-issue. But if you are doing computations on the polygon: interpolating, computing fluxes over cell bounds, putting them into a spatial data structure for searching, any number of things -- degenerate cells are a real problem. On the other hand, at least some plotting software (e.g. matplotlib) is perfectly happy with missing values :-) And with floating point issues, I'd really hate to have to test equality to know a quad was really a triangle. BTW: while refering to GIS standards makes sense in some cases, they actually are really bad at representing grids, and shared vertices in general, so we should not assume they have it right for CF use cases. |
Hi all, I think we all ran out of steam on this issue 3 years ago, but without too much effort we might come to a consensus on how to proceed. From my reading (which should be checked) of the discussion I propose that we provide the following guidelines: For grids constructed from cells that do not all have the same number of sides (e.g., some rectangular cells and some triangular cells), the We can improve on the language, but please say if you disagree with this approach. Also, I have not really examined whether this suggestion is consistent with ugrid, although Martin has provided a summary above. thanks, |
Hi Karl, Sounds good to me. This suggestion is indeed consistent with UGRID, provided we also clarify that the unneeded indices must occur last in the bounds dimension. Thanks, |
Thanks for returning to this, Karl @taylor13. I agree with your proposal. |
No objections and some support have been voiced. I propose the following slightly modified text (to indicate the unneeded indices must occur last): "For grids constructed from cells that do not all have the same number of sides (e.g., some rectangular cells and some triangular cells), the cell_bounds must be dimensioned to accommodate the maximum number of cell vertices. For cells with fewer than the maximum number of vertices, the unneeded elements should appear as the last elements in cell_bounds and should be assigned the _FillValue." Please feel free to suggest edits if this can be improved. |
This all looks good -- but does this text belong in the CF doc? or in the UGRID spec, where it already says: """ NOTE; in practice I've usually seen -1 used for the "missing node" value -- should we mention that as a common, but not required, practice? maybe not clear enough (and we need to edit the doc ti change "we propose" to something more definite. In |
@ChrisBarker-NOAA. I think it belongs in the CF doc, because this can arise for CF metadata which isn't UGRID. That's how this issue began. Of course, it's desirable, and good, if the same convention is used for UGRID and non-UGRID in CF. @taylor13. Thanks. I think the amended text is fine. Would you like help with doing a PR? |
Dear Karl @taylor13 Three weeks have passed with no objections, and enough support has been given, so this change is accepted. We need a pull request to implement it. Karl, would you like to do this, with help as required? Best wishes and thanks Jonathan |
Dear all In this issue we agreed a proposal of Karl's three months ago and since then has been waiting for someone to write a PR. We decided on the text to add, but I don't think we decided where to put it. It seems to me that it should be appended to the first paragraph of sect 7.1. However, looking at the context, I feel we should also insert another couple of sentences to explain how many vertices one might expect there to be. That is not otherwise made clear until further down, when we describe some cases explicitly. I suggest that we introduce a new term, because I think it helps with the new sentences, in the terminology section 1.3:
The first paragraph of 7.1 is currently
In the above, I propose we change should to must (the two bold words), because these are requirements (as the conformance document indicates). I propose we append the following to this paragraph:
In the above, the three bold sentences are what I've added. The other bold bits are insertions or substitutions in Karl's text. In your final sentence, Karl, there are two "should"s. Are they recommendations or requirements? In either case, we could add them to the conformance document. In the conformance document, we have
I think that's unaffected by the new convention, but I propose to append the folllowing to it:
I have prepared PR 521 to implement the above changes. I realise this goes a bit beyond what we agreed, but I hope not to the extent that we should revisit the decision. I believe my additions and changes are clarification to the proposed text. Best wishes Jonathan |
Thanks, Jonathan, for initiating the pull request. Regarding
I suppose if we say "must", that would be less backward compatible. For future datasets it sure would be good for everyone to handle this as we've suggested, so maybe we should change this to "must", even though older datasets may be non-compliant. What do others think? |
Hi Jonathan @JonathanGregory, You suggested the following text:
Is there a reason not to specify that the size of the vertex dimension for an n-dimensional coordinate variable has to be Later you suggest
I have a similar question as above on specifying a size of |
Dear Ethan I wondered also about repeating the phrase to be explicit, but it sounded cumbersome! We do have the conformance document as well, just in case there is any doubt. What do you think of
That has even less repetition, and to me it seems unambiguous. If "it" is omitted, you can't worry about what "it" refers to! It's like a function definition, supplying only the arguments on the second call. At present, the convention for boundary variables only describes one- and two-dimensional (in a spatial sense) cells. Two-dimensional boundaries are faces, which must have more than two vertices. There's a later part of the text in sect 7.1 which I've worried about before:
Since it talks about traversing the vertices clockwise, I think it must still be concerned with faces. For instance, it's not talking about the eight vertices of a cube which isn't aligned with the coordinate axes, where each of the vertices needs its own 3D coordinates. Multidimensional cells have two-dimensional faces. In a 3D grid of tilted cube cells, each cell has three indices I suspect that this part of the text is flawed, and we should remove it if so. For bounds in more than two dimensions, this convention isn't inadequate. UGRID could be used. But perhaps I have misunderstood the idea? If I'm right, it should be dealt with in a separate issue. The only connection with the present issue is in the text I inserted. Perhaps boundary variables should be disallowed for auxiliary coordinate variables of more than two dimensions, and the statement above should instead be
Best wishes Jonathan |
Dear all We got stuck with this five weeks ago. Ethan suggested replacing "its" at one point to be explicit. Instead, I've adopted the same word order as in the conformance document, which avoids this difficulty, and I think it's good to be consistent anyway. I have updated the PR with this text now appearing in section 7.1 on "Cell boundaries":
You may notice that I've changed the last two sentences to allow the vertex dimension to be greater than the maximum number of cell vertices. For instance, if you have a grid with triangles and squares, you could make the vertex dimension 6 instead of 4, in case you want to add hexagons later. The alternative is to say it should be the present maximum, but that would imply either recommending or requiring boundary variables to be resized if you took a subset of the cells which didn't include any with the maximum number of vertices, for example. That strikes me as unnecessary. If we can cope with unused elements at the end of the vertex dimension, it doesn't matter how many there are. It's up to the data-writer how much they care about saving space, I would say. I don't think this point was discussed before, and I hope it's not controversial. What do you think? If the above paragraph is acceptable, we can merge the PR to achieve what Karl proposed in this issue. Is it OK with you, Ethan @ethanrd and Karl @taylor13? Ethan asked another question, about whether we could say coordinate variables of n dimensions should generally have boundary variables of n+1 dimensions. As I mentioned before, I suspect that the convention of boundary variables of dimension greater than two is flawed. This is a different issue from the one Karl raised, which I don't want to delay. I will therefore open another issue to address it. Best wishes Jonathan |
Hello,
I think this ought to a "must appear as the last elements", otherwise this text is at odds with the UGRID text, and that part of the UGRID text is part of CF.
UGRID is also uncertain here (as concluded by issue ugrid-conventions/ugrid-conventions#53), which was why we excluded UGRID 3-d volumes from incorporation into CF. |
OK, @davidhassell. Karl thought "must" might be the correct choice, and you have provided a reason. If this convention hasn't been allowed before, I think that's OK even for our generous interpretation of backward compatibility. I have changed that sentence from "should" to "must". It must be
I have updated the PR. Is it OK, Karl @taylor13 and @ethanrd?
Whether or not UGRID can handle it, the convention is unclear, I think. I have opened #527 about this. |
@JonathanGregory - The PR looks good to me. Thanks. |
All looks good to me, but a related question: "The vertex dimension must be the most rapidly varying one" I'm pretty sure similar language is present elsewhere in CF, but this seems odd to me -- Fortran and C use different data storage conventions, so that the "most rapidly varying one" is different. But is it always the same in netCDF (i.e. the last)? -- if so, shouldn't we just say that? If not, how is a less savvy user (or a compliance checker) to know what the most rapidly varying one is? If this is well documented somewhere, great, but I see the term three places in CF, and none of them define what it means, though there is this: "... a variable of type string with n dimensions, or as a variable of type char with n+1 dimensions where the last (most rapidly varying)..." which implies that the last is indeed always the most rapidly varying -- so maybe we should use that language everywhere? |
Dear @ChrisBarker-NOAA Thanks. I understand "most rapidly varying" index to mean the one which varies by 1 for the addresses of adjacent locations in storage, i.e. the first index in Fortran, the last in C and CDL. The phrase was probably used in order not to trip people up e.g. by saying "last" if they weren't aware that CDL is like C. But I can see that it could be confusing, and we should maybe change or define it, or both. It would be very helpful if you or someone else could open a Best wishes Jonathan |
Looks good to me, Jonathan. Thanks! We could anticipate the outcome of the issue just raised by Chris by changing
to read something like
but I leave that up to you. |
wait: "The vertex dimension must be the most rapidly varying one (i.e., the first variable dimension Isn't it the last when coding in C? ( I know I have to think carefully about it each time ...) -- anyway, that's not relevant anyway, as regardless of how you code it, it should always be the same in the netcdf, yes? I've opened a defect issue (#530), as Jonathan suggested, though having looked more, there's maybe little to do. For this topic, I suggest:
That would be consistent with the text describing character arrays. |
Yes, of course. Brain fog set in early today. And your wording is correct. |
LGTM. |
Three weeks have passed with no further concerns. Enough support was expressed according to the rules for this to be accepted. Therefore I will merge the PR. |
Pull request 521
There was some discussion beginning in 2017 (thread begins here) and continuing in 2018 (beginning here) about how to handle the following:
Consider a (flattened) grid consisting of both triangular and rectangular grid cells with n total grid cells. In this case the cell_bounds for longitude should be dimensioned [n, 4] because there are a maximum of 4 cell vertices.
My question is: for the triangular grid cells, what value should be assigned to the 4th vertex (corner)? Should it be set to
missing_value
?There seemed to be no strong objection to defining both the missing_value and
_FillValue
as attributes of the bounds variable and assigning the same value to both, which would also then occupy the 4th element of the triangular grid cell vertices.As far as I know, no formal action was taken on this, but for CMIP6 we did recommend that netCDF files containing CMIP6 output be written consistent with the above guidance.
Subsequently, I have learned that another strategy has been adopted by some "remapping" codes (i.e., "regridding packages"). They expect the unneeded vertices of a "bounds" variable be filled with the same value as the last needed value. So, for example, if a triangular grid cell has vertices located a, b, and c, the bounds variable (dimensioned 4 to accommodate other quadralateral grid cells) would be filled with the values [a, b, c, c], the c being duplicated to occupy the unneeded 4th element.
I think this approach is superior to filling unused vertices with
missing_value
or_FillValue
. If adopted, we could easily explain it in a sentence.The text was updated successfully, but these errors were encountered: