Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdutil.averager could do a sanity check of the delta of the dimension bounds #235

Open
jypeter opened this issue Mar 21, 2018 · 5 comments
Assignees
Labels
Milestone

Comments

@jypeter
Copy link
Member

jypeter commented Mar 21, 2018

@dnadeau4 One of our interns was a victim of the following side effect. The time bounds were not correctly defined in a file she was using, with the bounds of each time step having the same value as the time step

 time_counter = 86400, 172800, 259200, 345600, 432000, 518400, 604800,

 time_counter_bounds =
  86400, 86400,
  172800, 172800,
  259200, 259200,
  345600, 345600,
  432000, 432000, 

The correct values should have been

 time_counter = t0, t1, ...
 time_counter_bounds =
  t0 - delta/2, t0 + delta/2,
  t1 - delta/2, t1 + delta/2,

but we had

 time_counter_bounds =
    t0, t0,
    t1, t1, 

This led to crazy (big) values when averaging over the time axis, probably due to a division by zero or something similar

>>> import cdms2, cdutil, genutil
>>> f = cdms2.open('/home/scratch01/jypeter/time_counter_bounds_pb.nc')
>>> U = f('U')
>>> import genutil
>>> genutil.minmax(U)
(-31.189350128173828, 23.517915725708008)
>>> U_avg = cdutil.averager(U, axis='t')
>>> genutil.minmax(U_avg)
(1.7976931348623157e+308, -1.7976931348623157e+308)

Let me know if you need a sample data file to reproduce this bug (or side effect)

@doutriaux1 doutriaux1 added this to the 3.1 milestone Mar 22, 2018
@doutriaux1
Copy link
Contributor

@jypeter it is really a user error. I guess we could add a check for empty bounds. Not sure it's really the way to go. I recommend you use cdutil.times.setTImeBoundsMonthly(data) or similar.

@dnadeau4 what do you think?

@jypeter
Copy link
Member Author

jypeter commented Mar 22, 2018

I agree that it's a user (or rather meta-data) error, and that the user could call cdutil.times.setTImeBounds* if he knew that the problem was due to this. It would actually be easier for the user if there were no time bounds, that is if they were missing rather than incorrect (what you call empty)

The trouble here is that what the user sees after the averaging above, is that all the data is masked, which makes it a bit hard for a casual user to understand what has happened

>>> genutil.minmax(U_avg)
(1.7976931348623157e+308, -1.7976931348623157e+308)
>>> U_avg.shape
(1, 90, 180)
>>> import MV2
>>> MV2.count(U_avg)
0
>>> U_avg
variable_5
masked_array(data =
 [[[-- -- -- ..., -- -- --]
  [-- -- -- ..., -- -- --]
  [-- -- -- ..., -- -- --]
  ...,
  [-- -- -- ..., -- -- --]
  [-- -- -- ..., -- -- --]
  [-- -- -- ..., -- -- --]]],
             mask =
 [[[ True  True  True ...,  True  True  True]
  [ True  True  True ...,  True  True  True]
  [ True  True  True ...,  True  True  True]
  ...,
  [ True  True  True ...,  True  True  True]
  [ True  True  True ...,  True  True  True]
  [ True  True  True ...,  True  True  True]]],
       fill_value = 1e+20)

Hmmm, the values reported by genutil.minmax are also a bit strange and misleading, and I have created a genutil issue about that (CDAT/genutil#20)

>>> genutil.minmax(U_avg)
(1.7976931348623157e+308, -1.7976931348623157e+308)
>>> U_avg.min()
masked
>>> U_avg.max()
masked

Anyway, you can download the input data if you have time to get a closer look

@doutriaux1 doutriaux1 modified the milestone: 3.1 Mar 29, 2018
@dnadeau4
Copy link
Contributor

That is a bug in genutil, it should not returns any values, and write back a message that all data are masked.

@doutriaux1 are you in charge of genutil, I can see that you wrote code in 2010?

@dnadeau4
Copy link
Contributor

@doutriaux1 I will look into this today or tomorrow.

@jypeter
Copy link
Member Author

jypeter commented Aug 28, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants