Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

fix: histogram only on non-nan values #15

Merged
merged 8 commits into from
Aug 19, 2024

Conversation

sahahner
Copy link
Member

@sahahner sahahner commented Aug 9, 2024

Fix issue #14

@sahahner sahahner requested a review from gabrieloks August 9, 2024 10:27
@sahahner sahahner linked an issue Aug 9, 2024 that may be closed by this pull request
@sahahner sahahner changed the base branch from develop to first-release August 9, 2024 10:39
@gmertes gmertes marked this pull request as draft August 9, 2024 11:22
@gmertes
Copy link
Member

gmertes commented Aug 9, 2024

Putting this one on hold until after #9

Base automatically changed from first-release to develop August 9, 2024 14:30
@gmertes gmertes marked this pull request as ready for review August 9, 2024 14:45
@gmertes
Copy link
Member

gmertes commented Aug 9, 2024

Somehow there are no conflicts 😱

Sara, please update the changelog and then should be good to go.

@FussyDuck
Copy link

FussyDuck commented Aug 12, 2024

CLA assistant check
All committers have signed the CLA.

@sahahner sahahner requested review from anaprietonem and removed request for gabrieloks August 12, 2024 10:27
anaprietonem
anaprietonem previously approved these changes Aug 12, 2024
Copy link
Contributor

@anaprietonem anaprietonem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks to me! As a minor thing I would add a comment to explain that because of the postprocessors being applied, then the data might contains NaNs

@sahahner
Copy link
Member Author

In addition to fixing the issue, I noticed that the histogram plot containing histograms for the true and predicted variable does not fix the range of the bins. This leads to bins of different sizes and at different positions. Therefore, the histograms do not allow a fair comparison of the two variables if the values lie in different ranges.
The last commit incorporates a change, which forces the two variables to be binned into the same bins.

The effect on the histogram plotted after few epochs, when the differences are still high, is clearly visible:
Before: with bins covering smaller ranges on the x-axis, densities of true data go up to 3500
image
After: density up to 8000 using the same bins for both
image

I also changed the name of the label "Truth (ERA5)" to "Truth (data)"

@anaprietonem, can you please have a look at the additional changes?

@anaprietonem
Copy link
Contributor

In addition to fixing the issue, I noticed that ...
So If I am reading correctly what you are doing is passing the range to the histogram so when plotting both the histogram for the predicted and truth data that uses the same range, right?

I also changed the name of the label "Truth (ERA5)" to "Truth (data)"
About this, so I assume you'd like to remove ERA5 since we might use other training datasets? If so maybe we should change it also in the spectra plot where there is another label like that. What do you think about using 'Predicted' or 'AI-predicted' rather than anemoi?

@sahahner
Copy link
Member Author

So If I am reading correctly what you are doing is passing the range to the histogram so when plotting both the histogram for the predicted and truth data that uses the same range, right?

Yes.

@sahahner
Copy link
Member Author

About this, so I assume you'd like to remove ERA5 since we might use other training datasets? If so maybe we should change it also in the spectra plot where there is another label like that. What do you think about using 'Predicted' or 'AI-predicted' rather than anemoi?

That is a very good proposal. I changed it in the other plot as well.

Yes, ERA5 is not necessarily used by everyone. I also want to point out that this is not the meteorological truth, but what our model considers as true data.

@anaprietonem anaprietonem self-requested a review August 13, 2024 14:09
anaprietonem
anaprietonem previously approved these changes Aug 13, 2024
Copy link
Contributor

@anaprietonem anaprietonem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes applied to the histogram function looks good to me!

@sahahner sahahner merged commit 0436daf into develop Aug 19, 2024
108 checks passed
@sahahner sahahner deleted the 14-histogram-plot-for-variables-cotaining-nans branch August 19, 2024 12:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Histogram plot for variables cotaining NaNs
4 participants