-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat : implement under/overflow function #413
Conversation
Thanks a lot! I'll take a look tomorrow. |
Alright. Can you split this into two PRs, I think the |
For tests you can take a look at the ones implemented here https://github.com/scikit-hep/mplhep/blob/master/tests/test_basic.py |
Hi, thanks for working on this! The "default" option above draws a lot of attention on the outmost bins, but does not explain what the red color corresponds to. In my opinion there is no special visualization needed if flow content is not included: this is the only case where the plot actually does what it says on the axis. The "show" and "sum" cases on the other hand are not doing this, they suggest the bin contents have a value of the observable within the range indicated by the bin edges (which is not necessarily true). I think it would be helpful to denote somehow that those are special. One way in which this could be accomplished is by breaking the axis in the middle of the flow (or sum) bin. In principle this would also require the axis labels to be updated accordingly. This might not generalize too well, and I don't know how to translate that to 2d. |
Hi @alexander-held , thanks for the suggestion,
Indeed this is the "hint" option, where highlight the bins if under/overflow contents exist
To have break axis, do you mean the style like this?https://matplotlib.org/3.1.0/gallery/subplots_axes_and_figures/broken_axis.html |
Thanks for clarifying, I misunderstood this then. If the default behavior is that the plots do not change at all then that sounds reasonable to me. For the broken axis: yes, I was thinking of something like that. To be clear here: I don't think this is crucially needed, it was just meant as an idea. I know that in practice there are many plots that use e.g. the "sum" approach. This is not specific to the changes here, it is a more general comment. Having the bins end at a clearly defined value on the axis that does not actually correspond to the cutoff for the bin content is what I find personally misleading. It is even more misleading in the examples above where the axis continues to the left and the right of the histogram, suggesting additional empty bins (so really suggesting that all events are in [5, 15]). I am not opposed to this implementation here though, users are still free to use what they want of course. |
Thanks a lot @Ming-Yan for adding the pics. I am +1 on the broken axis (particularly for the "hint" and "show" options).
I am not sure I agree that ignoring the flow bin does actually "do what it says" because for most plots the xlims don't match the extent of the histogram, but include some edge whitespace, which would suggest there's nothing there, rather than "no info here" which could be well shown with the broken axis. |
Agreed on the issue of xlim not matching the range of the histogram. I guess I'd always set the limits so that this cannot happen in practice. |
Thanks for the discussion, I would try to implemented broken axis for "hint" and "show" option |
+1 for Broken axis to show flow bins, this seems like a really nice way to communicate it visually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I think this is definitely in the right direction. The added test baseline png is not updated and the existing test baseline pngs should not be changed in this PR (there's no visual difference, it's just that they're not bit by bit exact copies, so doing git commit -a
is adding them on.
For the viz in 1D I think we don't need the markers on the top axis and I think it would look better if the "broken axis" markers didn't go through the bin edges, but were contained "within" the bin.
I think if the gap value in "show"
is set to nan it should also help in the 2D plot, where right now it looks like a 0 value bin, as opposed to a "missing" one.
src/mplhep/plot.py
Outdated
@@ -168,9 +171,42 @@ def histplot( | |||
) | |||
|
|||
# Cast to plottables | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still present?
Thanks @andrzejnovak for reviewing the updates. The suggests implemented in the new commit with adding a few more tests |
637677e
to
359928a
Compare
- add additional space for "show" option, change the tick labels - using slash for "hint", "show" options
- add additional space for "show" option, change the tick labels - using slash for "hint", "show" options
- plot: remove None option, put warning for no flow bins histogram - plot: add flow bins option ROOT histogram - test : add two-side,one-side and no flow bins
Conflicts: src/mplhep/plot.py tests/baseline/test_hist2dplot_flow.png tests/baseline/test_histplot_flow.png tests/test_basic.py
- plot/test: remove None options - test: add dedicated test for uproot/hist with various flow scheme - test: add type check (numpy/hist with flow=False) [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci
src/mplhep/plot.py
Outdated
for i, h in enumerate(hists): | ||
value, variance = h.values(), h.variances() | ||
if not isinstance(h, Hist.hist.Hist) and not "uproot.models.TH" in str(type(h)): | ||
print(f"Warning: {type(h)} is not allowed to get flow bins") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is this triggered every time you pass just an array as a hist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems you have this solved for 2D with and flow is not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be triggered every time, otherwise the hist would not filled as flow
set to None
after 1 iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a good practice tho? I don't think I should see this warning if I all I do is call histplot([1,2,3,2,1])
which I think will happen here right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, add and flow is not None
and indeed if we have None
option, we would keep the case plottables.append(Plottable(value, edges=final_bins, variances=variance))
src/mplhep/plot.py
Outdated
trans = mpl.transforms.blended_transform_factory(ax.transData, ax.transAxes) | ||
kwargs = dict( | ||
marker=[(-0.5, -d), (0.5, d)], | ||
markersize=15, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could the marker size be somehow derived in relation to figure size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it possible, I take height to change the size
ax_h = ax.bbox.height
kwargs = dict(
marker=[(-0.5, -d), (0.5, d)],
markersize=ax_h*0.05,
linestyle="none",
color="k",
mec="k",
mew=1,
clip_on=False,
transform=trans,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right for None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hep.histplot(h, ax=axs[0], flow="hint")
hep.histplot(h, ax=axs[1], flow="show")
hep.histplot(h, ax=axs[2], flow=None)
hep.histplot(h, ax=axs[3], flow="sum")
axs[0].set_title("Default(hint)", fontsize=18)
axs[1].set_title("Show", fontsize=18)
axs[3].set_title("Sum", fontsize=18)
axs[2].set_title("None", fontsize=18)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah.. sorry I think it's the confusion of check under/overflow bin where the elif flow is None : continue
should be added. And I don't think it changes the hist object(we didn't modify it anywhere in the code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this, the object should not be different
np.random.seed(0)
entries = np.random.normal(10, 3, 400)
h = hist.new.Reg(20, 5, 15, name="x", underflow=True, overflow=False).Weight().fill(entries)
hc = copy.deepcopy(h)
h.plot()
h.plot(flow='sum')
h.plot(flow='show')
h.plot(flow=None)
plt.close()
h.values(flow=True) == hc.values(flow=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one-sided flow tests seem to misbehave as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.. I think the problem is you cannot check the flow bin is underflow or overflow from the stored hist
... You would see there's a flow bins(no h.values(underflow=True)
to check)
Hist(Regular(20, 5, 15, overflow=False, name='x'), storage=Weight()) # Sum: WeightedSum(value=359, variance=359) (WeightedSum(value=377, variance=377) with flow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, would have to check if the object has h.axes[0].traits
and if underflow
and overflow
are set in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, then i need to add additional check for under/overflow bins for hist
case, I just simply check
underflow = underflow + h.values(flow=True)[0]
overflow = overflow + h.values(flow=True)[-1]
- plot: fix hist under/overflow bins check - plot: tmpfix on hist value/variances - add hist to setup.py
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still couple of explicit hist
checks to be rid of in the code. Plus some flake8 errors https://results.pre-commit.ci/run/github/184555939/1681661620.370AvrXARZSbUqgBkBfvJQ
I am not sure what's tripping up the actions, but the problem happens because a test fails. Checking locally it seems the 2d flow test doesn't have an updated example |
- test: update None for hist2dplot - plot: make copy for hist - update plot
- test: fix cbar min/max - plot: fix vmin/vmax for cbar - plot: fix 2d flow bins check - plot: fix sum option
[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci fix:CI
Thanks a lot @Ming-Yan! |
This causes a massive printout in some cases. @HDembinski made the str of a histogram a text plot of a histogram, which makes this warning printed unreadable. A fix would be to use the repr instead. See the PyHEP histogramming Gitter channels for details from the user report. Also, this should be a |
and not h.axes[0].traits.overflow | ||
): | ||
flow = None | ||
print(f"Warning: you don't have flow bins stored in {h}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been:
warnings.warn(f"You don't have flow bins stored in {h!r}")
and flow is not None | ||
): | ||
print( | ||
f"Warning: {type(h)} is not allowed to get flow bins, flow bin option set to None" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be a warnings.warn. There's a Ruff check you can enable to disallow all print statements in library code.
Also, I don't understand the default. Calling plot on a flowless histogram should not produce a warning about missing flow bins; you should not have to pass |
Hi @henryiii , apologize for noticing the suggestions late and thanks for the fixes |
workable version for #337
histplot
,hist2dplot
function