Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat : implement under/overflow function #413

Merged
merged 14 commits into from
Apr 18, 2023

Conversation

Ming-Yan
Copy link
Contributor

@Ming-Yan Ming-Yan commented Mar 2, 2023

workable version for #337

  • implemented in histplot,hist2dplot function
  • "hint" : show text if under/overflow exist, also printout the value
  • "show": additional bin with 5 times larger bin width
  • "sum": sum under/overflow bin to first/last bin

@andrzejnovak
Copy link
Member

Thanks a lot! I'll take a look tomorrow.

@andrzejnovak
Copy link
Member

Alright. Can you split this into two PRs, I think the plot_ratio plot is for a separate discussion. For to flow bin options, it would be nice if you could post what they look like and add them as tests.

src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
@andrzejnovak
Copy link
Member

For tests you can take a look at the ones implemented here https://github.com/scikit-hep/mplhep/blob/master/tests/test_basic.py

@Ming-Yan Ming-Yan changed the title feat : add plotting features feat : implement under/overflow function #337 Mar 6, 2023
@Ming-Yan Ming-Yan changed the title feat : implement under/overflow function #337 feat : implement under/overflow function Mar 6, 2023
@Ming-Yan
Copy link
Contributor Author

Ming-Yan commented Mar 6, 2023

Add 1d/2d test plot
test_histplot_flow
test_hist2dplot_flow

@alexander-held
Copy link
Member

Hi, thanks for working on this! The "default" option above draws a lot of attention on the outmost bins, but does not explain what the red color corresponds to. In my opinion there is no special visualization needed if flow content is not included: this is the only case where the plot actually does what it says on the axis.

The "show" and "sum" cases on the other hand are not doing this, they suggest the bin contents have a value of the observable within the range indicated by the bin edges (which is not necessarily true). I think it would be helpful to denote somehow that those are special. One way in which this could be accomplished is by breaking the axis in the middle of the flow (or sum) bin. In principle this would also require the axis labels to be updated accordingly. This might not generalize too well, and I don't know how to translate that to 2d.

@Ming-Yan
Copy link
Contributor Author

Ming-Yan commented Mar 7, 2023

Hi @alexander-held , thanks for the suggestion,

Hi, thanks for working on this! The "default" option above draws a lot of attention on the outmost bins, but does not explain what the red color corresponds to. In my opinion there is no special visualization needed if flow content is not included: this is the only case where the plot actually does what it says on the axis.

Indeed this is the "hint" option, where highlight the bins if under/overflow contents exist

The "show" and "sum" cases on the other hand are not doing this, they suggest the bin contents have a value of the observable within the range indicated by the bin edges (which is not necessarily true). I think it would be helpful to denote somehow that those are special. One way in which this could be accomplished is by breaking the axis in the middle of the flow (or sum) bin. In principle this would also require the axis labels to be updated accordingly. This might not generalize too well, and I don't know how to translate that to 2d.

To have break axis, do you mean the style like this?https://matplotlib.org/3.1.0/gallery/subplots_axes_and_figures/broken_axis.html
I agree with the point that the flow bins value not next to the bin edges, but I think we also see some plots using the current implementation. For the sum option, as it merges to the last/first bin, I think split axis is not required

@alexander-held
Copy link
Member

Indeed this is the "hint" option, where highlight the bins if under/overflow contents exist

Thanks for clarifying, I misunderstood this then. If the default behavior is that the plots do not change at all then that sounds reasonable to me.

For the broken axis: yes, I was thinking of something like that. To be clear here: I don't think this is crucially needed, it was just meant as an idea. I know that in practice there are many plots that use e.g. the "sum" approach. This is not specific to the changes here, it is a more general comment. Having the bins end at a clearly defined value on the axis that does not actually correspond to the cutoff for the bin content is what I find personally misleading. It is even more misleading in the examples above where the axis continues to the left and the right of the histogram, suggesting additional empty bins (so really suggesting that all events are in [5, 15]). I am not opposed to this implementation here though, users are still free to use what they want of course.

@andrzejnovak
Copy link
Member

Thanks a lot @Ming-Yan for adding the pics. I am +1 on the broken axis (particularly for the "hint" and "show" options).

In my opinion there is no special visualization needed if flow content is not included: this is the only case where the plot actually does what it says on the axis.

I am not sure I agree that ignoring the flow bin does actually "do what it says" because for most plots the xlims don't match the extent of the histogram, but include some edge whitespace, which would suggest there's nothing there, rather than "no info here" which could be well shown with the broken axis.

@alexander-held
Copy link
Member

Agreed on the issue of xlim not matching the range of the histogram. I guess I'd always set the limits so that this cannot happen in practice.

@Ming-Yan
Copy link
Contributor Author

Ming-Yan commented Mar 7, 2023

Thanks for the discussion, I would try to implemented broken axis for "hint" and "show" option

@NJManganelli
Copy link

+1 for Broken axis to show flow bins, this seems like a really nice way to communicate it visually

@Ming-Yan
Copy link
Contributor Author

Implemented broken axis signed(not using subplots)
Also edit the axis label for show option

image
image

Copy link
Member

@andrzejnovak andrzejnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I think this is definitely in the right direction. The added test baseline png is not updated and the existing test baseline pngs should not be changed in this PR (there's no visual difference, it's just that they're not bit by bit exact copies, so doing git commit -a is adding them on.

For the viz in 1D I think we don't need the markers on the top axis and I think it would look better if the "broken axis" markers didn't go through the bin edges, but were contained "within" the bin.

I think if the gap value in "show" is set to nan it should also help in the 2D plot, where right now it looks like a 0 value bin, as opposed to a "missing" one.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
@@ -168,9 +171,42 @@ def histplot(
)

# Cast to plottables

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still present?

src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
@Ming-Yan
Copy link
Contributor Author

Thanks @andrzejnovak for reviewing the updates. The suggests implemented in the new commit with adding a few more tests

@Ming-Yan Ming-Yan force-pushed the master branch 2 times, most recently from 637677e to 359928a Compare April 3, 2023 13:54
- add additional space for "show" option, change the tick labels
- using slash for "hint", "show" options
- add additional space for "show" option, change the tick labels
- using slash for "hint", "show" options
Ming-Yan added 2 commits April 8, 2023 16:07
- plot: remove None option, put warning for no flow bins histogram
- plot: add flow bins option ROOT histogram
- test : add two-side,one-side and no flow bins
Conflicts:
	src/mplhep/plot.py
	tests/baseline/test_hist2dplot_flow.png
	tests/baseline/test_histplot_flow.png
	tests/test_basic.py
@Ming-Yan
Copy link
Contributor Author

Ming-Yan commented Apr 8, 2023

Adapt two-side, one-side no overflow bins plot
image

- plot/test: remove None options
- test: add dedicated test for uproot/hist with various flow scheme
- test: add type check (numpy/hist with flow=False)

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
tests/test_basic.py Show resolved Hide resolved
tests/test_basic.py Show resolved Hide resolved
tests/test_basic.py Outdated Show resolved Hide resolved
tests/test_basic.py Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
for i, h in enumerate(hists):
value, variance = h.values(), h.variances()
if not isinstance(h, Hist.hist.Hist) and not "uproot.models.TH" in str(type(h)):
print(f"Warning: {type(h)} is not allowed to get flow bins")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this triggered every time you pass just an array as a hist?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you have this solved for 2D with and flow is not None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be triggered every time, otherwise the hist would not filled as flow set to None after 1 iteration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a good practice tho? I don't think I should see this warning if I all I do is call histplot([1,2,3,2,1]) which I think will happen here right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, add and flow is not None and indeed if we have None option, we would keep the case plottables.append(Plottable(value, edges=final_bins, variances=variance))

src/mplhep/plot.py Show resolved Hide resolved
trans = mpl.transforms.blended_transform_factory(ax.transData, ax.transAxes)
kwargs = dict(
marker=[(-0.5, -d), (0.5, d)],
markersize=15,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could the marker size be somehow derived in relation to figure size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it possible, I take height to change the size

ax_h = ax.bbox.height
        kwargs = dict(
            marker=[(-0.5, -d), (0.5, d)],
            markersize=ax_h*0.05,
            linestyle="none",
            color="k",
            mec="k",
            mew=1,
            clip_on=False,
            transform=trans,
        )

feat : minor fixes

- plot: fix warning for histplot
- plot: rename to conflict with hist
- plot: change slash size
- test: make tests independent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right for None

Copy link
Contributor Author

@Ming-Yan Ming-Yan Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

find a weird bug, not sure where it from, if I swap the order of "sum" and None then I get the correct version, but we do this independently...
    hep.histplot(h, ax=axs[0], flow="hint")
    hep.histplot(h, ax=axs[1], flow="show")
    hep.histplot(h, ax=axs[2], flow=None)
    hep.histplot(h, ax=axs[3], flow="sum")
    axs[0].set_title("Default(hint)", fontsize=18)
    axs[1].set_title("Show", fontsize=18)
    axs[3].set_title("Sum", fontsize=18)
    axs[2].set_title("None", fontsize=18)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I checked out the PR, it seems the code is modifying the input object at some point, that's why it makes a mess
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. sorry I think it's the confusion of check under/overflow bin where the elif flow is None : continue should be added. And I don't think it changes the hist object(we didn't modify it anywhere in the code)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this, the object should not be different
image

np.random.seed(0)
entries = np.random.normal(10, 3, 400)
h = hist.new.Reg(20, 5, 15, name="x", underflow=True, overflow=False).Weight().fill(entries)
hc = copy.deepcopy(h)
h.plot()
h.plot(flow='sum')
h.plot(flow='show')
h.plot(flow=None)
plt.close()
h.values(flow=True) == hc.values(flow=True)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one-sided flow tests seem to misbehave as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.. I think the problem is you cannot check the flow bin is underflow or overflow from the stored hist... You would see there's a flow bins(no h.values(underflow=True) to check)

Hist(Regular(20, 5, 15, overflow=False, name='x'), storage=Weight()) # Sum: WeightedSum(value=359, variance=359) (WeightedSum(value=377, variance=377) with flow)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, would have to check if the object has h.axes[0].traits and if underflow and overflow are set in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, then i need to add additional check for under/overflow bins for hist case, I just simply check

underflow = underflow + h.values(flow=True)[0]
overflow = overflow + h.values(flow=True)[-1]

src/mplhep/plot.py Outdated Show resolved Hide resolved
Ming-Yan and others added 3 commits April 14, 2023 15:13
- plot: fix hist under/overflow bins check
- plot: tmpfix on hist value/variances
- add hist to setup.py
Copy link
Member

@andrzejnovak andrzejnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still couple of explicit hist checks to be rid of in the code. Plus some flake8 errors https://results.pre-commit.ci/run/github/184555939/1681661620.370AvrXARZSbUqgBkBfvJQ

src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
src/mplhep/plot.py Outdated Show resolved Hide resolved
@andrzejnovak
Copy link
Member

I am not sure what's tripping up the actions, but the problem happens because a test fails. Checking locally it seems the 2d flow test doesn't have an updated example

- test: update None for hist2dplot
- plot: make copy for hist
- update plot
- test: fix cbar min/max
- plot: fix vmin/vmax for cbar
- plot: fix 2d flow bins check
- plot: fix sum option
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

fix:CI
@andrzejnovak andrzejnovak merged commit d0cf7ce into scikit-hep:master Apr 18, 2023
@andrzejnovak
Copy link
Member

Thanks a lot @Ming-Yan!

@henryiii
Copy link
Member

henryiii commented May 12, 2023

This causes a massive printout in some cases. @HDembinski made the str of a histogram a text plot of a histogram, which makes this warning printed unreadable. A fix would be to use the repr instead.

See the PyHEP histogramming Gitter channels for details from the user report.

Also, this should be a warnings.warn, not a print. That way users have control over it, it gets printed nicely (with color in a notebook), it's testable, etc.

and not h.axes[0].traits.overflow
):
flow = None
print(f"Warning: you don't have flow bins stored in {h}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been:

warnings.warn(f"You don't have flow bins stored in {h!r}")

and flow is not None
):
print(
f"Warning: {type(h)} is not allowed to get flow bins, flow bin option set to None"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be a warnings.warn. There's a Ruff check you can enable to disallow all print statements in library code.

@henryiii
Copy link
Member

Also, I don't understand the default. Calling plot on a flowless histogram should not produce a warning about missing flow bins; you should not have to pass flow=None to get it to shut up. Also, if it takes a text string, it should be possible to specify everything via a text string, and None should be the same as not specifying anything. That's the standard meaning of None, and the reason it's typed as "Optional".

@henryiii henryiii mentioned this pull request May 15, 2023
@Ming-Yan
Copy link
Contributor Author

Ming-Yan commented May 16, 2023

Hi @henryiii , apologize for noticing the suggestions late and thanks for the fixes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants