Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster continuous/discrete splitting #1286

Merged
merged 9 commits into from
Oct 27, 2022
Merged

Faster continuous/discrete splitting #1286

merged 9 commits into from
Oct 27, 2022

Conversation

mlochbaum
Copy link
Contributor

Rewrite splitContinuousAndDiscreteForMinWeight using comparisons at a distance of minDiscreteWeight - 1. This is much faster, but gives a small (~5%) overall improvement in our benchmark toPointSetDist where it's used. It would be more important for mostly-discrete distributions, which we don't benchmark, and will also become more relevant if we're able to speed up KDE and sorting.

@vercel
Copy link

vercel bot commented Oct 19, 2022

@mlochbaum is attempting to deploy a commit to the Quantified Uncertainty Team on Vercel.

A member of the Team first needs to authorize it.

@vercel
Copy link

vercel bot commented Oct 20, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
squiggle-components ✅ Ready (Inspect) Visit Preview Oct 25, 2022 at 11:32AM (UTC)
squiggle-website ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Oct 25, 2022 at 11:32AM (UTC)

@codecov-commenter
Copy link

Codecov Report

Merging #1286 (418a5fd) into develop (58792a2) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop    #1286   +/-   ##
========================================
  Coverage    53.14%   53.14%           
========================================
  Files           18       18           
  Lines          382      382           
  Branches        22       22           
========================================
  Hits           203      203           
  Misses         177      177           
  Partials         2        2           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator

@berekuk berekuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall impressions:

  • the algorithm is good and seems optimal and correct (I checked it with pen-and-paper as well as I could)
  • thanks for the comments, and especially for spelling out the (...] boundaries explicitly
  • there's always a risk of off-by-one errors in cases like this, and I don't feel like this is covered enough by tests now, which worries me
  • imperative algorithms in Rescript seem awkward, there's nothing we can do about it now, but we'll have to write more of those as we optimize other parts of the distributions code

There are two cases here:

  1. In case of entirely or almost entirely continuous arrays, almost the entire win is achieved here by imperative low-level approach. I checked this the last time we had that discussion on Slack; if I inline the addData and rewrite the reduce as a loop in my old version, it gives us almost the same performance as your new version. Function calls are relatively costly.
  2. In case of discrete samples with many duplicates, your version is much faster. But also it comes right after we sort the initial array, which is O(N*log(N)), so doing the splitting in O(log(N)) instead of O(N) probably doesn't matter much?

Constant factor in matter, of course, and O(N*log(N)) + O(N) could be 10% slower than O(N*log(N) + O(log(N)) (when N is in 1k-10k range), but this observation vs the added complexity of the code is why I'm not as excited about this PR as I could be.

In the future, we should go after higher-order bits of performance wins first. (There are two easy wins I know of, I mentioned them in this slack message; there's also #1089, and also the idea that we shouldn't convert samplesets to pointsets as often as we do now).

I might change my mind if I saw a benchmark for (2); my guess is that the win there is in 3-5% range for the entire conversion procedure compared to the version without the binary search.

PS: I don't mind merging this anyway, we probably won't touch this code for a while, and it's a single isolated algorithm that doesn't affect other parts of the codebase. Local code complexity is not that important.

I won't merge this for now to give you time to address my minor comments, and also @OAGr might want to do a review.

packages/squiggle-lang/src/rescript/Utility/E/E_A.res Outdated Show resolved Hide resolved
packages/squiggle-lang/src/rescript/Utility/E/E_A.res Outdated Show resolved Hide resolved
packages/squiggle-lang/src/rescript/Utility/E/E_A.res Outdated Show resolved Hide resolved
packages/squiggle-lang/src/rescript/Utility/E/E_A.res Outdated Show resolved Hide resolved
@mlochbaum
Copy link
Contributor Author

Think I've fixed everything, other than the midpoint calculation (see my comment). I agree this wasn't a terribly important change, but I'd done most of the work by the time I figured that out. Except for the part where a discrete run is found, the code is actually simpler, so the added complexity all goes towards the case where we have an asymptotic improvement. And I do get a test failure if I change the binary search to stop at a range of 2 instead of 1, so I think the current level of testing is probably all right.

let value = sortedArray[i.contents]
if value != sortedArray[i.contents + minDistance] {
Js.Array2.push(continuous, value)->ignore
i := i.contents + 1
Copy link
Contributor

@OAGr OAGr Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny optimization, but I assume at this point you could push all of the same values up to sortedArray[i.contents + minDistance], to continuous. Not sure if this would actually speed it up though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it more, this is probably not worth doing, it would add too much complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't, since a run might start anywhere in the middle: if you have 5 6 6 6 6 6 and test with minDistance of 3, you see that 5 doesn't start a run but the next 6 actually does. It's possible to skip values if you use a smaller stride. I'll explain this as a top-level comment.

Copy link
Contributor

@OAGr OAGr Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said same value.

If you have 55588888, and line 334 catches, then you rule all of the next 5s out. So start a second loop and increment until you get to a value that's not a 5. Or do a binary search here if you think that minDistance could be really high.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One annoying thing is that I presume we need to accept some heuristic of what the data would look like. If values were very likely on being unique, conditional on not having minDiscreteWeight duplicates, then the current code is probably close to optimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, got it. It's better to search for values not equal to the last one, rather than those that are equal to the first one. The test in the second loop would run at pretty much the same speed as the main one, and there's a hard-to-predict branch when it stops, so I don't think it would be an improvement with a forwards loop. If you run the loop backwards it's fairly similar to my minDiscreteWeight / 2 version, but splitting it up as minDiscreteWeight - 1 and 1 instead.

// element indices differ by minDistance.
let minDistance = minDiscreteWeight - 1

let len = length(sortedArray)
Copy link
Contributor

@OAGr OAGr Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly long function that took me a while to wrap my head around. I'd probably be conservative and have things spelled out more, especially at the top level. (Tiny names in tiny functions are fine.)

Some examples include:

  • len -> sortedArrayLen
  • i -> sortedArrayIndex (or sortedArrayI)
  • minDistance -> minDistanceOfSameValue
  • value -> indexValue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it, but I find that style harder to read, as sortedArrayLen and sortedArrayIndex aren't as obviously different as i and len. There's only one array. Is it really that hard to guess what i indexes into?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several indexes. (i, i0, base, j, lo, mid). This, at very least, seems to have pretty terse naming to me.

I haven't seen much use of base, lo before. It seems likely that some of this follows lower level programming conventions I'm not used to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lo, mid, hi is very common for binary searches (using i for hi partly because declaring mutable variables in rescript is so annoying). I use i0 here to mean a saved copy of i; maybe iOrig for i0 and iNext for j would be better. base is the base of the binary search.

@mlochbaum
Copy link
Contributor Author

Forgot to mention: for longer values of minDiscreteWeight it's possible to do better by testing at intervals of minDiscreteWeight / 2. Setting m = minDiscreteWeight, if the value at position i isn't the same as the one at i - m / 2 or i + m / 2 then the maximum length of a run containing i is (i + m/2) - (i - m/2) - 1, subtracting 1 for exclusive range. This is less than or equal to m - 1, so i can't be in a run.

Figuring out what happens when you do see equal values is more complicated here because you also have to search backwards to find the start of the run, and you might not even have a long enough run. Given the complexity and that we don't use very high minDiscreteWeight values, I don't think we need to implement this for now.

@OAGr
Copy link
Contributor

OAGr commented Oct 21, 2022

I'm still finding this fairy hard to read. (I realize this is common though, with imperative code).

Maybe at least add a 1-3 sentence explanation of the algorithm on the top?

I don't think this one piece of code is too importand, but I would like to set practices earlier than later.

@mlochbaum
Copy link
Contributor Author

Okay, added a few comments at the top and also cleaned up the existing stuff ("performance-critical" is not so accurate).

@berekuk
Copy link
Collaborator

berekuk commented Oct 22, 2022

There's currently a lint error that should be fixed before merging.

And I do get a test failure if I change the binary search to stop at a range of 2 instead of 1, so I think the current level of testing is probably all right.

I'm still kinda worried about corner cases. My level of confidence about this code being correct is in 90-95% range, but not 99%+. My guess is that if there are bugs here, they're related to incremental 2^N jumps being aligned/misaligned with the end of of the array, so any specific test might miss them.

Can you look into fast-check for testing this better? I'd suggest generating arrays of tuples of (value, count) with arbitrary low values of count, then expand that to a continuous array, then call a function and check that the result matches the original list of tuples.

@berekuk
Copy link
Collaborator

berekuk commented Oct 22, 2022

Also, I just noticed that calling this function with minDiscreteWeight = 1 causes an infinite loop. We won't use this value, but it's worth to check it and throw an exception.

@mlochbaum
Copy link
Contributor Author

fast-check test is working! Had some trouble because rescript-fast-check's integerRange has been broken by changes to fast-check, so it would generate out-of-bounds values for the weight.

The integration with Jest is pretty sloppy: I just call makeTest in the testing function and return true. I definitely didn't want to spend a bunch of time digging into our test framework is set up if we might be switching to Typescript anyway.

nat(~max=20, ()),
testSegmentsCorrected,
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're moving to TS soon, I would worry less about these tests. Much of the complication is in the fact that they use Rescript.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Ozzie, I won't review this very carefully, seems too time-consuming. Thanks for figuring this out!

Copy link
Collaborator

@berekuk berekuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge.

Upd: Vercel doesn't allow me to authorize the deployment there for some reason; I thought formally reviewing would help, since the error message is "A member of the Quantified Uncertainty Team on Vercel is required to review the pull request and authorize this Deployment afterwards.", but it didn't help.

Anyway, I don't expect any user-facing changes, so it seems fine to merge without deploying to Vercel.

@berekuk berekuk merged commit 3dd7e2b into quantified-uncertainty:develop Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants