-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster continuous/discrete splitting #1286
Faster continuous/discrete splitting #1286
Conversation
@mlochbaum is attempting to deploy a commit to the Quantified Uncertainty Team on Vercel. A member of the Team first needs to authorize it. |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Codecov Report
@@ Coverage Diff @@
## develop #1286 +/- ##
========================================
Coverage 53.14% 53.14%
========================================
Files 18 18
Lines 382 382
Branches 22 22
========================================
Hits 203 203
Misses 177 177
Partials 2 2 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall impressions:
- the algorithm is good and seems optimal and correct (I checked it with pen-and-paper as well as I could)
- thanks for the comments, and especially for spelling out the
(...]
boundaries explicitly - there's always a risk of off-by-one errors in cases like this, and I don't feel like this is covered enough by tests now, which worries me
- imperative algorithms in Rescript seem awkward, there's nothing we can do about it now, but we'll have to write more of those as we optimize other parts of the distributions code
There are two cases here:
- In case of entirely or almost entirely continuous arrays, almost the entire win is achieved here by imperative low-level approach. I checked this the last time we had that discussion on Slack; if I inline the
addData
and rewrite thereduce
as a loop in my old version, it gives us almost the same performance as your new version. Function calls are relatively costly. - In case of discrete samples with many duplicates, your version is much faster. But also it comes right after we sort the initial array, which is
O(N*log(N))
, so doing the splitting inO(log(N))
instead ofO(N)
probably doesn't matter much?
Constant factor in matter, of course, and O(N*log(N)) + O(N)
could be 10% slower than O(N*log(N) + O(log(N))
(when N is in 1k-10k range), but this observation vs the added complexity of the code is why I'm not as excited about this PR as I could be.
In the future, we should go after higher-order bits of performance wins first. (There are two easy wins I know of, I mentioned them in this slack message; there's also #1089, and also the idea that we shouldn't convert samplesets to pointsets as often as we do now).
I might change my mind if I saw a benchmark for (2); my guess is that the win there is in 3-5% range for the entire conversion procedure compared to the version without the binary search.
PS: I don't mind merging this anyway, we probably won't touch this code for a while, and it's a single isolated algorithm that doesn't affect other parts of the codebase. Local code complexity is not that important.
I won't merge this for now to give you time to address my minor comments, and also @OAGr might want to do a review.
Think I've fixed everything, other than the midpoint calculation (see my comment). I agree this wasn't a terribly important change, but I'd done most of the work by the time I figured that out. Except for the part where a discrete run is found, the code is actually simpler, so the added complexity all goes towards the case where we have an asymptotic improvement. And I do get a test failure if I change the binary search to stop at a range of 2 instead of 1, so I think the current level of testing is probably all right. |
let value = sortedArray[i.contents] | ||
if value != sortedArray[i.contents + minDistance] { | ||
Js.Array2.push(continuous, value)->ignore | ||
i := i.contents + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny optimization, but I assume at this point you could push all of the same values up to sortedArray[i.contents + minDistance], to continuous
. Not sure if this would actually speed it up though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it more, this is probably not worth doing, it would add too much complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't, since a run might start anywhere in the middle: if you have 5 6 6 6 6 6 and test with minDistance
of 3, you see that 5 doesn't start a run but the next 6 actually does. It's possible to skip values if you use a smaller stride. I'll explain this as a top-level comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I said same value.
If you have 55588888, and line 334 catches, then you rule all of the next 5s out. So start a second loop and increment until you get to a value that's not a 5. Or do a binary search here if you think that minDistance
could be really high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One annoying thing is that I presume we need to accept some heuristic of what the data would look like. If values were very likely on being unique, conditional on not having minDiscreteWeight duplicates, then the current code is probably close to optimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, got it. It's better to search for values not equal to the last one, rather than those that are equal to the first one. The test in the second loop would run at pretty much the same speed as the main one, and there's a hard-to-predict branch when it stops, so I don't think it would be an improvement with a forwards loop. If you run the loop backwards it's fairly similar to my minDiscreteWeight / 2
version, but splitting it up as minDiscreteWeight - 1
and 1
instead.
// element indices differ by minDistance. | ||
let minDistance = minDiscreteWeight - 1 | ||
|
||
let len = length(sortedArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly long function that took me a while to wrap my head around. I'd probably be conservative and have things spelled out more, especially at the top level. (Tiny names in tiny functions are fine.)
Some examples include:
len
->sortedArrayLen
i
->sortedArrayIndex
(orsortedArrayI
)minDistance
->minDistanceOfSameValue
value
->indexValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it, but I find that style harder to read, as sortedArrayLen
and sortedArrayIndex
aren't as obviously different as i
and len
. There's only one array. Is it really that hard to guess what i
indexes into?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several indexes. (i, i0, base, j, lo, mid). This, at very least, seems to have pretty terse naming to me.
I haven't seen much use of base
, lo
before. It seems likely that some of this follows lower level programming conventions I'm not used to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, lo
, mid
, hi
is very common for binary searches (using i
for hi
partly because declaring mutable variables in rescript is so annoying). I use i0
here to mean a saved copy of i
; maybe iOrig
for i0
and iNext
for j
would be better. base
is the base of the binary search.
Forgot to mention: for longer values of Figuring out what happens when you do see equal values is more complicated here because you also have to search backwards to find the start of the run, and you might not even have a long enough run. Given the complexity and that we don't use very high |
I'm still finding this fairy hard to read. (I realize this is common though, with imperative code). Maybe at least add a 1-3 sentence explanation of the algorithm on the top? I don't think this one piece of code is too importand, but I would like to set practices earlier than later. |
Okay, added a few comments at the top and also cleaned up the existing stuff ("performance-critical" is not so accurate). |
There's currently a lint error that should be fixed before merging.
I'm still kinda worried about corner cases. My level of confidence about this code being correct is in 90-95% range, but not 99%+. My guess is that if there are bugs here, they're related to incremental 2^N jumps being aligned/misaligned with the end of of the array, so any specific test might miss them. Can you look into fast-check for testing this better? I'd suggest generating arrays of tuples of |
Also, I just noticed that calling this function with |
fast-check test is working! Had some trouble because rescript-fast-check's The integration with Jest is pretty sloppy: I just call |
nat(~max=20, ()), | ||
testSegmentsCorrected, | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're moving to TS soon, I would worry less about these tests. Much of the complication is in the fact that they use Rescript.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to Ozzie, I won't review this very carefully, seems too time-consuming. Thanks for figuring this out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge.
Upd: Vercel doesn't allow me to authorize the deployment there for some reason; I thought formally reviewing would help, since the error message is "A member of the Quantified Uncertainty Team on Vercel is required to review the pull request and authorize this Deployment afterwards.", but it didn't help.
Anyway, I don't expect any user-facing changes, so it seems fine to merge without deploying to Vercel.
Rewrite
splitContinuousAndDiscreteForMinWeight
using comparisons at a distance ofminDiscreteWeight - 1
. This is much faster, but gives a small (~5%) overall improvement in our benchmarktoPointSetDist
where it's used. It would be more important for mostly-discrete distributions, which we don't benchmark, and will also become more relevant if we're able to speed up KDE and sorting.