Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster continuous/discrete splitting #1286
Faster continuous/discrete splitting #1286
Changes from 5 commits
b8b3619
418a5fd
fa44ab1
d10660e
7d32faf
436875b
dcc422f
30f64c5
fc4804a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly long function that took me a while to wrap my head around. I'd probably be conservative and have things spelled out more, especially at the top level. (Tiny names in tiny functions are fine.)
Some examples include:
len
->sortedArrayLen
i
->sortedArrayIndex
(orsortedArrayI
)minDistance
->minDistanceOfSameValue
value
->indexValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it, but I find that style harder to read, as
sortedArrayLen
andsortedArrayIndex
aren't as obviously different asi
andlen
. There's only one array. Is it really that hard to guess whati
indexes into?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several indexes. (i, i0, base, j, lo, mid). This, at very least, seems to have pretty terse naming to me.
I haven't seen much use of
base
,lo
before. It seems likely that some of this follows lower level programming conventions I'm not used to.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
lo
,mid
,hi
is very common for binary searches (usingi
forhi
partly because declaring mutable variables in rescript is so annoying). I usei0
here to mean a saved copy ofi
; maybeiOrig
fori0
andiNext
forj
would be better.base
is the base of the binary search.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny optimization, but I assume at this point you could push all of the same values up to sortedArray[i.contents + minDistance], to
continuous
. Not sure if this would actually speed it up though.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it more, this is probably not worth doing, it would add too much complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't, since a run might start anywhere in the middle: if you have 5 6 6 6 6 6 and test with
minDistance
of 3, you see that 5 doesn't start a run but the next 6 actually does. It's possible to skip values if you use a smaller stride. I'll explain this as a top-level comment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I said same value.
If you have 55588888, and line 334 catches, then you rule all of the next 5s out. So start a second loop and increment until you get to a value that's not a 5. Or do a binary search here if you think that
minDistance
could be really high.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One annoying thing is that I presume we need to accept some heuristic of what the data would look like. If values were very likely on being unique, conditional on not having minDiscreteWeight duplicates, then the current code is probably close to optimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, got it. It's better to search for values not equal to the last one, rather than those that are equal to the first one. The test in the second loop would run at pretty much the same speed as the main one, and there's a hard-to-predict branch when it stops, so I don't think it would be an improvement with a forwards loop. If you run the loop backwards it's fairly similar to my
minDiscreteWeight / 2
version, but splitting it up asminDiscreteWeight - 1
and1
instead.