-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
canonical_reordering_sign_euclidean() could be made faster #433
Comments
Even if it turns out that |
I wrote:
Ok, I seem to be up and running, using the sequence of workarounds described here: #430 (comment) I'll be happy to make a PR for this if there's interest. |
I did some tracing and timing. It turns out this function isn't called a lot during multivector multiplications like I thought it would be. Instead, I see that it's called a lot on the first multiplication after the clifford algebra is created. At that time, the entire multiplication table gets created, in _numba_construct_gmt(). So, not a lot of savings, given how this function is currently called. However, I see in issue #3 that there might be plans to change the initialization strategy so that it does not create the entire multiplication table. If/when that happens, I suspect optimization of canonical_reordering_sign_euclidean() may look like more of a win than it does now. |
One more observation... I suspect typical real-life programs use grades 0,1,2, slightly less commonly n-2,n-1,n, and rarely anything in between.... does that sound right? I think my suggested rewrite of this function will be very fast when the operands have grades (i.e. numbers of bits set) 0,1,2, but pretty slow for n-2,n-1,n, when n is relatively large. But, I bet, with a little thought, the slow n-2,n-1,n cases can be transformed into the fast 0,1,2 cases, by taking complements of the bit patterns, or something close to that. And I bet, with some more thought, it's even possible to make it fast when one of the two grades is in {0,1,2} and the other is in {n-2,n-1,n}. |
I thought of another possible implementation.
|
This is really interesting work @donhatch thanks for the suggestions. In this package we do indeed pre-calculate all the sign changes during algebra initialisation. As you have pointed out however it is definitely not the only way to do it, and indeed for very large algebras where one might want to do sparse calculations it might make more sense to do the canonical reordering in the actual multiplication function directly. One of the problems with sparse algebra stuff is that often numerical errors build up in the bits that "should be 0" and so calculations becomes progressively more dense over time anyway unless a bit of care is taken to trim/clean your multivectors. I'd be super keen to work on integrating this stuff into clifford and i'm sure some of the other packages which implement GA would be interested too. I have a few days this week to work on clifford stuff :) Do you happen to be on the bivector discord, could discuss a bit there? (I'm hugohadfield there) |
It looks to me like
canonical_reordering_sign_euclidean()
's running time is probably quadratic in the highest index of any bit set inbitmap_a
(or maybe linear, ifcount_bits_set()
is fast, i.e. if not_utils.DISABLE_JIT
).I think this could be improved to be at most linear in the total number of set bits in the two input bitmasks (often a very small number like 1 or 2, I'm guessing), and in fact even smaller than that-- it can be made linear in the number of bits that end up participating in swaps.
(To be precise: it needs to iterate over only the bits that are set in the bitwise-OR (bitmask_a|bitmask_b) whose indices are strictly higher than the lowest bit set in bitmask_b, and strictly lower than the highest bit set in bitmask_a).
I haven't looked at the call graph very closely, nor have I done any benchmarks, but I suspect this function is called heavily during multiplications of multivectors, so it might be worth optimizing.
For reference, the current implementation is this:
I think the following rewrite would work (sorry, I haven't tested it; I haven't learned how to build and test yet):
Also, I think the
_utils.DISABLE_JIT
implementation of the helper functioncount_set_bits()
could be sped up pretty simply too (it's no longer needed in the above implementation ofcanonical_reordering_sign_euclidean()
, but it's called by other things).That's currently this:
which could be rewritten as this:
The text was updated successfully, but these errors were encountered: