-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process XOR/ROL optimisations #28
base: master
Are you sure you want to change the base?
Conversation
a9d9d9a
to
c6d5696
Compare
kaitaistruct.py
Outdated
if PY2: | ||
return bytes(bytearray(v ^ key for v in bytearray(data))) | ||
else: | ||
return bytes(v ^ key for v in data) | ||
|
||
@staticmethod | ||
def process_xor_many(data, key): | ||
if key == b'\x00' * len(key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overhead on every call. I guess unneeded - comparing every byte of one array to each byte of another array ( where each byte is zero) has the same complexity as xoring 2 arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the key is few bytes and data is millions of bytes, then this check has negligible cost. However... you are right. It might be better to use something like not any(key)
which would not read entire key but only up to first non-zero byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are also right. So maybe check 2 sizes and take decision what to do based on them?
if len(key) < len(data)//someCoefficientDependingOnTheProbabiltyToMeetZeroAndDistributionOfKeysLengths
?
kaitaistruct.py
Outdated
if PY2: | ||
return bytes(bytearray(a ^ b for a, b in zip(bytearray(data), itertools.cycle(bytearray(key))))) | ||
else: | ||
return bytes(a ^ b for a, b in zip(data, itertools.cycle(key))) | ||
|
||
precomputed_rotations = {(amount,1):[(i << amount) & 0xff | (i >> (-amount & 7)) for i in range(256)] for amount in range(9)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that memory lookup (especially in interpreted fully object-oriented language, so I guess CPU cache helps little) is faster than bit shift. Could you provide some benchmark results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unable to run a benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also need to withdraw one of the commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unable to run a benchmark.
there is no need to run kaitai benchmark, you can run just an own one
Benchmarks as requested: >>> from timeit import *
>>> timeit('[d[i] for i in range(256)]', 'd=bytearray(range(256))')
12.807483946000048
>>> timeit('[d[i] for i in range(256)]', 'd=bytearray(range(256))')
13.325101311000026
>>> timeit('[d[i] for i in range(256)]', 'd=list(range(256))')
12.117929144000072
>>> timeit('[d[i] for i in range(256)]', 'd=list(range(256))')
12.115412523000032
>>> timeit('[d[i] for i in range(256)]', 'd={i:i for i in range(256)}')
14.979522146999898
>>> timeit('[d[i] for i in range(256)]', 'd={i:i for i in range(256)}')
14.68503412799987
>>> timeit('[(i<<amount)&0xff|(i>>(-amount&7)) for i in range(256)]', 'amount=1')
64.24798052300002
>>> timeit('[(i<<amount)&0xff|(i>>(-amount&7)) for i in range(256)]', 'amount=1')
64.24846534499989 |
All suggestions from @KOLANICH included and resolved. Ready to merge. |
It's a shame that python doesn't have |
I would reach to python-ideas mailing list. The guys there have authoritative knowledge about such things, and they may say that adding those operators might not happen or was already rejected. I will ask them. OTOH, that would not affect us. Since the runtime needs to support older Python runtimes (including 2.7), even if they would add |
I wonder if it is possible to extend python syntax with a native module. Or at least make some builtin functions without much overhead. |
I added some helper code that would allow me to not maintain 2 code paths. |
Finished implementing the rotation, including larger groups. |
return bytes(a ^ b for a, b in zip(data, itertools.cycle(key))) | ||
if len(key) == 1: | ||
return KaitaiStream.process_xor_one(data, ord(key)) | ||
if len(key) <= 64 and key == b'\x00' * len(key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 64? Why not, for example a quarter of message length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its to make it O(1) time, in case both key and data was megabyte sized. If so then it skips the check and just does the xoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its to make it O(1) time, in case both key and data was megabyte sized. If so then it skips the check and just does the xoring.
Again, IMHO it is waste of resources, I assume xoring to zero string as improbable, in most cases it is waste of resources.
let we have the probability of the zero array is p, N is the bit-length of data, K is the bit-length of key.
the complexity in the case of zero is
O(K)
(in fact 2*K
because we create an array and then compare it)
otherwise
O(K+N)
So the average complexity of zero-checking case (we check key for zero) is
c=K+(1-p)*N
And the average complexity of non-checking case is always N
.
So we need to select by the clause
K+(1-p)*N<N
K<p*N
Now let's model p.
If each bit of a key is randomly chosen from {1, 0} then
p = 2^-K
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please read the reply below and withdraw this one review.
@KOLANICH Please stop right there. What you are doing is a pointless mathematical exercise, not proving anything. You are making huge implicit assumptions that arent justified, and overall doing it wrong. For one, you are assuming that keys follow uniform distribution which I think is wrong. You also seem to be assuming key lengths are also somewhat likely to be long as they are likely to be short. Thats also probably wrong. In my personal experience, programmers tend to use very short keys, few bytes at most. It seems unlikely to find kilobyte-sized keys in real formats, just due to how software deverlopers think and work. Its also somewhat likely that all-zero keys can be found in real usage. Just because the format includes a possibility of xoring some data, it doesnt mean that is actually and actively being used. Programmers often tend to use zero keys when they dont have a particular need to use a different one. A safe default, so to speak. Note that I am not presenting those conjectures to justify my code, but rather to disprove your argument. Your argument resembles this kind of thinking: we have some real-world numbers, and digits are taken from uniform distribution, then few pages of equations... until someone points out that digits in real statistical data follow Benfords law. For two, you are using assymptotic equality to a code that is specifically limited to smallest inputs. That is just plain wrong. Assymptotic equality is BY DEFINITION not applicable to small(est) inputs. Not only that, but you are using assymptotic notation that igrores constants, meaning it doesnt differentiate between N and 2N. Segdwick, and I am sure you are familiar with this guy's work, points out in literally EVERY LECTURE that he gave, that using tilde notation would be more appropriate for that reason. That is, assuming it would be justified to use assymptotic notation in the first place. The justification for the checks you talk about are following, and DO NOT depend on the conjectures I mentioned earlier:
|
Of course it's wrong since xor is usually used not for crypto in file formats, but we can plug there any distribution we like by introducing an attribute giving probability of zero string. We may even specify simple distributions as expressions in KS language (of course it will require more math functions).
I have no assumption on key length in the derivation, except the one it is a nonnegative number. The derivation should work for any length.
I can use any function under |
Half wrong. We wanna chose the implementation that is realistically faster, not the one with smaller theoretic asymptotic complexity. I think you are completely missing the point, and just keep producing more equations. |
Listen, I like occasional math too. In fact I find this slightly nostalgic, because I do have a CS diploma and had this theory at classes too. Aside of the fact that we probably reside in different countries, I would not mind having some calculus just for fun, over some marshmallows at campfire. But this doesnt make your arguments valid. What you said in the quote is true, but it is also irrelevant. I implore you to instead provide some counter argument for mine, if there is one:
|
Just added small check that groupsize is not negative. |
I would prefer to remove the unused implementation before merging, but I left it so you can look at it. |
Added optimisations from kaitai-io/kaitai_struct#411 . |
Please stop adding more and more stuff to a single PR. This highly elevates the chance that it won't be applied. And, no, it's not "optimization", it breaks functionality of reading files which are being appended to during parsing, like live binary logs, packet captures, etc. |
I appologise for putting so much stuff into a single PR. Please consider this PR frozen at this point, and only related to Process XOR/ROL. I advise reading it one commit at a time, and in split view. |
Effectuates:
kaitai-io/kaitai_struct#390
kaitai-io/kaitai_struct#397
I ran the test suite but please confirm it. Also I leave running the benchmarks to you, because I am not able to run those.