-
Notifications
You must be signed in to change notification settings - Fork 31
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Convert xor operations to one liners
- Loading branch information
Showing
1 changed file
with
2 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this commit resulted in some performance drawbacks.
According to benchmark_process_xor, before I've been getting 32.54340088365 s for parsing on avg, after that commit it's 34.45647013185 s on avg => that's ~5% worse than it was.
However, it might be a good idea to check out whole Python benchmarking practice, as I'm by no means a Python expert, so I might have messed it up :)
04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, yeah, it also actually breaks
xor_many
tests:Looks like it doesn't do key wrapping when key is shorter than data.
04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, forgot about key wrapping. Performance drawbacks come from bytearray conversions that are necessary for Python 2 (
bytes
type in Python 2 is actually alias forstr
, so it does not support construction from list of bytes). Overall I think having pretty one liners is not worth in this case, so I will revert this commit04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the record correct functional style one liner for process_xor_many in Python 3 will look line this:
04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic idea is to try achieving maximum performance in these
process_*
implementations — so we should keep the fastest one. Python is, unfortunately, so far the slowest language out of all 4 being benchmarked :( I thought that Python 3 would be faster than Python 2, but, alas, it seems that it's actually ~10-15% slower. May be bringing in numpy support or something like that would improve performance.04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, Python 3 is actually slower than Python 2 in a lot of cases. Including numpy is an interesting option, but I'd like to have it as a soft dependency, e.g. something like:
04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that pure Python 3 version above is actually faster than numpy unless I'm doing something terribly wrong. The bottleneck seem to be
bytearray
construction which is actually not needed on Python 3.04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Results of pure Python 3 version with bytearray conversion
So according to this information the most performant approach is:
04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about Python, but from what I see here so far - you're doing a specific microbenchmark here, and the results do not necessarily apply to real-world macro situation. The benchmark I've mentioned is obviously not flawless too, but it mimics more or less plausible real-world scenario - i.e. XORing several kilobytes of buffer with a key, and parsing unXORed contents afterwards. Can I persuade you to try various implementations of
process_xor_*
with that too?04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GreyCat Thanks a lot for your suggestion, bottleneck identified by my microbenchmark indeed was insignificant for bigger data sizes. I summarized my tests in #8
04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to point out that current benchmarking code does not measure average time over several runes (timeit alike) but only runs it once, which is subject to CPU jitter. Those 5% could be just a noise in the wire, so to speak, easily attributable to CPU jitter.
04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, one does not run this just once, I've ran it several times and aggregated the results. Ideally, we should add some automation to do these calculations for us...
04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oO, you what?... Do you want me to add timeit to the python benchmark suite? It will increase the run time by factor of few, but make number more accurate? Just say so, and I will effectuate it.
04289cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should the benchmarks run on a file or on bytes read into RAM?