Skip to content

Commit

Permalink
Convert xor operations to one liners
Browse files Browse the repository at this point in the history
  • Loading branch information
dgladkov committed Oct 11, 2016
1 parent 802993e commit 04289cc
Showing 1 changed file with 2 additions and 14 deletions.
16 changes: 2 additions & 14 deletions kaitaistruct.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,23 +217,11 @@ def read_strz(self, encoding, term, include_term, consume_term, eos_error):

@staticmethod
def process_xor_one(data, key):
r = bytearray(data)
for i in range(len(r)):
r[i] ^= key
return bytes(r)
return bytes(bytearray(v ^ key for v in bytearray(data)))

@staticmethod
def process_xor_many(data, key):
r = bytearray(data)
k = bytearray(key)
ki = 0
kl = len(k)
for i in range(len(r)):
r[i] ^= k[ki]
ki += 1
if ki >= kl:
ki = 0
return bytes(r)
return bytes(bytearray(a ^ b for a, b in map(bytearray, zip(data, key))))

@staticmethod
def process_rotate_left(data, amount, group_size):
Expand Down

14 comments on commit 04289cc

@GreyCat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this commit resulted in some performance drawbacks.

According to benchmark_process_xor, before I've been getting 32.54340088365 s for parsing on avg, after that commit it's 34.45647013185 s on avg => that's ~5% worse than it was.

However, it might be a good idea to check out whole Python benchmarking practice, as I'm by no means a Python expert, so I might have messed it up :)

@GreyCat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, yeah, it also actually breaks xor_many tests:

======================================================================
FAIL: test_process_xor4_const (test_process_xor4_const.TestProcessXor4Const)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/spec/python/test_process_xor4_const.py", line 10, in test_process_xor4_const
    self.assertEqual(r.buf, b"foo bar")
AssertionError: b'foo ' != b'foo bar'

======================================================================
FAIL: test_process_xor4_value (test_process_xor4_value.TestProcessXor4Value)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/spec/python/test_process_xor4_value.py", line 10, in test_process_xor4_value
    self.assertEqual(r.buf, b"foo bar")
AssertionError: b'foo ' != b'foo bar'

Looks like it doesn't do key wrapping when key is shorter than data.

@dgladkov
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, forgot about key wrapping. Performance drawbacks come from bytearray conversions that are necessary for Python 2 (bytes type in Python 2 is actually alias for str, so it does not support construction from list of bytes). Overall I think having pretty one liners is not worth in this case, so I will revert this commit

@dgladkov
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record correct functional style one liner for process_xor_many in Python 3 will look line this:

In [1]: import itertools

In [2]: bytes(a ^ b for a, b in zip(b'long data', itertools.cycle(b'key')))
Out[2]: b'\x07\n\x17\x0cE\x1d\n\x11\x18'

@GreyCat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic idea is to try achieving maximum performance in these process_* implementations — so we should keep the fastest one. Python is, unfortunately, so far the slowest language out of all 4 being benchmarked :( I thought that Python 3 would be faster than Python 2, but, alas, it seems that it's actually ~10-15% slower. May be bringing in numpy support or something like that would improve performance.

@dgladkov
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Python 3 is actually slower than Python 2 in a lot of cases. Including numpy is an interesting option, but I'd like to have it as a soft dependency, e.g. something like:

if numpy:
    return numpy.bitwise_xor(bytearray(data), bytearray(key)).tobytes()
else:
    # current code

@dgladkov
Copy link
Collaborator Author

@dgladkov dgladkov commented on 04289cc Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that pure Python 3 version above is actually faster than numpy unless I'm doing something terribly wrong. The bottleneck seem to be bytearray construction which is actually not needed on Python 3.

import timeit

print('python + itertools')
print(timeit.timeit(
    "assert bytes(a ^ b for a, b in zip(b'ABA', itertools.cycle(b'BAB'))) == b'\x03\x03\x03'",
    setup='import itertools',
    number=100000)
)
print('numpy')
print(timeit.timeit(
    "assert numpy.bitwise_xor(bytearray(b'ABA'), bytearray(b'BAB')).tobytes() == b'\x03\x03\x03'",
    setup='import numpy',
    number=100000)
)
$ python benchmark.py
python + itertools
0.1668216609996307
numpy
0.33887984700049856

@dgladkov
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results of pure Python 3 version with bytearray conversion

python + itertools
0.16899475500031258
python + itertools + bytearray conversion
0.4171916619998228
numpy
0.3426564929995948

So according to this information the most performant approach is:

  • Python 2: create as few bytearrays as possible and mutate them
  • Python 3: use generators and process bytes instead of bytearrays

@GreyCat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that pure Python 3 version above is actually faster than numpy unless I'm doing something terribly wrong.

I don't know much about Python, but from what I see here so far - you're doing a specific microbenchmark here, and the results do not necessarily apply to real-world macro situation. The benchmark I've mentioned is obviously not flawless too, but it mimics more or less plausible real-world scenario - i.e. XORing several kilobytes of buffer with a key, and parsing unXORed contents afterwards. Can I persuade you to try various implementations of process_xor_* with that too?

@dgladkov
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GreyCat Thanks a lot for your suggestion, bottleneck identified by my microbenchmark indeed was insignificant for bigger data sizes. I summarized my tests in #8

@arekbulski
Copy link
Member

@arekbulski arekbulski commented on 04289cc Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's ~5% worse than it was.

I would like to point out that current benchmarking code does not measure average time over several runes (timeit alike) but only runs it once, which is subject to CPU jitter. Those 5% could be just a noise in the wire, so to speak, easily attributable to CPU jitter.

@GreyCat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before I've been getting 32.54340088365 s for parsing on avg
on avg

Of course, one does not run this just once, I've ran it several times and aggregated the results. Ideally, we should add some automation to do these calculations for us...

@arekbulski
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oO, you what?... Do you want me to add timeit to the python benchmark suite? It will increase the run time by factor of few, but make number more accurate? Just say so, and I will effectuate it.

@arekbulski
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should the benchmarks run on a file or on bytes read into RAM?

Please sign in to comment.