Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process XOR/ROL optimisations #21

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 87 additions & 13 deletions kaitai/kaitaistream.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -399,11 +399,15 @@ std::string kaitai::kstream::ensure_fixed_contents(std::string expected) {
}

std::string kaitai::kstream::bytes_strip_right(std::string src, char pad_byte) {
std::size_t new_len = src.length();
std::size_t max_len = src.length();
std::size_t new_len = max_len;

while (new_len > 0 && src[new_len - 1] == pad_byte)
new_len--;

if (new_len == max_len)
return src;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this optimization have a measurable impact?

As I understand it, the src parameter is a copy of the value specified by the caller, and the value returned by this function is similarly a copy of the value specified in the return statement, so in this case, there are already two copies being performed. (If we're lucky, the compiler is optimizing away at least one of them.) Given that, I can't tell just by reading the code whether skipping the call to substr() actually saves anything. (That is, if the compiler has to copy the return value, then the overhead of using substr() to generate the value copied is probably "in the noise", but adding the optimization makes the code more complicated and introduces an extra (arguably unnecessary) branch.

In any case, you should consider passing the argument as a reference to a const value, which would remove one of the two copies. (Alternatively, if we had access to C++-11 semantics, you would want to consider using std::move() to create the return value.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the problem. In about every other language, both static like C# and dynamically typed like Python Ruby, strings are passed around in O(1). Do I understand correctly that in C++ those strings are passed by full copy in O(N)? If so, then I dont know about the optimisations...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ offers all three options:

  • pass by const reference, where the callee receives a reference (what you're calling O(1)) to the caller's object (here a string), but the callee may not modify it;
  • pass by mutable reference, where the callee receives a reference to the caller's object, and is able to modify it;
  • pass by value, where the callee receives a "copy" of the caller's object -- "copy" is in quotes, because the value is produced using the (possibly default-generated) "copy constructor", which, really, could do anything, including creating a reference to the original object.

However, you have to specify the one that you want...and, the most straightforward syntax produces arguably the safest and simplest effect: pass by value.

Copy link
Member Author

@arekbulski arekbulski Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its funny that with all that convoluted answers that you gave me, it makes me think that you actually dont know... which kind of doesnt come as a suprise, I think whoever said "nobodody really understands quantum mechanics" DEFINATELY meant C++!!

I am working on a benchmark, I will test it. Standby.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the confusion. Which passing mechanism is used is made clear by the function's signature in the declaration/definition.

Pass by "const reference":

std::string kaitai::kstream::bytes_strip_right(const std::string& src, char pad_byte);

The callee and caller share the storage for the object (here, src), but the callee is prevented (at compile-time) from modifying it. However, it looks, inside the function, like the object was declared in the local scope. (I.e., src is not a pointer, and it is accessed with "dot" notation, as src.length().)

Pass by "reference":

std::string kaitai::kstream::bytes_strip_right(std::string& src, char pad_byte);

This is the old-fashioned "pass by pointer" hidden under a bunch of syntactic sugar -- the callee and the caller share the storage for the object, the callee accesses it as though it were declared locally, and the callee has the ability to modify it. This works well for "output parameters" and "in-out parameters", but it is somewhat dangerous, as it's not always apparent to the reader that the caller's value can be modified.

Pass by "value":

std::string kaitai::kstream::bytes_strip_right(std::string src, char pad_byte);

This one does what it says: only the "value" of the object gets passed, and the callee receives a copy-constructed version of the object. By default, that is a full and separate copy of it, which means that the caller is protected from anything that the callee might do, and the callee has full freedom to (ab)use the object as it wishes (once again, the object is accessed as if it were a local declaration); however, there is a performance penalty associated with this if the object is large or complicated to copy.

So, the code as it stands is passing by value, which is arguably the safest, most conservative, least confusing, and probably most expensive approach.

However, that said, it is possible that the compiler can apply some optimizations transparently. The most obvious is for it to allocate the return value in the caller and use that storage to receive the callee's copy, which removes the need for one of the two copy operations. (There are other options, depending on what the caller does with the result and on which version of the C++ compiler is being used.)


return src.substr(0, new_len);
}

Expand All @@ -417,6 +421,9 @@ std::string kaitai::kstream::bytes_terminate(std::string src, char term, bool in
if (include && new_len < max_len)
new_len++;

if (new_len == max_len)
return src;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto my previous comment.


return src.substr(0, new_len);
}

Expand All @@ -425,41 +432,108 @@ std::string kaitai::kstream::bytes_terminate(std::string src, char term, bool in
// ========================================================================

std::string kaitai::kstream::process_xor_one(std::string data, uint8_t key) {
if (key == 0)
return data;

size_t len = data.length();
std::string result(len, ' ');

for (size_t i = 0; i < len; i++)
for (size_t i = 0; i < len; i++) {
result[i] = data[i] ^ key;
}

return result;
}

std::string kaitai::kstream::process_xor_many(std::string data, std::string key) {
size_t len = data.length();
size_t kl = key.length();
size_t keylen = key.length();
if (keylen == 1)
return process_xor_one(data, key[0]);

std::string result(len, ' ');

size_t ki = 0;
size_t k = 0;
for (size_t i = 0; i < len; i++) {
result[i] = data[i] ^ key[ki];
ki++;
if (ki >= kl)
ki = 0;
result[i] = data[i] ^ key[k];
k++;
if (k == keylen)
k = 0;
}

return result;
}

std::string kaitai::kstream::process_rotate_left(std::string data, int amount) {
uint8_t precomputedSingleRotations[8][256];

// NOTE: static block of code, https://stackoverflow.com/a/34321324/2375119
computeSingleRotations {
for (int amount = 1; amount < 8; amount++) {
int anti_amount = 8 - amount;
for (uint8_t i = 0; i < 256; i++) {
precomputedSingleRotations[amount][i] = (uint8_t)((i << amount) | (i >> anti_amount));
}
}
}

std::string kaitai::kstream::process_rotate_left(std::string data, int amount, int groupSize = 1) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think my C++ compiler permits specifying default parameter values in the definition if they are already specified in the declaration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have absolutely no idea about this. C++ syntax is completely outside my area of expertise.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand...but, does the code as is compile?

if (groupSize < 1)
throw std::runtime_error("process_rotate_left: groupSize must be at least 1");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than runtime_error, I think invalid_argument would probably be more appropriate.


amount = mod(amount, groupSize * 8);
if (amount == 0)
return data;

int amount_bytes = amount / 8;
size_t len = data.length();
std::string result(len, ' ');

for (size_t i = 0; i < len; i++) {
uint8_t bits = data[i];
result[i] = (bits << amount) | (bits >> (8 - amount));
if (groupSize == 1) {
uint8_t *translate = &precomputedSingleRotations[amount][0];

for (size_t i = 0; i < len; i++) {
result[i] = translate[data[i]];
}

return result;
}

return result;
if (len % groupSize != 0)
throw std::runtime_error("process_rotate_left: data length must be a multiple of group size");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than runtime_error, I think invalid_argument would probably be more appropriate.


if (amount % 8 == 0) {
size_t indices[groupSize];
for (size_t i = 0; i < groupSize; i++) {
indices[i] = (size_t)((i + amount_bytes) % groupSize);
}

for (size_t i = 0; i < len; i += groupSize) {
for (size_t k = 0; k < groupSize; k++) {
result[i+k] = data[i + indices[k]];
}
}

return result;
}

{
int amount1 = amount % 8;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be tempted to write this as

       int amount1 = amount & 7;

int amount2 = 8 - amount1;
size_t indices1[groupSize];
size_t indices2[groupSize];
for (size_t i = 0; i < groupSize; i++) {
indices1[i] = (size_t)((i + amount_bytes) % groupSize);
indices2[i] = (size_t)((i + 1 + amount_bytes) % groupSize);
}

for (size_t i = 0; i < len; i += groupSize) {
for (size_t k = 0; k < groupSize; k++) {
result[i+k] = (uint8_t)((data[i + indices1[k]] << amount1) | (data[i + indices2[k]] >> amount2));
}
}

return result;
}
}

#ifdef KS_ZLIB
Expand Down
13 changes: 10 additions & 3 deletions kaitai/kaitaistream.h
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,8 @@ class kstream {
/**
* Performs a XOR processing with given data, XORing every byte of input with a single
* given value.
* WARNING: May return same byte array if key is zero.
*
* @param data data to process
* @param key value to XOR with
* @return processed data
Expand All @@ -182,6 +184,8 @@ class kstream {
* Performs a XOR processing with given data, XORing every byte of input with a key
* array, repeating key array many times, if necessary (i.e. if data array is longer
* than key array).
* WARNING: May return same byte array if key is zero.
*
* @param data data to process
* @param key array of bytes to XOR with
* @return processed data
Expand All @@ -190,13 +194,16 @@ class kstream {

/**
* Performs a circular left rotation shift for a given buffer by a given amount of bits,
* using groups of 1 bytes each time. Right circular rotation should be performed
* using this procedure with corrected amount.
* using groups of some bytes each time. Right rotation can be performed by using
* negative amount.
* WARNING: May return same byte array if amount is zero (modulo-wise).
*
* @param data source data to process
* @param amount number of bits to shift by
* @param groupSize number of bytes that make a group
* @return copy of source array with requested shift applied
*/
static std::string process_rotate_left(std::string data, int amount);
static std::string process_rotate_left(std::string data, int amount, int groupSize = 1);

#ifdef KS_ZLIB
/**
Expand Down