-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ujson #2737
Replace ujson #2737
Conversation
Resolves issue Kinto#2736
Added a unit test for issue Kinto#2736 . Modified the memory storage backend to use the native json library as the added unit test fails with the memory storage if simdjson is employed. simdjson cannot *deserialize* strings representing numbers > 64 bits. On the other hand, it can serialize a > 64-bit number. So, currently we have: Memory Storage - uses native json PostgreSQL Storage - uses simdjson
This reverts commit 8161b72.
This reverts commit bb7bdb9.
…Memory storage backend Resolve Kinto#2736 by replacing ujson which cannot serialize or deserialize numbers > 64 bits. simdjson, a performant json library is used for the PostgreSQL backend while native json is used for the memory storage backend in order to ensure that the unit test demonstrating the issue passes. Note 1: there is no unit test which verifies that this works with PostgreSQL - I manually modified the test configuration setup however to confirm that it does in fact pass.
kinto/core/storage/memory.py
Outdated
@@ -13,7 +13,8 @@ | |||
StorageBase, | |||
exceptions, | |||
) | |||
from kinto.core.utils import COMPARISON, find_nested_value, json | |||
from kinto.core.utils import COMPARISON, find_nested_value | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you import json
from kinto.core.utils
here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, kinto.core.utils redefines json as simdjson, so no. Currently I have:
simdjson for PostgreSQL, and native json for Memory
Not sure whether or not a more performant (than native) json library is needed but that's the rationale. Alternatively it's possible simdjson could be used for both storages but the way records are added via memory storage would need to be modified which is a far more invasive change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage of json
in the memory backend is rather straightforward. What issues would we have if we would use simdjson
there too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the unit test I introduced will raise an exception here (found in core/storage/memory.py):
obj = json.loads(json.dumps(obj))
The reason is that simdjson cannot deserialize strings representing numbers > 64 bits, as per simdjson/simdjson#167. On the other hand, it can serialize a > 64-bit number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, gotcha.
And then I would suggest that once we merge this, you create an issue to keep track of this so that we can use simdjson everywhere once this is fixed upstream.
(we could also do something like obj = simdjson.loads(json.dumps(obj))
but I don't think it brings a lot of value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. I'll create an issue once it's merged and I've subscribed to the simdjson issue to track progress there.
Funny enough, this is because pysimdjson's |
Thanks very much, good to know. I'm going to investigate other options. |
Replaced pysimdjson with python-rapidjson. As pysimdjson uses stdlib json to serialize json, serialization performance is subpar compared to ujson. With rapidjson, and the selected options used, performance for serialization should be comparable to that of ujson and simdjson, whereas deserialization is slower than that of ujson but better than that of ujson, according to benchmarks found here: https://python-rapidjson.readthedocs.io/en/latest/benchmarks.html For serialization, we do not employ the "number_mode=NM_NATIVE" setting which improves performance, in favour of correctness and in order to resolve Kinto#2677.
kinto/core/utils.py
Outdated
return json.dumps(v, escape_forward_slashes=False) | ||
class json: | ||
def dumps(v, **kw): | ||
return rapidjson.dumps(v, bytes_mode=rapidjson.BM_NONE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping the kw
will introduce breaking changes... See
kinto/kinto/plugins/quotas/utils.py
Line 6 in b9b4eb0
canonical_json = json.dumps(record, sort_keys=True, separators=(",", ":")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm misunderstanding but the file you reference imports stdlib json so it shouldn't be affected. I will make a change however to respect passed in kw, if any; grepping the source I don't see any instances where this happens right now.
import json
def record_size(record):
# We cannot use ultrajson here, since the separator
option is not available.
canonical_json = json.dumps(record, sort_keys=True, separators=(",", ":"))
return len(canonical_json)
@damilgra Are you still motivated to push this to the finish line? |
Sure, anything else I need to do? I'm trying to get a tox environment set up, but if memory serves there were some failing tests unrelated to changes I made. Please let me know if you have any other changes you'd like me to make. |
Fixes #2736