Replace ujson #2737

damilgra · 2021-03-04T16:32:35Z

Resolves issue Kinto#2736

Added a unit test for issue Kinto#2736 . Modified the memory storage backend to use the native json library as the added unit test fails with the memory storage if simdjson is employed. simdjson cannot *deserialize* strings representing numbers > 64 bits. On the other hand, it can serialize a > 64-bit number. So, currently we have: Memory Storage - uses native json PostgreSQL Storage - uses simdjson

This reverts commit 8161b72.

This reverts commit bb7bdb9.

…Memory storage backend Resolve Kinto#2736 by replacing ujson which cannot serialize or deserialize numbers > 64 bits. simdjson, a performant json library is used for the PostgreSQL backend while native json is used for the memory storage backend in order to ensure that the unit test demonstrating the issue passes. Note 1: there is no unit test which verifies that this works with PostgreSQL - I manually modified the test configuration setup however to confirm that it does in fact pass.

leplatrem · 2021-03-04T16:37:41Z

kinto/core/storage/memory.py

@@ -13,7 +13,8 @@
    StorageBase,
    exceptions,
 )
-from kinto.core.utils import COMPARISON, find_nested_value, json
+from kinto.core.utils import COMPARISON, find_nested_value
+import json


Couldn't you import json from kinto.core.utils here too?

Right now, kinto.core.utils redefines json as simdjson, so no. Currently I have:
simdjson for PostgreSQL, and native json for Memory

Not sure whether or not a more performant (than native) json library is needed but that's the rationale. Alternatively it's possible simdjson could be used for both storages but the way records are added via memory storage would need to be modified which is a far more invasive change.

The usage of json in the memory backend is rather straightforward. What issues would we have if we would use simdjson there too?

The problem is that the unit test I introduced will raise an exception here (found in core/storage/memory.py):

obj = json.loads(json.dumps(obj))

The reason is that simdjson cannot deserialize strings representing numbers > 64 bits, as per simdjson/simdjson#167. On the other hand, it can serialize a > 64-bit number.

Ok, gotcha.

And then I would suggest that once we merge this, you create an issue to keep track of this so that we can use simdjson everywhere once this is fixed upstream.

(we could also do something like obj = simdjson.loads(json.dumps(obj)) but I don't think it brings a lot of value

Sure thing. I'll create an issue once it's merged and I've subscribed to the simdjson issue to track progress there.

TkTech · 2021-03-09T18:20:30Z

The reason is that simdjson cannot deserialize strings representing numbers > 64 bits, as per simdjson/simdjson#167. On the other hand, it can serialize a > 64-bit number.

Funny enough, this is because pysimdjson's dump() and dumps() are just aliases to Python's built in json module. The underlying simdjson library doesn't do serialization at all (yet).

damilgra · 2021-03-10T01:07:56Z

Funny enough, this is because pysimdjson's dump() and dumps() are just aliases to Python's built in json module. The underlying simdjson library doesn't do serialization at all (yet).

Thanks very much, good to know. I'm going to investigate other options.

Replaced pysimdjson with python-rapidjson. As pysimdjson uses stdlib json to serialize json, serialization performance is subpar compared to ujson. With rapidjson, and the selected options used, performance for serialization should be comparable to that of ujson and simdjson, whereas deserialization is slower than that of ujson but better than that of ujson, according to benchmarks found here: https://python-rapidjson.readthedocs.io/en/latest/benchmarks.html For serialization, we do not employ the "number_mode=NM_NATIVE" setting which improves performance, in favour of correctness and in order to resolve Kinto#2677.

leplatrem · 2021-04-21T09:43:57Z

kinto/core/utils.py

-    return json.dumps(v, escape_forward_slashes=False)
+class json:
+    def dumps(v, **kw):
+        return rapidjson.dumps(v, bytes_mode=rapidjson.BM_NONE)


Dropping the kw will introduce breaking changes... See

kinto/kinto/plugins/quotas/utils.py

Line 6 in b9b4eb0

canonical_json = json.dumps(record, sort_keys=True, separators=(",", ":"))

Perhaps I'm misunderstanding but the file you reference imports stdlib json so it shouldn't be affected. I will make a change however to respect passed in kw, if any; grepping the source I don't see any instances where this happens right now.

import json

def record_size(record):
# We cannot use ultrajson here, since the separator option is not available.
canonical_json = json.dumps(record, sort_keys=True, separators=(",", ":"))
return len(canonical_json)

kinto/core/utils.py

leplatrem · 2021-06-07T16:28:05Z

@damilgra Are you still motivated to push this to the finish line?

damilgra · 2021-06-07T17:08:43Z

@damilgra Are you still motivated to push this to the finish line?

Sure, anything else I need to do? I'm trying to get a tox environment set up, but if memory serves there were some failing tests unrelated to changes I made. Please let me know if you have any other changes you'd like me to make.

damilgra added 6 commits March 2, 2021 17:58

Replace ujson with simdjson

bb7bdb9

Resolves issue Kinto#2736

Revert "Add unit test for Kinto#2736"

5b20602

This reverts commit 8161b72.

Revert "Replace ujson with simdjson"

177f08f

This reverts commit bb7bdb9.

Add CHANGELOG entry for Kinto#2736

fbea624

leplatrem reviewed Mar 4, 2021

View reviewed changes

Fix linting/isort

af95040

damilgra added 3 commits March 10, 2021 00:17

flake8, isort fixes

5bbe6a8

Remove unused method

9a3f23f

leplatrem reviewed Apr 21, 2021

View reviewed changes

damilgra added 4 commits April 21, 2021 11:10

Respect kw passed in to JSON load and dump functions

0ce62d3

Merge branch 'master' into replace-ujson

1fa3273

Lint fix

3ac5a66

Lint fix

f60d9b0

leplatrem reviewed Jun 7, 2021

View reviewed changes

kinto/core/utils.py Outdated Show resolved Hide resolved

Update utils.py

5f783ce

leplatrem approved these changes Jun 25, 2021

View reviewed changes

leplatrem merged commit 951dd25 into Kinto:master Jul 2, 2021

slav0nic mentioned this pull request Sep 25, 2021

replacing ujson Kinto/kinto-redis#227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace ujson #2737

Replace ujson #2737

damilgra commented Mar 4, 2021

leplatrem Mar 4, 2021

damilgra Mar 4, 2021

leplatrem Mar 5, 2021

damilgra Mar 6, 2021

leplatrem Mar 6, 2021 •

edited

Loading

damilgra Mar 9, 2021

TkTech commented Mar 9, 2021

damilgra commented Mar 10, 2021

leplatrem Apr 21, 2021

damilgra Apr 21, 2021

leplatrem commented Jun 7, 2021

damilgra commented Jun 7, 2021

Replace ujson #2737

Replace ujson #2737

Conversation

damilgra commented Mar 4, 2021

leplatrem Mar 4, 2021

Choose a reason for hiding this comment

damilgra Mar 4, 2021

Choose a reason for hiding this comment

leplatrem Mar 5, 2021

Choose a reason for hiding this comment

damilgra Mar 6, 2021

Choose a reason for hiding this comment

leplatrem Mar 6, 2021 • edited Loading

Choose a reason for hiding this comment

damilgra Mar 9, 2021

Choose a reason for hiding this comment

TkTech commented Mar 9, 2021

damilgra commented Mar 10, 2021

leplatrem Apr 21, 2021

Choose a reason for hiding this comment

damilgra Apr 21, 2021

Choose a reason for hiding this comment

leplatrem commented Jun 7, 2021

damilgra commented Jun 7, 2021

leplatrem Mar 6, 2021 •

edited

Loading