Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add defstruct macro #14

Open
wants to merge 79 commits into
base: develop
Choose a base branch
from
Open

Add defstruct macro #14

wants to merge 79 commits into from

Conversation

rutenkolk
Copy link
Contributor

@rutenkolk rutenkolk commented Oct 13, 2024

[Note: I would consider this a draft PR, as I have not yet added tests (which could unearth some bugs and necessitate appropriate fixes).]

Hi, this PR contains the addition of a defstruct macro. It does the following:

  • It adds serialize and deserialize code to a serde "registry" for the new type (details below)
  • It generates a new type that has the specified members (details below)
  • It adds an implmentation for c-layout
  • It adds inline implementations for both deserialize-from and serialize-into
  • It adds an implementation for clojure.pprint/simple-dispatch

serde registry

The "registry" is implemented via the multimethods generate-deserialize and generate-serialize which produce code to de/serialize the respective types. This removes indirection in the de/serialize code for types that use other types. i think in the original discussion we were on the same page, but thought the other meant something different. The defstruct macro adds implementations to the multimethods for the newly generated type.

the generated type

The new type is generated via deftype in the private function generate-struct-record. This is an attempt to strike a middle ground between the two positions of the original discussion, although the result might be a bit odd:

  • The type implements both IPersistentVector and IPersistentMap.
    • The basic idea is: if it is treated like a vector, it behaves like a vector. if it is treated like a map, it behaves like a map.
  • It therefore implements both vector-like methods like nth as well as map-like methods like without (for e.g. dissoc).
  • If there is a an overlap in map/vector interface such as with assoc, it supports both paradigms of indices-as-keys and membernames-as-keys. Practically speaking, if you use something like assoc with a number as a key, it behaves like a vector (and will return a vector), otherwise like a map (and will return a map).
  • one notable exception here is foreach which can't support both paradigms, and it is therefore implemented as if it's a vector. The rationale here is that the value of the type is composed of the actual values of the members, not the associated names of the places of the values. If you map or reduce over an object of this type, you will do so over the values of the members.

with-c-layout

There was one implementation problem. Since padding was needed to be taken into account to allow for inline serdes, the new code for the macro needed to rely on with-c-layout. The problem here is that with-c-layout is in the layout namespace which already depends on mem. As a stopgap solution i simply copied the function over as a private function. I would be in favor of actually deprecating the layout namespace. for backwards compatibility the with-c-layout function in layout could depend on the one in mem. Not only is the layout namespace at this point somewhat anemic, it has also caused me trouble. I'm not sure if it's a bug, but due to with-c-layout being in layout, i ran into the problem that there are now two different :padding keywords, which i found confusing.

tests & benchmarks

No tests or benchmarks exist right now. I don't expect the custom type to be slower than defrecord, but I want to test it.
Similarly, i do want to add a first set of tests for the de/serialization.

@rutenkolk rutenkolk marked this pull request as draft October 13, 2024 21:19
@IGJoshua
Copy link
Owner

I absolutely love this. You've done a fantastic job! I look forward to seeing the tests that you add for this, and I'm also thinking ahead to reorganizing some of the existing serde code to use the new generate-serialize and generate-deserialize to add some inline arities to serialize and deserialize when the type argument is a constant. Don't worry about any of that in this PR, it's just something I want to use this for in the future.

@IGJoshua
Copy link
Owner

IGJoshua commented Oct 14, 2024

So I like the way you've chosen to introduce the type registry. It integrates well with the existing tools, and provides a way to do inline arities for serialize and deserialize in the future. One hesitation I have at the moment looking over the code is the use of ::mem/array to refer to actual java arrays.

Arrays

Currently in coffi the ::mem/array type serializes anything seqable, which does include arrays, and it deserializes to a vec.
I think that to support the direction you're going here, we should add optional kwargs as options in the array type, with a :raw? true option meaning that it will deserialize to a JVM array and will assume that the argument is an array, and then add some conditionals in to ensure that we have the fast path for array serialization.

I also think that your compromise around using a record-like type with both map and array style accesses is appropriate and well-done. I might personally want to go the other direction with the foreach implementation though, making it act as if it's reducing over a sequence of map entries. Doing it this way allows adding a quick map val into the stack without too much performance overhead, and it avoids the need to figure out a zipmap with the keys and values separately. I don't have too strong an opinion on this one though as long as keys returns the keys in the same order as foreach yields the values.

serde registry

The serde registry as it stands with generate-serialize and generate-deserialize both look pretty good in terms of usage and follow about what I want them to do, but I want to note two things about them that I'm not sure how I feel about right now.

The first is just an observation and not a problem, that being that these functions all generate the equivalent of a serialize-into or deserialize-from call, which I think is appropriate, I'm just thinking about what this might mean in terms of naming though if the generate-x functions are going to become a part of the public api of coffi.

The second is that these macros as they stand are unhegenic macro helpers. I think it would be appropriate for the multimethod to take in the symbol which will be used to refer to the segment.

with-c-layout

For the with-c-layout problem, I think there's a couple things to be done. To start with, we can make the private version in coffi.mem use :coffi.layout/padding explicitly which doesn't require the namespace be loaded, which reduces it to just one padding key. Then for the rest, I'm a little undecided about it.

All the structs being passed over the C abi will most certainly use the with-c-layout layout, however the intention behind having the namespace in the first place was to allow easily serializing clojure maps into e.g. std140 or std430 from the GLSL spec, I just haven't gotten around to implementing those yet as I was wanting to get a defstruct macro and some codegen for an opengl bindgen library first.

Specifically though, if we remove the coffi.layout namespace and just assume everything is the c-layout, that will then mean there's no way for a user of the library to reach lower in the abstractions to implement a different layout for their usecase except to re-implement defstruct for their own layout.

@IGJoshua IGJoshua mentioned this pull request Oct 14, 2024
@rutenkolk rutenkolk marked this pull request as ready for review December 29, 2024 16:59
@rutenkolk
Copy link
Contributor Author

rutenkolk commented Dec 29, 2024

Hi, I've opened this PR for review, since I tested and benchmarked everything.

The macro itself doesn't have a raw? option anymore, but it supports arrays with a :raw? true optional keyword like this: [::mem/array ::whatever-type 3 :raw? true].

There were a few interesting things that happened along the way. One issue were the inline versions of the write-x functions like write-int. Sometimes reflection would occur but even *warn-on-reflection* true wouldn't catch it. Since writing this macro version with proper typehinting is finicky i added a new with-typehints macro to make it more robust and applied it to all of the functions where it was causing issues. I can write this out by hand if needed, but it's very uncomfortable. In essence you either couldn't pass in a literal or a form and have it be typehinted correctly in both cases as a primitive local before. Now this works, given that the local is actually of the right primitive type.

the internal with-c-layout function still exists, but it's using the keywords just like the original version in coffi.layout.

the macros are now hygenic, as in: they take in the expression that represents e.g. a segment and don't just use a random symbol name you have to match on the call-site.

i have spent some time optimizing serializing and deserializing and benchmarked everything. especially interesting is the story around arrays. some key takeaways:

  • if one has to decide between inlining and unrolling a loop, inlining is faster for raw arrays, but not for vectors
  • with big enough arrays of primitives, it's advantageous for serializing to first create one big array and then copy that array with one call. this is apparently not true for deserializing, which is kind of surprising! A transient vector wins this case.
  • there are tradeoffs when which method to read / write becomes the better option and the code auto-chooses the best, based on my benchmarks. this may need to be tested on multiple platforms though and adjusted accordingly.

grafik

grafik

In any case, the "auto" version wich chooses the method performs as good or better than the best alternative for arrays and vectors respectively. That being said, raw arrays remain incredibly fast and small vectors (< 16 elements) are still pretty fast.

One thing I noticed while profiling calls to serialize-into and deserialize-from is that with small sizes, a big cost factor of these functions becomes looking up the multimethod:

Screenshot 2024-12-29 at 18 23 56

Copy link
Owner

@IGJoshua IGJoshua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a few specific things in the review I'd like resolved or to discuss, and attached here I've got a few patches that I'd like if they were applied to the PR.

0004-Don-t-use-underscore-on-used-args.patch.txt
0003-Remove-duplicate-c-layout-implementation.patch.txt
0002-Fix-warning-about-defstruct-redefinition.patch.txt
0001-Use-a-once-only-impl-rather-than-with-typehints.patch.txt

src/clj/coffi/mem.clj Outdated Show resolved Hide resolved
src/clj/coffi/mem.clj Outdated Show resolved Hide resolved
src/clj/coffi/mem.clj Outdated Show resolved Hide resolved
src/clj/coffi/mem.clj Show resolved Hide resolved
src/clj/coffi/mem.clj Outdated Show resolved Hide resolved
src/clj/coffi/mem.clj Outdated Show resolved Hide resolved
src/clj/coffi/mem.clj Outdated Show resolved Hide resolved
src/clj/coffi/mem.clj Outdated Show resolved Hide resolved
@rutenkolk rutenkolk changed the base branch from master to develop January 4, 2025 19:06
@rutenkolk
Copy link
Contributor Author

I think work on this PR is nearing completion. As for the performance, here are some benchmark results:

Serializing a struct with n amount of ::mem/int members:

linear scale:
grafik

logarithmic scale:
grafik

Deserializing a struct with n amount of ::mem/int members:

linear scale:
grafik

logarithmic scale:
grafik

Serializing a struct with one ::mem/array of ::mem/ints of fixed size n

all individual ways to serialize, logarithmic scale:
grafik

comparison defstruct with raw arrays and vectors for arrays vs. defalias, logarithmic scale:
grafik

Deserializing a struct with one ::mem/array of ::mem/ints of fixed size n

all individual ways to deserialize, logarithmic scale:
grafik

comparison defstruct with raw arrays and vectors for arrays vs. defalias, logarithmic scale:
grafik

In all cases, a significant speedup has been achieved, often more than an order of magnitude, in special cases more than two. For native arrays, raw java arrays outperform vectors quite heavily. there may be room for improving this specific codepath, but it is still a noticeable improvement. For some cases, like raw arrays, the performance is in the realm of only a few nanoseconds and one of the biggest factors actually becomes the initial dispatch of the multimethod, so the actual time the de/serialization takes is probably hard to improve significantly further without changing aspects of how coffi itself operates.

@rutenkolk
Copy link
Contributor Author

to document a last touch:

i moved with-c-layout to coffi.layout again and load-fileed it in coffi.mem right above the definition of defstruct (the latest possible time, so that it hopefully will cause as little problems as possible, should coffi.layout be developed further).

defstruct can still be called from coffi.mem and even after removing the dependency on coffi.layout in mem_test.clj no test fails, so i think this worked out just fine!

register-new-struct-deserialization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants