Data-systems messaging #306
0xterminator
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
RFC Goals
The RFC goals are well summarized in the diagram below:
We are investigating a serialization/deserialization mechanism which will allow us to serialize all needed fuel-core data structures (Rust) to a binary vector which will be sent over the wire to NATS. The same binary vector is to be ready by all sorts of clients which will deserialize the data via a deserializer and construct out of it typed native structures - objects/interfaces/types in javascript, structs in rust, go structs with types in golang. We are in a search of a mechanism that allows on the basis of globally shared schemas for different clients and languages to derive typed objects ready for use thus deserializing the data easily. The opposite is also true - a client must be able to serialize the data using serializable structures derived from the global schemas and push them to nats. This way we want to ensure complete convertibility between data sent and received from e.g. rust / typescript /go without compromising its entirety or causing breaking changes in the current fuel-core functionality.
Overview
In this RFC we are examining 2 very fast and commonly used serialization/deserialization algorithms - Borsh and Protobuf in order to determine if any of them is feasible/suitable for implementing a global messaging system within fuel based on shared schemas. Hereby we need to consider the following points before evaluating any possible solutions:
Suitability
Our restriction is that we are bound to the types defined in
fuel-core
as these are the main data types that the blockchain relies on and internally uses. Allfuel-core
data types haveserde::Serialize
andserde::Deserialize
implemented which makes them easily convertible from and into JSON on demand. Havind said that, we have realized that JSON is quite slow in terms of transfer over the wire hence we are exploring theborsh
vs.protobuf
capabilities in this RFC. Whichever of these types is to be potentially used, it needs to be suitable for the morphology of the data types we have in fuel and its ecosystem, meaning that serializers/deserializers need to be compatible with the latter and provide means of ser/deserializing those with not much effort.Libraries availability
There are official libraries for each of the two investigated algorithms:
Borsh
Protobuf
Based on conducted research these are the most-comprehensive libraries available and also the best maintained one. Further research in the RFC will focus on them.
Complexity
It is important to note that an optimum solution is being sought after by which data mapping between
fuel-core
structures is achieved with minimum effort, can be mechanized as much as possible and provides always deterministic outputs.Speed and optimizations
The optimum choice between borsh and protobuf is to also reflect on the performance of serializing/deserializing any data retrieved from fuel,, cpu and memory load. The latter results, if needed, are to be benchmarked. Also, another important factor here is that the serialized data needs to be in such a format that standard compression algorithms could easily be applied on top of it if it is to be used for intensive messaging.
Borsh
Description
Borsh JS is an implementation of the Borsh binary serialization format for JavaScript and TypeScript projects.
Borsh stands for Binary Object Representation Serializer for Hashing. It is meant to be used in security-critical projects as it prioritizes consistency, safety, speed, and comes with a strict specification.
Borsh with Rust
Borsh with Rust works by defining
BorshSerialize
andBorshDeserialize
traits on a data structureThe output here are binary vectors which the data is converted in.
Borsh can also generate a schema that is appended to the back of the serialized u8 vector. Schemas are usually needed for correct decoding and are less error-prone than the schema-less borsh decoding as ser/deserialization could end up interpreting values of an object in different ways depending on the platform/language where the serialization/deserialization process takes palce. Schemas usually increase the serialized output but have the mentioned above advantages on the other hand. Usually encoding and decoding with schemas is safer and also allows flexibility if e.g. new members are added to a given struct or removed subsequently.
Here is an example with schema:
Going further, schemas can be extraced as such:
and converted to
Vec<u8>
as such:which gives us the possibility to persist schemas in binary format in files. Analogically, schemas could be read from some e.g. text files and used if needed. The byte representation of a schema is pretty useless on its own as the rust library does not however make direct use of the latter. Unfortunately there is no human-readible representation fo a borsh schema as reflected in this ticket. In general, if some payload is serialized with schema, it can only be deserialized with that schema. If it is serialized with a schema, but attempted to be serialized without, it would fail.
There are tools borsh-schema-utils however that allow converting a schema to a json file e.g.: borsh-schema-utils by recursively extracting data from the
BorshSchemaContainer
.The latter would result in the following json:
In theory that data reflects the structure schema design and could be used by e.g. typescript to extract necessary interfaces or types. Of course, the latter is subject to us being able to implement the
BorshDeserialize, BorshSchema, BorshSerialize
traits on our rust structures infuel-core
.Borsh with Typescript
Borsh with typescript takes a slightly different approach to what we have with Rust. Let us take the Person structure again:
As we see, with typescript we always need a schema to perform the serialization/deserialization procedures. The schema describes the morphology of the data with its types. However, the schema that typescript needs is quite different from what we generated above with Rust:
Discussion
Now, having explored how both algorithms and their libraries work, one can conclude that:
To ensure compatibility between rust and typescript when a payload Vec/Uint8Array is received over the wire the following prerequisites must be met
fuel-core
data structures need to implementBorshSchema
,BorshSerialize
andBorshDeserialize
to ensure that we could fetch their schemas when needed.For example:
The flow could easily be summarized as follows :
or the diagram here:
The latter requires 4 steps to be taken which could be bundled in some form of a pipeline or packaging system. The latter steps could be easily achieved provided provided 1. is doable.
Complications around borsh and fuel-core
After further explorations, borsh seems to have difficulties with some commonly used Rust structures in
fuel-core
. For example, the following mock code poses complications:The error we are getting is:
recursive type <mocktypes::BlockHeader as BorshSchema>::add_definitions_recursively::BlockHeaderV1 has infinite size
The error we are encountering is due to the recursive nature of your BlockHeader enum. The Borsh serialization/deserialization library requires that the size of the data structure be known at compile-time, but the recursive definition of BlockHeader makes this impossible. Since we have the entirety of our structures in fuel-core designed around the principles of nested enums, in order to make it work one would have to apply some indirection techniques such as
Box
orRc
applied to fuel-core to help borsh serialize properly. Borsh is currently unable to behave in a polymorphic way meaning a message of type:cannot be serialized to an equivalent borsh schema. Same concept was verified also with typescript.
Another thing that one needs to keep an open eye are type differences e.g. link that might lead to substantial differences in the serialization.
Conclusions
We saw that if a borsh serialize message is to be sent over the wire to e.g. a typescript client, we would need to have the typescript client be able to draw the schemas in order to serialize based on the received type which is a cumbersome process we need to generate in 1-4 above. In addition to that we have issues around the morphology of the rust fuel-core data structures and borsh which seem to have interoperability issues by design. There are ways to go around these complications, but that might mean introducing substantial changes to fuel-core which is something undesirable.
Protobuf
Description
Protocol Buffers (protobuf) is a method developed by Google for serializing structured data, making it easier to share data across different platforms and languages. It’s an efficient, language-neutral, and platform-neutral format that’s widely used for defining and exchanging structured information in applications and services.
Key Aspects of Protobuf:
Protobuf converts structured data into a compact binary format that can be transmitted over a network or stored, and then deserialized back into its original form. The binary format is more efficient than text-based formats like JSON or XML.
Protobuf is designed to be language-neutral. You define your data structure in a .proto file, which is then compiled into code for various programming languages (e.g., Java, Python, C++, Go). This makes protobuf an excellent choice for communication between services written in different languages.
With Protobuf, data structures are defined in a schema file (a .proto file) where you specify the data types, field names, and field numbers. For example, here’s a simple .proto file:
The schema ensures that data is structured consistently, and field numbers help protobuf maintain backward compatibility as the schema evolves over time.
Protobuf messages are typically smaller and faster to encode/decode than JSON or XML. This efficiency is useful in scenarios where bandwidth and speed are critical, such as IoT devices, mobile applications, and real-time systems.
Overall, protocol Buffers offers a compact, fast, and language-agnostic way to handle data serialization, making it well-suited for microservices, general messaging, and real-time systems where efficiency is a priority.
Approach
A good idea is to think about using our already inherited
serde::Serialize
andserde::Deserialize
that most of the fuel-core types exhibit. Unfortunately there is currently no meaningful way or tool to map serde_json schemas to protobuf structures with types and tags. There are however tools that might become helpful in the process:a direct mapping from
enum Transaction
tomessage Transaction
wont work as Transaction is polymorphic and could take different variants. Having an example json for each of them and obtaining an analogous structure in protobuf is quite cumbersome and might not lead to an automatic fully-consistent data generation with a single protobuf schema.Even with some self-implemented parsers suggested here a fully comprehensive mapping wont be achieved easily as fuel data is quite densely packed in terms of Rust representation which makes a protobuf normalization quite difficult.
In later versions of
syntax = "proto3"
Protobuf has expanded its schema generation language and included some quite interesting features such as:bytes
data type - equivalent to a Vec in Rust orUint8Array
in typescriptgoogle.protobuf.Timestamp
data type for a timestampgoogle.protobuf.Duration
data type for time durationgoogle.protobuf.Int64Value
data type for special i64 valuesgoogle.protobuf.Any
polymorphic data type that could represent any data and be namedin addition to the standard ones. Here is the full list
The most important of all these for our use case is the probably the new
oneof
data type that allows complex polymorph data messages with similar schemas :The latter is indeed an important addition to protobufs as it easily allows us to mimic the fuel-core structures nature in a 1:1 manner. Protobufs have always had enum typed messages but they have been quite simple in terms of schema before version3 came out meaning variant mapped to a constant index.
For example:
With version 3 complex data messages could now easily be mapped from:
to:
The advantage we get from protobuf here is that we can easily match on each variant if we were to receive a message and that level of branching can be as deeply nested as
Integer.MAX_VALUE
- link.Version 3 message members are also all optional which gives us enormous flexibility and mimics quite well some of the optional fuel-core data functionalities. If a data member is missing, Rust would yield
None
whereas typescript would result innull | undefined
on the interface side.As we see, due to the introduction of
oneof
, we can end up either matching in Rust:and in typescript:
With these in-build features, the default members optionality and the additional complex types imported under
google.protobuf
, one could in fact represent the fuel-core morphology quite precisely in a 1:1 linear manner with protobuf messages.Realization
The suggested above compatibility allows us to go forward and create a process:
define all our .proto schema definitions in a single crate (library) following the
fuel-core
andfuel-vm
data structures morphology. In the diagram below I have depicted this asfuel-protos
.Both crates
fuel-core
andfuel-vm
would import thefuel-protos
crate and generate rust protobuf definitions also exporting them. The way to achieve this is to add a build.rs to each crate or sub-crate insidefuel-core
andfuel-vm
. This will trigger theprost-build
process that will lead to message structures being generated in Rust on build of each of these crates.Define in each crate for those native mapped to protobuf structures corresponding From and TryFrom conversions. The idea is to keep types and traits all in the same place in order not the violate the orphan rule.
Introduce a new crate under fuel-core e.g.
fuel-message-types
which will collect and re-export the produced in 2. rust proto structures alongside with all necessary traits needed for their conversion in between native fuel-core types and wire message types.The latter
messaging
crate is to be used as a dependency by our data-messaging system for easy in-and-out conversion between wire and native rust data types.Finally the
fuel-protos
crate could also represent an npm package that is to be imported by our data-systems-ts-sdk and fuel-ts-sdk if needed. The protobuf generator will automatically create fully typed typescript interfaces and definitions for direct use when consuming or sending messages. Other clients such asfuel-go
could also easily benefit from the .proto schemas and make direct use of 100% compatible with rust and typescript data structures.The steps above are all depicted on the diagram below:
Conclusions
We have explored a very feasible architecture where we can make full use of protobuf3's latest features and create a centralized repository with schemas that could be shared by ts-clients, go-clients etc. The latter will be used as a single source of truth. The conversion between wire types and native rust types should be easy to achieve and can be easily checked on every CI/CD run for correctness against the native types. I feel we should proceed with the protobuf approach which will give us a sustainable and flexible solution for the entire networking of all fuel-systems.
Beta Was this translation helpful? Give feedback.
All reactions