We should maximize uploaded metadata size #63

PopcornPaws · 2022-11-24T11:04:38Z

Description

In order to avoid uploading large datasets (maliciously) we should sanitize the input size of dynamic types (vectors). Even though we can limit the number of requirements/identities in the frontend, anybody can send transactions from a custom cli client, thus sending extremely long binary vectors when invoking create guild, etc.

Solution

Maybe it would be a good idea to serialize stuff on-chain instead of sending already serialized data. That way we can set and check against an upper bound of requirements/identities on-chain.

PopcornPaws · 2022-12-01T16:20:34Z

Should also sanitize all inputs on-chain if possible. E.g. padded GuildName, etc.

PopcornPaws · 2023-01-13T08:28:20Z

Generic data vs sanity checks

There's a trade-off here that needs thought:

we want to store generic data on-chain
we also want to perform some sanity checks on the uploaded data

The easiest way to be as generic as possible when it comes to on-chain stored data is uploading everything encoded as a byte vector Vec<u8>. The problem with this is that the pallets and the chain doesn't have any idea about the underlying structure/type of the data, therefore it is highly limited what sanity checks we can perform on the pallet level. Thus, if someone calls the pallet from a faulty frontend, or just from their own cli-client without performing any sanity checks, they can easily upload invalid data that cannot be used anymore (unless they have the option to override it).

If we allow updating structured data, it is first encoded when submitting the transaction, then it is decoded in the pallet, introducing a potentially expensive "deserialization" step on-chain. On the flip side, we gain the ability to perform sanity checks on the uploaded data (checking for duplicates, asserting the number of roles in a guild do not exceed the allowed max, etc). The other problem with this approach is that the uploaded data may contain a vector of different types that have a common trait implemented:

requirements: Vec<Box<dyn Requirement>>,

where the Vec contains Box pointers to various blobs on the heap memory. I'm not entirely sure whether something like this can be handled by the Encode/Decode trait used by Substrate to serialize/deserialize data going in and out of the chain (probably not). Thus, we would need to define various requirements as enum variants that may wrap different types. However, this reduces the generality of the upcoming requirement engine.

Maybe we could utilize erased-serde somehow, but this would probably only help when we store everything serialized on-chain.

PopcornPaws · 2023-01-13T08:34:21Z

Another important aspect to take into account is #43 because if we store only serialized guild metadata on-chain, it would be very painful to implement functions that allow guild owners to update roles. Alternatively, if we store data in a structured way, allowing small updates by guild owners would be easier and we could add more sanity checks.

PopcornPaws · 2023-01-13T14:30:25Z

Add as much structure as possible and serialize individual requirements only, which can be submitted in an SpVec.

max roles/guild: ~10
max requirements/role: ~8-10
but we should check how it behaves with 100 roles, etc.

We should also add an add_role(guild_name, role_name, role_metadata) method that adds new role(s) to existing guilds. Additionally, we could do the same for adding more requirements to a given role (add_requirement(guild_name, role_name, requirements: Vec<u8>)).

PopcornPaws · 2023-01-13T14:43:15Z

Add active: bool flag to guilds and roles that denotes whether a user can join a given role. This should only be modified by the guild owner.

PopcornPaws · 2023-01-18T14:41:47Z

Intermezzo on storage

Storing stuff on-chain, under consensus is not very effective and may incur large costs on the runtime. Thankfully, substrate has the option to consistently store large amounts of data in the offchain local storage of the nodes using offchain indexing. The great thing about offchain indexing is that the runtime can write transaction data to the local storage without invoking any offchain worker. It is ensured that off-chain data is consistent among nodes running with the --enable-offchain-indexing flag. We can still store data hashes on-chain that can be used to verify data stored in off-chain storage. This allows validators with limited storage to still join the network (they can run their nodes without the --enable-offchain-indexing flag, thus they won't have all data in their local storage but can still help in verifying data integrity.

Implementing this requires a bit of thought, because off-chain storage only allows writing to it and clearing data. Thus, the runtime cannot read from the off-chain storage. Furthermore, writing to the off-chain storage from runtime is only available through a limited api, i.e. setting a key-value pair.

Intermezzo on requirement types

Static requirements

Free and Allowlist requirements are kind of unique in the sense that they don't require external data retrieval for verification, contrary to balances, number of followers on a platform, etc. Therefore, it might make sense to handle them separately, that is, introduce a static vs dynamic distinction between requirements that don't require external verification data (thus they are verifiable on-chain) and requirements that do require oracle access to off-chain verification data.

This distinction would allow the following requirement setups:

Free (standalone, cannot be bundled with anything else)
Allowlist (standalone allowlist that's not bundled with anything else)
AllowlistWithDynamicRequirements (an Allowlist with AND/OR connection to an arbitrary set of dynamic requirements that require oracle access)
DynamicRequirements (arbitrary set of dynamic requirements that require verification data retrieved by an oracle)
In either case, Free and Allowlist would be evaluated on-chain, and only the dynamic requirements would be delegated to oracles. Allowlists would have an optional negation flag that, if enabled, turns the allowlist into a "disallowlist".

Dynamic requirements

As for dynamic requirements, they have the following structure:

they need to describe the source of the verification data required to evaluate them
they need to provide (either implicitly or explicitly) parameters to retrieve the data
they need to have a Relation defined that can be checked whether it's satisfied by the retrieved verification data

A Relation can be thought as something like:

pub enum Relation<T> {
    EqualTo(T),
    GreaterThan(T),
    GreaterOrEqualTo(T),
    LessThan(T),
    LessOrEqualTo(T),
    Between(core::ops::Range<T>),
}

impl Relation {
    pub fn assert(&self, vd: VerificationData) -> bool {
        ...
    }
}

Thus, a dynamic requirement is satisfied if the respective Relation is asserted to be true on the retrieved verification data.

Allowlists in storage

Allowlists are special in the sense that their size can be much larger than other requirements, because they may contain 50k+ addresses which (if we consider a 20 byte Ethereum address) results in data in the size of Megabytes. It is important that the Wasm runtime has an upper limit of 32 MB for allocated memory. After testing the runtime, it accepted 100k entries in an allowlist, but rejected one with 200k entries.

Nevertheless, if we can at least store allowlists in the off-chain storage as a merkle tree and store only it's hash on-chain, we wouldn't have to worry about bloating consensus data. However, on-chain allowlist-verification would be impossible (because the runtime cannot read from the off-chain storage) and modifying the allowlist would be tricky. If we store a merkle root of an allowlist merkle tree, however, we could ask for a merkle proof provided by the user and maybe verify that on-chain using the merkle root only.

PopcornPaws added enhancement labels Nov 24, 2022

PopcornPaws self-assigned this Nov 24, 2022

PopcornPaws assigned OnyxSkyscape and unassigned PopcornPaws Jan 13, 2023

PopcornPaws mentioned this issue Jan 13, 2023

Proper benchmarking + load tests #91

Closed

6 tasks

PopcornPaws mentioned this issue Jan 16, 2023

Oracle pallet cleanup. #92

Merged

OnyxSkyscape mentioned this issue Jan 17, 2023

I63 sanity checks #93

Merged

PopcornPaws mentioned this issue Jan 19, 2023

Storage redesign #94

Closed

OnyxSkyscape closed this as completed in #93 Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We should maximize uploaded metadata size #63

We should maximize uploaded metadata size #63

PopcornPaws commented Nov 24, 2022

PopcornPaws commented Dec 1, 2022

PopcornPaws commented Jan 13, 2023

PopcornPaws commented Jan 13, 2023

PopcornPaws commented Jan 13, 2023 •

edited

Loading

PopcornPaws commented Jan 13, 2023

PopcornPaws commented Jan 18, 2023 •

edited

Loading

We should maximize uploaded metadata size #63

We should maximize uploaded metadata size #63

Comments

PopcornPaws commented Nov 24, 2022

Description

Solution

PopcornPaws commented Dec 1, 2022

PopcornPaws commented Jan 13, 2023

Generic data vs sanity checks

PopcornPaws commented Jan 13, 2023

PopcornPaws commented Jan 13, 2023 • edited Loading

PopcornPaws commented Jan 13, 2023

PopcornPaws commented Jan 18, 2023 • edited Loading

Intermezzo on storage

Intermezzo on requirement types

Static requirements

Dynamic requirements

Allowlists in storage

PopcornPaws commented Jan 13, 2023 •

edited

Loading

PopcornPaws commented Jan 18, 2023 •

edited

Loading