Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We should maximize uploaded metadata size #63

Closed
Tracked by #91
PopcornPaws opened this issue Nov 24, 2022 · 6 comments · Fixed by #93
Closed
Tracked by #91

We should maximize uploaded metadata size #63

PopcornPaws opened this issue Nov 24, 2022 · 6 comments · Fixed by #93
Assignees

Comments

@PopcornPaws
Copy link
Contributor

Description

In order to avoid uploading large datasets (maliciously) we should sanitize the input size of dynamic types (vectors). Even though we can limit the number of requirements/identities in the frontend, anybody can send transactions from a custom cli client, thus sending extremely long binary vectors when invoking create guild, etc.

Solution

Maybe it would be a good idea to serialize stuff on-chain instead of sending already serialized data. That way we can set and check against an upper bound of requirements/identities on-chain.

@PopcornPaws
Copy link
Contributor Author

Should also sanitize all inputs on-chain if possible. E.g. padded GuildName, etc.

@PopcornPaws
Copy link
Contributor Author

Generic data vs sanity checks

There's a trade-off here that needs thought:

  • we want to store generic data on-chain
  • we also want to perform some sanity checks on the uploaded data

The easiest way to be as generic as possible when it comes to on-chain stored data is uploading everything encoded as a byte vector Vec<u8>. The problem with this is that the pallets and the chain doesn't have any idea about the underlying structure/type of the data, therefore it is highly limited what sanity checks we can perform on the pallet level. Thus, if someone calls the pallet from a faulty frontend, or just from their own cli-client without performing any sanity checks, they can easily upload invalid data that cannot be used anymore (unless they have the option to override it).

If we allow updating structured data, it is first encoded when submitting the transaction, then it is decoded in the pallet, introducing a potentially expensive "deserialization" step on-chain. On the flip side, we gain the ability to perform sanity checks on the uploaded data (checking for duplicates, asserting the number of roles in a guild do not exceed the allowed max, etc). The other problem with this approach is that the uploaded data may contain a vector of different types that have a common trait implemented:

requirements: Vec<Box<dyn Requirement>>,

where the Vec contains Box pointers to various blobs on the heap memory. I'm not entirely sure whether something like this can be handled by the Encode/Decode trait used by Substrate to serialize/deserialize data going in and out of the chain (probably not). Thus, we would need to define various requirements as enum variants that may wrap different types. However, this reduces the generality of the upcoming requirement engine.

Maybe we could utilize erased-serde somehow, but this would probably only help when we store everything serialized on-chain.

@PopcornPaws
Copy link
Contributor Author

Another important aspect to take into account is #43 because if we store only serialized guild metadata on-chain, it would be very painful to implement functions that allow guild owners to update roles. Alternatively, if we store data in a structured way, allowing small updates by guild owners would be easier and we could add more sanity checks.

@PopcornPaws
Copy link
Contributor Author

PopcornPaws commented Jan 13, 2023

Add as much structure as possible and serialize individual requirements only, which can be submitted in an SpVec.

  • max roles/guild: ~10
  • max requirements/role: ~8-10
    but we should check how it behaves with 100 roles, etc.

We should also add an add_role(guild_name, role_name, role_metadata) method that adds new role(s) to existing guilds. Additionally, we could do the same for adding more requirements to a given role (add_requirement(guild_name, role_name, requirements: Vec<u8>)).

@PopcornPaws
Copy link
Contributor Author

Add active: bool flag to guilds and roles that denotes whether a user can join a given role. This should only be modified by the guild owner.

@PopcornPaws
Copy link
Contributor Author

PopcornPaws commented Jan 18, 2023

Intermezzo on storage

Storing stuff on-chain, under consensus is not very effective and may incur large costs on the runtime. Thankfully, substrate has the option to consistently store large amounts of data in the offchain local storage of the nodes using offchain indexing. The great thing about offchain indexing is that the runtime can write transaction data to the local storage without invoking any offchain worker. It is ensured that off-chain data is consistent among nodes running with the --enable-offchain-indexing flag. We can still store data hashes on-chain that can be used to verify data stored in off-chain storage. This allows validators with limited storage to still join the network (they can run their nodes without the --enable-offchain-indexing flag, thus they won't have all data in their local storage but can still help in verifying data integrity.

Implementing this requires a bit of thought, because off-chain storage only allows writing to it and clearing data. Thus, the runtime cannot read from the off-chain storage. Furthermore, writing to the off-chain storage from runtime is only available through a limited api, i.e. setting a key-value pair.

Intermezzo on requirement types

Static requirements

Free and Allowlist requirements are kind of unique in the sense that they don't require external data retrieval for verification, contrary to balances, number of followers on a platform, etc. Therefore, it might make sense to handle them separately, that is, introduce a static vs dynamic distinction between requirements that don't require external verification data (thus they are verifiable on-chain) and requirements that do require oracle access to off-chain verification data.

This distinction would allow the following requirement setups:

  • Free (standalone, cannot be bundled with anything else)
  • Allowlist (standalone allowlist that's not bundled with anything else)
  • AllowlistWithDynamicRequirements (an Allowlist with AND/OR connection to an arbitrary set of dynamic requirements that require oracle access)
  • DynamicRequirements (arbitrary set of dynamic requirements that require verification data retrieved by an oracle)
    In either case, Free and Allowlist would be evaluated on-chain, and only the dynamic requirements would be delegated to oracles. Allowlists would have an optional negation flag that, if enabled, turns the allowlist into a "disallowlist".

Dynamic requirements

As for dynamic requirements, they have the following structure:

  • they need to describe the source of the verification data required to evaluate them
  • they need to provide (either implicitly or explicitly) parameters to retrieve the data
  • they need to have a Relation defined that can be checked whether it's satisfied by the retrieved verification data

A Relation can be thought as something like:

pub enum Relation<T> {
    EqualTo(T),
    GreaterThan(T),
    GreaterOrEqualTo(T),
    LessThan(T),
    LessOrEqualTo(T),
    Between(core::ops::Range<T>),
}

impl Relation {
    pub fn assert(&self, vd: VerificationData) -> bool {
        ...
    }
}

Thus, a dynamic requirement is satisfied if the respective Relation is asserted to be true on the retrieved verification data.

Allowlists in storage

Allowlists are special in the sense that their size can be much larger than other requirements, because they may contain 50k+ addresses which (if we consider a 20 byte Ethereum address) results in data in the size of Megabytes. It is important that the Wasm runtime has an upper limit of 32 MB for allocated memory. After testing the runtime, it accepted 100k entries in an allowlist, but rejected one with 200k entries.

Nevertheless, if we can at least store allowlists in the off-chain storage as a merkle tree and store only it's hash on-chain, we wouldn't have to worry about bloating consensus data. However, on-chain allowlist-verification would be impossible (because the runtime cannot read from the off-chain storage) and modifying the allowlist would be tricky. If we store a merkle root of an allowlist merkle tree, however, we could ask for a merkle proof provided by the user and maybe verify that on-chain using the merkle root only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants