Skip to content
This repository has been archived by the owner on May 11, 2023. It is now read-only.

Docs: Create Sparse Merkle Tree test spec #66

Merged
merged 20 commits into from
Mar 3, 2022

Conversation

bvrooman
Copy link
Contributor

@bvrooman bvrooman commented Feb 14, 2022

Steps toward an automated test suite:

  1. Define the test spec (this PR)
  2. Codify test spec in a machine readable format: Feature Request: Machine readable test suite for SMT testing #67
  3. Read machine readable test spec and generate test suite: Feature Request: Implement a test execution layer to automate the machine readable test suite #68

@bvrooman bvrooman added the documentation Improvements or additions to documentation label Feb 14, 2022
@bvrooman bvrooman self-assigned this Feb 14, 2022
@bvrooman bvrooman marked this pull request as ready for review February 16, 2022 04:55
@Voxelot
Copy link
Member

Voxelot commented Feb 16, 2022

This kind of works, although I was hoping we'd have something more along the lines of a machine readable format e.g. toml or json that the library could execute against.

@bvrooman
Copy link
Contributor Author

bvrooman commented Feb 17, 2022

This kind of works, although I was hoping we'd have something more along the lines of a machine readable format e.g. toml or json that the library could execute against.

I considered a DSL approach for this. I think that such an approach could be built alongside this spec.

Regardless of the ultimate testing strategy implemented by an SMT library, we would need some sort of specification document, since any sort of machine readable test spec still has to be manually compiled. While the Celestia SMT currently provides the expected hashes, it is not the producer of this spec - it is a consumer. It will need to implement this spec too. Ultimately, that could mean testing against the machine readable version. Maybe in the future, the machine readable test spec can be produced automatically, too.

But there are limitations of a DSL approach.

  • A simple declarative format, like JSON can work, but it will become unwieldy when tests require a large number of steps. A 100-step test would require 100 objects defining each step.
  • Translating leaf keys and leaf data into binary inputs requires specification anyway. What is the data type/size of numeric input? Is the binary big endian or little endian? Is the text input UTF-8 or another format?
  • The client must interpret the DSL and inputs, and map them to function calls, which could be brittle if the language/notation ever need updating.

We can agree to avoid large tests or any ambiguity in the data formats (e.g. UTF-8 string inputs only), but ultimately the limitations of this test language become limitations in the usefulness of the test suite altogether. All this considered, is this approach superior to implementing the test suite by hand?

I would say that it's possible to go with a machine readable format, but I think we should have a design discussion around that first. I would argue this shouldn't gate our own SMT implementation in the meantime.

@Voxelot
Copy link
Member

Voxelot commented Feb 22, 2022

I agree that we shouldn't gate on the automated harness as long we we have this spec manually implemented for now. Please create an issue for the machine readable spec if you haven't already and we can discuss that in more detail later.

@bvrooman
Copy link
Contributor Author

bvrooman commented Feb 22, 2022

I agree that we shouldn't gate on the automated harness as long we we have this spec manually implemented for now. Please create an issue for the machine readable spec if you haven't already and we can discuss that in more detail later.

I wrote up two tickets for the automated harness:

  1. Feature Request: Machine readable test suite for SMT testing #67
  2. Feature Request: Implement a test execution layer to automate the machine readable test suite #68

I haven't fully implemented this spec as unit tests in the SMT PR yet (as you saw, a couple tests are missing or deviate in some way). I would like to make sure we have a consensus on this document before I codify it in the library. A few tests are affected by the Celestia spec discrepancy, so there's some ambiguity there. As you suggest, I may just write that those tests are still WIP until the issue is resolved.

Copy link
Contributor

@adlerjohn adlerjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Shouldn't keys be 32 bytes?

@bvrooman
Copy link
Contributor Author

bvrooman commented Feb 24, 2022

Hmm. Shouldn't keys be 32 bytes?

Key input can be any length - upon calling update(&key, &data), the key input gets hashed into 32 bytes (and similarly, the data input gets hashed into 32 bytes). These hashed keys and data of 32 bytes are what become leaves and nodes. The intuition is that update(..) can be called with any key, as long as the key can be represented as an array of bytes, and we don't impose the internal requirement of hashing keys on the end user.

@bvrooman
Copy link
Contributor Author

bvrooman commented Feb 24, 2022

I am changing empty root values in this document to the zero sum (000...) in the meantime, until we can come to a conclusion on the issue in Celestia. I am going with the zero sum for now since that is currently what both code implementations produce, so we are at least compliant (possibly wrong, but at least compliant!).

@Voxelot
Copy link
Member

Voxelot commented Feb 24, 2022

Should we also make a repo of the Go code used to generate these roots from celestia's impl as a future reference in case we want to update these tests? Or just put them in a local subdir somewhere in the docs of this repo?

Voxelot
Voxelot previously approved these changes Feb 24, 2022
@bvrooman
Copy link
Contributor Author

Should we also make a repo of the Go code used to generate these roots from celestia's impl as a future reference in case we want to update these tests? Or just put them in a local subdir somewhere in the docs of this repo?

I would suggest that we fork Celestia's SMT repo. In the fork, I can put up the tests I wrote to generate the roots for this document. We could then add or modify the tests in the fork and record the outputs of those tests in this document. I imagine with a few iterations, we could get a high degree of confidence in our test coverage.

@Voxelot
Copy link
Member

Voxelot commented Feb 25, 2022

That makes sense, from the fork it would be easy to PR the tests back to the official celestia SMT repo once we're happy with the coverage.

@adlerjohn
Copy link
Contributor

I'm okay with forking the Go repo only if it's used purely to upstream the changes. We don't want to maintain a Go implementation of the SMT. With that said, do we really need to fork the repo, or can @bvrooman just contribute the change upstream immediately?

@adlerjohn
Copy link
Contributor

the key input gets hashed into 32 bytes

Wait no, the key is not hashed. Or rather, should not be. See celestiaorg/smt#40. Note that the SMT spec does not require hashed keys. That's left to the application.

@bvrooman
Copy link
Contributor Author

bvrooman commented Feb 28, 2022

the key input gets hashed into 32 bytes

Wait no, the key is not hashed. Or rather, should not be. See celestiaorg/smt#40. Note that the SMT spec does not require hashed keys. That's left to the application.

I see! On the Fuel side, I can modify the SMT interface to take in a Bytes32. In order for the tests to line up with the Celestia results, I can continue to hash the keys via sha256, and pass in the resulting 32 bytes. I see that there is already a PR in Celestia to move towards raw keys.

For the test vectors in this spec, I will move the hashing from the library side to the client side to produce 32 byte keys. If we specify that the test inputs are generated by using Sha256 on an input to create the 32 byte keys, we can ensure continuity of keys between implementations.

@bvrooman
Copy link
Contributor Author

bvrooman commented Feb 28, 2022

I'm okay with forking the Go repo only if it's used purely to upstream the changes. We don't want to maintain a Go implementation of the SMT. With that said, do we really need to fork the repo, or can @bvrooman just contribute the change upstream immediately?

The fork may not be necessary - it was just an idea to stage changes coming from us before committing to Celestia. I anticipate that the test spec will undergo at least couple of iterations. But I'm also happy to just commit any changes directly to Celestia.

It looks like the migration to using raw keys definitely affects the test cases I used to generate the roots for this document. I can put the current test suite up in a Celestia PR, but it may be worth waiting for that change to go in before I commit it.

@adlerjohn
Copy link
Contributor

The Celestia SMT is going slow because it's not on the required path for now. I would say ignore adding anything to that repo for the time being.

@bvrooman bvrooman merged commit 0a7d91c into master Mar 3, 2022
@bvrooman bvrooman deleted the bvrooman-docs-sparse-merkle-tree-test-spec branch March 3, 2022 20:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Improvements or additions to documentation
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants