Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domains: Move trace root capture from runtime to client for efficiency #3338

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

vedhavyas
Copy link
Member

@vedhavyas vedhavyas commented Jan 8, 2025

We have been capturing the trace roots in the Domain runtime before executing the next trasaction. This became problematic during our benchmark tests where calculating storage root is essentially time taking process since it does the calculate the intermediate roots from full set of changes from beginning instead of from the previous root. More on this here

This PR essentially moves capturing the trace roots by creating triebackend over uncommiteed changes to ensure the diff set is minimal during the state root calculation.

I have also took the opportunity to clean up a TODO in the Fraud proof where post state root was returned from the Domain runtime. This can be calculated on verifier from the overlay changes.
Update: unfortunately, I found an interesting deviation that needs more research. Will provide a new PR when that TODO can be fixed and it does not relates to this PR anyway

Code contributor checklist:

Copy link
Member

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to check something that could impact performance (and correctness). Please feel free to ignore my review if it’s not ready yet!

Ok(Some(v))
} else {
Ok(self.backend.get(key, prefix)?)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing a “tombstone” check here?

If a key is present in the backend, but deleted in the overlay, this method returns Some(old) - the old value from the backend. But the correct value would be None, because the key has been deleted in the overlay, and will be deleted in the backend once consolidated.

(If we never delete any keys, or we always “delete” by overwriting with another value, then there’s no bug here. But I just wanted to check.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.

We are not missing any tombstone checks here. If the key is deleted, it will have a default value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the key is deleted, it will have a default value.

Just to be sure you are not talking about ValueQuery here, right? because ValueQuery/OptionQuery just define returning T::default() or None if the key-value does no exist in the state, it doesn't mean actually storing a T::default() in the state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to confirm my understanding here: if a key is deleted in the overlay, the overlay will return Some(default), and we’ll never get to the backend returning Some(old)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If the key is written(new/overwrite/delete) during the execution, then the memory DB will have the key present. If the value was not written to, then overlay memory db will not have it and we fallback to supporting backend

As a quick note: this is not new code and has been in the in the codebase for quite sometime powering our invalid state transaition Fraud proof :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I noticed the code had been moved 🙂

So if the key is deleted in the overlay, it will be present and set to what value?
(It can’t be None, because then we’d ignore the deletion, and return the outdated value from the backend.)

Or should we be using contains() in the if condition, and get() to retrieve the value (which can be None for keys deleted in the overlay)?

That would reliably find deleted keys, if they hadn’t already been purged.

Either way, do we have a test for this behaviour?

I couldn’t find it documented clearly in the TrieBackendStorage API reference, but they do say to use contains() (and not get()) to check for existence:
https://docs.rs/memory-db/0.32.0/memory_db/struct.MemoryDB.html#

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn’t find it documented clearly in the TrieBackendStorage API reference, but they do say to use contains() (and not get()) to check for existence:

Ideally does not matter since get returns an optional value.

we do have implicit tests for this through XDM and pallet domains(not sure yet.)
I did test locally sometime back creating a pallet specifically meant to check the overlay deletion and addition. These never made it. You can actually test this by creating a few storage types and update and check them during the block hooks, initialize and finalize.
This will clearly shows if the deletion works

@vedhavyas vedhavyas force-pushed the storage_root_changes branch from 52d6c2e to d2a9284 Compare January 9, 2025 11:50
@vedhavyas vedhavyas force-pushed the storage_root_changes branch from d2a9284 to fa4fce9 Compare January 9, 2025 11:51
@vedhavyas vedhavyas marked this pull request as ready for review January 9, 2025 11:51
@vedhavyas vedhavyas requested a review from NingLin-P as a code owner January 9, 2025 11:51
@vedhavyas
Copy link
Member Author

Benchmark results until this point

Tx in block 1000 2000 4000 5000 6000 8000
Domain block execution in MS 468 1354 4595 6667 9298 17059
Domain block import in MS 73 399 1947 15703 28678 62564
Total Time in MS 543 1755 6545 22385 37991 79623

@vedhavyas
Copy link
Member Author

Benchmark results until this point

Tx in block 1000 2000 4000 5000 6000 8000
Domain block execution in MS 501 1237 4174 6387 9564 15478
Domain block import in MS 8 43 197 370 766 794
Total Time in MS 512 1320 4374 6766 10333 16285

Copy link
Member

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, but I just want to double-check a few things. It would be great to have another review, too!

@@ -28,9 +28,6 @@ sp_api::decl_runtime_apis! {
tx_range: &U256,
) -> bool;

/// Returns the intermediate storage roots in an encoded form.
fn intermediate_roots() -> Vec<[u8; 32]>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to bump the API version when removing a method?

I think this means old client code will refuse to run with the new runtime due to a missing function, but that’s what we want, right? We want users to upgrade their nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do client upgrades first before runtime releases. So all the clients will use client side intermediate roots and once runtime is upgraded the function completely disappears.

I still like to check running the client with new change on taurus to ensure both give same roots. I have verified these locally but verifying on taurus is never a bad thing :)

domains/client/block-builder/src/custom_api.rs Outdated Show resolved Hide resolved
domains/pallets/executive/src/lib.rs Show resolved Hide resolved
Copy link
Member Author

@vedhavyas vedhavyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great questions @teor2345 !
I trying to still optimise further and will spend more time today on this. In mean time feel free to review and run these changes locally if you prefer :)

@@ -28,9 +28,6 @@ sp_api::decl_runtime_apis! {
tx_range: &U256,
) -> bool;

/// Returns the intermediate storage roots in an encoded form.
fn intermediate_roots() -> Vec<[u8; 32]>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do client upgrades first before runtime releases. So all the clients will use client side intermediate roots and once runtime is upgraded the function completely disappears.

I still like to check running the client with new change on taurus to ensure both give same roots. I have verified these locally but verifying on taurus is never a bad thing :)

domains/client/block-builder/src/custom_api.rs Outdated Show resolved Hide resolved
@vedhavyas
Copy link
Member Author

This is good a review now.
I do not see further optimisations but if you find anything I have missed, let me know

Copy link
Member

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to keep going on about deleted keys, but the only docs I could find suggested using a different method to check for existence.

Also it seems like the new timer calls could impact our benchmarks on some platforms and architectures.

Ok(Some(v))
} else {
Ok(self.backend.get(key, prefix)?)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I noticed the code had been moved 🙂

So if the key is deleted in the overlay, it will be present and set to what value?
(It can’t be None, because then we’d ignore the deletion, and return the outdated value from the backend.)

Or should we be using contains() in the if condition, and get() to retrieve the value (which can be None for keys deleted in the overlay)?

That would reliably find deleted keys, if they hadn’t already been purged.

Either way, do we have a test for this behaviour?

I couldn’t find it documented clearly in the TrieBackendStorage API reference, but they do say to use contains() (and not get()) to check for existence:
https://docs.rs/memory-db/0.32.0/memory_db/struct.MemoryDB.html#

domains/client/block-builder/src/custom_api.rs Outdated Show resolved Hide resolved
domains/client/block-builder/src/lib.rs Outdated Show resolved Hide resolved
teor2345
teor2345 previously approved these changes Jan 15, 2025
Copy link
Member

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me, as long as we're sure about layers and deletion

@vedhavyas
Copy link
Member Author

as long as we're sure about layers and deletion

Fair. Added a test pallet that ensures this check and fails on any incorrect changes going forward @teor2345

@vedhavyas vedhavyas requested a review from teor2345 January 17, 2025 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants