Fix second segment offset in object retrieval, other minor cleanups #3367

teor2345 · 2025-02-04T23:55:35Z

This PR fixes a bug in the offset of the second segment downloaded during object reconstruction (commit 30b8167). This bug has been in the code since we added incremental segment creation, and removed the segment item count.

It also fixes a test hang in test_domain_tx_propagate.

This PR also contains multiple small cleanups based on these reviews:

domain config duplication: Add contract creation allow lists to EVM domains #3350 (comment)
feature formatting: Add contract creation allow lists to EVM domains #3350 (comment)

And the reviewed cleanups from PR #3362, with changes from these reviews:

Vec clone vs slice reference: Stop downloading entire segments when retrieving objects #3362 (comment)
incorrect comments about domain block size: Stop downloading entire segments when retrieving objects #3362 (comment)

Close #3370.

Code contributor checklist:

I have read, understood and followed contributing guide

…g,domain_runtime_info}

NingLin-P

Make sense to me, thanks!

vedhavyas

make sense. Left a question I'm sure about

vedhavyas · 2025-02-05T12:37:22Z

shared/subspace-data-retrieval/src/object_fetcher.rs

-            let mut progress = 1 + Compact::compact_len(&(items.len() as u64));
+            // Unconditional progress is enum variant, always 1 byte in SCALE encoding.
+            // (Segments do not have an item count, to make incremental writing easier.)
+            let mut progress = 1;


Are we sure about the items count not being part of the count? I might have missed the pr mostly

Archiver (in order to be able to encode incrementally) purposefully doesn't encode number of elements:

subspace/crates/subspace-archiving/src/archiver.rs

Lines 60 to 70 in f5e8b43

fn encode_to<O: Output + ?Sized>(&self, dest: &mut O) {

match self {

Segment::V0 { items } => {

dest.push_byte(0);

for item in items {

item.encode_to(dest);

}

}

}

}

}

Yep, that was commit 96ed749 in PR #1200.

Here’s a potential test case, which I'll look into today:

2025-02-05T13:46:19.238562Z ERROR subspace_gateway::commands::http::server: Failed to fetch objects hashes=[ca34db057f1305579611948965ce644d2b686b4034f5cf110707df8efb95f28a] err=InvalidDataHash { data_hash: af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262, data_size: 0, mapping: GlobalObject { hash: ca34db057f1305579611948965ce644d2b686b4034f5cf110707df8efb95f28a, piece_index: PieceIndex(113406), offset: 963327 } }

It's on mainnet, pieceIndex=113406 and offset=963327

shared/subspace-data-retrieval/src/object_fetcher.rs

nazar-pc

No blockers, makes sense to me

nazar-pc · 2025-02-05T19:33:38Z

shared/subspace-data-retrieval/src/object_fetcher.rs

        mapping: GlobalObject,
    ) -> Result<Vec<Piece>, Error> {
-        download_pieces(piece_indexes, &self.piece_getter)
+        download_pieces(piece_indexes.clone(), &self.piece_getter)


It is unfortunate that we have to clone it just to log in error case. I was using Arc<[PieceIndex]> in some places for this reason.

I've made this change just in piece_fetcher.rs, and pushed it to this PR.

I tried doing it for all the PieceGetter trait impls, but that made it less ergonomic (commit 0f4bd8d). Do you think we should merge it?

nazar-pc · 2025-02-05T19:35:15Z

shared/subspace-data-retrieval/src/object_fetcher.rs

-            let mut progress = 1 + Compact::compact_len(&(items.len() as u64));
+            // Unconditional progress is enum variant, always 1 byte in SCALE encoding.
+            // (Segments do not have an item count, to make incremental writing easier.)
+            let mut progress = 1;


Archiver (in order to be able to encode incrementally) purposefully doesn't encode number of elements:

subspace/crates/subspace-archiving/src/archiver.rs

Lines 60 to 70 in f5e8b43

fn encode_to<O: Output + ?Sized>(&self, dest: &mut O) {

match self {

Segment::V0 { items } => {

dest.push_byte(0);

for item in items {

item.encode_to(dest);

}

}

}

}

}

teor2345

I've made the suggested changes.

This PR ran into an intermittent test failure, so I opened ticket #3370, and added timeouts to that test to diagnose why it's failing (commit c0aa066).

teor2345 · 2025-02-05T22:38:51Z

shared/subspace-data-retrieval/src/object_fetcher.rs

-            let mut progress = 1 + Compact::compact_len(&(items.len() as u64));
+            // Unconditional progress is enum variant, always 1 byte in SCALE encoding.
+            // (Segments do not have an item count, to make incremental writing easier.)
+            let mut progress = 1;


Yep, that was commit 96ed749 in PR #1200.

Here’s a potential test case, which I'll look into today:

2025-02-05T13:46:19.238562Z ERROR subspace_gateway::commands::http::server: Failed to fetch objects hashes=[ca34db057f1305579611948965ce644d2b686b4034f5cf110707df8efb95f28a] err=InvalidDataHash { data_hash: af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262, data_size: 0, mapping: GlobalObject { hash: ca34db057f1305579611948965ce644d2b686b4034f5cf110707df8efb95f28a, piece_index: PieceIndex(113406), offset: 963327 } }

It's on mainnet, pieceIndex=113406 and offset=963327

teor2345 · 2025-02-05T22:39:06Z

shared/subspace-data-retrieval/src/object_fetcher.rs

        mapping: GlobalObject,
    ) -> Result<Vec<Piece>, Error> {
-        download_pieces(piece_indexes, &self.piece_getter)
+        download_pieces(piece_indexes.clone(), &self.piece_getter)


I've made this change just in piece_fetcher.rs, and pushed it to this PR.

I tried doing it for all the PieceGetter trait impls, but that made it less ergonomic (commit 0f4bd8d). Do you think we should merge it?

shared/subspace-data-retrieval/src/object_fetcher.rs

teor2345

Windows failed again in the same test, I increased the test timeout just in case that's the cause. (I don't think it is, but I just want to be extra sure.)

nazar-pc · 2025-02-06T04:32:33Z

Tests are running under nextest, meaning a lot of them in parallel and some tests (notably archiving) are heavily multi-threaded, meaning it is possible for multiple tests to overlap and take a few minutes to run even though they wouldn't take that long individually. Due to essentially running multiple nodes, domain tests are even heavier and it is not surprising if they take way more than 5 minutes in CI.

Check logs of previous successful runs of that tests to see how long it actually took there to get a rough estimate.

teor2345 · 2025-02-06T05:17:21Z

Tests are running under nextest, meaning a lot of them in parallel and some tests (notably archiving) are heavily multi-threaded, meaning it is possible for multiple tests to overlap and take a few minutes to run even though they wouldn't take that long individually. Due to essentially running multiple nodes, domain tests are even heavier and it is not surprising if they take way more than 5 minutes in CI.

Check logs of previous successful runs of that tests to see how long it actually took there to get a rough estimate.

Yes, I know 🙂

That test usually takes 30-180 seconds, so 300 seconds should be plenty. (And if it doubles in time maybe it’s worth investigating.)

teor2345 · 2025-02-06T05:29:26Z

Looks like there’s also a real bug here:

thread 'tokio-runtime-worker' panicked at C:\actions-runner_work\subspace\subspace\domains\client\domain-operator\src\domain_worker.rs:133:13:
all branches are disabled and there is no else branch

https://github.com/autonomys/subspace/actions/runs/13171813519/job/36763423877?pr=3367#step:11:5588

Seems like it only happens on shutdown though. I’ll have a look tomorrow.

teor2345

This is ready for another review.

I added commits to:

fix the shutdown panic in the test
un-ban the peer to make the flaky test_domain_tx_propagate test pass (in every test failure I've seen, the peer is banned)

I also tested that the object retrieval fix works. It takes ages because of #3318, but it works.

domains/client/domain-operator/src/tests.rs

teor2345 added 5 commits February 5, 2025 09:26

Remove duplicate domain_runtime_config in domain_object.{domain_confi…

ac86bd8

…g,domain_runtime_info}

Combine nightly feature lists

22dfbed

Move ObjectMappingResponse to subspace-rpc-primitives

c844c6d

Rename data_size to data_length for consistency

b75e432

Reformat error declarations for readability

c09fa5d

teor2345 added bug Something isn't working refactor labels Feb 4, 2025

teor2345 self-assigned this Feb 4, 2025

teor2345 requested review from nazar-pc, NingLin-P and vedhavyas as code owners February 4, 2025 23:55

teor2345 enabled auto-merge February 4, 2025 23:56

teor2345 mentioned this pull request Feb 5, 2025

Taurus storage migration for "Private EVM" #3360

Draft

5 tasks

NingLin-P previously approved these changes Feb 5, 2025

View reviewed changes

vedhavyas previously approved these changes Feb 5, 2025

View reviewed changes

nazar-pc previously approved these changes Feb 5, 2025

View reviewed changes

teor2345 added 5 commits February 6, 2025 13:07

Replace &Vec with &[], Vec, or Arc<[]>

130c90b

Fix second segment offset bug in object retrieval

b3c99a7

Fix incorrect comment about block size in any domain

87459f6

evm-tracker comment fixes

5dad4d3

Add a timeout to test_domain_tx_propagate to diagnose hangs

c0aa066

teor2345 dismissed stale reviews from nazar-pc, vedhavyas, and NingLin-P via c0aa066 February 6, 2025 03:09

teor2345 force-pushed the evm-obj-minor-cleanups branch from f70a549 to c0aa066 Compare February 6, 2025 03:09

nazar-pc previously approved these changes Feb 6, 2025

View reviewed changes

teor2345 mentioned this pull request Feb 6, 2025

test_domain_tx_propagate sometimes hangs because EVM full node is banned #3370

Open

teor2345 commented Feb 6, 2025

View reviewed changes

teor2345 requested review from nazar-pc and NingLin-P February 6, 2025 03:14

teor2345 requested a review from vedhavyas February 6, 2025 03:14

Increase test timeout to 5 minutes

8937359

teor2345 dismissed nazar-pc’s stale review via 8937359 February 6, 2025 04:26

teor2345 commented Feb 6, 2025

View reviewed changes

teor2345 added 3 commits February 7, 2025 12:08

Avoid shutdown panic in operator worker select! loop

9b47fcb

Simplify imports

a8a4845

Fix a flaky test by unbanning a peer

8dab353

teor2345 commented Feb 7, 2025

View reviewed changes

nazar-pc reviewed Feb 7, 2025

View reviewed changes

domains/client/domain-operator/src/tests.rs Outdated Show resolved Hide resolved

Add TODO to fix flaky test

bb5ef04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix second segment offset in object retrieval, other minor cleanups #3367

Fix second segment offset in object retrieval, other minor cleanups #3367

teor2345 commented Feb 4, 2025 •

edited

Loading

NingLin-P left a comment

vedhavyas left a comment

vedhavyas Feb 5, 2025

nazar-pc Feb 5, 2025

teor2345 Feb 5, 2025

nazar-pc left a comment

nazar-pc Feb 5, 2025

teor2345 Feb 5, 2025

nazar-pc Feb 5, 2025

teor2345 left a comment

teor2345 Feb 5, 2025

teor2345 Feb 5, 2025

teor2345 left a comment

nazar-pc commented Feb 6, 2025

teor2345 commented Feb 6, 2025

teor2345 commented Feb 6, 2025

teor2345 left a comment

	fn encode_to<O: Output + ?Sized>(&self, dest: &mut O) {
	match self {
	Segment::V0 { items } => {
	dest.push_byte(0);
	for item in items {
	item.encode_to(dest);
	}
	}
	}
	}
	}

Fix second segment offset in object retrieval, other minor cleanups #3367

Are you sure you want to change the base?

Fix second segment offset in object retrieval, other minor cleanups #3367

Conversation

teor2345 commented Feb 4, 2025 • edited Loading

Code contributor checklist:

NingLin-P left a comment

Choose a reason for hiding this comment

vedhavyas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nazar-pc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

teor2345 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

teor2345 left a comment

Choose a reason for hiding this comment

nazar-pc commented Feb 6, 2025

teor2345 commented Feb 6, 2025

teor2345 commented Feb 6, 2025

teor2345 left a comment

Choose a reason for hiding this comment

teor2345 commented Feb 4, 2025 •

edited

Loading