-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix second segment offset in object retrieval, other minor cleanups #3367
base: main
Are you sure you want to change the base?
Conversation
…g,domain_runtime_info}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense. Left a question I'm sure about
let mut progress = 1 + Compact::compact_len(&(items.len() as u64)); | ||
// Unconditional progress is enum variant, always 1 byte in SCALE encoding. | ||
// (Segments do not have an item count, to make incremental writing easier.) | ||
let mut progress = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure about the items count not being part of the count? I might have missed the pr mostly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Archiver (in order to be able to encode incrementally) purposefully doesn't encode number of elements:
subspace/crates/subspace-archiving/src/archiver.rs
Lines 60 to 70 in f5e8b43
fn encode_to<O: Output + ?Sized>(&self, dest: &mut O) { | |
match self { | |
Segment::V0 { items } => { | |
dest.push_byte(0); | |
for item in items { | |
item.encode_to(dest); | |
} | |
} | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that was commit 96ed749 in PR #1200.
Here’s a potential test case, which I'll look into today:
2025-02-05T13:46:19.238562Z ERROR subspace_gateway::commands::http::server: Failed to fetch objects hashes=[ca34db057f1305579611948965ce644d2b686b4034f5cf110707df8efb95f28a] err=InvalidDataHash { data_hash: af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262, data_size: 0, mapping: GlobalObject { hash: ca34db057f1305579611948965ce644d2b686b4034f5cf110707df8efb95f28a, piece_index: PieceIndex(113406), offset: 963327 } }
It's on mainnet, pieceIndex=113406 and offset=963327
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No blockers, makes sense to me
mapping: GlobalObject, | ||
) -> Result<Vec<Piece>, Error> { | ||
download_pieces(piece_indexes, &self.piece_getter) | ||
download_pieces(piece_indexes.clone(), &self.piece_getter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unfortunate that we have to clone it just to log in error case. I was using Arc<[PieceIndex]>
in some places for this reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this change just in piece_fetcher.rs, and pushed it to this PR.
I tried doing it for all the PieceGetter trait impls, but that made it less ergonomic (commit 0f4bd8d). Do you think we should merge it?
let mut progress = 1 + Compact::compact_len(&(items.len() as u64)); | ||
// Unconditional progress is enum variant, always 1 byte in SCALE encoding. | ||
// (Segments do not have an item count, to make incremental writing easier.) | ||
let mut progress = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Archiver (in order to be able to encode incrementally) purposefully doesn't encode number of elements:
subspace/crates/subspace-archiving/src/archiver.rs
Lines 60 to 70 in f5e8b43
fn encode_to<O: Output + ?Sized>(&self, dest: &mut O) { | |
match self { | |
Segment::V0 { items } => { | |
dest.push_byte(0); | |
for item in items { | |
item.encode_to(dest); | |
} | |
} | |
} | |
} | |
} |
f70a549
to
c0aa066
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut progress = 1 + Compact::compact_len(&(items.len() as u64)); | ||
// Unconditional progress is enum variant, always 1 byte in SCALE encoding. | ||
// (Segments do not have an item count, to make incremental writing easier.) | ||
let mut progress = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that was commit 96ed749 in PR #1200.
Here’s a potential test case, which I'll look into today:
2025-02-05T13:46:19.238562Z ERROR subspace_gateway::commands::http::server: Failed to fetch objects hashes=[ca34db057f1305579611948965ce644d2b686b4034f5cf110707df8efb95f28a] err=InvalidDataHash { data_hash: af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262, data_size: 0, mapping: GlobalObject { hash: ca34db057f1305579611948965ce644d2b686b4034f5cf110707df8efb95f28a, piece_index: PieceIndex(113406), offset: 963327 } }
It's on mainnet, pieceIndex=113406 and offset=963327
mapping: GlobalObject, | ||
) -> Result<Vec<Piece>, Error> { | ||
download_pieces(piece_indexes, &self.piece_getter) | ||
download_pieces(piece_indexes.clone(), &self.piece_getter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this change just in piece_fetcher.rs, and pushed it to this PR.
I tried doing it for all the PieceGetter trait impls, but that made it less ergonomic (commit 0f4bd8d). Do you think we should merge it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows failed again in the same test, I increased the test timeout just in case that's the cause. (I don't think it is, but I just want to be extra sure.)
Tests are running under nextest, meaning a lot of them in parallel and some tests (notably archiving) are heavily multi-threaded, meaning it is possible for multiple tests to overlap and take a few minutes to run even though they wouldn't take that long individually. Due to essentially running multiple nodes, domain tests are even heavier and it is not surprising if they take way more than 5 minutes in CI. Check logs of previous successful runs of that tests to see how long it actually took there to get a rough estimate. |
Yes, I know 🙂 That test usually takes 30-180 seconds, so 300 seconds should be plenty. (And if it doubles in time maybe it’s worth investigating.) |
Looks like there’s also a real bug here:
https://github.com/autonomys/subspace/actions/runs/13171813519/job/36763423877?pr=3367#step:11:5588 Seems like it only happens on shutdown though. I’ll have a look tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ready for another review.
I added commits to:
- fix the shutdown panic in the test
- un-ban the peer to make the flaky test_domain_tx_propagate test pass (in every test failure I've seen, the peer is banned)
I also tested that the object retrieval fix works. It takes ages because of #3318, but it works.
This PR fixes a bug in the offset of the second segment downloaded during object reconstruction (commit 30b8167). This bug has been in the code since we added incremental segment creation, and removed the segment item count.
It also fixes a test hang in test_domain_tx_propagate.
This PR also contains multiple small cleanups based on these reviews:
And the reviewed cleanups from PR #3362, with changes from these reviews:
Close #3370.
Code contributor checklist: