Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop downloading entire segments when retrieving objects #3362

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

teor2345
Copy link
Member

This PR fixes two performance edge cases in object retrieval:

  • downloading a segment to work out how much padding the segment has
  • downloading a segment to remove the parent segment header at the start of that segment

Instead, it:

  • tries all the possible padding lengths against the object hash
  • manually decodes and discards the segment version variant, parent segment header, and the start of the block continuation containing the continuing object

Other Changes

This PR fixes a miscalculation of segment offsets, which was never changed (or tested) since 2023, when the segment format was rewritten to remove the encoded number of segment items.

It also does some minor refactors, and adds some utility methods on pieces and segments.

TODO

  • Write tests for each object reconstruction edge case (in this PR, or a future PR)

Code contributor checklist:

@teor2345 teor2345 added bug Something isn't working improvement it is already working, but can be better labels Jan 31, 2025
@teor2345 teor2345 self-assigned this Jan 31, 2025
@teor2345 teor2345 requested a review from nazar-pc as a code owner January 31, 2025 02:22
Copy link
Member

@vedhavyas vedhavyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense
Will look through again once remaining tests are added

@@ -13,7 +13,7 @@ use tracing::{debug, trace};
// This code was copied and modified from subspace_service::sync_from_dsn::download_and_reconstruct_blocks():
// <https://github.com/autonomys/subspace/blob/d71ca47e45e1b53cd2e472413caa23472a91cd74/crates/subspace-service/src/sync_from_dsn/import_blocks.rs#L236-L322>
pub async fn download_pieces<PG>(
piece_indexes: &Vec<PieceIndex>,
piece_indexes: &[PieceIndex],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to clone anyway, wouldn't it be better to send the copy here instead ?

pub fn decode_variant_and_length<I: Input>(
input: &mut I,
) -> Result<(u8, Option<u64>), parity_scale_codec::Error> {
let variant = input
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just call the decode_variant fn here?

// All of the above, plus the partial byte count for the partial progress variant.
pub const MAX_SEGMENT_HEADER_SIZE: usize = MIN_SEGMENT_HEADER_SIZE + 4;

/// The maximum object length the implementation in this module can reliably handle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle -> handled

/// Currently objects are limited by the largest block size in any domain, which is 5 MB.
/// But this implementation supports the maximum length of the 4 byte scale encoding.
pub const MAX_SUPPORTED_OBJECT_LENGTH: usize = 1024 * 1024 * 1024 - 1;
/// Currently objects are limited by the largest block size in any domain, which is 5 MB. But this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if domain is the right word here since it we have a different context where we use it for domains

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working improvement it is already working, but can be better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants