Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(clp-s): Record log-order at compression time. #584
feat(clp-s): Record log-order at compression time. #584
Changes from 3 commits
d9db6e1
91d5fc7
04857b0
efd9218
1958026
d01dbc7
072f6e2
58e4a0c
bdbaed2
2d1b76f
e69a34a
56c54b8
e641ab9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the order column optional since it may affect the compression ratio a lot for some datasets? And if we want to record log order, we can call
add_internal_field
only once?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make it optional I'd like to make it the default + force its use it in the package. I talked to @kirkrodrigues about it and the general consensus we reached is that allowing people to generate archives that don't record order now will cause problems for us down the line.
We could also do some brief experiments to see if we can reduce the overhead (e.g.by trying delta encoding or using 32 bit field size) if that would help convince you. If those kinds of optimizations work we can consider adding them in a later PR.
For
add_internal_field
I'm calling it that we so that we don't need to tie it to every different place where we flush an archive. If you want though I could put it in a lambda and call it in every one of those places.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree to set it as the default in the package.
For each archive, we can add the internal subtree and the node once, storing the node ID. After that, we just call add_value. Could you clarify what different places you're referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly line 506 where
split_archive
is called.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this change to call
add_metadata_field
once before the ingestion loop and after every invocation ofsplit_archive
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added a
--disable-log-order
option for the compression flow.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling and documentation to the new method.
The method implementation is correct but could be more robust with error handling and documentation.
Consider applying these improvements:
📝 Committable suggestion