-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dead Page Reference Count Bug Fix #181
base: main
Are you sure you want to change the base?
Conversation
@@ -39,6 +39,8 @@ struct PageToRecycle { | |||
// | |||
i32 depth; | |||
|
|||
slot_offset_type offset_as_unique_identifier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit verbose... maybe we can use the same convention as PageAllocator
, which calls this user_slot
?
See: https://github.com/mathworks/llfs/blob/main/src/llfs/page_allocator_events.hpp#L44
src/llfs/page_allocator_state.hpp
Outdated
@@ -242,7 +242,7 @@ class PageAllocatorState : public PageAllocatorStateNoLock | |||
|
|||
// Returns the new ref count that will result from applying the delta to the passed obj. | |||
// | |||
i32 calculate_new_ref_count(const PackedPageRefCount& delta) const; | |||
i32 calculate_new_ref_count(const PackedPageRefCount& delta, const u32 index) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest adding index
to a BATT_DEBUG_INFO
somewhere in the call stack that leads to this function. That way it will be emitted if a check fails (which is the purpose of this arg, as I understand it), but with less code change.
src/llfs/page_recycler.hpp
Outdated
struct MetricsExported { | ||
CountMetric<u32> page_id_deletion_reissue{0}; | ||
}; | ||
static MetricsExported& metrics_export(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest looking into adding this new metric counter to the per-instance (PageRecycler) metrics.
src/llfs/page_recycler.hpp
Outdated
@@ -55,13 +55,21 @@ class PageRecycler | |||
CountMetric<u64> page_drop_error_count{0}; | |||
}; | |||
|
|||
struct MetricsExported { | |||
CountMetric<u32> page_id_deletion_reissue{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming convention: CountMetric<T>
-type metrics should end in _count
.
src/llfs/page_recycler.hpp
Outdated
@@ -55,13 +55,21 @@ class PageRecycler | |||
CountMetric<u64> page_drop_error_count{0}; | |||
}; | |||
|
|||
struct MetricsExported { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to say GlobalMetrics
instead (assuming we keep this additional scope of metrics); exported vs not exported isn't really the distinction being made here, since many per-object metric sets are exported via the global registry.
//+++++++++++-+-+--+----- --- -- - - - - | ||
|
||
static PageCount default_max_buffered_page_count(const PageRecyclerOptions& options); | ||
|
||
static u64 calculate_log_size(const PageRecyclerOptions& options, | ||
Optional<PageCount> max_buffered_page_count = None); | ||
|
||
static u64 calculate_log_size_no_padding(const PageRecyclerOptions& options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please add doc comments for this new function? Thanks!
@@ -105,12 +113,13 @@ class PageRecycler | |||
// necessarily flushed (see `await_flush`). | |||
// | |||
StatusOr<slot_offset_type> recycle_pages(const Slice<const PageId>& page_ids, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider breaking out the depth=0
and depth>0
cases, since they have different requirements in terms of valid args. But in any case, please fix the naming of the offset
/unique_offset
param to be consistent between the two, and add doc comments to describe the rules wrt. how these are called.
This MR is to release the fix for ref count bug in page recycler's dead page recycling flow. This issue crops up when volume trimmer and page recycler retries a dead page recycling task for a page after recovery. The solution is to track largest_slot_offset in recycler which will be checked at the time of receiving any new dead_page recycling request from volume trimmer. Any request having lower or equal slot_offset compare to largest-slot-offset will be ignored.
The MR also adds in a histogram plot to show where all we are hitting the bug over the entire seed range. This is done through an existing page ref count test.
#13