-
-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[no-release-notes] go: remotestorage: chunk_store.go: Clean up ChunkCache. #8861
base: main
Are you sure you want to change the base?
Conversation
When DoltChunkStore was implemented, it needed to do write buffering in order to read its own writes and to flush table files to the upstream. At some point, read caching and caching of has many results was added onto the write buffer, creating confusion and despair. This change separates back out the use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the interfaces seem semantically weird but the in-place separation is nice
type ChunkCache interface { | ||
// Put puts a slice of chunks into the cache. Error returned if the cache capacity has been exceeded. | ||
Put(c []nbs.ToChunker) error | ||
// Insert some observed / fetched chunks into the cached. These |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Insert some observed / fetched chunks into the cached. These | |
// Insert some observed / fetched chunks into the cache. These |
has, err := lru.New2Q[hash.Hash, struct{}](maxHasCapacity) | ||
if err != nil { | ||
panic(err) | ||
} | ||
return &mapChunkCache{ | ||
&sync.Mutex{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't seem to be using the mutex in any of the methods below, intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It's no longer necessary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure every comment I made was to clarify code comments after I read the code. So feel free to ship after you pick and choose what suggestions you care about!
) | ||
|
||
type WriteBuffer interface { | ||
Put(nbs.CompressedChunk) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment to the effect of "Using CompressedChunks rather than ToChunker because these objects are only destined for Noms table files"
Alternately everything you are putting in the cache was a Chunk immediately before putting it in the cache. Are you trying to save memory by doing it up front?
It's a little odd to have the AddPendingChunks method take a ToChunker, is really what it comes down to. I can see what each method makes sense in it's own context. Anyway, I think there is a clarifying comment here about why you use one in one case and not in the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment about what |Put| does. |Put|'s raison d'etre is to implement |GetAllForWrite|, which is returning compressed chunks. The reality is that this interface is for writing to remotestorage remotes...it is only used by the implementation of DoltChunkStore and it's just working with data structures that make sense for it. There's no reason to use ToChunker in the Put here...it needs things it can actually write to storage files, and we can't write ToChunkers to storage files. (An argument could be made that it could just take a Chunk. That seems fine to me, but further separates the relationship between Put
and GetAllForWrite
for a reader. Certainly storing them as compressed chunks has benefits compared to not (memory utilization of the cache itself, CPU overhead if you have to GetAllForWrite multiple times, etc.))
// Returns the current set of written chunks. After this | ||
// returns, concurrent calls to other methods may block until | ||
// |WriteCompleted is called. Calls to |GetAllForWrite| must | ||
// be bracketed by a call to |WriteCompleted| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"bracketed" is confusing. Just say calls to GetAllForWrite must be followed by a call to WriteCompleted
} | ||
} | ||
|
||
func (b *mapWriteBuffer) RemovePresentChunks(absent hash.HashSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absent
is an odd name for this argument. maybe just hashes
// Called after a call to |GetAllForWrite|, this records | ||
// success or failure of the write operation. If the write | ||
// operation was successful, then the written chunks are now | ||
// in the upstream, and so they can be cleared. Otherwise, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// in the upstream, and so they can be cleared. Otherwise, the | |
// in the upstream and they are cleared from the buffer. Otherwise, the |
assert.Panics(t, func() { | ||
cache.WriteCompleted(false) | ||
}) | ||
cache.AddPendingChunks(make(hash.HashSet), make(map[hash.Hash]nbs.ToChunker)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't really think these two last calls add anything to the test.
WriteCompleted(success bool) | ||
|
||
// ChunkStore clients expect to read their own writes before a commit. | ||
// On the get path, remotestorage should add pending chunks to its result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear initially that res
is the result set. Would be clearer with:
// On the get path, remotestorage should add pending chunks to its result | |
// On the get path, remotestorage should add pending chunks to |res| |
// set. On the HasMany path, remotestorage should remove present chunks | ||
// from its absent set on the HasMany response. | ||
AddPendingChunks(h hash.HashSet, res map[hash.Hash]nbs.ToChunker) | ||
RemovePresentChunks(h hash.HashSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment indicating that RemovePresentChunks
will update its input set, and what remains in the set after the call was not removed.
@@ -882,6 +864,9 @@ func (dcs *DoltChunkStore) Commit(ctx context.Context, current, last hash.Hash) | |||
return false, NewRpcError(err, "Commit", dcs.host, req) | |||
} | |||
|
|||
// We only delete the chunks that we wrote to the remote from | |||
// our write buffer if our commit was successful. | |||
success = resp.Success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deceptive pattern. Maybe when you declare the variable state that it's required by the defered call? Without looking at the context carefully, I'd think success = resp.Success
is a no-op.
When DoltChunkStore was implemented, it needed to do write buffering in order to read its own writes and to flush table files to the upstream. At some point, read caching and caching of has many results was added onto the write buffer, creating confusion and despair. This change separates back out the use cases.