Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[no-release-notes] go: remotestorage: chunk_store.go: Clean up ChunkCache. #8861

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

reltuk
Copy link
Contributor

@reltuk reltuk commented Feb 13, 2025

When DoltChunkStore was implemented, it needed to do write buffering in order to read its own writes and to flush table files to the upstream. At some point, read caching and caching of has many results was added onto the write buffer, creating confusion and despair. This change separates back out the use cases.

When DoltChunkStore was implemented, it needed to do write buffering in order
to read its own writes and to flush table files to the upstream. At some point,
read caching and caching of has many results was added onto the write buffer,
creating confusion and despair. This change separates back out the use cases.
@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
5eea56b ok 5937457
version total_tests
5eea56b 5937457
correctness_percentage
100.0

@reltuk reltuk marked this pull request as ready for review February 14, 2025 20:33
@reltuk reltuk changed the title go: remotestorage: chunk_store.go: Clean up ChunkCache. [no-release-notes] go: remotestorage: chunk_store.go: Clean up ChunkCache. Feb 14, 2025
@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
449318a ok 5937457
version total_tests
449318a 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
fc10069 ok 5937457
version total_tests
fc10069 5937457
correctness_percentage
100.0

@macneale4 macneale4 self-requested a review February 19, 2025 22:21
Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the interfaces seem semantically weird but the in-place separation is nice

type ChunkCache interface {
// Put puts a slice of chunks into the cache. Error returned if the cache capacity has been exceeded.
Put(c []nbs.ToChunker) error
// Insert some observed / fetched chunks into the cached. These
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Insert some observed / fetched chunks into the cached. These
// Insert some observed / fetched chunks into the cache. These

has, err := lru.New2Q[hash.Hash, struct{}](maxHasCapacity)
if err != nil {
panic(err)
}
return &mapChunkCache{
&sync.Mutex{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't seem to be using the mutex in any of the methods below, intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It's no longer necessary...

Copy link
Contributor

@macneale4 macneale4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure every comment I made was to clarify code comments after I read the code. So feel free to ship after you pick and choose what suggestions you care about!

)

type WriteBuffer interface {
Put(nbs.CompressedChunk) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to the effect of "Using CompressedChunks rather than ToChunker because these objects are only destined for Noms table files"

Alternately everything you are putting in the cache was a Chunk immediately before putting it in the cache. Are you trying to save memory by doing it up front?

It's a little odd to have the AddPendingChunks method take a ToChunker, is really what it comes down to. I can see what each method makes sense in it's own context. Anyway, I think there is a clarifying comment here about why you use one in one case and not in the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment about what |Put| does. |Put|'s raison d'etre is to implement |GetAllForWrite|, which is returning compressed chunks. The reality is that this interface is for writing to remotestorage remotes...it is only used by the implementation of DoltChunkStore and it's just working with data structures that make sense for it. There's no reason to use ToChunker in the Put here...it needs things it can actually write to storage files, and we can't write ToChunkers to storage files. (An argument could be made that it could just take a Chunk. That seems fine to me, but further separates the relationship between Put and GetAllForWrite for a reader. Certainly storing them as compressed chunks has benefits compared to not (memory utilization of the cache itself, CPU overhead if you have to GetAllForWrite multiple times, etc.))

// Returns the current set of written chunks. After this
// returns, concurrent calls to other methods may block until
// |WriteCompleted is called. Calls to |GetAllForWrite| must
// be bracketed by a call to |WriteCompleted|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"bracketed" is confusing. Just say calls to GetAllForWrite must be followed by a call to WriteCompleted

}
}

func (b *mapWriteBuffer) RemovePresentChunks(absent hash.HashSet) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absent is an odd name for this argument. maybe just hashes

// Called after a call to |GetAllForWrite|, this records
// success or failure of the write operation. If the write
// operation was successful, then the written chunks are now
// in the upstream, and so they can be cleared. Otherwise, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// in the upstream, and so they can be cleared. Otherwise, the
// in the upstream and they are cleared from the buffer. Otherwise, the

assert.Panics(t, func() {
cache.WriteCompleted(false)
})
cache.AddPendingChunks(make(hash.HashSet), make(map[hash.Hash]nbs.ToChunker))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't really think these two last calls add anything to the test.

WriteCompleted(success bool)

// ChunkStore clients expect to read their own writes before a commit.
// On the get path, remotestorage should add pending chunks to its result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear initially that res is the result set. Would be clearer with:

Suggested change
// On the get path, remotestorage should add pending chunks to its result
// On the get path, remotestorage should add pending chunks to |res|

// set. On the HasMany path, remotestorage should remove present chunks
// from its absent set on the HasMany response.
AddPendingChunks(h hash.HashSet, res map[hash.Hash]nbs.ToChunker)
RemovePresentChunks(h hash.HashSet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment indicating that RemovePresentChunks will update its input set, and what remains in the set after the call was not removed.

@@ -882,6 +864,9 @@ func (dcs *DoltChunkStore) Commit(ctx context.Context, current, last hash.Hash)
return false, NewRpcError(err, "Commit", dcs.host, req)
}

// We only delete the chunks that we wrote to the remote from
// our write buffer if our commit was successful.
success = resp.Success
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deceptive pattern. Maybe when you declare the variable state that it's required by the defered call? Without looking at the context carefully, I'd think success = resp.Success is a no-op.

@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
7dadc7c ok 5937457
version total_tests
7dadc7c 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
d586d2a ok 5937457
version total_tests
d586d2a 5937457
correctness_percentage
100.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants