[no-release-notes] go: remotestorage: chunk_store.go: Clean up ChunkCache. #8861

reltuk · 2025-02-13T21:04:26Z

When DoltChunkStore was implemented, it needed to do write buffering in order to read its own writes and to flush table files to the upstream. At some point, read caching and caching of has many results was added onto the write buffer, creating confusion and despair. This change separates back out the use cases.

go/libraries/doltcore/remotestorage/chunk_store.go

coffeegoddd · 2025-02-13T21:39:08Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`5eea56b`	ok	5937457

version	total_tests
`5eea56b`	5937457

correctness_percentage
100.0

…ache-cleanup

…led commits.

…testing.

coffeegoddd · 2025-02-14T21:18:28Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`449318a`	ok	5937457

version	total_tests
`449318a`	5937457

correctness_percentage
100.0

…ache-cleanup

coffeegoddd · 2025-02-14T22:13:43Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`fc10069`	ok	5937457

version	total_tests
`fc10069`	5937457

correctness_percentage
100.0

max-hoffman

LGTM, the interfaces seem semantically weird but the in-place separation is nice

max-hoffman · 2025-02-19T22:28:23Z

go/libraries/doltcore/remotestorage/chunk_cache.go

 type ChunkCache interface {
-	// Put puts a slice of chunks into the cache. Error returned if the cache capacity has been exceeded.
-	Put(c []nbs.ToChunker) error
+	// Insert some observed / fetched chunks into the cached. These


Suggested change

// Insert some observed / fetched chunks into the cached. These

// Insert some observed / fetched chunks into the cache. These

max-hoffman · 2025-02-19T22:35:39Z

go/libraries/doltcore/remotestorage/map_chunk_cache.go

+	has, err := lru.New2Q[hash.Hash, struct{}](maxHasCapacity)
+	if err != nil {
+		panic(err)
+	}
 	return &mapChunkCache{
 		&sync.Mutex{},


we don't seem to be using the mutex in any of the methods below, intentional?

Good catch! It's no longer necessary...

macneale4

I'm pretty sure every comment I made was to clarify code comments after I read the code. So feel free to ship after you pick and choose what suggestions you care about!

macneale4 · 2025-02-19T22:40:40Z

go/libraries/doltcore/remotestorage/writebuffer.go

+)
+
+type WriteBuffer interface {
+	Put(nbs.CompressedChunk) error


Can you add a comment to the effect of "Using CompressedChunks rather than ToChunker because these objects are only destined for Noms table files"

Alternately everything you are putting in the cache was a Chunk immediately before putting it in the cache. Are you trying to save memory by doing it up front?

It's a little odd to have the AddPendingChunks method take a ToChunker, is really what it comes down to. I can see what each method makes sense in it's own context. Anyway, I think there is a clarifying comment here about why you use one in one case and not in the other.

I added a comment about what |Put| does. |Put|'s raison d'etre is to implement |GetAllForWrite|, which is returning compressed chunks. The reality is that this interface is for writing to remotestorage remotes...it is only used by the implementation of DoltChunkStore and it's just working with data structures that make sense for it. There's no reason to use ToChunker in the Put here...it needs things it can actually write to storage files, and we can't write ToChunkers to storage files. (An argument could be made that it could just take a Chunk. That seems fine to me, but further separates the relationship between Put and GetAllForWrite for a reader. Certainly storing them as compressed chunks has benefits compared to not (memory utilization of the cache itself, CPU overhead if you have to GetAllForWrite multiple times, etc.))

macneale4 · 2025-02-19T22:50:59Z

go/libraries/doltcore/remotestorage/writebuffer.go

+	// Returns the current set of written chunks.  After this
+	// returns, concurrent calls to other methods may block until
+	// |WriteCompleted is called.  Calls to |GetAllForWrite| must
+	// be bracketed by a call to |WriteCompleted|


"bracketed" is confusing. Just say calls to GetAllForWrite must be followed by a call to WriteCompleted

macneale4 · 2025-02-19T22:54:55Z

go/libraries/doltcore/remotestorage/writebuffer.go

+	}
+}
+
+func (b *mapWriteBuffer) RemovePresentChunks(absent hash.HashSet) {


absent is an odd name for this argument. maybe just hashes

macneale4 · 2025-02-19T22:59:40Z

go/libraries/doltcore/remotestorage/writebuffer.go

+	// Called after a call to |GetAllForWrite|, this records
+	// success or failure of the write operation.  If the write
+	// operation was successful, then the written chunks are now
+	// in the upstream, and so they can be cleared. Otherwise, the


Suggested change

// in the upstream, and so they can be cleared. Otherwise, the

// in the upstream and they are cleared from the buffer. Otherwise, the

macneale4 · 2025-02-19T23:06:39Z

go/libraries/doltcore/remotestorage/writebuffer_test.go

+	assert.Panics(t, func() {
+		cache.WriteCompleted(false)
+	})
+	cache.AddPendingChunks(make(hash.HashSet), make(map[hash.Hash]nbs.ToChunker))


Don't really think these two last calls add anything to the test.

macneale4 · 2025-02-19T23:11:48Z

go/libraries/doltcore/remotestorage/writebuffer.go

+	WriteCompleted(success bool)
+
+	// ChunkStore clients expect to read their own writes before a commit.
+	// On the get path, remotestorage should add pending chunks to its result


Not clear initially that res is the result set. Would be clearer with:

Suggested change

// On the get path, remotestorage should add pending chunks to its result

// On the get path, remotestorage should add pending chunks to |res|

macneale4 · 2025-02-19T23:19:13Z

go/libraries/doltcore/remotestorage/writebuffer.go

+	// set. On the HasMany path, remotestorage should remove present chunks
+	// from its absent set on the HasMany response.
+	AddPendingChunks(h hash.HashSet, res map[hash.Hash]nbs.ToChunker)
+	RemovePresentChunks(h hash.HashSet)


Add a comment indicating that RemovePresentChunks will update its input set, and what remains in the set after the call was not removed.

macneale4 · 2025-02-19T23:29:21Z

go/libraries/doltcore/remotestorage/chunk_store.go

@@ -882,6 +864,9 @@ func (dcs *DoltChunkStore) Commit(ctx context.Context, current, last hash.Hash)
 		return false, NewRpcError(err, "Commit", dcs.host, req)
 	}

+	// We only delete the chunks that we wrote to the remote from
+	// our write buffer if our commit was successful.
+	success = resp.Success


deceptive pattern. Maybe when you declare the variable state that it's required by the defered call? Without looking at the context carefully, I'd think success = resp.Success is a no-op.

coffeegoddd · 2025-02-19T23:48:11Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`7dadc7c`	ok	5937457

version	total_tests
`7dadc7c`	5937457

correctness_percentage
100.0

coffeegoddd · 2025-02-20T18:48:19Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`d586d2a`	ok	5937457

version	total_tests
`d586d2a`	5937457

correctness_percentage
100.0

reltuk commented Feb 13, 2025

View reviewed changes

go/libraries/doltcore/remotestorage/chunk_store.go Outdated Show resolved Hide resolved

coffeegoddd added the correctness_approved label Feb 13, 2025

reltuk added 3 commits February 14, 2025 10:47

Merge remote-tracking branch 'origin/main' into aaron/remotestorage-c…

bfa2739

…ache-cleanup

go: remotestorage: writebuffer: Fix write buffer chunk caching on fai…

9b267b4

…led commits.

go: remotestorage: Make write buffer machinery more robust. Add some …

5d4d380

…testing.

reltuk marked this pull request as ready for review February 14, 2025 20:33

reltuk changed the title ~~go: remotestorage: chunk_store.go: Clean up ChunkCache.~~ [no-release-notes] go: remotestorage: chunk_store.go: Clean up ChunkCache. Feb 14, 2025

go: remotestorage: Fix data race in writebuffer_test.go.

449318a

Merge remote-tracking branch 'origin/main' into aaron/remotestorage-c…

fc10069

…ache-cleanup

macneale4 self-requested a review February 19, 2025 22:21

max-hoffman approved these changes Feb 19, 2025

View reviewed changes

go: remotestorage: PR feedback on ChunkCache.

7dadc7c

macneale4 approved these changes Feb 19, 2025

View reviewed changes

go: remotestorage: chunk cache: PR feedback.

d586d2a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[no-release-notes] go: remotestorage: chunk_store.go: Clean up ChunkCache. #8861

[no-release-notes] go: remotestorage: chunk_store.go: Clean up ChunkCache. #8861

reltuk commented Feb 13, 2025

coffeegoddd commented Feb 13, 2025

coffeegoddd commented Feb 14, 2025

coffeegoddd commented Feb 14, 2025

max-hoffman left a comment

max-hoffman Feb 19, 2025

max-hoffman Feb 19, 2025

reltuk Feb 19, 2025

macneale4 left a comment

macneale4 Feb 19, 2025

reltuk Feb 20, 2025

macneale4 Feb 19, 2025

macneale4 Feb 19, 2025

macneale4 Feb 19, 2025

macneale4 Feb 19, 2025

macneale4 Feb 19, 2025

macneale4 Feb 19, 2025

macneale4 Feb 19, 2025

coffeegoddd commented Feb 19, 2025

coffeegoddd commented Feb 20, 2025

	// Insert some observed / fetched chunks into the cached. These
	// Insert some observed / fetched chunks into the cache. These

	// in the upstream, and so they can be cleared. Otherwise, the
	// in the upstream and they are cleared from the buffer. Otherwise, the

	// On the get path, remotestorage should add pending chunks to its result
	// On the get path, remotestorage should add pending chunks to \|res\|

[no-release-notes] go: remotestorage: chunk_store.go: Clean up ChunkCache. #8861

Are you sure you want to change the base?

[no-release-notes] go: remotestorage: chunk_store.go: Clean up ChunkCache. #8861

Conversation

reltuk commented Feb 13, 2025

coffeegoddd commented Feb 13, 2025

coffeegoddd commented Feb 14, 2025

coffeegoddd commented Feb 14, 2025

max-hoffman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macneale4 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coffeegoddd commented Feb 19, 2025

coffeegoddd commented Feb 20, 2025