-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodically clean up cached bundles directory #5976
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for oasisprotocol-oasis-core canceled.
|
8682e72
to
6e5f668
Compare
9ef3941
to
0869adc
Compare
c8ded6d
to
f3f52e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't dive too deeply into the overall logic, but it looks good based on an initial look! I left a couple of minor comments on the code.
f3f52e3
to
4ed1a32
Compare
I think will will either have 1. stop copying bundles configured via legacy path or 2. block at init time for the cleanup. With current design, even if you remove the bundle from the config (bundle path), it was previousy copied as part of Unless we do cleanup before that (we don't as we would block further with cleanup?), you cannot know after that ( Update this actually has a further implication: I have confirmed rn the Update of update |
dbddfa5
to
8bfe9f6
Compare
64a7aaa
to
622d0bd
Compare
I can confirm thought that e.g. if I delete bundles too early (when I receive new runtime descriptor) as was the case here, the runtime was actually suspended so test failed. This is good. Update: |
9cc236e
to
3d4b25d
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5976 +/- ##
========================================
Coverage 64.80% 64.80%
========================================
Files 632 632
Lines 64864 64995 +131
========================================
+ Hits 42033 42123 +90
- Misses 17870 17907 +37
- Partials 4961 4965 +4 ☔ View full report in Codecov by Sentry. |
3d4b25d
to
cba543b
Compare
go/runtime/registry/registry.go
Outdated
r.RLock() | ||
rt := r.activeDescriptor | ||
r.RUnlock() | ||
if rt != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy path first, so flatten the code.
go/runtime/bundle/registry.go
Outdated
return | ||
default: | ||
if v.Less(active) { | ||
r.logger.Info("Removing bundle with version lower then active", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would first log that we are removing all that have version less than active, and then in the for loop log for every bundle that is removed. This way, you see that we tried to removed bundles, but nothing was needed to be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is desirable since this function is called everytime an epoch changes. I would prefer logging if we do an actual removal? Anyways let's see how this changes once rebase on top of manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epoch are every 1h. You can log as Info
almost whatever you want. This way you can be sure that background tasks are triggered, even though they do nothing as nothing is upgraded.
go/runtime/bundle/registry.go
Outdated
} | ||
|
||
if err := os.RemoveAll(explDir); err != nil { | ||
r.logger.Error("failed to remove stale exploded (regular) bundle dir", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These parameters are strange: runtime_version
, ExplodedDataDir
. The first one is not consistent with version
from above, the other has not the same format.
cba543b
to
5197ed3
Compare
If you fetch the current epoch, read active version for that epoch, and delete all previous versions, the bundles should not be deleted too early. |
This function is needed so that we don't have to expose, `bunle.manifestName` constant (internal detail), for the e2e test.
5197ed3
to
c405417
Compare
// TODO should you also remove dirs for manifests successfully removed, | ||
// even if error? | ||
dirs, err := m.store.RemoveManifests(runtimeID, active) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
go/runtime/bundle/manager.go
Outdated
for _, dir := range dirs { | ||
m.removeBundle(dir) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is potential for race condition here inside cleanStaleBundles
:
Given that bundle.Registry
is accessible inside runtime.Registry
this means there can be another thread calling AddManifests
. We don't do it rn but the code allows it so someone may later extend it?
In case of unfortunate interleaving you may**:
- Explode bundles first
- Delete manifest hashes (
m.store.RemoveManifests
), part ofcleanStaleBundles
so discovery thread. - Add manifest hashes (
AddManifest
), e.g. call inside runtime registry thread. - Remove bundles via
m.removeBundleDir
, part ofcleanStaleBundles
so discovery thread.
Effectively you deleted bundles your registry believes it has.
** (note this is not scenario at init)
- In theory this should not happen as you should not be adding bundles with version lower than active?
- Maybe we want to add a comment or even make
store
more defensive to reject bundles lower then active? - What are alternatives, we could lock/make this atomic but then this has to become part of
bundle.registry
meaning this would break abstraction we wanted:discovery
is responsible for exploded directories (on disk representation) + fetching and cleaning, whilst the registry has in memory parts...
c405417
to
3b4eb79
Compare
Correct. I was just confirming I made sure my e2e test is failing when clean-up was not implemented as you write above. This was happening here: #5976 (comment) when I initially mis-understood the registry updates, thus deleting things to early. :) |
Freshly rebased on top of #6003. Should be ready for a second round of reviews. :) |
What
Furthermore, we fix the current bug (-> done as part of go/runtime/bundle: Cleanup bundles on startup #6003master
) of not being able to remove runtime from the configuration (see Periodically clean up cached bundles directory #5976 (comment))Why
Save on disk usage/ease the maintenance.
How
Regular and detached exploded bundles no longer present in the config, are removed during discovery startup. This way we are not blocking initialization-> Done here go/runtime/bundle: Cleanup bundles on startup #6003How to test
e2e