-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Time-based Refresh #261
Comments
That's an interesting use-case, thanks for the suggestion. I think the only quirk to figure out will be how to evolve the code-generated entry types. We try to minimize the number of fields to the configuration's minimum to reduce memory overhead. Basically the issues are:
If that all works out, we can brainstorm the |
I find a idea that can partly resolve my problem, https://cache2k.org/docs/latest/user-guide.html#refresh-ahead |
but this will only refresh once if no query after refresh period, that means it's only make expire time twice....if |
Yes, that might work. If I understand correctly, it uses a I think we can add this feature fairly easily, but I don't know when I'll have the time to work on it. So I think in the short-term you should choose whatever is best for you and not wait on me. |
@denghongcai just saw the discussion here and the reference to cache2k. Some remarks: I see expiry and refresh as two sides of the same coin. A value is considered fresh (enough) for some period of time, after that it expires or gets refreshed. Historically, in cache2k, the expiry policy was named refresh policy, because its very common that we use the automatic refresh in our applications. Besides, expiry and eviction are quite negative words. Refresh is more positive. A typical way to deal with the cache invalidation problem is to set a low To understand my point better, maybe I need to note that I think of the term "expiry" differently than it may be used in Guava, Caffeine or EHCache. In these caches, historically, expiry means "eviction based on time". In my usage expiry means just the value is not valid any more, whether or if at all the entry gets evicted is a separate thing. In cache2k an entry may expire but still be hold by the cache, for example to reuse the data for a conditional reload like "if-modified-since". IMHO, with the additional features in caches its more crucial to keep the naming and semantics separate. That's my take on it. @denghongcai wrote:
Right now the refresh ahead feature in cache2k only refreshes automatically once. If no access happens after the second refresh, the value expires after the expiry duration. That's a heuristic and turns out to work well for most use cases but not for all. I am thinking quite a long time to improve that, see some first discussion here: cache2k/cache2k#34 Please give some more hints about your goals. How often or how long should a value be refreshed without being accessed? @ben-manes wrote:
No, entries disappear if unused. See above. @ben-manes wrote:
Here is a mayor difference. In cache2k the refresh is triggered always at the expiry time, to have a fresh value when next used. There is a only a small time gap, which is the load time, when stale data might be served. In Caffeine, if you'd set the refresh time to CDN max age and a higher expire time, the cache may return stale data that is much older. To avoid this, you would set the refresh interval to a smaller value then max age and the expiry to max age. Caffeine does not refresh entries which have no activity, to avoid unnecessary system loads. That's a fair reason. I decided not to do that with the big picture in mind. The basic concept of a cache is that it contains entries that are used often. If the system load because of refreshing becomes a problem, the cache size could be reduced. If a lot of entries should be cached, but only a select few of hot entries should be refreshed, a smaller cache with refreshing enabled could be stacked on a huge cache. The real problem I see is predictable behavior. I don't think that users really do an in depth study of the semantics. What is probably happening is that the refresh time in Caffeine is configured as the time span the value is considered fresh enough. In the CDN case this means it is set identical to max-age. But the refresh time parameter has no guarantee at all. The age of the returned data can be rather arbitrary and depends on activity. On the other hand, triggering the refresh upon the activity has the advantage to add more randomness. With things always happening on constant intervals its more likely that resource spikes occur. |
@cruftex Thanks for your kindly remarks. In my scenario, my cache behavior have to work as a CDN. It means a cache entry will be specified a The problem I have met is, when system QPS is high, and they are getting same cache entry, if that time point, the cache entry just expired, all requests will stall on cache loading operation at the same time and after loading finished, resume to process at the same time. Such situation will lead to higher CPU Usage and increase JVM Memory pressure or cause FullGC. So I want a mechanism to ensure system have predictable performance. get -> entry which satisfied Only a entry not be accessed after a long time(expire time), the entry removed from cache. |
Thanks Jens, it’s interesting to hear how you approached things and where my understanding of how cache2k’s mechanisms work is incorrect. |
@denghongcai, @ben-manes thanks for the interesting discussion. @denghongcai
Is that correct? That's interesting, since my initial idea of the shortcoming of cache2k's approach was exactly the opposite: It might be needed to refresh more than once even if there is no activity. About your CDN use case: There is a lot content, that uses versioned URLs and high Solutions I can think of, a little bit more concrete:
Variant 2 gives a lot of flexibility to adapt to specific needs. After some more thought I think its reasonable to make the L1 cache not much smaller or even the same size of L2. Since both caches store references on the same data, the only overhead is the internal data structures of the cache. So the decision on the L1 cache sizing is the trade off between the additional overhead and the reload from L2 in case the entry gets evicted from L1. BTW: We use cache2k a lot for proxies, that's why there is the BTW 2: The variant 2 makes also sense because it might be useful to implement L2 via a distributed cache in your CDN use case. Looking at the standard refresh ahead behavior of Caffeine and cache2k again, I'll take away that Caffeine's behavior could lead to unwanted stale data reads, while cache2k's behavior leads to unnecessary resource usages, which could be quite massive, if not limited via the thread pool. Let's see, maybe I come up with some better general semantics here. |
@cruftex Thanks for your advice. I'm currently using L1/L2 way, just like CDN L1/L2. L1 cache size is must smaller than L2, and use Cache2K
|
@denghongcai Sounds good!
You could start an async load job that is simply a I realize that an async version of the |
Hai @ben-manes Based on this thread, So the cache can't refresh automatically, when the key is expired or evicted, Right? May I request this feature? or could you give me some guidance, if I want to create PR for this feature, which file that should modified? Thank a lot for your great library |
The current refresh capability is that when an entry is accessed and it's age exceeds the refresh time, then it will be asynchronously reloaded. If the entry is not accessed and exceeds the expiration interval, then it will be evicted. However the refresh interval is global, and the ask would be to let it be customized per-entry. If you are describing proactively refreshing, e.g. maintaining a known hot set by reloading based on an interval, then I'm not certain that proactive refresh would be a desirable feature for the cache itself. The existing scheme has the benefit that stale entries are refreshed without incurring user-facing latencies, while allowing inactive ones to be removed. If some entries could be proactively refreshed but others not, that's a bit of a complicated scheme to grasp. It could be done by the cache using some decider interface and a thread, but I'm not certain why the cache would be better than custom logic. So I'm not entirely sure what you'd change. But if you want to explore, the code is complex and located in BoundedLocalCache. To avoid our own scheduling thread, we'd want to make use of JDK9's shared thread by way of |
Consolidating into #504 |
just like Time-based Expire
my scenario is use caffeine to simulate CDN cache, a k-v must expire after 10 minutes, but may refresh obey cdn source site response's
s-maxage
, refresh time varies.The text was updated successfully, but these errors were encountered: