[Feature] Add support for async OAuth token refreshes #1135
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
This PR aims at eliminating long-tail latency due to OAuth token refreshes in scenarios where a single client is responsible for a relatively high (e.g. > 1 QPS) continuous outbound traffic. The feature is disabled by default — which arguably makes this PR a functional no-op.
Precisely, the PR introduces a new token cache which attempts to always keep its token fresh by asynchronously refreshing the token before it expires. We differentiate three token states:
fresh
: The token is valid and is not close to its expiration.stale
: The token is valid but will expire soon.expired
: The token has expired and cannot be used.Each time a request tries to access the token, we do the following:
fresh
, return the current token;stale
, trigger an asynchronous refresh and return the current token;expired
, make a blocking refresh call to update the token and return it.In particular, asynchronous refreshes use a lock to guarantee that there can only be one pending refresh at a given time.
The performance of the algorithm depends on the length of the
stale
andfresh
periods. On the first hand, thestale
period must be long enough to prevent tokens from entering the expired state. On the other hand, a longstale
period reduces the length of thefresh
period, thus increasing the refresh frequency.Right now, the
stale
period is configured to 3 minutes by default (i.e. 5% of the expected token lifespan of 1 hour). This value might be changed in the future to guarantee that the default behavior achieves the best performance for the majority of users.For reviewers: this PR only uses the new cache in control-plane auth flows; I plan to send a follow-up PR to enable asynchronous refresh in data-plane flows once this one has been merged.
How is this tested?
Complete test coverage with a focus on various concurrency scenarios.