-
Notifications
You must be signed in to change notification settings - Fork 937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auth: OIDC login flow fails in an LXD cluster if different members handle different stages of the flow #13644
Comments
@markylaing when you get a moment I'd appreciate your thoughts on how we could approach this one. Could we encode (& encrypt) the necessary info in the cookied/return URL of the flow such that its available to the other cluster member? |
I think @mas-who's suggestion of encoding it in the state parameter sounds reasonable. That said, we could also recommend a load-balancer configuration in our documentation to enable e.g. sticky sessions . |
Sounds good thanks! |
I've been thinking a little more on this. A better approach might be to use the same cookie encryption mechanism as with browser cookies. We can use the session ID in the state parameter as it is unencrypted. I'm not sure this will work with the |
We can't use the session ID. The cookie block and hash keys are for the relying party and need to persist, otherwise we would be performing OIDC discovery on every request. As we can't use the session ID, there would be no way to provide a salt when deriving keys from the cluster private key. We would need some cluster-wide mechanism of rotating a shared salt for this to work. |
Another idea for this: Set up a background task in LXD to generate a UUID or token and store it in the database. The UUID should be overwritten on a configurable rotation. Only the dqlite leader should perform this task. The leader should send an internal request to other members indicating that the key has been updated. The UUID should be passed into the OIDC verifier, the verifier can derive a private key from the cluster private key using the UUID as a shared salt, and use this for the hash and block keys for the relying party cookie configuration. |
This sounds like a good approach. I see two open issues:
A possible solution is to try to decode with both old and new salt for some grace period during/after the salt update. Additionally, do not send unreliable "salt was updated" messages between cluster nodes, but reload the UUID from database every n minutes. Make the grace period 2*n minutes. |
For the first point, we already rotate cookie encryption keys for the OIDC verifier every 5 minutes and haven't encountered this issue. I'd say the configurable duration should be not less than 5 minutes and we should not accept the old salt (more secure). The worst case scenario is that someone is redirected back to the login page to start again. For the second point, we send internal updates this way in a lot of cases (updating the identity cache is one). This way we can ensure the new UUID is written and replicated before the other members refresh it. If a member is offline, it will load the new UUID when it comes back online (currently done for the identity cache). One case we haven't encountered yet is a network blip causing the notification to fail when a member is actually online. In this case we may need to implement an exponential backoff, but since other mechanisms also rely on this, we'll implement this in a general way in a separate PR (if it becomes an issue). |
We could even include it in the heartbeat payload if needed, which is sent regularly to all members from the leader. |
I'm not a security expert but the session ID is used as a salt when deriving a secure key from the cluster private key using HKDF. An actor must have both the cluster private key and the session ID to derive a key which can be used to decrypt the |
Never mind, after looking into it again I realize that I misunderstood how the code works. |
Issue description
The OIDC login will fail in a LXD cluster if different members handle different stages of the flow. This is likely to occur if load balancing is deployed in front of the cluster.
When a user initiates the login flow, the request will reach
/oidc/login
and astate
parameter will be set in the redirect request to idp auth endpoint. Upon successful user authentication, the idp will make a request to the/oidc/callback
endpoint and thestate
parameter will need to be matched (data can be extracted from thestate
parameter). Since in the scenario where load balancing is active, it is likely that the cluster member that receives the/oidc/login
request would be different to the cluster member that receives the/oidc/callback
request, causingstate
to be mismatched which in turn will cause the login flow to fail.Some potential ideas on how to resolve this issue:
/oidc/callback
endpoint.state
parameter so that we can extract this information in the/oidc/callback
endpoint. If the extracted member address is not the same as the current member address, then we can forward the request to the correct member based on the extracted info.Steps to reproduce
Test with LXD cluster:
/oidc/callback
of lxd-node-2 I get an error message: "failed to get state: securecookie: the value is not valid"The text was updated successfully, but these errors were encountered: