Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auth: OIDC login flow fails in an LXD cluster if different members handle different stages of the flow #13644

Open
mas-who opened this issue Jun 21, 2024 · 11 comments · May be fixed by #15011
Open
Labels
Bug Confirmed to be a bug
Milestone

Comments

@mas-who
Copy link

mas-who commented Jun 21, 2024

Issue description

The OIDC login will fail in a LXD cluster if different members handle different stages of the flow. This is likely to occur if load balancing is deployed in front of the cluster.

When a user initiates the login flow, the request will reach /oidc/login and a state parameter will be set in the redirect request to idp auth endpoint. Upon successful user authentication, the idp will make a request to the /oidc/callback endpoint and the state parameter will need to be matched (data can be extracted from the state parameter). Since in the scenario where load balancing is active, it is likely that the cluster member that receives the /oidc/login request would be different to the cluster member that receives the /oidc/callback request, causing state to be mismatched which in turn will cause the login flow to fail.

Some potential ideas on how to resolve this issue:

  1. Store the auth state in dqlite so that we can always read out this data when processing a request to the /oidc/callback endpoint.
  2. Embed the member address in the state parameter so that we can extract this information in the /oidc/callback endpoint. If the extracted member address is not the same as the current member address, then we can forward the request to the correct member based on the extracted info.

Steps to reproduce

Test with LXD cluster:

  1. Started oidc flow with a proxy in front, proxy pointing to lxd-node-1
  2. While on the octa login page, change proxy to point to lxd-node-2 of the same cluster
  3. continue login process
  4. on hitting the /oidc/callback of lxd-node-2 I get an error message: "failed to get state: securecookie: the value is not valid"
@tomponline tomponline added the Bug Confirmed to be a bug label Jun 21, 2024
@tomponline tomponline added this to the lxd-6.2 milestone Jun 21, 2024
@tomponline
Copy link
Member

@markylaing when you get a moment I'd appreciate your thoughts on how we could approach this one.

Could we encode (& encrypt) the necessary info in the cookied/return URL of the flow such that its available to the other cluster member?

@tomponline tomponline changed the title The OIDC login flow fails in an LXD cluster if different members handle different stages of the flow Auth: OIDC login flow fails in an LXD cluster if different members handle different stages of the flow Jun 21, 2024
@markylaing
Copy link
Contributor

@markylaing when you get a moment I'd appreciate your thoughts on how we could approach this one.

Could we encode (& encrypt) the necessary info in the cookied/return URL of the flow such that its available to the other cluster member?

I think @mas-who's suggestion of encoding it in the state parameter sounds reasonable. That said, we could also recommend a load-balancer configuration in our documentation to enable e.g. sticky sessions .

@tomponline
Copy link
Member

I think @mas-who's suggestion of encoding it in the state parameter sounds reasonable.

Sounds good thanks!

@markylaing
Copy link
Contributor

I've been thinking a little more on this.

A better approach might be to use the same cookie encryption mechanism as with browser cookies. We can use the session ID in the state parameter as it is unencrypted. I'm not sure this will work with the httphelper package provided by the zitadel OIDC package though.

@markylaing
Copy link
Contributor

I've been thinking a little more on this.

A better approach might be to use the same cookie encryption mechanism as with browser cookies. We can use the session ID in the state parameter as it is unencrypted. I'm not sure this will work with the httphelper package provided by the zitadel OIDC package though.

We can't use the session ID. The cookie block and hash keys are for the relying party and need to persist, otherwise we would be performing OIDC discovery on every request.

As we can't use the session ID, there would be no way to provide a salt when deriving keys from the cluster private key. We would need some cluster-wide mechanism of rotating a shared salt for this to work.

@markylaing
Copy link
Contributor

Another idea for this:

Set up a background task in LXD to generate a UUID or token and store it in the database. The UUID should be overwritten on a configurable rotation. Only the dqlite leader should perform this task. The leader should send an internal request to other members indicating that the key has been updated. The UUID should be passed into the OIDC verifier, the verifier can derive a private key from the cluster private key using the UUID as a shared salt, and use this for the hash and block keys for the relying party cookie configuration.

@edlerd
Copy link
Contributor

edlerd commented Jul 12, 2024

Another idea for this:

Set up a background task in LXD to generate a UUID or token and store it in the database. The UUID should be overwritten on a configurable rotation. Only the dqlite leader should perform this task. The leader should send an internal request to other members indicating that the key has been updated. The UUID should be passed into the OIDC verifier, the verifier can derive a private key from the cluster private key using the UUID as a shared salt, and use this for the hash and block keys for the relying party cookie configuration.

This sounds like a good approach. I see two open issues:

  • On updating the salt, we need to take care of users that started the login flow with the old salt and come back to a node with the new salt.
  • We need to ensure every member updated to the new salt. This can be a problem in case some node is offline for a short time. During being offline, it can not be notified of the new salt. It then comes back online and happily serves users with the old salt.

A possible solution is to try to decode with both old and new salt for some grace period during/after the salt update. Additionally, do not send unreliable "salt was updated" messages between cluster nodes, but reload the UUID from database every n minutes. Make the grace period 2*n minutes.

@markylaing
Copy link
Contributor

For the first point, we already rotate cookie encryption keys for the OIDC verifier every 5 minutes and haven't encountered this issue. I'd say the configurable duration should be not less than 5 minutes and we should not accept the old salt (more secure). The worst case scenario is that someone is redirected back to the login page to start again.

For the second point, we send internal updates this way in a lot of cases (updating the identity cache is one). This way we can ensure the new UUID is written and replicated before the other members refresh it. If a member is offline, it will load the new UUID when it comes back online (currently done for the identity cache). One case we haven't encountered yet is a network blip causing the notification to fail when a member is actually online. In this case we may need to implement an exponential backoff, but since other mechanisms also rely on this, we'll implement this in a general way in a separate PR (if it becomes an issue).

@tomponline
Copy link
Member

We could even include it in the heartbeat payload if needed, which is sent regularly to all members from the leader.

@markylaing
Copy link
Contributor

question: why is it safe to use the same nonce + key pair multiple times? AFAICT the securecookies lib is using AES CTR, which has the weakness that if you have 2 different ciphertexts that were produced using the same nonce/key pair you can deduce the pair. If I understand correctly, the refresh token and id token cookies are issued using the same pair. This means that it should be easy to forge cookies for a known sessionID. Am I missing something?

Relevant issue gorilla/securecookie#66

I'm not a security expert but the session ID is used as a salt when deriving a secure key from the cluster private key using HKDF. An actor must have both the cluster private key and the session ID to derive a key which can be used to decrypt the oidc_identity and oidc_refresh cookies.

@nsklikas
Copy link

Never mind, after looking into it again I realize that I misunderstood how the code works.

@tomponline tomponline modified the milestones: lxd-6.2, lxd-6.3 Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants