Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix random 403 responses #25290

Merged

Conversation

OndroMih
Copy link
Contributor

Fixes #25141.

The securityContext is stateful, cannot be shared among requests. RealmAdapter stores the principal from request to it. We need to create a copy of the default context for each request instead of using the shared default context.

This is a quick fix but I can imagine a better and nicer fix. For example, keep returning defaultSecurityContext and instead let RealmAdapter to create a new security context if a user is signed in, and then store the principal to it. This would keep the same defaultSecurityContext for anonymous user.

Another option is to save the sessionPrincipal into a ThreadLocal. Although the SecurityContext is already stored in a thread local, the default context is shared in multiple thread locals. Storing sessionPrincipal in another thread local would keep it only for the current thread even if the security context is shared. However, I don't like adding another thread local.

Maybe @arjantijms , @dmatej , @avpinchuk , @pzygielo, @hs536 you have some idea to improve it? Or it's OK to go with my simple solution?

About tests, I'm not sure if adding tests makes sense. The issue is random. Sometimes doesn't happen even after 1000 requests, especially on a slower machine. So the test would not be reliable and might not fail each time even if the behavior is broken.

The securityContext is stateful, cannot be shared among requests. 
RealmAdapter stores the principal from request to it. We need to create 
a copy of the default context for each request instead of 
using the shared default context.
@dmatej
Copy link
Contributor

dmatej commented Dec 17, 2024

12:25:15       [exec] Caused by: java.lang.NullPointerException
12:25:15       [exec] 	at com.sun.s1peqe.security.ssl.converter.ejb.ConverterBean.myCallerPrincipal(ConverterBean.java:38)

@hs536
Copy link
Contributor

hs536 commented Dec 18, 2024

Hmm... that's odd behavior... when I read this, it feels more like a JVM bug than a GlassFish one.
it appears the default context should be immutable across requests, so regenerating it should be approached cautiously.

@OndroMih
Copy link
Contributor Author

Hi @hs536,

Hmm... that's odd behavior... when I read this, it feels more like a JVM bug than a GlassFish one.
it appears the default context should be immutable across requests, so regenerating it should be approached cautiously.

It's not a bug in JVM. The problem happens because one request sets the sessionPrincipal and then deletes it after the request processing. The security context is the default one. Although it's stored in a thread local, it's always the same object and thus it's shared by all requests. So if multiple requests run at the same time, they access their thread local variable, which gives them the shared default security context and manipulate the sessionPrincipal in it. If one of the requests is too short and clears the context, another request may see empty sessionPrincipal when it's evaluating the permissions and returns 403.

My fix avoids this by creating a new copy of the default context for each thread. This seems to break some expectations.

Another option is to save the sessionPrincipal into another ThreadLocal inside the security context. Then each thread would manipulate its own sessionPrincipal variable, even if they shared the same security context. It feels a bit hacky, I don't like storing values in thread locals, but the current solution seems to be even more hacky, because on one hand it expects the security context is immutable (reusing the default context for many threads), on another hand, its mutable because threads modify the sessionPrincipal.

@hs536
Copy link
Contributor

hs536 commented Dec 19, 2024

My apologies, I misunderstood. If the problem is that a specific thread is deleting (or modifying) the shared defaultSecurityContext's sessionPrincipal (and this modification is allowed), wouldn't removing static from defaultSecurityContext resolve the issue...?

private static SecurityContext defaultSecurityContext = generateDefaultSecurityContext();

@OndroMih
Copy link
Contributor Author

wouldn't removing static from defaultSecurityContext resolve the issue...?

I effectively did this with the current solution in this PR - a new security context is created in a thread local if it doesn't exist, instead of using the one in the static variable. It's not possible to just remove the static keyword, because the variable is access from static methods.

However, I think the null pointer now is related to this code:

return this == defaultSecurityContext ? getDefaultCallerPrincipal() : initiator;

It checks whether the context is the same as the defaultSecurityContext context, which it isn't anymore, it's just a copy of it. And then it returns initiator, which is most probably null in this case.

I'll rework the solution to store sessionPrincipal into a thread-local so that it's different for each thread, while the context itself is the same for all threads, if it's the default one. That should keep the logic and still fix the issue with the race condition and random 403 responses.

This way the behavior is not modified for the default security context, which will remain a singleton. 
The session principal is stored in a thread local so that each request has its own value even if they 
share the same default security context.
@arjantijms
Copy link
Contributor

Maybe the field serverGeneratedSecurityContext should be in thread-local storage too?

As mentioned in the description; having all these thread locals is kinda confusing when reading the code. The security context itself is in a thread local, but it's individual fields are too.

Probably we need to take a good look at this entire setup later again.

@OndroMih
Copy link
Contributor Author

Yes, I would review it later. Now, it’s clear what’s happening and my fix is simple and reliable.

With my solution, we have a thread-local in another thread local, maily because we need to store a mutable value in a structure which was designed to be immutable and sharable between multiple threads. Although securityContext is in thread local, the default security context is stored in multiple thread locals and this shared by mutliple threads. It’s OK if the structure is immutable. But after we added mutable state, it resulted in race conditions. If we make each security context unique and change rhe way it’s compared to the default context, then we can remove the nested thread local for sessionPrincipal.

@dmatej dmatej added this to the 7.0.21 milestone Dec 25, 2024
@dmatej dmatej added the bug Something isn't working label Dec 25, 2024
@dmatej dmatej requested a review from a team December 25, 2024 12:44
Based on review suggestion
@OndroMih OndroMih requested a review from a team December 30, 2024 10:22
@arjantijms
Copy link
Contributor

@arjantijms arjantijms merged commit eda0e4d into eclipse-ee4j:master Dec 31, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sometimes 403 Forbidden server response when websocket message call ajax update with identity store
5 participants