Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argon2id hashed passwords in userdb #2102

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

zungur
Copy link
Collaborator

@zungur zungur commented Jul 3, 2024

This (darft) PR assumes that all the passwords are hashed in the DB
Will provide a script for this later on

  • User Argon2id for password hashing
  • Cache hash comparison result of least recent auths
  • Register new users with hashed passwords in userdb
  • Hash existing passwords on first authentication
  • Remove password output to stdout/stderr
  • New error message when username is not registered
  • Order of auth errors are preserved (i.e., wrong pwd > blocked > pending)
  • New unit tests

@zungur zungur requested a review from ppigazzini July 3, 2024 09:11
@zungur zungur force-pushed the hashed_pwds_argon2id branch from 1644410 to 85c00fc Compare July 3, 2024 09:27
@zungur zungur force-pushed the hashed_pwds_argon2id branch from d2f2818 to ab2608a Compare July 3, 2024 09:30
@peregrineshahin
Copy link
Contributor

This is certainly not how to do it, and it's not even better than nothing.. this solution is unheard of in the industry.. because of two things.. first you have no clear way how to invalidate the hash for users that change passwords.. second the idea of sending the username and the password over the network with each request is bad.. this is why a cookie based management or even a session based is preferred, because the problem lies from the worker side it doesn't have a token or a cookie.. I suggest you look up how to implement a cookie-session based solution instead, you can either cheat how we do it for views and web pages or change our system to use oAuth.

@vdbergh
Copy link
Contributor

vdbergh commented Jul 4, 2024

Api tokens/cookies etc.. are not enough. One needs a challenge response system so that the username/password never need to be sent over the channel.

But it seems to me that then you may as well use TLS with client side certificates (with the server being the CA).

EDIT: IUC then OAUTH2 is challenge response based.

@vdbergh
Copy link
Contributor

vdbergh commented Jul 4, 2024

I was thinking that a token based system might be used to solve the curious case of Sylvain27.

A token would contain the worker name. We could then disable the bad worker by invalidating the corresponding token. If the user circumvents this by changing the worker name then we could revoke their right to request a new token. This means that their old workers could continue to run but it would not be possible to start new ones.

@peregrineshahin
Copy link
Contributor

peregrineshahin commented Jul 4, 2024

As this PR discusses an account login/block and not a particular worker block,
two easy solutions solve the problem without needing to cache the password hashes or even send the credentials again and using only HTTP and without oAuth2 if you wish for simplicity:
1- A stateless signed cookie issued from the server just as what the pyramid framework method we use, this cookie is used in every subsequent request, and invalidating it is done by forgetting the cookie/session in the response, and thus user has no cookie in the next subsequent request and such a "session" is in a way invalidated

    from pyramid.session import SignedCookieSessionFactory
    session_factory = SignedCookieSessionFactory("fishtest")
    config = Configurator(
        settings=settings,
        session_factory=session_factory,
        root_factory="fishtest.models.RootFactory",
    )

    def check_blocked_user(event):
        request = event.request
        if request.authenticated_userid is not None:
            auth_user_id = request.authenticated_userid
            if is_user_blocked(auth_user_id, request.userdb):
                session = request.session
                headers = forget(request)
                session.invalidate()
                raise HTTPFound(location=request.route_url("tests"), headers=headers)

    config.add_subscriber(check_blocked_user, NewRequest) # logout the user immediatly

2- A server-side session management
now if you are afraid that the user might reuse an old issued by-us-cookie, then sessions are considered more secure than cookies because the variables themselves are kept on the server:
You can use the previous method while also maintaining session storage in the server such that even a signed cookie or a random token will not work anymore if the session is invalidated on the server with delete sessions[session_id].

@peregrineshahin
Copy link
Contributor

worker-side code

import requests
import json

def login():
    url = 'http://0.0.0.0:6543/login'
    response = requests.post(url)
    
    if response.status_code == 200:
        print("Logged in successfully")
        return response.cookies
    else:
        print("Failed to log in:", response.status_code)
        return None

def send_request(cookies):
    url = 'http://0.0.0.0:6543/hello'
    payload = {'name': 'Worker'}
    headers = {'Content-Type': 'application/json'}
    
    response = requests.post(url, data=json.dumps(payload), headers=headers, cookies=cookies)
    
    if response.status_code == 200:
        print("Response from server:", response.json())
    elif response.status_code == 401:
        print("Unauthorized: Invalid session")
    else:
        print("Failed to get a response from server:", response.status_code)

def logout(cookies):
    url = 'http://0.0.0.0:6543/logout'
    response = requests.post(url, cookies=cookies)
    
    if response.status_code == 200:
        print("Logged out successfully")
    else:
        print("Failed to log out:", response.status_code)

if __name__ == '__main__':
    cookies = login()
    if cookies:
        send_request(cookies)
        logout(cookies)
        send_request(cookies)  # This should fail with unauthorized

@vdbergh
Copy link
Contributor

vdbergh commented Jul 4, 2024

Are you using a challenge response system? Otherwise the cookie serves the same role as the password (for an observer of the insecure channel)...

@peregrineshahin
Copy link
Contributor

Are you using a challenge response system? Otherwise the cookie serves the same role as the password (for an observer of the insecure channel)...

No that's a bit complicated for a PR that wants to cache hashed passwords....

@vdbergh
Copy link
Contributor

vdbergh commented Jul 4, 2024

To me it is not clear why a naive token based system (without challenge response) is more secure than just sending username password with every request.

I am interested in hearing arguments about this.

@peregrineshahin
Copy link
Contributor

peregrineshahin commented Jul 4, 2024

The question is not only about security it's about efficiency.
It depends on what do you mean by a naive token system, but I'm going to present a simple argument,
generally if you don't know the password you cannot change it, that's one big difference.
so if you don't have any session self-invalidating measures on the server that expire such a token without any human-will and expire such session (which you should and are the other benefits) then at the very least the account owner or the server can force the invalidation of the session in case an attack is reported.
If you log in at a big platform and someone has hijacked your session in someway then you will always have the option of immediately kicking them out by changing the password and thus invalidating other sessions.

@vondele
Copy link
Member

vondele commented Jul 4, 2024

I hope we agree that having hashed passwords in the db is a step forward from the current setup? I personally think that what we have now is sub-standard (even though understandable from the legacy point of view).

It is also clear we need something that is performant enough to do a few 100 authorized API calls per second on a single core.

I wouldn't mind moving forward from the status-quo and improving gradually afterward. If we can do most api calls with short-lived tokens handled in a standard way in the (near) future, that's even better. Nevertheless, I assume we would need something like the current PR anyway for the login part of #2102 (comment) ?

@zungur
Copy link
Collaborator Author

zungur commented Jul 4, 2024

Agreed on the gradual progress.

I am working on a hashlib.scrypt based alternative, which uses a standard library (@ppigazzini prefers this way as well) and is potentially more lightweight. I can also have a third one with hashlib.pbkdf2_hmac.

@zungur zungur marked this pull request as draft July 4, 2024 20:35
@zungur
Copy link
Collaborator Author

zungur commented Jul 5, 2024

Some tests using argon2id-cffi and hashlib.scrypt on a stronger hardware, without LRU caching, each with 5 different OWASP recommended parameters. Measurements are from built-in time and tracemalloc packages:

$ cat /proc/cpuinfo | grep "model name" | head -n1
model name	: AMD Ryzen 5 5600 6-Core Processor
hash_password_argon2id (1000 concurrent):
time/memory/parallelism: 1/47104/1 duration: 39.817909577 ms/it
time/memory/parallelism: 2/19456/1 duration: 14.991945146 ms/it
time/memory/parallelism: 3/12288/1 duration: 13.601312185 ms/it
time/memory/parallelism: 4/9216/1 duration: 13.152313787 ms/it
time/memory/parallelism: 5/7168/1 duration: 12.642153341 ms/it
Total allocated size: 7.6 KiB

check_password_argon2id (1000 concurrent):
time/memory/parallelism: 1/47104/1 duration: 39.813287391 ms/it
time/memory/parallelism: 2/19456/1 duration: 14.750067388 ms/it
time/memory/parallelism: 3/12288/1 duration: 13.407013378 ms/it
time/memory/parallelism: 4/9216/1 duration: 13.148232526 ms/it
time/memory/parallelism: 5/7168/1 duration: 12.663382961 ms/it
Total allocated size: 3.0 KiB

hash_password_scrypt (1000 concurrent):
mem cost/block size/parallelism: 131072/8/1 duration: 306.420782012 ms/it
mem cost/block size/parallelism: 65536/8/2 duration: 270.386434516 ms/it
mem cost/block size/parallelism: 32768/8/3 duration: 187.657989876 ms/it
mem cost/block size/parallelism: 16384/8/5 duration: 140.391250444 ms/it
mem cost/block size/parallelism: 8192/8/10 duration: 138.487072246 ms/it
Total allocated size: 2.5 KiB

check_password_scrypt (1000 concurrent):
mem cost/block size/parallelism: 131072/8/1 duration: 309.913226842 ms/it
mem cost/block size/parallelism: 65536/8/2 duration: 274.852901883 ms/it
mem cost/block size/parallelism: 32768/8/3 duration: 188.79400044 ms/it
mem cost/block size/parallelism: 16384/8/5 duration: 140.181087759 ms/it
mem cost/block size/parallelism: 8192/8/10 duration: 138.019926974 ms/it
Total allocated size: 0.7 KiB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants