Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single model mode #223

Merged
merged 5 commits into from
Jun 7, 2024
Merged

Single model mode #223

merged 5 commits into from
Jun 7, 2024

Conversation

peldszus
Copy link
Contributor

I added a mode in which all client connections use the same single model, instead of instantiating a new model for each connection. This only applies if a custom model has been specified at server start (i.e. a trt model or a custom fw model).

For this a new option has been added, defaulting to false, so that the current behaviour is not changed.

This partially resolves #109, but only for custom models. It does not apply for a fw-backend which dynamically loads standard-models based on the client request.

A thread lock is used, to make model prediction thread safe. But this also means that connections have to wait, if another connection currently predicts.

Motivation

I use a large-v3 tensorrt model. It would take 5secs to load for every new client connection. With the single model option, this is reduced to <1sec. Also, I only want to have the model in VRAM once.

@makaveli10
Copy link
Collaborator

@peldszus Thanks for putting this together.
I think it would be useful to make the single model mode as the default model?

@peldszus
Copy link
Contributor Author

peldszus commented Jun 4, 2024

I think it would be useful to make the single model mode as the default model?

It probably depends on your intended audience, but I agree.

For private users with no, or just one GPU, single model mode is probably the best option.

In production environments with higher throughput, maybe with multiple GPUs on one node, there are different scaling / optimizations routes to go, but they could involve single model mode as well.

@peldszus
Copy link
Contributor Author

peldszus commented Jun 4, 2024

@makaveli10 If this is fine for you, I can adjust the argument parser and the readme.

@peldszus peldszus force-pushed the single-model-mode branch from 2933189 to ab17c4d Compare June 5, 2024 07:47
@peldszus peldszus force-pushed the single-model-mode branch from b7e68ab to 1407731 Compare June 5, 2024 08:34
@peldszus
Copy link
Contributor Author

peldszus commented Jun 5, 2024

@makaveli10 Have a look now, I updated the option defaultness and the readme correspondingly.

@makaveli10
Copy link
Collaborator

Looks good to me. I would add an option to use single model if not using a custom model as well but I guess for a future realease because that is a bit more complicated in terms of maintaining a dict of models with model_sizes that have been instantiated and clearing it up if no client is using that model size.

@makaveli10 makaveli10 merged commit 5b9bc2b into collabora:main Jun 7, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Individual model loaded for each client
2 participants