Single model mode #223

peldszus · 2024-05-31T13:21:09Z

I added a mode in which all client connections use the same single model, instead of instantiating a new model for each connection. This only applies if a custom model has been specified at server start (i.e. a trt model or a custom fw model).

For this a new option has been added, defaulting to false, so that the current behaviour is not changed.

This partially resolves #109, but only for custom models. It does not apply for a fw-backend which dynamically loads standard-models based on the client request.

A thread lock is used, to make model prediction thread safe. But this also means that connections have to wait, if another connection currently predicts.

Motivation

I use a large-v3 tensorrt model. It would take 5secs to load for every new client connection. With the single model option, this is reduced to <1sec. Also, I only want to have the model in VRAM once.

- Use a threadlock around the model in single model mode

makaveli10 · 2024-06-03T16:05:34Z

@peldszus Thanks for putting this together.
I think it would be useful to make the single model mode as the default model?

peldszus · 2024-06-04T07:43:36Z

I think it would be useful to make the single model mode as the default model?

It probably depends on your intended audience, but I agree.

For private users with no, or just one GPU, single model mode is probably the best option.

In production environments with higher throughput, maybe with multiple GPUs on one node, there are different scaling / optimizations routes to go, but they could involve single model mode as well.

peldszus · 2024-06-04T07:48:24Z

@makaveli10 If this is fine for you, I can adjust the argument parser and the readme.

peldszus · 2024-06-05T08:42:20Z

@makaveli10 Have a look now, I updated the option defaultness and the readme correspondingly.

makaveli10 · 2024-06-07T05:40:14Z

Looks good to me. I would add an option to use single model if not using a custom model as well but I guess for a future realease because that is a bit more complicated in terms of maintaining a dict of models with model_sizes that have been instantiated and clearing it up if no client is using that model size.

peldszus added 3 commits May 31, 2024 15:06

Add single model mode for custom models

3c09289

- Use a threadlock around the model in single model mode

Raise error for invalid model paths

3a96f60

Update README

1ac7a27

peldszus mentioned this pull request Jun 4, 2024

Update tensorrt llm to v0.9.0 #227

Merged

Make single model mode the default, update readme

ab17c4d

peldszus force-pushed the single-model-mode branch from 2933189 to ab17c4d Compare June 5, 2024 07:47

Fix argparser option

1407731

peldszus force-pushed the single-model-mode branch from b7e68ab to 1407731 Compare June 5, 2024 08:34

makaveli10 merged commit 5b9bc2b into collabora:main Jun 7, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single model mode #223

Single model mode #223

peldszus commented May 31, 2024

makaveli10 commented Jun 3, 2024

peldszus commented Jun 4, 2024

peldszus commented Jun 4, 2024

peldszus commented Jun 5, 2024

makaveli10 commented Jun 7, 2024

Single model mode #223

Single model mode #223

Conversation

peldszus commented May 31, 2024

Motivation

makaveli10 commented Jun 3, 2024

peldszus commented Jun 4, 2024

peldszus commented Jun 4, 2024

peldszus commented Jun 5, 2024

makaveli10 commented Jun 7, 2024