-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker compose tensorrt #177
Conversation
Hi, it's possible to build docker image with CUDA_ARCH=52-virtual ? 688.3 ptxas /tmp/tmpxft_00000f2a_00000000-6_customAllReduceKernels.ptx, line 2273; error : Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher |
Hello @jooni22 from the error it seems that its not possible to build on this gpu! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I was trying to get whisper live running using a trt backend. I saw your comment in #164 which brought me here. If you don't mind, I'd bring up a question and two smaller comments here in the MR.
# convert small multilingual model | ||
bash scripts/build_whisper_tensorrt.sh /root/TensorRT-LLM-examples small | ||
# For e.g. 3090 RTX cuda architecture is 86-real | ||
CUDA_ARCH=86-real docker compose build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could successfully build an image with these instructions.
It took 50min on a 12-core workstation. Most of the time, 38min, was spent in build-trt-llm.sh
, compiling.
I am wondering, why it is necessary to compile TensorRT-LLM from scratch? What is the advantage over simply installing the pre-compiled wheel? Are the 0.7.1 wheels not supporting newer archs?
Btw: I appreciate your efforts to clean up the image. In the end, the image is still 38.4gb big, though, but I guess it would have been even bigger without those efforts. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THe pre-compiled wheel doesnt work on all ARCHS but its been a while since we tested, thanks for pointing that out, looking into that next.
--backend tensorrt \ | ||
--trt_model_path "path/to/whisper_trt/from/build/step" \ | ||
--trt_multilingual | ||
MODEL_SIZE=small.en BACKEND=tensorrt docker compose up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I couldn't get any of the model sizes running like that. I always got Invalid model name
:
$ MODEL_SIZE=small.en BACKEND=tensorrt docker compose up
WARN[0000] /.../WhisperLive/docker-compose.yml: `version` is obsolete
[+] Running 0/1
⠹ Container whisperlive-whisperlive-tensorrt-1 Recreated 0.3s
Attaching to whisperlive-tensorrt-1
whisperlive-tensorrt-1 | MODEL_SIZE is set to: small.en
whisperlive-tensorrt-1 | BACKEND is set to: tensorrt
whisperlive-tensorrt-1 | Running build-models.sh...
whisperlive-tensorrt-1 | whisper_small_en directory does not exist or is empty. Building whisper
whisperlive-tensorrt-1 | Installing requirements for Whisper TensorRT-LLM ...
whisperlive-tensorrt-1 | Invalid model name: whisper_small_en
whisperlive-tensorrt-1 exited with code 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! download_and_build_model()
needs to get passed model_name
and output_dir
as the arguments. Currently only the output_dir is passed and then treated as a model_name in the case-structure.
#!/bin/bash -e | ||
|
||
apt-get update && apt-get -y install git git-lfs | ||
git clone --depth=1 -b cuda12.2 https://github.com/makaveli10/TensorRT-LLM.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I see this right, the difference between your fork+branch and the original TensorRT-LLM repo is only the adding of smaller model parameters to examples/whisper/build.py
.
If using an official TensorRT-LLM wheel could be an option (see comment above), then the smaller model options could be added as a simple diff on the example files without the need to use a fork.
echo "Running build script for $model_name with output directory $output_dir" | ||
python3 build.py --output_dir "$output_dir" --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --model_name "$model_name" | ||
echo "Running TensorRT-LLM build script for $model_name with output directory $output_dir" | ||
python3 build.py --output_dir "$output_dir" --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --model_name "$model_name" > /dev/null 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose not to dump the stdout/err to devnull.
This is actually hiding errors that might be useful to know when one is starting the service.
This is especially important when the script continues, even if one of its commands errored. I'd thus also recommend to add set -e
in the beginning of this and the run-whisperlive.sh
script.
An example I just ran into: Some cuda error during the trt-build of the engine. After the hidden error, the next echo told me that the engine was built, but it wasn't.
PS: I have a working tensor rt branch. It is not perfect but it works and the image is not too big. It does not have the compile-model-at-start option yet though. I was considering to open an MR, if you like. Or you could just have a look and I can share my learnings. |
Feel free to Open a PR or share the branch whatever works best for you. I looked into the pip install and it works so, I started looking into updating the setup already. |
Sure, here you go: #221 |
Should close #164