Docker compose tensorrt #177

makaveli10 · 2024-03-11T10:04:16Z

optimize tensorrt docker build with multi stages
docker compose setup to build and run container with any model size from command line which builds the models on runtime if they dont exist in the mounted volume.

Should close #164

jooni22 · 2024-03-14T16:37:45Z

Hi, it's possible to build docker image with CUDA_ARCH=52-virtual ?
I got tesla m40 on proxmox as vGPU and during build I get error:

688.3 ptxas /tmp/tmpxft_00000f2a_00000000-6_customAllReduceKernels.ptx, line 2273; error : Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher

makaveli10 · 2024-03-15T15:04:15Z

Hello @jooni22 from the error it seems that its not possible to build on this gpu!

peldszus

Hi, I was trying to get whisper live running using a trt backend. I saw your comment in #164 which brought me here. If you don't mind, I'd bring up a question and two smaller comments here in the MR.

peldszus · 2024-04-19T09:44:38Z

TensorRT_whisper.md

-# convert small multilingual model
-bash scripts/build_whisper_tensorrt.sh /root/TensorRT-LLM-examples small
+# For e.g. 3090 RTX cuda architecture is 86-real
+CUDA_ARCH=86-real docker compose build


I could successfully build an image with these instructions.

It took 50min on a 12-core workstation. Most of the time, 38min, was spent in build-trt-llm.sh, compiling.

I am wondering, why it is necessary to compile TensorRT-LLM from scratch? What is the advantage over simply installing the pre-compiled wheel? Are the 0.7.1 wheels not supporting newer archs?

Btw: I appreciate your efforts to clean up the image. In the end, the image is still 38.4gb big, though, but I guess it would have been even bigger without those efforts. :)

THe pre-compiled wheel doesnt work on all ARCHS but its been a while since we tested, thanks for pointing that out, looking into that next.

peldszus · 2024-04-19T09:47:03Z

TensorRT_whisper.md

-                      --backend tensorrt \
-                      --trt_model_path "path/to/whisper_trt/from/build/step" \
-                      --trt_multilingual
+MODEL_SIZE=small.en BACKEND=tensorrt docker compose up


Unfortunately, I couldn't get any of the model sizes running like that. I always got Invalid model name:

$ MODEL_SIZE=small.en BACKEND=tensorrt docker compose up WARN[0000] /.../WhisperLive/docker-compose.yml: `version` is obsolete [+] Running 0/1 ⠹ Container whisperlive-whisperlive-tensorrt-1 Recreated 0.3s Attaching to whisperlive-tensorrt-1 whisperlive-tensorrt-1 | MODEL_SIZE is set to: small.en whisperlive-tensorrt-1 | BACKEND is set to: tensorrt whisperlive-tensorrt-1 | Running build-models.sh... whisperlive-tensorrt-1 | whisper_small_en directory does not exist or is empty. Building whisper whisperlive-tensorrt-1 | Installing requirements for Whisper TensorRT-LLM ... whisperlive-tensorrt-1 | Invalid model name: whisper_small_en whisperlive-tensorrt-1 exited with code 1

Ah! download_and_build_model() needs to get passed model_name and output_dir as the arguments. Currently only the output_dir is passed and then treated as a model_name in the case-structure.

peldszus · 2024-04-19T11:59:41Z

docker/scripts/install-deps.sh

+#!/bin/bash -e
+
+apt-get update && apt-get -y install git git-lfs
+git clone --depth=1 -b cuda12.2 https://github.com/makaveli10/TensorRT-LLM.git


If I see this right, the difference between your fork+branch and the original TensorRT-LLM repo is only the adding of smaller model parameters to examples/whisper/build.py.

If using an official TensorRT-LLM wheel could be an option (see comment above), then the smaller model options could be added as a simple diff on the example files without the need to use a fork.

peldszus · 2024-04-19T21:28:32Z

docker/scripts/build-whisper-tensorrt.sh

-    echo "Running build script for $model_name with output directory $output_dir"
-    python3 build.py --output_dir "$output_dir" --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --model_name "$model_name"
+    echo "Running TensorRT-LLM build script for $model_name with output directory $output_dir"
+    python3 build.py --output_dir "$output_dir" --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --model_name "$model_name" > /dev/null 2>&1


I'd propose not to dump the stdout/err to devnull.

This is actually hiding errors that might be useful to know when one is starting the service.

This is especially important when the script continues, even if one of its commands errored. I'd thus also recommend to add set -e in the beginning of this and the run-whisperlive.sh script.

An example I just ran into: Some cuda error during the trt-build of the engine. After the hidden error, the next echo told me that the engine was built, but it wasn't.

peldszus · 2024-05-29T11:52:35Z

PS: I have a working tensor rt branch. It is not perfect but it works and the image is not too big. It does not have the compile-model-at-start option yet though. I was considering to open an MR, if you like. Or you could just have a look and I can share my learnings.

makaveli10 · 2024-05-29T16:59:11Z

Feel free to Open a PR or share the branch whatever works best for you. I looked into the pip install and it works so, I started looking into updating the setup already.

peldszus · 2024-05-30T09:05:41Z

Sure, here you go: #221

makaveli10 added 6 commits March 7, 2024 09:11

add docker compose tensorrt llm

35455d9

update logging

1d7d2ff

support multilingual with tensorrt-llm

42ab9ec

Merge remote-tracking branch 'origin' into docker-compose-tensorrt

c6c0a58

option to run faster_whisper backend

5ca8fef

update readme tensorrt docker compose

368444c

makaveli10 mentioned this pull request Mar 11, 2024

Failed Whisper-TensorRT engine build in docker #152

Closed

peldszus reviewed Apr 19, 2024

View reviewed changes

makaveli10 closed this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker compose tensorrt #177

Docker compose tensorrt #177

makaveli10 commented Mar 11, 2024 •

edited

Loading

jooni22 commented Mar 14, 2024

makaveli10 commented Mar 15, 2024

peldszus left a comment

peldszus Apr 19, 2024

makaveli10 May 29, 2024

peldszus Apr 19, 2024

peldszus Apr 19, 2024 •

edited

Loading

peldszus Apr 19, 2024

peldszus Apr 19, 2024

peldszus commented May 29, 2024

makaveli10 commented May 29, 2024

peldszus commented May 30, 2024

Docker compose tensorrt #177

Docker compose tensorrt #177

Conversation

makaveli10 commented Mar 11, 2024 • edited Loading

jooni22 commented Mar 14, 2024

makaveli10 commented Mar 15, 2024

peldszus left a comment

Choose a reason for hiding this comment

peldszus Apr 19, 2024

Choose a reason for hiding this comment

makaveli10 May 29, 2024

Choose a reason for hiding this comment

peldszus Apr 19, 2024

Choose a reason for hiding this comment

peldszus Apr 19, 2024 • edited Loading

Choose a reason for hiding this comment

peldszus Apr 19, 2024

Choose a reason for hiding this comment

peldszus Apr 19, 2024

Choose a reason for hiding this comment

peldszus commented May 29, 2024

makaveli10 commented May 29, 2024

peldszus commented May 30, 2024

makaveli10 commented Mar 11, 2024 •

edited

Loading

peldszus Apr 19, 2024 •

edited

Loading