Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker compose tensorrt #177

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,17 @@ client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/b
docker run -it --gpus all -p 9090:9090 ghcr.io/collabora/whisperlive-gpu:latest
```

- TensorRT. Follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) in order to setup docker and use TensorRT backend. We provide a pre-built docker image which has TensorRT-LLM built and ready to use.
- TensorRT. Follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md).
```
mkdir docker/scratch-space
cp docker/scripts/build-whisper-tensorrt.sh docker/scratch-space
cp docker/scripts/run-whisperlive.sh docker/scratch-space

# For e.g. 3090 RTX cuda architecture is 86-real
CUDA_ARCH=86-real docker compose build

MODEL_SIZE=small.en BACKEND=tensorrt docker compose up
```

- CPU
```bash
Expand Down
55 changes: 9 additions & 46 deletions TensorRT_whisper.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,56 +12,19 @@ git clone https://github.com/collabora/WhisperLive.git
cd WhisperLive
```

- Pull the TensorRT-LLM docker image which we prebuilt for WhisperLive TensorRT backend.
```bash
docker pull ghcr.io/collabora/whisperbot-base:latest
- Build docker image for the gpu architecture. By default the image is built for 4090 i.e. `CUDA_ARCH=89-real;90-real`
```
mkdir docker/scratch-space
cp docker/scripts/build-whisper-tensorrt.sh docker/scratch-space
cp docker/scripts/run-whisperlive.sh docker/scratch-space

- Next, we run the docker image and mount WhisperLive repo to the containers `/home` directory.
```bash
docker run -it --gpus all --shm-size=8g \
--ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-p 9090:9090 -v /path/to/WhisperLive:/home/WhisperLive \
ghcr.io/collabora/whisperbot-base:latest
```

- Make sure to test the installation.
```bash
# export ENV=${ENV:-/etc/shinit_v2}
# source $ENV
python -c "import torch; import tensorrt; import tensorrt_llm"
```
**NOTE**: Uncomment and update library paths if imports fail.

## Whisper TensorRT Engine
- We build `small.en` and `small` multilingual TensorRT engine. The script logs the path of the directory with Whisper TensorRT engine. We need the model_path to run the server.
```bash
# convert small.en
bash scripts/build_whisper_tensorrt.sh /root/TensorRT-LLM-examples small.en

# convert small multilingual model
bash scripts/build_whisper_tensorrt.sh /root/TensorRT-LLM-examples small
# For e.g. 3090 RTX cuda architecture is 86-real
CUDA_ARCH=86-real docker compose build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could successfully build an image with these instructions.

It took 50min on a 12-core workstation. Most of the time, 38min, was spent in build-trt-llm.sh, compiling.

I am wondering, why it is necessary to compile TensorRT-LLM from scratch? What is the advantage over simply installing the pre-compiled wheel? Are the 0.7.1 wheels not supporting newer archs?

Btw: I appreciate your efforts to clean up the image. In the end, the image is still 38.4gb big, though, but I guess it would have been even bigger without those efforts. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THe pre-compiled wheel doesnt work on all ARCHS but its been a while since we tested, thanks for pointing that out, looking into that next.

```

## Run WhisperLive Server with TensorRT Backend
We run the container with docker compose which builds the tensorrt engine for specified model
if it doesnt exist already in the mounted volume `docker/scratch-space`. Optionally, if you want to run `faster_whisper` backend use `BACKEND=faster_whisper`
```bash
cd /home/WhisperLive

# Install requirements
apt update && bash scripts/setup.sh
pip install -r requirements/server.txt

# Required to create mel spectogram
wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz

# Run English only model
python3 run_server.py --port 9090 \
--backend tensorrt \
--trt_model_path "path/to/whisper_trt/from/build/step"

# Run Multilingual model
python3 run_server.py --port 9090 \
--backend tensorrt \
--trt_model_path "path/to/whisper_trt/from/build/step" \
--trt_multilingual
MODEL_SIZE=small.en BACKEND=tensorrt docker compose up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I couldn't get any of the model sizes running like that. I always got Invalid model name:

$ MODEL_SIZE=small.en BACKEND=tensorrt docker compose up                                                                                                         
WARN[0000] /.../WhisperLive/docker-compose.yml: `version` is obsolete                                                                                                                          
[+] Running 0/1                                                                                                                                                                                                                        
 ⠹ Container whisperlive-whisperlive-tensorrt-1  Recreated                                                                                                                                                                        0.3s 
Attaching to whisperlive-tensorrt-1
whisperlive-tensorrt-1  | MODEL_SIZE is set to: small.en
whisperlive-tensorrt-1  | BACKEND is set to: tensorrt
whisperlive-tensorrt-1  | Running build-models.sh...
whisperlive-tensorrt-1  | whisper_small_en directory does not exist or is empty. Building whisper
whisperlive-tensorrt-1  | Installing requirements for Whisper TensorRT-LLM ...
whisperlive-tensorrt-1  | Invalid model name: whisper_small_en
whisperlive-tensorrt-1 exited with code 1

Copy link
Contributor

@peldszus peldszus Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! download_and_build_model() needs to get passed model_name and output_dir as the arguments. Currently only the output_dir is passed and then treated as a model_name in the case-structure.

```
26 changes: 26 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
version: '3.8'

services:
whisperlive-tensorrt:
build:
context: docker
dockerfile: Dockerfile.tensorrt
args:
CUDA_ARCH: ${CUDA_ARCH:-89-real;90-real}
image: whisperlive-tensorrt:latest
volumes:
- type: bind
source: ./docker/scratch-space
target: /root/scratch-space
environment:
VERBOSE: ${VERBOSE:-false}
MODEL_SIZE: ${MODEL_SIZE:-small.en}
BACKEND: ${BACKEND:-tensorrt}
ports:
- "8000:9090"
deploy:
resources:
reservations:
devices:
- capabilities: ["gpu"]
entrypoint: ["/bin/bash", "-c", "/root/scratch-space/run-whisperlive.sh $$MODEL_SIZE $$BACKEND"]
20 changes: 20 additions & 0 deletions docker/Dockerfile.tensorrt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
ARG BASE_IMAGE=nvcr.io/nvidia/cuda
ARG BASE_TAG=12.2.2-devel-ubuntu22.04

FROM ${BASE_IMAGE}:${BASE_TAG} as base
ARG CUDA_ARCH
ENV CUDA_ARCH=${CUDA_ARCH}

FROM base as devel
WORKDIR /root
COPY scripts/install-deps.sh /root
RUN bash install-deps.sh && rm install-deps.sh
COPY scripts/build-trt-llm.sh /root
RUN bash build-trt-llm.sh && rm build-trt-llm.sh

FROM devel as release
WORKDIR /root/
COPY scripts/install-trt-llm.sh /root
RUN bash install-trt-llm.sh && rm install-trt-llm.sh
COPY scripts/setup-whisperlive.sh /root/
RUN ./setup-whisperlive.sh
9 changes: 9 additions & 0 deletions docker/scripts/build-trt-llm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash -e

export ENV=${ENV:-/etc/shinit_v2}
source $ENV

CUDA_ARCH="${CUDA_ARCH:-89-real;90-real}"

cd /root/TensorRT-LLM
python3 scripts/build_wheel.py --clean --cuda_architectures "$CUDA_ARCH" --trt_root /usr/local/tensorrt
23 changes: 16 additions & 7 deletions scripts/build_whisper_tensorrt.sh → docker/scripts/build-whisper-tensorrt.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -48,19 +48,21 @@ download_and_build_model() {
# wget --directory-prefix=assets "$model_url"
# echo "Download completed: ${model_name}.pt"
if [ ! -f "assets/${model_name}.pt" ]; then
wget --directory-prefix=assets "$model_url"
wget --directory-prefix=assets "$model_url" > /dev/null 2>&1
echo "Download completed: ${model_name}.pt"
else
echo "${model_name}.pt already exists in assets directory."
fi

local output_dir="whisper_${model_name//./_}"
echo "$output_dir"
echo "Running build script for $model_name with output directory $output_dir"
python3 build.py --output_dir "$output_dir" --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --model_name "$model_name"
echo "Running TensorRT-LLM build script for $model_name with output directory $output_dir"
python3 build.py --output_dir "$output_dir" --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --model_name "$model_name" > /dev/null 2>&1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose not to dump the stdout/err to devnull.

This is actually hiding errors that might be useful to know when one is starting the service.

This is especially important when the script continues, even if one of its commands errored. I'd thus also recommend to add set -e in the beginning of this and the run-whisperlive.sh script.

An example I just ran into: Some cuda error during the trt-build of the engine. After the hidden error, the next echo told me that the engine was built, but it wasn't.

echo "Whisper $model_name TensorRT engine built."
echo "========================================="
echo "Model is located at: $(pwd)/$output_dir"
mkdir -p /root/scratch-space/models
cp -r $output_dir /root/scratch-space/models

}

if [ "$#" -lt 1 ]; then
Expand All @@ -70,8 +72,15 @@ fi

tensorrt_examples_dir="$1"
model_name="${2:-small.en}"
output_dir="whisper_${model_name//./_}"

cd $1/whisper
pip install --no-deps -r requirements.txt
if [ ! -d "/root/scratch-space/models/$output_dir" ] || [ -z "$(ls -A /root/scratch-space/models/$output_dir)" ]; then
echo "$output_dir directory does not exist or is empty. Building whisper"
cd $1/whisper
echo "Installing requirements for Whisper TensorRT-LLM ..."
pip install --no-deps -r requirements.txt > /dev/null 2>&1
download_and_build_model "$output_dir"
else
echo "$output_dir directory exists and is not empty. Skipping build-whisper..."
fi

download_and_build_model "$model_name"
54 changes: 54 additions & 0 deletions docker/scripts/install-deps.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/bin/bash -e

apt-get update && apt-get -y install git git-lfs
git clone --depth=1 -b cuda12.2 https://github.com/makaveli10/TensorRT-LLM.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I see this right, the difference between your fork+branch and the original TensorRT-LLM repo is only the adding of smaller model parameters to examples/whisper/build.py.

If using an official TensorRT-LLM wheel could be an option (see comment above), then the smaller model options could be added as a simple diff on the example files without the need to use a fork.

cd TensorRT-LLM
git checkout main
git submodule update --init --recursive
git lfs install
git lfs pull

# do not reinstall CUDA (our base image provides the same exact versions)
patch -p1 <<EOF
diff --git a/docker/common/install_tensorrt.sh b/docker/common/install_tensorrt.sh
index 2dcb0a6..3a27e03 100644
--- a/docker/common/install_tensorrt.sh
+++ b/docker/common/install_tensorrt.sh
@@ -35,19 +35,7 @@ install_ubuntu_requirements() {
dpkg -i cuda-keyring_1.0-1_all.deb

apt-get update
- if [[ $(apt list --installed | grep libcudnn8) ]]; then
- apt-get remove --purge -y libcudnn8*
- fi
- if [[ $(apt list --installed | grep libnccl) ]]; then
- apt-get remove --purge -y --allow-change-held-packages libnccl*
- fi
- if [[ $(apt list --installed | grep libcublas) ]]; then
- apt-get remove --purge -y --allow-change-held-packages libcublas*
- fi
- CUBLAS_CUDA_VERSION=$(echo $CUDA_VER | sed 's/\./-/g')
apt-get install -y --no-install-recommends libcudnn8=${CUDNN_VER} libcudnn8-dev=${CUDNN_VER}
- apt-get install -y --no-install-recommends libnccl2=${NCCL_VER} libnccl-dev=${NCCL_VER}
- apt-get install -y --no-install-recommends libcublas-${CUBLAS_CUDA_VERSION}=${CUBLAS_VER} libcublas-dev-${CUBLAS_CUDA_VERSION}=${CUBLAS_VER}
apt-get clean
rm -rf /var/lib/apt/lists/*
}
EOF

cd docker/common/
export BASH_ENV=${BASH_ENV:-/etc/bash.bashrc}
export ENV=${ENV:-/etc/shinit_v2}
bash install_base.sh
bash install_cmake.sh
source $ENV
bash install_ccache.sh
# later on TensorRT-LLM will force reinstall this version anyways
pip3 install --extra-index-url https://download.pytorch.org/whl/cu121 torch==2.1.0
bash install_tensorrt.sh
bash install_polygraphy.sh
source $ENV

cd /root/TensorRT-LLM/docker/common/
bash install_mpi4py.sh
source $ENV
10 changes: 10 additions & 0 deletions docker/scripts/install-trt-llm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash -e

cd /root/TensorRT-LLM
pip install build/tensorrt_llm-0.7.1-cp310-cp310-linux_x86_64.whl
mv examples ../TensorRT-LLM-examples
cd ..

rm -rf TensorRT-LLM
# we don't need static libraries and they take a lot of space
(cd /usr && find . -name '*static.a' | grep -v cudart_static | xargs rm -f)
28 changes: 28 additions & 0 deletions docker/scripts/run-whisperlive.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash -e
echo "MODEL_SIZE is set to: $MODEL_SIZE"
echo "BACKEND is set to: $BACKEND"

test -f /etc/shinit_v2 && source /etc/shinit_v2

echo "Running build-models.sh..."
cd /root/scratch-space/
./build-whisper-tensorrt.sh /root/TensorRT-LLM-examples/ $MODEL_SIZE

whisper_model_trt="whisper_${MODEL_SIZE//./_}"

echo "$whisper_model_trt"

cd /root/WhisperLive

if [ "$BACKEND" == "tensorrt" ]; then
if [[ $MODEL_SIZE == *".en" ]]; then
exec python3 run_server.py -p 9090 -b $BACKEND \
-trt /root/scratch-space/models/"$whisper_model_trt"
else
exec python3 run_server.py -p 9090 -b $BACKEND \
-trt /root/scratch-space/models/"$whisper_model_trt" \
-m
fi
else
exec python3 run_server.py
fi
17 changes: 17 additions & 0 deletions docker/scripts/setup-whisperlive.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash -e

## Clone this repo and install requirements
[ -d "WhisperLive" ] || git clone https://github.com/collabora/WhisperLive.git

cd WhisperLive
apt update
apt-get install portaudio19-dev ffmpeg wget -y

## Install all the other dependencies normally
pip install -r requirements/server.txt

mkdir -p /root/.cache/whisper-live/
curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx

# the sound filter definitions
wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
Loading