We utilized GH200 systems at JLSE testbeds at ALCF. We use apptainers to setup TensorRT-LLM.
- Build a container
$ source build-container.sh
This script builds a apptainer image trt-llm-gh200.sif
using trt-llm-gh200.def
definition file in the same directory.
- You will need
power_utils.py
file for power metric collectring in the same direcotry as thebenchmark_power.py
- Clone TRT-LLM Repo
git clone https://github.com/NVIDIA/TensorRT-LLM.git git checkout v0.12.0
- Replace or Copy files
run_power.py
,run_precision_bench.py
,utils.py
andrun.py
from this directory to cloned trt-llm directory.
- Use provided shell script
run-container-throughput.sh
in this directory to run container that runsrun-throughput-bench.sh
to invokerun.py
for various configurations of input, output lengths and batch sizes.
source run-container-throughput.sh
- Use provided shell script
run-container-power.sh
in this directory to run container that runsrun-power-bench.sh
to invokerun_power.py
for various configurations of input, output lengths and batch sizes.
source run-container-power.sh
- Use provided shell script
run-container-precision.sh
in this directory to run container that runsrun-precision-bench.sh
to invokerun_precision.py
for various configurations of input, output lengths and batch sizes.
source run-container-precision.sh