README.MD

Benchmarking LLMs on Nvidia GH200 using TensorRT-LLM

We utilized GH200 systems at JLSE testbeds at ALCF. We use apptainers to setup TensorRT-LLM.

$ source build-container.sh

This script builds a apptainer image trt-llm-gh200.sif using trt-llm-gh200.def definition file in the same directory.

You will need power_utils.py file for power metric collectring in the same direcotry as the benchmark_power.py

Clone TRT-LLM Repo

git clone https://github.com/NVIDIA/TensorRT-LLM.git
git checkout v0.12.0

Replace or Copy files run_power.py, run_precision_bench.py, utils.py and run.py from this directory to cloned trt-llm directory.

Use provided shell script run-container-throughput.sh in this directory to run container that runs run-throughput-bench.sh to invoke run.py for various configurations of input, output lengths and batch sizes.

    source run-container-throughput.sh

Use provided shell script run-container-power.sh in this directory to run container that runs run-power-bench.sh to invoke run_power.py for various configurations of input, output lengths and batch sizes.

    source run-container-power.sh

Use provided shell script run-container-precision.sh in this directory to run container that runs run-precision-bench.sh to invoke run_precision.py for various configurations of input, output lengths and batch sizes.

    source run-container-precision.sh