Triton ONNX runtime backend slower than onnxruntime python client on CPU #7677

Mitix-EPI · 2024-10-03T08:16:22Z

Description
When deploying an ONNX model using the Triton Inference Server's ONNX runtime backend, the inference performance on the CPU is noticeably slower compared to running the same model using the ONNXRuntime Python client directly. This performance discrepancy is observed under identical conditions, including the same hardware, model, and input data.

Triton Information
TRITON_VERSION <= 24.09

To Reproduce

model used:

wget -O model.onnx https://github.com/onnx/models/raw/refs/heads/main/validated/vision/classification/densenet-121/model/densenet-12.onnx

Triton server (ONNX runtime)

config.pbtxt

name: "test_densenet" 
platform: "onnxruntime_onnx"

Python clients

Triton client

import numpy as np
import tritonclient.grpc as grpcclient
import tritonclient.grpc._infer_input as infer_input

grpcclient = grpcclient.InferenceServerClient(url='localhost:9178')

i = infer_input.InferInput('data_0', [1, 3, 224, 224], 'FP32')
i.set_data_from_numpy(np.zeros((1, 3, 224, 224), dtype=np.float32))

%%timeit
res = grpcclient.infer(model_name="test_densenet", inputs=[i])

results: 473 ms ± 87.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

ONNX Runtime

import onnxruntime as ort

ort_sess = ort.InferenceSession('model.onnx')
test_inputs = {"data_0": np.zeros((1, 3, 224, 224), dtype=np.float32)}

%%timeit
ort_sess.run(["fc6_1"], test_inputs)

results: 159 ms ± 23.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The text was updated successfully, but these errors were encountered:

nqbao11 · 2024-10-14T07:34:59Z

Comparing the performance between two approaches by single one run is quite not fair. Why don't you setup a benchmark against larger samples and have more than one client?

Mitix-EPI · 2024-10-31T09:45:44Z

it seems to be not reproducable on another computer around me. Check here to see my specs if you want more details.

nnshah1 added the performance A possible performance tune-up label Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton ONNX runtime backend slower than onnxruntime python client on CPU #7677

Triton ONNX runtime backend slower than onnxruntime python client on CPU #7677

Mitix-EPI commented Oct 3, 2024

nqbao11 commented Oct 14, 2024

Mitix-EPI commented Oct 31, 2024

Triton ONNX runtime backend slower than onnxruntime python client on CPU #7677

Triton ONNX runtime backend slower than onnxruntime python client on CPU #7677

Comments

Mitix-EPI commented Oct 3, 2024

Triton server (ONNX runtime)

Python clients

Triton client

ONNX Runtime

nqbao11 commented Oct 14, 2024

Mitix-EPI commented Oct 31, 2024