Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending additional data to triton inference server #7507

Open
vrvrv opened this issue Jan 3, 2025 · 2 comments
Open

Sending additional data to triton inference server #7507

vrvrv opened this issue Jan 3, 2025 · 2 comments

Comments

@vrvrv
Copy link

vrvrv commented Jan 3, 2025

Thank you litellm!

Since version 1.55.8, with the support for the generate endpoint to triton, I attempted to integrate it (I am using the vllm backend on the Triton Inference Server and connecting it to the litellm API endpoint.)

However, there is one aspect that I find lacking, which I would like to inquire about. Compared to the request data for the officially supported vllm_backend on Triton link, only a minimal amount of data is being transmitted. Currently, only text_input, parameters, and stream are being sent, and additional parameters are transmitted by updating parameters. However, it is difficult to send other types of data. I hope the structure of data_for_triton can be improved to allow for easier modification of the data.

data_for_triton: Dict[str, Any] = {
"text_input": prompt_factory(model=model, messages=messages),
"parameters": {
"max_tokens": int(optional_params.get("max_tokens", 2000)),
"bad_words": [""],
"stop_words": [""],
},
"stream": bool(stream),

@krrishdholakia
Copy link
Contributor

Hey @vrvrv is that the right spec on triton's code?

I don't see parameters in the vllm_backend, i do see sampling_parameters

@vrvrv
Copy link
Author

vrvrv commented Jan 3, 2025

@krrishdholakia in the vllm_backend, parameters is optional.
vllm_backend gets parameters if sampling_parameters is not provided.
https://github.com/triton-inference-server/vllm_backend/blob/d061556955b89d538ec53f32c532e2041ce2fc89/src/model.py#L569-L575

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants