Sending additional data to triton inference server #7507

vrvrv · 2025-01-03T01:40:39Z

Thank you litellm!

Since version 1.55.8, with the support for the generate endpoint to triton, I attempted to integrate it (I am using the vllm backend on the Triton Inference Server and connecting it to the litellm API endpoint.)

However, there is one aspect that I find lacking, which I would like to inquire about. Compared to the request data for the officially supported vllm_backend on Triton link, only a minimal amount of data is being transmitted. Currently, only text_input, parameters, and stream are being sent, and additional parameters are transmitted by updating parameters. However, it is difficult to send other types of data. I hope the structure of data_for_triton can be improved to allow for easier modification of the data.

litellm/litellm/llms/triton/completion/transformation.py

Lines 168 to 175 in b928052

    
           data_for_triton: Dict[str, Any] = { 
        
               "text_input": prompt_factory(model=model, messages=messages), 
        
               "parameters": { 
        
                   "max_tokens": int(optional_params.get("max_tokens", 2000)), 
        
                   "bad_words": [""], 
        
                   "stop_words": [""], 
        
               }, 
        
               "stream": bool(stream),

The text was updated successfully, but these errors were encountered:

krrishdholakia · 2025-01-03T04:47:43Z

Hey @vrvrv is that the right spec on triton's code?

I don't see parameters in the vllm_backend, i do see sampling_parameters

vrvrv · 2025-01-03T09:10:39Z

@krrishdholakia in the vllm_backend, parameters is optional.
vllm_backend gets parameters if sampling_parameters is not provided.
https://github.com/triton-inference-server/vllm_backend/blob/d061556955b89d538ec53f32c532e2041ce2fc89/src/model.py#L569-L575

krrishdholakia added the awaiting: user response label Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sending additional data to triton inference server #7507

Sending additional data to triton inference server #7507

vrvrv commented Jan 3, 2025

krrishdholakia commented Jan 3, 2025

vrvrv commented Jan 3, 2025

Sending additional data to triton inference server #7507

Sending additional data to triton inference server #7507

Comments

vrvrv commented Jan 3, 2025

krrishdholakia commented Jan 3, 2025

vrvrv commented Jan 3, 2025