You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since version 1.55.8, with the support for the generate endpoint to triton, I attempted to integrate it (I am using the vllm backend on the Triton Inference Server and connecting it to the litellm API endpoint.)
However, there is one aspect that I find lacking, which I would like to inquire about. Compared to the request data for the officially supported vllm_backend on Triton link, only a minimal amount of data is being transmitted. Currently, only text_input, parameters, and stream are being sent, and additional parameters are transmitted by updating parameters. However, it is difficult to send other types of data. I hope the structure of data_for_triton can be improved to allow for easier modification of the data.
Thank you litellm!
Since version 1.55.8, with the support for the generate endpoint to triton, I attempted to integrate it (I am using the vllm backend on the Triton Inference Server and connecting it to the litellm API endpoint.)
However, there is one aspect that I find lacking, which I would like to inquire about. Compared to the request data for the officially supported vllm_backend on Triton link, only a minimal amount of data is being transmitted. Currently, only
text_input
,parameters
, andstream
are being sent, and additional parameters are transmitted by updating parameters. However, it is difficult to send other types of data. I hope the structure ofdata_for_triton
can be improved to allow for easier modification of the data.litellm/litellm/llms/triton/completion/transformation.py
Lines 168 to 175 in b928052
The text was updated successfully, but these errors were encountered: