🎅 I WISH LITELLM HAD... #361

krrishdholakia · 2023-09-13T19:40:55Z

This is a ticket to track a wishlist of items you wish LiteLLM had.

COMMENT BELOW 👇

With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs

Respond with ❤️ to any request you would also like to see

P.S.: Come say hi 👋 on the Discord

krrishdholakia · 2023-09-13T19:44:04Z

[LiteLLM Client] Add new models via UI

Thinking aloud it seems intuitive that you'd be able to add new models / remap completion calls to different models via UI. Unsure on real problem though.

krrishdholakia · 2023-09-13T19:46:30Z

User / API Access Management

Different users have access to different models. It'd be helpful if there was a way to maybe leverage the BudgetManager to gate access. E.g. GPT-4 is expensive, i don't want to expose that to my free users but i do want my paid users to be able to use it.

krrishdholakia · 2023-09-13T19:48:57Z

cc: @yujonglee @WilliamEspegren @zakhar-kogan @ishaan-jaff @PhucTranThanh feel free to add any requests / ideas here.

ishaan-jaff · 2023-09-13T19:49:49Z

[Spend Dashboard] View analytics for spend per llm and per user

This allows me to see what my most expensive llms are and what users are using litellm heavily

ishaan-jaff · 2023-09-13T19:51:34Z

Auto select the best LLM for a given task

If it's a simple task like responding to "hello" litlellm should auto-select a cheaper but faster llm like j2-light

Pipboyguy · 2023-09-13T21:43:33Z

Integration with NLP Cloud

krrishdholakia · 2023-09-13T22:04:01Z

That's awesome @Pipboyguy - dm'ing on linkedin to learn more!

krrishdholakia · 2023-09-14T17:56:09Z

@ishaan-jaff check out this truncate param in the cohere api

This looks super interesting. Similar to your token trimmer. If the prompt exceeds context window, trim in a particular manner.

I would maybe only run trimming on user/assistant messages. Not touch the system prompt (works for RAG scenarios as well).

haseeb-heaven · 2023-09-17T00:00:25Z

Option to use Inference API so we can use any model from Hugging Face 🤗

krrishdholakia · 2023-09-17T00:20:03Z

@haseeb-heaven you can already do this -

litellm/litellm/llms/huggingface_restapi.py

Line 53 in a63784d

completion_url = f"https://api-inference.huggingface.co/models/{model}"

from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response)

haseeb-heaven · 2023-09-17T00:30:12Z

@haseeb-heaven you can already do this -

litellm/litellm/llms/huggingface_restapi.py

Line 53 in a63784d

completion_url = f"https://api-inference.huggingface.co/models/{model}"
from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) 

Wow great thanks its working. Nice feature

smig23 · 2023-09-18T02:39:52Z

Support for inferencing using models hosted on Petals swarms (https://github.com/bigscience-workshop/petals), both public and private.

ishaan-jaff · 2023-09-18T16:11:27Z

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

shauryr · 2023-09-18T17:28:54Z

finetuning wrapper for openai, huggingface etc.

krrishdholakia · 2023-09-18T18:37:02Z

@shauryr i created an issue to track this - feel free to add any missing details here

smig23 · 2023-09-18T18:57:48Z

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

Specifically for my aims, I'm running a private swarm as a experiment with a view to implementing with in private organization, who have idle GPU resources, but it's distributed. The initial target would be inferencing and if litellm was able to be the abstraction layer, it would allow flexibility to go another direction with hosting in the future.

ranjancse26 · 2023-09-19T05:02:17Z

I wish the litellm to have a direct support for finetuning the model. Based on the below blog post, I understand that in order to fine tune, one needs to have a specific understanding on the LLM provider and then follow their instructions or library for fine tuning the model. Why not the LiteLLM do all the abstraction and handle the fine-tuning aspects as well?

https://docs.litellm.ai/docs/tutorials/finetuned_chat_gpt
https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset

ranjancse26 · 2023-09-19T07:31:45Z

I wish LiteLLM has a support for open-source embeddings like sentence-transformers, hkunlp/instructor-large etc.

Sorry, based on the below documentation, it seems there's only support for the Open AI embedding.

https://docs.litellm.ai/docs/embedding/supported_embedding

ranjancse26 · 2023-09-19T09:21:00Z

I wish LiteLLM has the integration to cerebrium platform. Please check the below link for the prebuilt-models.

https://docs.cerebrium.ai/cerebrium/prebuilt-models

ishaan-jaff · 2023-09-19T16:19:28Z

@ranjancse26 what models on cerebrium do you want to use with LiteLLM ?

ranjancse26 · 2023-09-19T16:30:20Z

@ishaan-jaff The cerebrium has got a lot of pre-built model. The focus should be on consuming the open-source models first ex: Lama 2, GPT4All, Falcon, FlanT5 etc. I am mentioning this as a first step. However, it's a good idea to have the Litellm take care of the internal communication with the custom-built models too. In-turn based on the API which the cerebrium is exposing.

ishaan-jaff · 2023-09-19T18:44:22Z

@smig23 We've added support for petals to LiteLLM https://docs.litellm.ai/docs/providers/petals

ranjancse26 · 2023-09-21T00:25:23Z

I wish Litellm has a built-in support for the majority of the provider operations than targeting the text generation alone. Consider an example of Cohere, the below one allows users to have conversations with a Large Language Model (LLM) from Cohere.

https://docs.cohere.com/reference/post_chat

ranjancse26 · 2023-09-21T00:32:02Z

I wish Litellm has a ton of support and examples for users to develop apps with RAG pattern. It's kind of mandatory to go with the standard best practices and we all wish to have the same support.

ranjancse26 · 2023-09-21T00:36:39Z

I wish Litellm has use-case driven examples for beginners. Keeping in mind of the day-to-day use-cases, it's a good idea to come up with a great sample which covers the following aspects.

Text classification
Text summarization
Text translation
Text generation
Code generation

ranjancse26 · 2023-09-21T00:39:56Z

I wish Litellm to support for various known or popular vector db's. Here are couple of them to begin with.

Pinecone
Qdrant
Weaviate
Milvus
DuckDB
Sqlite

ranjancse26 · 2023-09-21T00:49:23Z

I wish Litellm has a built-in support for performing the web-scrapping or to get the real-time data using known provider like serpapi. It will be helpful for users to build the custom AI models or integrate with the LLMs for performing the retrieval augmented based generation.

https://serpapi.com/blog/llms-vs-serpapi/#serpapi-google-local-results-parser
https://colab.research.google.com/drive/1Q9VvVzjZJja7_y2Ls8qBkE_NApbLiqly?usp=sharing

lazariv · 2025-01-27T07:41:30Z

I wish LiteLLM could support custom provider for /embeddings endpoint.

It actually does. Just define your embeddings model as any other completion model, and set "mode": "embedding" in the model_info

yuyuma · 2025-01-29T21:30:33Z

We use litellm with langfuse as our success callback. We proxy through litellm both chat completion models and embedding models, and currently everything is being logged through langfuse.

I wish there was a config.yaml flag where we can disable certain models from triggering the success callback. Perhaps it's a flag in the individual model configurations themselves, or perhaps it's defined as an additional success_callback model white/black list. In our case, we love being able to observe chat completion observations but have absolutely no need for embedding observations to be passed through to langfuse.

rcarmo · 2025-01-30T17:44:15Z

SQLite support. It's a tremendously useful tool to use locally, but having to run Postgres together with it on a laptop is a pain.

V4G4X · 2025-02-02T14:07:07Z

I wish I could stream R1's reasoning tokens with OpenRouter, i.e see what it's thinking before it sends the output.
See here and here(aider).

boosh · 2025-02-03T14:43:23Z

Usable docs

krrishdholakia · 2025-02-03T15:01:20Z

Usable docs

Hey @boosh any specific improvements we can make on docs?

I wish I could stream R1's reasoning tokens with OpenRouter, i.e see what it's thinking before it sends the output.

Hey @V4G4X replied on ticket 👍

we love being able to observe chat completion observations but have absolutely no need for embedding observations to be passed through to langfuse

is this solved if we just add a supported call types for langfuse (similar to caching) - supported_call_types: ["completion", "text_completion"] @yuyuma

boosh · 2025-02-03T15:12:55Z

It's not clear what's enterprise and not tbh. Some pages suggest the router and proxy are others don't.

krrishdholakia · 2025-02-03T15:16:12Z

Hey @boosh

we have the parent list here - https://docs.litellm.ai/docs/proxy/enterprise

Can we do a 20min feedback call to see how we can make things clearer? Here's my calendly, if that's helpful - https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

yuyuma · 2025-02-03T15:38:45Z

Usable docs

Hey @boosh any specific improvements we can make on docs?

I wish I could stream R1's reasoning tokens with OpenRouter, i.e see what it's thinking before it sends the output.

Hey @V4G4X replied on ticket 👍

we love being able to observe chat completion observations but have absolutely no need for embedding observations to be passed through to langfuse

is this solved if we just add a supported call types for langfuse (similar to caching) - supported_call_types: ["completion", "text_completion"] @yuyuma

@krrishdholakia having a supported call types would work amazing!

achpalaman · 2025-02-06T22:09:09Z

@krrishdholakia Here are a couple of asks:

There are 2 kinds of structured output modes in OpenAI. JSON mode and Structured Output mode.
Structured Output mode supports only a subset of schema (and offers stronger guarantees.)
If you pass a Pydantic class as response_format, it implicitly selects Structured Output mode. Would be great to have a flag to select which one we need.
Being able to turn on/off JSON Validation per call, rather than it being a global litellm setting.
Support passing in Pydantic classes for "tools" (similar to how it is currently supported for response_format.)

krrishdholakia · 2025-02-06T22:40:05Z

Hey @achpalaman

There are 2 kinds of structured output modes in OpenAI. JSON mode and Structured Output mode.
Structured Output mode supports only a subset of schema (and offers stronger guarantees.)
If you pass a Pydantic class as response_format, it implicitly selects Structured Output mode. Would be great to have a flag to select which one we need

not sure i follow this, can you share mock code for this

johnson-liang · 2025-02-07T11:21:47Z

I wish input_callback, success_callback and failure_callback can be set via environment variables for built-in callbacks.

I am using tools like Aider which uses litellm Python SDK under the hood, and would like to use Langfuse to monitor its LLM use and cost. It would be nice that I can use environment variable to set success_callback and failure_callback to 'langfuse'.

justicel · 2025-02-11T22:46:45Z

It would be amazing to have a way to have preferred pricing by time of day, to prefer a given model/model provider so say, we could make it very expensive to run queries during the day against our local model run on vllm, 9-5pm as the resources are limited for hardware.

krrishdholakia · 2025-02-11T22:55:08Z

@justicel have you considered request prioritization?

you could just ensure higher priority requests run on vllm during a given time - https://docs.litellm.ai/docs/scheduler

justicel · 2025-02-11T23:03:21Z

@krrishdholakia Hmm Good question, but looking over the documentation, currently how does the prioritization work there? I'm not seeing anything in the docs that match the ability to base the priority on time. It seems more the feature is meant to be used for priority queuing which is a bit different than preferring a given model/model group during time of day. But maybe I'm wrong?

krrishdholakia · 2025-02-11T23:28:42Z

preferring a given model/model group during time of day

you're right. how would you want to prefer 1 model / model group over another? happy to continue this in a separate ticket / discussion @justicel

justicel · 2025-02-12T18:17:02Z

@krrishdholakia Yes. I'll open a ticket for this!

elabbarw · 2025-02-13T07:14:02Z

It would be good to have an option to set a budget per model or have a budget group and pull in a number of models under it (e.g. image generation models)

krrishdholakia · 2025-02-13T07:21:42Z

Hey @elabbarw both are already possible

set budget per model - https://docs.litellm.ai/docs/proxy/users#-virtual-key-model-specific
create budget group with model-specific budgets - https://docs.litellm.ai/docs/proxy/rate_limit_tiers,

bukit-kronik · 2025-02-14T11:29:56Z

Openrouter web search

https://openrouter.ai/docs/features/web-search

pass plugins

{
    "plugins": [
    {
      "id": "web",
      "max_results": 3,
      "search_prompt": "Some relevant web results:"
    }
  ]
}

https://docs.litellm.ai/docs/providers/openrouter#passing-openrouter-params---transforms-models-route

lazariv · 2025-02-14T11:34:57Z

I wish the SSO option were included in the open-source tier. At our university, we use litellm to provide our fellow researchers with access to self-hosted LLMs, and SSO would simplify the user management a lot.

krrishdholakia · 2025-02-14T15:49:26Z

Hi @bukit-kronik the plugins value should just work - we pass any non-openai param straight to provider in the request body - https://docs.litellm.ai/docs/completion/provider_specific_params

import os
from litellm import completion
os.environ["OPENROUTER_API_KEY"] = ""

response = completion(
            model="openrouter/google/palm-2-chat-bison",
            messages=messages,
            plugins=[{ "id": "web" }]
        )

does that solve your problem?

bukit-kronik · 2025-02-14T18:27:23Z

It works in normal python script, my bad!

OrionCodeDev · 2025-02-15T08:44:30Z

I wish LiteLLM stop generating api keys when you login to UI

krrishdholakia · 2025-02-15T14:19:53Z

Hey @OrionCodeDev we're trying to move away from this. If you have specific suggestions for how we can migrate to session tokens, it'd be helpful to hear!

hung-phan · 2025-02-19T08:05:29Z

I wish we update the redisvl integration #5412. I see we still use 0.0.7 and the integration is outdated

AndreyStream · 2025-02-25T04:09:53Z

I wish langchain-community ChatLiteLLM will support open ai structured output.
we are using ChatLiteLLM as an adapter between our langchain based codebase and litellm sdk, and it doesnt support the new open ai structured output mode and defaults to tool use.

krrishdholakia · 2025-02-25T04:57:54Z

open ai structured output mode

what do you mean?

OpenAI structured output is supported and the langchain - litellm adapter just uses the version of litellm installed locally - https://docs.litellm.ai/docs/completion/json_mode

AndreyStream · 2025-02-25T21:24:54Z

@krrishdholakia got it mixed up, we are using ChatLiteLLMRouter and define the router from configuration, so no litellm proxy.
just making sure, im talking about strict structured output and not the older json mode.

krrishdholakia pinned this issue Sep 13, 2023

krrishdholakia changed the title ~~LiteLLM Wishlist~~ 🎅 I WISH LITELLM ADDED... Sep 14, 2023

krrishdholakia changed the title ~~🎅 I WISH LITELLM ADDED...~~ 🎅 I WISH LITELLM HAD... Sep 14, 2023

yuyuma mentioned this issue Mar 2, 2025

[Feature]: Add supported call types for langfuse #8936

Open

🎅 I WISH LITELLM HAD... #361

🎅 I WISH LITELLM HAD... #361

Comments

krrishdholakia commented Sep 13, 2023 • edited Loading

COMMENT BELOW 👇

With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs

krrishdholakia commented Sep 13, 2023

krrishdholakia commented Sep 13, 2023

krrishdholakia commented Sep 13, 2023 • edited Loading

ishaan-jaff commented Sep 13, 2023 • edited Loading

ishaan-jaff commented Sep 13, 2023

Pipboyguy commented Sep 13, 2023

krrishdholakia commented Sep 13, 2023

krrishdholakia commented Sep 14, 2023 • edited Loading

haseeb-heaven commented Sep 17, 2023

krrishdholakia commented Sep 17, 2023 • edited Loading

haseeb-heaven commented Sep 17, 2023

smig23 commented Sep 18, 2023

ishaan-jaff commented Sep 18, 2023

shauryr commented Sep 18, 2023

krrishdholakia commented Sep 18, 2023

smig23 commented Sep 18, 2023

ranjancse26 commented Sep 19, 2023

ranjancse26 commented Sep 19, 2023

ranjancse26 commented Sep 19, 2023

ishaan-jaff commented Sep 19, 2023

ranjancse26 commented Sep 19, 2023

ishaan-jaff commented Sep 19, 2023

ranjancse26 commented Sep 21, 2023

ranjancse26 commented Sep 21, 2023

ranjancse26 commented Sep 21, 2023

ranjancse26 commented Sep 21, 2023

ranjancse26 commented Sep 21, 2023 • edited Loading

lazariv commented Jan 27, 2025

yuyuma commented Jan 29, 2025

rcarmo commented Jan 30, 2025

V4G4X commented Feb 2, 2025

boosh commented Feb 3, 2025

krrishdholakia commented Feb 3, 2025 • edited Loading

boosh commented Feb 3, 2025

krrishdholakia commented Feb 3, 2025

yuyuma commented Feb 3, 2025

achpalaman commented Feb 6, 2025

krrishdholakia commented Feb 6, 2025

johnson-liang commented Feb 7, 2025 • edited Loading

justicel commented Feb 11, 2025

krrishdholakia commented Feb 11, 2025

justicel commented Feb 11, 2025

krrishdholakia commented Feb 11, 2025

justicel commented Feb 12, 2025

elabbarw commented Feb 13, 2025 • edited Loading

krrishdholakia commented Feb 13, 2025

bukit-kronik commented Feb 14, 2025

lazariv commented Feb 14, 2025 • edited Loading

krrishdholakia commented Feb 14, 2025

bukit-kronik commented Feb 14, 2025 • edited Loading

OrionCodeDev commented Feb 15, 2025

krrishdholakia commented Feb 15, 2025

hung-phan commented Feb 19, 2025

AndreyStream commented Feb 25, 2025

krrishdholakia commented Feb 25, 2025

AndreyStream commented Feb 25, 2025

krrishdholakia commented Sep 13, 2023 •

edited

Loading

krrishdholakia commented Sep 13, 2023 •

edited

Loading

ishaan-jaff commented Sep 13, 2023 •

edited

Loading

krrishdholakia commented Sep 14, 2023 •

edited

Loading

krrishdholakia commented Sep 17, 2023 •

edited

Loading

ranjancse26 commented Sep 21, 2023 •

edited

Loading

krrishdholakia commented Feb 3, 2025 •

edited

Loading

johnson-liang commented Feb 7, 2025 •

edited

Loading

elabbarw commented Feb 13, 2025 •

edited

Loading

lazariv commented Feb 14, 2025 •

edited

Loading

bukit-kronik commented Feb 14, 2025 •

edited

Loading