-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI #73
base: main
Are you sure you want to change the base?
Conversation
hyperparamter -> hyperparameter
Add instructions for Windows users to install MikTeX
…tch-1 Update README.md
The enhancement of the inference is able to adapt to the service support OpenAI SDK for fast integration. |
hi @whats2000 ! Appreciate your efforts putting together alternative LLM backends for this project. Tried using your code with my local ollama instance but getting error 422 unprocessable entity from the webui. Not sure if I'm missing anything? Configured the API
|
@AlexTzk I think I need more information on how to reproduce your error. What is your operator system? The script I test within the Linux Ubuntu System 20 (WSL2). Can you try a test script by calling the # Test script for Ollama
print(OpenaiProvider.get_response(
api_key="ollama",
model_name="deepseek-r1:32b",
user_prompt="What is the meaning of life?",
system_prompt="You are a philosopher seeking the meaning of life.",
base_url="http://localhost:11434/v1/"
)) And I get a response with Ollama
|
@whats2000 Thank you for your reply. Your test function does work:
Then I set baseurl within config.py:
Launch gradio with python config_gradio.py, set OpenAI API key to Ollama and specify the same model, deepseek-r1:32b. I get error 422 I am running ollama version 0.5.7 on another machine at 10.0.0.99:11434, that's a docker container with an ubuntu base I believe. The AgentLab code and your PR is running on an ubuntu server 24.04 LTS with Python 3.12.9 |
Seems like this is due to gradio from your image (It fail to lanuch the terminal). Did you see any output in terminal that point out the why the terminal fail to launch? Also I use XTerm for linux. Feel free to check the code at |
@AlexTzk I added the version of the |
@whats2000 It's working now! I believe my issue was caused because of Xterm not being able to launch due to not having a GUI. Thank you for your help. A couple of notes: I now have a different exception about max_tries exceeded during literature review but that is not connected to your PR, I will try to fix that now. |
I just updated the layout to look more balanced, do you think it looks better? |
@whats2000 Looks great! Love how you split it across both sides. The debugging feature is extremely helpful, many thanks for that. Running an experiment now to test if max_tries exception is being thrown again but as soon as I'm done with that I will test this again! Great work 👍 |
The bug is that the `model_backbone` attribute in LaboratoryWorkflow use as both `dict` and `str`. Which make some agent not find the model and use default_model and cause inference error.
@AlexTzk I just fixed several bugs that I discovered in the original project. Did this fix for you? I found that the |
Now only throw if all of the key is missing
@whats2000 I still got the max_tries exception with deepseek-r1:32b; second test was with qwen2.5-coder:32b-instruct-q5_K_M via Gradio but that seemed to have crashed as well. No message in the WEBUI but I presume it was the same exception about max_tries for literature review. I am now trying to run a smaller model, qwen2.5-coder:14b-instruct-q4_1 launched from the terminal rather than the webui, want to see if it's the same error. |
Is it going to be merged soon? |
@whats2000 looks good! The only problem with ollama max_tokens argument is that it doesn't seem to get passed every time. Maybe it's my hardware but the way I went around it was to create my own modelfile and model in Ollama, then specify the max_tokens and other settings in there. It's still failing during literature review but it does get further now and it seems to follow the structure. My problem is probably related to the amount of VRAM being too low, 32GB... |
I think there might be some bug in Ollama capable with OpenAI SDK? Not quite sure for that. I guess we need someone that can load bigger model to test it out. Or maybe we need reduce the complexity of the prompt (Although might hurt performance). I have try that on other project to make it at least usable. |
Thanks for this! I've tested ollama and Gemini and both work for me, but there are some issues with anthropic Requesting model "claude-3-5-sonnet" gives
and "claude-3-5-sonnet-20241022" gives
around line 113 in inference.py I believe this . should be a -: Same around line 171:
Fixing that and requesting "claude-3-5-sonnet" gives this error:
So it seems like anthropic want you to specify the exact sonnet you want, because loosening the string match and asking for a specific model like claude-3-5-sonnet-20241022 will at least successfully run some inference
Though this runs some inference, you get other issues:
Adding claude-3-5-sonnet-20241022 to the costmap in and costmap out dictionaries doesn't fix the cost approximation error. Will update if I find anything else, but that’s all I’ve had time to look into for now. |
@MohamadZeina which model did you run with ollama and did it get past literature_review? |
@MohamadZeina Thanks for your help, I will take a look at it. But sadly I do not own an API key so I will need some help after I try to patch it! |
@MohamadZeina I have made a patch for Claude's issue. If it fixes your issue, please let me know! |
Thank you for the great work @whats2000 ! I will look to try with 32b, but hope my 3080 does not cry! |
I think I need some rework on the prompt for the smaller model 💀 |
Thanks @whats2000. Your changes allow the literature review to run, but then plan formulation fails if you don't provide a temperature.
In ai_lab_repo.py, literature review explicitly provides a temp (0.8), but none of the other steps do. Maybe the cleanest way around this is to treat the anthropic provider like the openAI provider in providers.py - ie, don't provide a temperature when there isn't one
This fixes the temperature issue, but now there's an issue if the context gets longer than the anthropic max (200_000). Working on a fix. Is it possible / would you like me to commit these anthropic fixes directly? |
I am working on it! I think we need to clip the context for claude. As I think it was not provide by the SDK. |
@MohamadZeina I have a question to ask, what step is it that triggers the max token issue? And how did you set the configuration (Like some over-size review or something)? It might be hard for me to patch the 200k length issue (As I can not run Claude, sorry). Feel free to make a patch for it!) |
Thanks for incorporating that temperature fix. Apologies but I've lost the console output that produced that error with too much context. From memory it was an early step, maybe literature review or plan formulation. It was Claude Sonnet, but the compute settings were all set to the lowest possible: --num-papers-lit-review 1 --mlesolver-max-steps 1 --papersolver-max-steps 1 Have run Haiku a few times with no issues, and running Sonnet again now and haven't had the issue. Will share what I find if I run into it again, or if I implement a fix |
Apologies @AlexTzk , didn't see this initially. Have only played with deepseek-r1 distills - all of them up to 8B, and 70B but not for long. They all get stuck on the literature review, I think they're struggling to follow the instruction to add papers to the review. I can't get past it, even if I reduce the required number of papers to 1, prompt it more heavily to add papers, and increase ollama max context. Would love to hear if people have luck with other small models. |
I add a beta version of web ui with Vite React App through Flask API. Use the |
@whats2000 awesome work, I will test the webui sometime this week and provide feedback. @MohamadZeina I managed to get past lit_review by using qwen2.5:32b model on ollama with a num_ctx windows of 100000 tokens. I created my own model from the modelfile. Another unforeseen problem is during the subsequent tasks, more specifically when it gets to running_experiments, it will take a long time to reply as it's using a mixture of RAM and VRAM - about 50/50 - and this will cause the code to timeout. Creating a different model with a 16k token window does get around this but you have to interrupt current research, delete your custom model, create another custom model under the same name with different context window and restarting the research. I was thinking that implementing a memory_class might be a worthwhile endeavour...
|
Great to hear that! |
Summary
o1-mini
config.py
What I have checked
gemini-2.0-flash
Reference Issue able to solve
config.py
>OLLAMA_API_BASE_URL
and setting the OpenAI API key toollama
can connect to any service capable of OpenAI SDKUI Example
Gradio
Launch with

config_gradio.py
React Flask App (Beta)
Launch with

app.py
Note
Working on adding some monitors and some dialog visualization.
Test Production Paper
SAMAug with MRANet.pdf
Review.txt