[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI #73

whats2000 · 2025-02-08T11:37:38Z

Summary

Added support for Ollama as a new agent provider.
Integrated Gemini as a new agent provider.
Integrated Claude as a new agent provider.
Implemented a Gradio-based UI for easier configuration of agent settings.
Fix the bug of the config of different models then OpenAI will fail to load from checkpoint and fallback to o1-mini
Fix a miss-spelling of phase agent_models "report refinement" to "paper refinement", I think this is from the original code
Centralize the other config to config.py

What I have checked

Full running in Linux for the full function with gemini-2.0-flash
Test with the Gemini Provider
Test with Ollama Provider

Reference Issue able to solve

Inference.py - Mismatch in Model String #20 -> Fix the string
Add support for Gemini API? #33 , Add Gemini API to enable Gemini model support for the LLM backend #64 , [Work in Progress] Add support for gemini models #67 -> Direct Support Gemini
Add support for custom OpenAI API base_url #40 -> By editing the config.py > OLLAMA_API_BASE_URL and setting the OpenAI API key to ollama can connect to any service capable of OpenAI SDK
Inference.py - Mismatch in Model String #20 , Fix: Standardize Model Names for Consistency in Cost Calculation and API Calls #29 -> Fix the string
Add instructions for Windows users to install MikTeX #62, chore: update mlesolver.py #3 -> Merge the pull request

UI Example

Gradio

Launch with config_gradio.py

React Flask App (Beta)

Launch with app.py

Note

Working on adding some monitors and some dialog visualization.

Test Production Paper

SAMAug with MRANet.pdf
Review.txt

hyperparamter -> hyperparameter

Add instructions for Windows users to install MikTeX

…tch-1 Update README.md

whats2000 · 2025-02-08T11:39:41Z

The enhancement of the inference is able to adapt to the service support OpenAI SDK for fast integration.

AlexTzk · 2025-02-09T19:10:57Z

hi @whats2000 ! Appreciate your efforts putting together alternative LLM backends for this project.

Tried using your code with my local ollama instance but getting error 422 unprocessable entity from the webui. Not sure if I'm missing anything?

Configured the API http://OLLAMA_IP:11434 also tried http://OLLAMA_IP:11434/api/generate and http://OLLAMA_IP:11434/v1 but same result
Then I tried to modify your function and tailor more for ollama:


class OpenaiProvider:
    @staticmethod
    def get_response(
        api_key: str,
        model_name: str,
        user_prompt: str,
        system_prompt: str,
        temperature: float = None,
        base_url: str | None = None,
    ) -> str:
        if api_key == "ollama":
            url = "http://10.0.0.99:11434/api/generate"
            headers = {"Content-Type": "application/json"}
            payload = {
                "model": model_name,
                "prompt": user_prompt,
                "stream": False
            }
            response = requests.post(url, json=payload, headers=headers)
            if response.status_code == 200:
                return response.json().get("response", "No response received.")
            else:
                return f"Error: {response.status_code} - {response.text}"


        openai.api_key = api_key ```


I'm launching with ``` python config_gradio.py ``` 

Do you mind sharing your working setup details?

whats2000 · 2025-02-09T20:09:36Z

@AlexTzk I think I need more information on how to reproduce your error. What is your operator system? The script I test within the Linux Ubuntu System 20 (WSL2). Can you try a test script by calling the provider.py only? Also, ensure that your Ollama version is the latest (it might be due to outdated reasons).

# Test script for Ollama
print(OpenaiProvider.get_response(
    api_key="ollama",
    model_name="deepseek-r1:32b",
    user_prompt="What is the meaning of life?",
    system_prompt="You are a philosopher seeking the meaning of life.",
    base_url="http://localhost:11434/v1/"
))

And I get a response with Ollama 0.5.7

C:\Users\user\.conda\envs\NatureLanguageAnalyze\python.exe C:\Users\user\Documents\GitHub\AgentLaboratory\provider.py 
<think>
Okay, so I just came across this user who is playing the role of a philosopher searching for the meaning of life. They asked me, "What is the meaning of life?" and my initial response was kind of an exploration of various philosophical perspectives—existentialism, humanism, stoicism, spirituality, and nihilism. Now, they’re asking me to think through this as if I'm just starting out on this journey, maybe a bit confused or overwhelmed.

// A lot of output I just skip it

Process finished with exit code 0

AlexTzk · 2025-02-10T00:11:13Z

@whats2000 Thank you for your reply.

Your test function does work:

<think>
Okay, so I'm trying to figure out what the meaning of life is. Hmm, where do I even start with this? I've heard people talk about it in different contexts—philosophy, religion, science—and everyone seems to have a different take. Maybe I should break it down into smaller parts. 
[.......]
</think>
[more output]

Then I set baseurl within config.py:

OLLAMA_API_BASE_URL = "http://10.0.0.99:11434/v1/"

Launch gradio with python config_gradio.py, set OpenAI API key to Ollama and specify the same model, deepseek-r1:32b. I get error 422

I am running ollama version 0.5.7 on another machine at 10.0.0.99:11434, that's a docker container with an ubuntu base I believe.

The AgentLab code and your PR is running on an ubuntu server 24.04 LTS with Python 3.12.9

whats2000 · 2025-02-10T05:03:51Z

Seems like this is due to gradio from your image (It fail to lanuch the terminal). Did you see any output in terminal that point out the why the terminal fail to launch? Also I use XTerm for linux. Feel free to check the code at config_gradio.py

whats2000 · 2025-02-10T06:14:33Z

@AlexTzk I added the version of the gradio, could you try to uninstall gradio and reinstall requirements.txt. I also fix the missing dependence of torchvision and torchaudio

AlexTzk · 2025-02-10T15:21:14Z

@whats2000 It's working now! I believe my issue was caused because of Xterm not being able to launch due to not having a GUI. Thank you for your help.

A couple of notes:
With gradio 4.44.1 MarkupSafe version cannot be used as it requires a lower version and Pillow also requires downgrading. I left pip decide what version to use for MarkupSafe and downgraded pillow to 10.4.0. Seems to be working fine.

I now have a different exception about max_tries exceeded during literature review but that is not connected to your PR, I will try to fix that now.

whats2000 · 2025-02-10T15:39:05Z

I just updated the layout to look more balanced, do you think it looks better?
I also let it directly output the command it generates at status in order for debugging!
Hope this can help you @AlexTzk

AlexTzk · 2025-02-10T15:45:57Z

@whats2000 Looks great! Love how you split it across both sides. The debugging feature is extremely helpful, many thanks for that.

Running an experiment now to test if max_tries exception is being thrown again but as soon as I'm done with that I will test this again! Great work 👍

The bug is that the `model_backbone` attribute in LaboratoryWorkflow use as both `dict` and `str`. Which make some agent not find the model and use default_model and cause inference error.

whats2000 · 2025-02-10T17:35:32Z

@AlexTzk I just fixed several bugs that I discovered in the original project. Did this fix for you? I found that the deep seek-r1 is not that good at creating a command format prompt, making it fail to invoke the task. And that may result in max_tries exception. From the model technical report is shown that the model is not mainly trained for tool usage ( Which affects the struct output command performance). I also test the model qwen2.5:32b and it seems like it will create struct output. I will test the qwen2.5-coder:32b to see if the performance is better.

Now only throw if all of the key is missing

AlexTzk · 2025-02-11T00:05:19Z

@whats2000 I still got the max_tries exception with deepseek-r1:32b; second test was with qwen2.5-coder:32b-instruct-q5_K_M via Gradio but that seemed to have crashed as well. No message in the WEBUI but I presume it was the same exception about max_tries for literature review.

I am now trying to run a smaller model, qwen2.5-coder:14b-instruct-q4_1 launched from the terminal rather than the webui, want to see if it's the same error.

nullnuller · 2025-02-11T04:44:26Z

Is it going to be merged soon?

AlexTzk · 2025-02-18T15:23:01Z

@whats2000 looks good! The only problem with ollama max_tokens argument is that it doesn't seem to get passed every time. Maybe it's my hardware but the way I went around it was to create my own modelfile and model in Ollama, then specify the max_tokens and other settings in there.

It's still failing during literature review but it does get further now and it seems to follow the structure. My problem is probably related to the amount of VRAM being too low, 32GB...

whats2000 · 2025-02-19T05:06:10Z

I think there might be some bug in Ollama capable with OpenAI SDK? Not quite sure for that. I guess we need someone that can load bigger model to test it out. Or maybe we need reduce the complexity of the prompt (Although might hurt performance). I have try that on other project to make it at least usable.

MohamadZeina · 2025-02-19T16:32:12Z

Thanks for this! I've tested ollama and Gemini and both work for me, but there are some issues with anthropic

Requesting model "claude-3-5-sonnet" gives

Inference Exception: Model claude-3-5-sonnet not found

and "claude-3-5-sonnet-20241022" gives

Inference Exception: Model claude-3-5-sonnet-20241022 not found

around line 113 in inference.py I believe this . should be a -:
elif model_str == "claude-3.5-sonnet" or model_str == "claude-3-5-haiku":
Should be
elif model_str == "claude-3-5-sonnet" or model_str == "claude-3-5-haiku":

Same around line 171:

                    if model_str in [
                        "o1-preview", "o1-mini", "o1",
                        "claude-3.5-sonnet", "claude-3-5-haiku",
                        "gemini-2.0-flash", "gemini-2.0-flash-lite"
                    ]:

Fixing that and requesting "claude-3-5-sonnet" gives this error:

Inference Exception: Error code: 404 - {'type': 'error', 'error': {'type': 'not_found_error', 'message': 'model: claude-3-5-sonnet'}}

So it seems like anthropic want you to specify the exact sonnet you want, because loosening the string match and asking for a specific model like claude-3-5-sonnet-20241022 will at least successfully run some inference

elif "claude-3-5-sonnet" in model_str or "claude-3-5-haiku" in model_str:

Though this runs some inference, you get other issues:

Cost approximation has an error? 'Could not automatically map claude-3-5-sonnet-20241022 to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.'

Adding claude-3-5-sonnet-20241022 to the costmap in and costmap out dictionaries doesn't fix the cost approximation error.

Will update if I find anything else, but that’s all I’ve had time to look into for now.

AlexTzk · 2025-02-19T23:43:36Z

@MohamadZeina which model did you run with ollama and did it get past literature_review?

whats2000 · 2025-02-20T16:51:47Z

@MohamadZeina Thanks for your help, I will take a look at it. But sadly I do not own an API key so I will need some help after I try to patch it!

whats2000 · 2025-02-21T06:27:39Z

@MohamadZeina I have made a patch for Claude's issue. If it fixes your issue, please let me know!

npandiyan · 2025-02-21T10:49:51Z

Thank you for the great work @whats2000 !
I have tried the framework through the gradio app and its sweet.
I then tried it with qwen2.5:7b, and as @AlexTzk was mentioning, I was facing the same issue with unexpected breaks in literature review process.
To joke with, I spammed (copy pasted) system command description for the literature review process to the notes under config and somehow this helped get through literature review phase in a few attempts.

I will look to try with 32b, but hope my 3080 does not cry!

whats2000 · 2025-02-21T13:08:16Z

I think I need some rework on the prompt for the smaller model 💀

MohamadZeina · 2025-02-21T13:11:19Z

Thanks @whats2000. Your changes allow the literature review to run, but then plan formulation fails if you don't provide a temperature.

Inference Exception: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'temperature: Input should be a valid number'}}

In ai_lab_repo.py, literature review explicitly provides a temp (0.8), but none of the other steps do.

Maybe the cleanest way around this is to treat the anthropic provider like the openAI provider in providers.py - ie, don't provide a temperature when there isn't one

        if temperature is None:
            message = client.messages.create(
                model=model_name,
                system=system_prompt,
                messages=[{"role": "user", "content": user_prompt}],
                max_tokens = 8192 if 'sonnet' in model_name else 4096,
            )
        else:
            message = client.messages.create(
                model=model_name,
                system=system_prompt,
                messages=[{"role": "user", "content": user_prompt}],
                max_tokens = 8192 if 'sonnet' in model_name else 4096,
                temperature=temperature,
            )

This fixes the temperature issue, but now there's an issue if the context gets longer than the anthropic max (200_000). Working on a fix. Is it possible / would you like me to commit these anthropic fixes directly?

whats2000 · 2025-02-21T14:16:32Z

I am working on it! I think we need to clip the context for claude. As I think it was not provide by the SDK.

whats2000 · 2025-02-21T15:08:48Z

@MohamadZeina I have a question to ask, what step is it that triggers the max token issue? And how did you set the configuration (Like some over-size review or something)? It might be hard for me to patch the 200k length issue (As I can not run Claude, sorry). Feel free to make a patch for it!)

MohamadZeina · 2025-02-21T15:28:02Z

Thanks for incorporating that temperature fix.

Apologies but I've lost the console output that produced that error with too much context. From memory it was an early step, maybe literature review or plan formulation.

It was Claude Sonnet, but the compute settings were all set to the lowest possible: --num-papers-lit-review 1 --mlesolver-max-steps 1 --papersolver-max-steps 1

Have run Haiku a few times with no issues, and running Sonnet again now and haven't had the issue. Will share what I find if I run into it again, or if I implement a fix

MohamadZeina · 2025-02-21T15:55:05Z

@MohamadZeina which model did you run with ollama and did it get past literature_review?

Apologies @AlexTzk , didn't see this initially. Have only played with deepseek-r1 distills - all of them up to 8B, and 70B but not for long. They all get stuck on the literature review, I think they're struggling to follow the instruction to add papers to the review. I can't get past it, even if I reduce the required number of papers to 1, prompt it more heavily to add papers, and increase ollama max context. Would love to hear if people have luck with other small models.

whats2000 · 2025-02-22T06:25:38Z

I add a beta version of web ui with Vite React App through Flask API. Use the app.py to install the WebUI, you can check the project here. Feel free to give me some feedback. I am working on i18n and adding the visualization of process, cost, etc.

Here is the UI, I tried to make it look like gradio.

AlexTzk · 2025-02-24T13:53:26Z

@whats2000 awesome work, I will test the webui sometime this week and provide feedback.

@MohamadZeina I managed to get past lit_review by using qwen2.5:32b model on ollama with a num_ctx windows of 100000 tokens. I created my own model from the modelfile. Another unforeseen problem is during the subsequent tasks, more specifically when it gets to running_experiments, it will take a long time to reply as it's using a mixture of RAM and VRAM - about 50/50 - and this will cause the code to timeout. Creating a different model with a 16k token window does get around this but you have to interrupt current research, delete your custom model, create another custom model under the same name with different context window and restarting the research.

I was thinking that implementing a memory_class might be a worthwhile endeavour...
Or, specifically to literature_review, rather than doing everything in one go we could split literature_review in subtasks:

find all relevant papers, store IDs in a file
go through each paper with an LLM restart after each one is reviewed so the context window doesn't run out - remove IDs after review
Store relevant content from the papers into a different file that gets appended after each review
Compile all literature_review and go to next step

whats2000 · 2025-02-24T15:18:40Z

Great to hear that!

eltociear and others added 6 commits January 9, 2025 23:36

chore: update mlesolver.py

b0cb490

hyperparamter -> hyperparameter

Update README.md

03f1338

Add instructions for Windows users to install MikTeX

Merge pull request SamuelSchmidgall#1 from N4SIRODDIN3/N4SIRODDIN3-pa…

3d045ce

…tch-1 Update README.md

add gradio for fast configuration

f2110be

add support for more provider

29ae2ea

Update requirements.txt

174a7da

whats2000 added 2 commits February 8, 2025 22:35

fix the check point not update

73e9bae

Add the missing json import

0862c92

Fix: Fix the check name to openai_api_key

e27295b

Moorelatrice39 approved these changes Feb 10, 2025

View reviewed changes

add torchvison,torchaudio and gradio version

705fb0f

update the layout of the web ui

07f5760

whats2000 added 3 commits February 11, 2025 01:18

add valid check for API key in web ui

be5b74c

fix the issue that model_backbone type

dab197b

The bug is that the `model_backbone` attribute in LaboratoryWorkflow use as both `dict` and `str`. Which make some agent not find the model and use default_model and cause inference error.

add a pause for user to read the review

b29b7e2

whats2000 mentioned this pull request Feb 10, 2025

Add support for Gemini API? #33

Open

whats2000 added 2 commits February 11, 2025 01:57

fix the model estimate for Claude and Gemini

ba22078

fix the check of missing key

62057d4

Now only throw if all of the key is missing

add the new model to front for support information

22cfb67

whats2000 added 2 commits February 21, 2025 14:18

fix the incorrect Claude model name

4519d14

change to allow more flexible Claude model compute the cost

3c0d062

add a storage for restore web ui settings

9312269

fix the missing temperature issue in Anthropic

04fc69a

whats2000 mentioned this pull request Feb 21, 2025

[Work in Progress] Add support for gemini models #67

Open

Merge branch 'pr/62'

0183178

This was referenced Feb 21, 2025

Add instructions for Windows users to install MikTeX #62

Open

Fix: Standardize Model Names for Consistency in Cost Calculation and API Calls #29

Open

Merge branch 'pr/3'

c41364e

whats2000 mentioned this pull request Feb 21, 2025

chore: update mlesolver.py #3

Open

whats2000 added 2 commits February 22, 2025 14:17

add webui option with flask base react app

dc7d533

add the texlive-science to installation guild

d8d6725

whats2000 changed the title ~~[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Gradio UI configuration~~ [Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI #73

[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI #73

whats2000 commented Feb 8, 2025 •

edited

Loading

whats2000 commented Feb 8, 2025

AlexTzk commented Feb 9, 2025 •

edited

Loading

whats2000 commented Feb 9, 2025 •

edited

Loading

AlexTzk commented Feb 10, 2025 •

edited

Loading

whats2000 commented Feb 10, 2025 •

edited

Loading

whats2000 commented Feb 10, 2025

AlexTzk commented Feb 10, 2025

whats2000 commented Feb 10, 2025

AlexTzk commented Feb 10, 2025

whats2000 commented Feb 10, 2025 •

edited

Loading

AlexTzk commented Feb 11, 2025

nullnuller commented Feb 11, 2025

AlexTzk commented Feb 18, 2025

whats2000 commented Feb 19, 2025 •

edited

Loading

MohamadZeina commented Feb 19, 2025

AlexTzk commented Feb 19, 2025

whats2000 commented Feb 20, 2025 •

edited

Loading

whats2000 commented Feb 21, 2025

npandiyan commented Feb 21, 2025

whats2000 commented Feb 21, 2025 •

edited

Loading

MohamadZeina commented Feb 21, 2025

whats2000 commented Feb 21, 2025 •

edited

Loading

whats2000 commented Feb 21, 2025

MohamadZeina commented Feb 21, 2025

MohamadZeina commented Feb 21, 2025

whats2000 commented Feb 22, 2025 •

edited

Loading

AlexTzk commented Feb 24, 2025 •

edited

Loading

whats2000 commented Feb 24, 2025

[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI #73

Are you sure you want to change the base?

[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI #73

Conversation

whats2000 commented Feb 8, 2025 • edited Loading

Summary

What I have checked

Reference Issue able to solve

UI Example

Gradio

React Flask App (Beta)

Test Production Paper

whats2000 commented Feb 8, 2025

AlexTzk commented Feb 9, 2025 • edited Loading

whats2000 commented Feb 9, 2025 • edited Loading

AlexTzk commented Feb 10, 2025 • edited Loading

whats2000 commented Feb 10, 2025 • edited Loading

whats2000 commented Feb 10, 2025

AlexTzk commented Feb 10, 2025

whats2000 commented Feb 10, 2025

AlexTzk commented Feb 10, 2025

whats2000 commented Feb 10, 2025 • edited Loading

AlexTzk commented Feb 11, 2025

nullnuller commented Feb 11, 2025

AlexTzk commented Feb 18, 2025

whats2000 commented Feb 19, 2025 • edited Loading

MohamadZeina commented Feb 19, 2025

AlexTzk commented Feb 19, 2025

whats2000 commented Feb 20, 2025 • edited Loading

whats2000 commented Feb 21, 2025

npandiyan commented Feb 21, 2025

whats2000 commented Feb 21, 2025 • edited Loading

MohamadZeina commented Feb 21, 2025

whats2000 commented Feb 21, 2025 • edited Loading

whats2000 commented Feb 21, 2025

MohamadZeina commented Feb 21, 2025

MohamadZeina commented Feb 21, 2025

whats2000 commented Feb 22, 2025 • edited Loading

AlexTzk commented Feb 24, 2025 • edited Loading

whats2000 commented Feb 24, 2025

whats2000 commented Feb 8, 2025 •

edited

Loading

AlexTzk commented Feb 9, 2025 •

edited

Loading

whats2000 commented Feb 9, 2025 •

edited

Loading

AlexTzk commented Feb 10, 2025 •

edited

Loading

whats2000 commented Feb 10, 2025 •

edited

Loading

whats2000 commented Feb 10, 2025 •

edited

Loading

whats2000 commented Feb 19, 2025 •

edited

Loading

whats2000 commented Feb 20, 2025 •

edited

Loading

whats2000 commented Feb 21, 2025 •

edited

Loading

whats2000 commented Feb 21, 2025 •

edited

Loading

whats2000 commented Feb 22, 2025 •

edited

Loading

AlexTzk commented Feb 24, 2025 •

edited

Loading