Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI #73

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

whats2000
Copy link

@whats2000 whats2000 commented Feb 8, 2025

Summary

  • Added support for Ollama as a new agent provider.
  • Integrated Gemini as a new agent provider.
  • Integrated Claude as a new agent provider.
  • Implemented a Gradio-based UI for easier configuration of agent settings.
  • Fix the bug of the config of different models then OpenAI will fail to load from checkpoint and fallback to o1-mini
  • Fix a miss-spelling of phase agent_models "report refinement" to "paper refinement", I think this is from the original code
  • Centralize the other config to config.py

What I have checked

  • Full running in Linux for the full function with gemini-2.0-flash
  • Test with the Gemini Provider
  • Test with Ollama Provider

Reference Issue able to solve

UI Example

Gradio

Launch with config_gradio.py
image

React Flask App (Beta)

Launch with app.py
image

Note

Working on adding some monitors and some dialog visualization.

Test Production Paper

SAMAug with MRANet.pdf
Review.txt

@whats2000
Copy link
Author

The enhancement of the inference is able to adapt to the service support OpenAI SDK for fast integration.

@AlexTzk
Copy link

AlexTzk commented Feb 9, 2025

hi @whats2000 ! Appreciate your efforts putting together alternative LLM backends for this project.

Tried using your code with my local ollama instance but getting error 422 unprocessable entity from the webui. Not sure if I'm missing anything?

Configured the API http://OLLAMA_IP:11434 also tried http://OLLAMA_IP:11434/api/generate and http://OLLAMA_IP:11434/v1 but same result
Then I tried to modify your function and tailor more for ollama:


class OpenaiProvider:
    @staticmethod
    def get_response(
        api_key: str,
        model_name: str,
        user_prompt: str,
        system_prompt: str,
        temperature: float = None,
        base_url: str | None = None,
    ) -> str:
        if api_key == "ollama":
            url = "http://10.0.0.99:11434/api/generate"
            headers = {"Content-Type": "application/json"}
            payload = {
                "model": model_name,
                "prompt": user_prompt,
                "stream": False
            }
            response = requests.post(url, json=payload, headers=headers)
            if response.status_code == 200:
                return response.json().get("response", "No response received.")
            else:
                return f"Error: {response.status_code} - {response.text}"


        openai.api_key = api_key ```


I'm launching with ``` python config_gradio.py ``` 

Do you mind sharing your working setup details? 

@whats2000
Copy link
Author

whats2000 commented Feb 9, 2025

@AlexTzk I think I need more information on how to reproduce your error. What is your operator system? The script I test within the Linux Ubuntu System 20 (WSL2). Can you try a test script by calling the provider.py only? Also, ensure that your Ollama version is the latest (it might be due to outdated reasons).

# Test script for Ollama
print(OpenaiProvider.get_response(
    api_key="ollama",
    model_name="deepseek-r1:32b",
    user_prompt="What is the meaning of life?",
    system_prompt="You are a philosopher seeking the meaning of life.",
    base_url="http://localhost:11434/v1/"
))

And I get a response with Ollama 0.5.7

C:\Users\user\.conda\envs\NatureLanguageAnalyze\python.exe C:\Users\user\Documents\GitHub\AgentLaboratory\provider.py 
<think>
Okay, so I just came across this user who is playing the role of a philosopher searching for the meaning of life. They asked me, "What is the meaning of life?" and my initial response was kind of an exploration of various philosophical perspectives—existentialism, humanism, stoicism, spirituality, and nihilism. Now, they’re asking me to think through this as if I'm just starting out on this journey, maybe a bit confused or overwhelmed.

// A lot of output I just skip it

Process finished with exit code 0

@AlexTzk
Copy link

AlexTzk commented Feb 10, 2025

@whats2000 Thank you for your reply.

Your test function does work:

<think>
Okay, so I'm trying to figure out what the meaning of life is. Hmm, where do I even start with this? I've heard people talk about it in different contexts—philosophy, religion, science—and everyone seems to have a different take. Maybe I should break it down into smaller parts. 
[.......]
</think>
[more output]

Then I set baseurl within config.py:

OLLAMA_API_BASE_URL = "http://10.0.0.99:11434/v1/"

Launch gradio with python config_gradio.py, set OpenAI API key to Ollama and specify the same model, deepseek-r1:32b. I get error 422
image

I am running ollama version 0.5.7 on another machine at 10.0.0.99:11434, that's a docker container with an ubuntu base I believe.

The AgentLab code and your PR is running on an ubuntu server 24.04 LTS with Python 3.12.9

@whats2000
Copy link
Author

whats2000 commented Feb 10, 2025

Seems like this is due to gradio from your image (It fail to lanuch the terminal). Did you see any output in terminal that point out the why the terminal fail to launch? Also I use XTerm for linux. Feel free to check the code at config_gradio.py

@whats2000
Copy link
Author

@AlexTzk I added the version of the gradio, could you try to uninstall gradio and reinstall requirements.txt. I also fix the missing dependence of torchvision and torchaudio

@AlexTzk
Copy link

AlexTzk commented Feb 10, 2025

@whats2000 It's working now! I believe my issue was caused because of Xterm not being able to launch due to not having a GUI. Thank you for your help.

A couple of notes:
With gradio 4.44.1 MarkupSafe version cannot be used as it requires a lower version and Pillow also requires downgrading. I left pip decide what version to use for MarkupSafe and downgraded pillow to 10.4.0. Seems to be working fine.

I now have a different exception about max_tries exceeded during literature review but that is not connected to your PR, I will try to fix that now.

@whats2000
Copy link
Author

I just updated the layout to look more balanced, do you think it looks better?
I also let it directly output the command it generates at status in order for debugging!
Hope this can help you @AlexTzk

image

@AlexTzk
Copy link

AlexTzk commented Feb 10, 2025

@whats2000 Looks great! Love how you split it across both sides. The debugging feature is extremely helpful, many thanks for that.

Running an experiment now to test if max_tries exception is being thrown again but as soon as I'm done with that I will test this again! Great work 👍

The bug is that the `model_backbone` attribute in LaboratoryWorkflow use as both `dict` and `str`. Which make some agent not find the model and use default_model and cause inference error.
@whats2000
Copy link
Author

whats2000 commented Feb 10, 2025

@AlexTzk I just fixed several bugs that I discovered in the original project. Did this fix for you? I found that the deep seek-r1 is not that good at creating a command format prompt, making it fail to invoke the task. And that may result in max_tries exception. From the model technical report is shown that the model is not mainly trained for tool usage ( Which affects the struct output command performance). I also test the model qwen2.5:32b and it seems like it will create struct output. I will test the qwen2.5-coder:32b to see if the performance is better.

@AlexTzk
Copy link

AlexTzk commented Feb 11, 2025

@whats2000 I still got the max_tries exception with deepseek-r1:32b; second test was with qwen2.5-coder:32b-instruct-q5_K_M via Gradio but that seemed to have crashed as well. No message in the WEBUI but I presume it was the same exception about max_tries for literature review.

I am now trying to run a smaller model, qwen2.5-coder:14b-instruct-q4_1 launched from the terminal rather than the webui, want to see if it's the same error.

@nullnuller
Copy link

Is it going to be merged soon?

@AlexTzk
Copy link

AlexTzk commented Feb 18, 2025

@whats2000 looks good! The only problem with ollama max_tokens argument is that it doesn't seem to get passed every time. Maybe it's my hardware but the way I went around it was to create my own modelfile and model in Ollama, then specify the max_tokens and other settings in there.

It's still failing during literature review but it does get further now and it seems to follow the structure. My problem is probably related to the amount of VRAM being too low, 32GB...

@whats2000
Copy link
Author

whats2000 commented Feb 19, 2025

I think there might be some bug in Ollama capable with OpenAI SDK? Not quite sure for that. I guess we need someone that can load bigger model to test it out. Or maybe we need reduce the complexity of the prompt (Although might hurt performance). I have try that on other project to make it at least usable.

@MohamadZeina
Copy link

Thanks for this! I've tested ollama and Gemini and both work for me, but there are some issues with anthropic

Requesting model "claude-3-5-sonnet" gives

Inference Exception: Model claude-3-5-sonnet not found

and "claude-3-5-sonnet-20241022" gives

Inference Exception: Model claude-3-5-sonnet-20241022 not found

around line 113 in inference.py I believe this . should be a -:
elif model_str == "claude-3.5-sonnet" or model_str == "claude-3-5-haiku":
Should be
elif model_str == "claude-3-5-sonnet" or model_str == "claude-3-5-haiku":

Same around line 171:

                    if model_str in [
                        "o1-preview", "o1-mini", "o1",
                        "claude-3.5-sonnet", "claude-3-5-haiku",
                        "gemini-2.0-flash", "gemini-2.0-flash-lite"
                    ]:

Fixing that and requesting "claude-3-5-sonnet" gives this error:

Inference Exception: Error code: 404 - {'type': 'error', 'error': {'type': 'not_found_error', 'message': 'model: claude-3-5-sonnet'}}

So it seems like anthropic want you to specify the exact sonnet you want, because loosening the string match and asking for a specific model like claude-3-5-sonnet-20241022 will at least successfully run some inference

elif "claude-3-5-sonnet" in model_str or "claude-3-5-haiku" in model_str:

Though this runs some inference, you get other issues:

Cost approximation has an error? 'Could not automatically map claude-3-5-sonnet-20241022 to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.'

Adding claude-3-5-sonnet-20241022 to the costmap in and costmap out dictionaries doesn't fix the cost approximation error.

Will update if I find anything else, but that’s all I’ve had time to look into for now.

@AlexTzk
Copy link

AlexTzk commented Feb 19, 2025

@MohamadZeina which model did you run with ollama and did it get past literature_review?

@whats2000
Copy link
Author

whats2000 commented Feb 20, 2025

@MohamadZeina Thanks for your help, I will take a look at it. But sadly I do not own an API key so I will need some help after I try to patch it!

@whats2000
Copy link
Author

@MohamadZeina I have made a patch for Claude's issue. If it fixes your issue, please let me know!

@npandiyan
Copy link

Thank you for the great work @whats2000 !
I have tried the framework through the gradio app and its sweet.
I then tried it with qwen2.5:7b, and as @AlexTzk was mentioning, I was facing the same issue with unexpected breaks in literature review process.
To joke with, I spammed (copy pasted) system command description for the literature review process to the notes under config and somehow this helped get through literature review phase in a few attempts.

I will look to try with 32b, but hope my 3080 does not cry!

@whats2000
Copy link
Author

whats2000 commented Feb 21, 2025

I think I need some rework on the prompt for the smaller model 💀

@MohamadZeina
Copy link

Thanks @whats2000. Your changes allow the literature review to run, but then plan formulation fails if you don't provide a temperature.

Inference Exception: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'temperature: Input should be a valid number'}}

In ai_lab_repo.py, literature review explicitly provides a temp (0.8), but none of the other steps do.

Maybe the cleanest way around this is to treat the anthropic provider like the openAI provider in providers.py - ie, don't provide a temperature when there isn't one

        if temperature is None:
            message = client.messages.create(
                model=model_name,
                system=system_prompt,
                messages=[{"role": "user", "content": user_prompt}],
                max_tokens = 8192 if 'sonnet' in model_name else 4096,
            )
        else:
            message = client.messages.create(
                model=model_name,
                system=system_prompt,
                messages=[{"role": "user", "content": user_prompt}],
                max_tokens = 8192 if 'sonnet' in model_name else 4096,
                temperature=temperature,
            )

This fixes the temperature issue, but now there's an issue if the context gets longer than the anthropic max (200_000). Working on a fix. Is it possible / would you like me to commit these anthropic fixes directly?

@whats2000
Copy link
Author

whats2000 commented Feb 21, 2025

I am working on it! I think we need to clip the context for claude. As I think it was not provide by the SDK.

@whats2000
Copy link
Author

@MohamadZeina I have a question to ask, what step is it that triggers the max token issue? And how did you set the configuration (Like some over-size review or something)? It might be hard for me to patch the 200k length issue (As I can not run Claude, sorry). Feel free to make a patch for it!)

@MohamadZeina
Copy link

Thanks for incorporating that temperature fix.

Apologies but I've lost the console output that produced that error with too much context. From memory it was an early step, maybe literature review or plan formulation.

It was Claude Sonnet, but the compute settings were all set to the lowest possible: --num-papers-lit-review 1 --mlesolver-max-steps 1 --papersolver-max-steps 1

Have run Haiku a few times with no issues, and running Sonnet again now and haven't had the issue. Will share what I find if I run into it again, or if I implement a fix

@MohamadZeina
Copy link

@MohamadZeina which model did you run with ollama and did it get past literature_review?

Apologies @AlexTzk , didn't see this initially. Have only played with deepseek-r1 distills - all of them up to 8B, and 70B but not for long. They all get stuck on the literature review, I think they're struggling to follow the instruction to add papers to the review. I can't get past it, even if I reduce the required number of papers to 1, prompt it more heavily to add papers, and increase ollama max context. Would love to hear if people have luck with other small models.

@whats2000
Copy link
Author

whats2000 commented Feb 22, 2025

I add a beta version of web ui with Vite React App through Flask API. Use the app.py to install the WebUI, you can check the project here. Feel free to give me some feedback. I am working on i18n and adding the visualization of process, cost, etc.

Here is the UI, I tried to make it look like gradio.
image

@whats2000 whats2000 changed the title [Feature Improvement] Add support for Ollama, Gemini, and Claude, with Gradio UI configuration [Feature Improvement] Add support for Ollama, Gemini, and Claude, with Web UI Feb 22, 2025
@AlexTzk
Copy link

AlexTzk commented Feb 24, 2025

@whats2000 awesome work, I will test the webui sometime this week and provide feedback.

@MohamadZeina I managed to get past lit_review by using qwen2.5:32b model on ollama with a num_ctx windows of 100000 tokens. I created my own model from the modelfile. Another unforeseen problem is during the subsequent tasks, more specifically when it gets to running_experiments, it will take a long time to reply as it's using a mixture of RAM and VRAM - about 50/50 - and this will cause the code to timeout. Creating a different model with a 16k token window does get around this but you have to interrupt current research, delete your custom model, create another custom model under the same name with different context window and restarting the research.

I was thinking that implementing a memory_class might be a worthwhile endeavour...
Or, specifically to literature_review, rather than doing everything in one go we could split literature_review in subtasks:

  • find all relevant papers, store IDs in a file
  • go through each paper with an LLM restart after each one is reviewed so the context window doesn't run out - remove IDs after review
  • Store relevant content from the papers into a different file that gets appended after each review
  • Compile all literature_review and go to next step

@whats2000
Copy link
Author

Great to hear that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants