Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supabase crawler and coder - update crawl4ai to 0.5.0 - New Agent guide #43

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

bigsk1
Copy link

@bigsk1 bigsk1 commented Mar 6, 2025

Added

  • New crawler for Supabase docs
  • New coder for Supabase coder agent
  • Option to select crawl4ai or requests in UI
  • slider to select the amount of urls to crawl
  • Check status of crawl button
  • test crawl to make sure before committing
  • Crawl4ai v0.5.0 with playwrite and debs ( current version doesn't even link to docs anymore on their website )
  • docs folder with clear details for creating and adding new crawler and coders agents - examples included

Updated

  • Dockerfile in root
  • streamlit_ui.py to handle new crawler and coder in Documentation tab
  • Graph service and MCP to handle new agent
  • Title and summery for embeddings using gpt4o-mini as I had issues with anthropic doesn't support response_format - could add PRIMARY_MODEL back as option if needed, embdedding were being inforced by openai already it seemed.
  • Prompt for pydantic and supabase agents when using MCP so responces dont get cut short ( can always be added back in if you want 1. create readme 2. make requirements.txt ect..)

Removed

  • Unused weather agent - no example shown in .env.example
Archon/
├── archon/
│   ├── archon_graph.py        # Core agent workflow and orchestration
│   ├── supabase_coder.py      # Supabase agent implementation
│   ├── pydantic_ai_coder.py   # Pydantic AI agent implementation
│   ├── crawl_supabase_docs.py # Crawler for Supabase documentation
│   └── crawl_pydantic_ai_docs.py # Crawler for Pydantic AI documentation
├── streamlit_ui.py            # Main UI implementation
└── utils/
    └── utils.py               # Shared utility functions

pull-documentation


pull-agent


Crawl4ai proving super clean scrape

image

bigsk1 added 16 commits March 3, 2025 18:25
…finished, refresh updates in streamlit pain is ass
…o have simple mode for crawl with requests, check data in supabase look good, test mode works on both options. Dockerfile updated for crawl4ai 0.5.0 with add playwrite deps and supported headless browsing
… being to larger, mcp service working, both crawlers working but if large amount of urls 250+ get error need to handle errors
- Fix import conflict with list_documentation_pages_helper using aliases
- Enhance agent type detection with expanded keyword list
- Ensure correct coder selection based on query content
- Add explicit source filters for documentation retrieval
- Improve logging for better debugging and traceability
- Fix Supabase agent not being properly selected for relevant queries
@coleam00
Copy link
Owner

coleam00 commented Mar 6, 2025

Woah there is a lot here - thank you so much for all of this! Over the next few days I'll be sure to review this in detail!

@coleam00 coleam00 added the enhancement New feature or request label Mar 6, 2025
@bigsk1
Copy link
Author

bigsk1 commented Mar 7, 2025

@coleam00 sounds good, feel free to make any adjustments as needed.

bigsk1 added 3 commits March 7, 2025 02:15
…tic_ai_docs" in both the documentation string and the actual database query, was incorrectly call pydantic_docs not pydantic_ai_docs, clear button actually clears database now
@coleam00
Copy link
Owner

coleam00 commented Mar 9, 2025

After looking over the code, I have to say I'm impressed with the level of detail here! There are a couple hesitations I have though:

  1. Archon is meant to build other AI agents, and the example you gave isn't actually for building an agent and Supabase as a primary coder agent wouldn't be for building agents either. Archon isn't meant to be a general coding assistant since we already have too many of those! It would make sense to include the Supabase docs to help the agent write tools to work with Supabase, but not much more IMO. This setup could be used to include documentation for other agent frameworks though!

  2. I think for adding more and more documentation sources we'll have to make this specialized agent creation more dynamic. Instead of there being a default reasoner, Supabase reasoner, then more reasoners with more frameworks, I would like it to just be a single reasoner with dynamic access to the right prompt, tools, and documentation.

@coleam00 coleam00 added the question Further information is requested label Mar 9, 2025
@bigsk1
Copy link
Author

bigsk1 commented Mar 10, 2025

I had some of the same concerns after using for awhile, and was thinking about a modular system, actually started making a dynamic crawler option see here https://github.com/bigsk1/Archon/tree/crawler-template

There is a crawler_template.py all you do is copy and rename this file, it contains the bulk of what is needed for a crawler, i.e Supabase docs or any other resource.

The base_crawler.py is the shared functions of all crawlers.

The crawler_registry.py auto discovers it, you just modify and add the new crawler details:

 defaults = [
        {
            "name": "supabase_docs",
            "module_path": "archon.crawl_supabase_docs",
            "display_name": "Supabase Docs",
            "keywords": ["supabase", "postgres", "postgresql", "rpc", "edge function", 
                         "storage", "auth", "realtime", "subscription"],
            "description": "Supabase documentation for building applications with Supabase"
        },

You have a ui_helpers.py that is a helper module for creating doc tabs and ui components automatically in streamlit.

Read about the Crawler Registry Guide here: https://github.com/bigsk1/Archon/blob/crawler-template/docs/CRAWLER_REGISTRY_GUIDE.md

So that would be an idea for a dynamic modular design to get crawlers and embeddings in supabase for an AI agent to then have access to this knowledge. Currently it has some bugs I ran out of time to mess with. I'm sure there is a better way to do this, ideally you just paste a name, sitemap or url into the UI and bam, you got a new crawler with new Ui tabs and all the existing shown functions and features to crawl, clear, delete, ect..

As far as the coder agents it makes sense there is a general coder agent to make other agents, using streamlit is a little tough, going down this rabbit hole and thinking about all this led me to build this the other day
https://github.com/bigsk1/supa-crawl-chat

Maybe you can take a few ideas and improve and extend as you see fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants