Supabase crawler and coder - update crawl4ai to 0.5.0 - New Agent guide #43

bigsk1 · 2025-03-06T10:48:35Z

Added

New crawler for Supabase docs
New coder for Supabase coder agent
Option to select crawl4ai or requests in UI
slider to select the amount of urls to crawl
Check status of crawl button
test crawl to make sure before committing
Crawl4ai v0.5.0 with playwrite and debs ( current version doesn't even link to docs anymore on their website )
docs folder with clear details for creating and adding new crawler and coders agents - examples included

Updated

Dockerfile in root
streamlit_ui.py to handle new crawler and coder in Documentation tab
Graph service and MCP to handle new agent
Title and summery for embeddings using gpt4o-mini as I had issues with anthropic doesn't support response_format - could add PRIMARY_MODEL back as option if needed, embdedding were being inforced by openai already it seemed.
Prompt for pydantic and supabase agents when using MCP so responces dont get cut short ( can always be added back in if you want 1. create readme 2. make requirements.txt ect..)

Removed

Unused weather agent - no example shown in .env.example

Archon/
├── archon/
│   ├── archon_graph.py        # Core agent workflow and orchestration
│   ├── supabase_coder.py      # Supabase agent implementation
│   ├── pydantic_ai_coder.py   # Pydantic AI agent implementation
│   ├── crawl_supabase_docs.py # Crawler for Supabase documentation
│   └── crawl_pydantic_ai_docs.py # Crawler for Pydantic AI documentation
├── streamlit_ui.py            # Main UI implementation
└── utils/
    └── utils.py               # Shared utility functions

Crawl4ai proving super clean scrape

…finished, refresh updates in streamlit pain is ass

…pdates, added module-level lock

…orrect logs

…o have simple mode for crawl with requests, check data in supabase look good, test mode works on both options. Dockerfile updated for crawl4ai 0.5.0 with add playwrite deps and supported headless browsing

… to match supabase tab.

… being to larger, mcp service working, both crawlers working but if large amount of urls 250+ get error need to handle errors

…d archon_graph.py and graph_service.py

…h exceeded" error

- Fix import conflict with list_documentation_pages_helper using aliases - Enhance agent type detection with expanded keyword list - Ensure correct coder selection based on query content - Add explicit source filters for documentation retrieval - Improve logging for better debugging and traceability - Fix Supabase agent not being properly selected for relevant queries

coleam00 · 2025-03-06T17:30:41Z

Woah there is a lot here - thank you so much for all of this! Over the next few days I'll be sure to review this in detail!

bigsk1 · 2025-03-07T06:18:55Z

@coleam00 sounds good, feel free to make any adjustments as needed.

…accept a timeout parameter

…uld be replaced depending on what you want.

…tic_ai_docs" in both the documentation string and the actual database query, was incorrectly call pydantic_docs not pydantic_ai_docs, clear button actually clears database now

coleam00 · 2025-03-09T12:16:15Z

After looking over the code, I have to say I'm impressed with the level of detail here! There are a couple hesitations I have though:

Archon is meant to build other AI agents, and the example you gave isn't actually for building an agent and Supabase as a primary coder agent wouldn't be for building agents either. Archon isn't meant to be a general coding assistant since we already have too many of those! It would make sense to include the Supabase docs to help the agent write tools to work with Supabase, but not much more IMO. This setup could be used to include documentation for other agent frameworks though!
I think for adding more and more documentation sources we'll have to make this specialized agent creation more dynamic. Instead of there being a default reasoner, Supabase reasoner, then more reasoners with more frameworks, I would like it to just be a single reasoner with dynamic access to the right prompt, tools, and documentation.

bigsk1 · 2025-03-10T15:37:41Z

I had some of the same concerns after using for awhile, and was thinking about a modular system, actually started making a dynamic crawler option see here https://github.com/bigsk1/Archon/tree/crawler-template

There is a crawler_template.py all you do is copy and rename this file, it contains the bulk of what is needed for a crawler, i.e Supabase docs or any other resource.

The base_crawler.py is the shared functions of all crawlers.

The crawler_registry.py auto discovers it, you just modify and add the new crawler details:

 defaults = [
        {
            "name": "supabase_docs",
            "module_path": "archon.crawl_supabase_docs",
            "display_name": "Supabase Docs",
            "keywords": ["supabase", "postgres", "postgresql", "rpc", "edge function", 
                         "storage", "auth", "realtime", "subscription"],
            "description": "Supabase documentation for building applications with Supabase"
        },

You have a ui_helpers.py that is a helper module for creating doc tabs and ui components automatically in streamlit.

Read about the Crawler Registry Guide here: https://github.com/bigsk1/Archon/blob/crawler-template/docs/CRAWLER_REGISTRY_GUIDE.md

So that would be an idea for a dynamic modular design to get crawlers and embeddings in supabase for an AI agent to then have access to this knowledge. Currently it has some bugs I ran out of time to mess with. I'm sure there is a better way to do this, ideally you just paste a name, sitemap or url into the UI and bam, you got a new crawler with new Ui tabs and all the existing shown functions and features to crawl, clear, delete, ect..

As far as the coder agents it makes sense there is a general coder agent to make other agents, using streamlit is a little tough, going down this rabbit hole and thinking about all this led me to build this the other day
https://github.com/bigsk1/supa-crawl-chat

Maybe you can take a few ideas and improve and extend as you see fit.

bigsk1 added 16 commits March 3, 2025 18:25

supabase docs working but have to hit stop button in ui to show when …

26ccc78

…finished, refresh updates in streamlit pain is ass

same issue with ui not updating during crawl showing zero stats and u…

37b3fc0

…pdates, added module-level lock

working supabase crawl with check status button

5d057de

supabase chat agent working

96a3bee

add docs and examples, update readme, supabase chat agent returning c…

995812c

…orrect logs

update steamlit.py to match latest from origin

02820a7

mcp docker command for cursor

e3df74c

add crawl4ai example in docs

93f9d82

add supabase pydantic agent guide to match main

e8c0317

update comments on crawl4ai in new agent guide

56d7042

add crawl4ai v0.5.0, cleaned crawling and checked in supabase ui, als…

65b885f

…o have simple mode for crawl with requests, check data in supabase look good, test mode works on both options. Dockerfile updated for crawl4ai 0.5.0 with add playwrite deps and supported headless browsing

crawl4ai and requests working for pydantic docs, updated streamlit_ui…

4803a99

… to match supabase tab.

mcp agents correctly use the right agent now but responces get cutoff…

ec0d667

… being to larger, mcp service working, both crawlers working but if large amount of urls 250+ get error need to handle errors

shorter responces in mcp agents to not get cutoff, updated prompts an…

94fdfcd

…d archon_graph.py and graph_service.py

update readme, update crawlers to prevent the "maximum recursion dept…

fa73622

…h exceeded" error

coleam00 added the enhancement New feature or request label Mar 6, 2025

bigsk1 added 3 commits March 7, 2025 02:15

remove timeout - BrowserConfig class in the crawl4ai library doesn't …

635f9c3

…accept a timeout parameter

change back to pydantic ai not pydantic sitemap to match main, but co…

9493cac

…uld be replaced depending on what you want.

clear_existing_records function to use the correct source name "pydan…

c8e9988

…tic_ai_docs" in both the documentation string and the actual database query, was incorrectly call pydantic_docs not pydantic_ai_docs, clear button actually clears database now

coleam00 added the question Further information is requested label Mar 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supabase crawler and coder - update crawl4ai to 0.5.0 - New Agent guide #43

Supabase crawler and coder - update crawl4ai to 0.5.0 - New Agent guide #43

bigsk1 commented Mar 6, 2025 •

edited

Loading

coleam00 commented Mar 6, 2025

bigsk1 commented Mar 7, 2025

coleam00 commented Mar 9, 2025

bigsk1 commented Mar 10, 2025

Supabase crawler and coder - update crawl4ai to 0.5.0 - New Agent guide #43

Are you sure you want to change the base?

Supabase crawler and coder - update crawl4ai to 0.5.0 - New Agent guide #43

Conversation

bigsk1 commented Mar 6, 2025 • edited Loading

Added

Updated

Removed

coleam00 commented Mar 6, 2025

bigsk1 commented Mar 7, 2025

coleam00 commented Mar 9, 2025

bigsk1 commented Mar 10, 2025

bigsk1 commented Mar 6, 2025 •

edited

Loading