-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supabase crawler and coder - update crawl4ai to 0.5.0 - New Agent guide #43
base: main
Are you sure you want to change the base?
Conversation
…finished, refresh updates in streamlit pain is ass
…pdates, added module-level lock
…o have simple mode for crawl with requests, check data in supabase look good, test mode works on both options. Dockerfile updated for crawl4ai 0.5.0 with add playwrite deps and supported headless browsing
… to match supabase tab.
… being to larger, mcp service working, both crawlers working but if large amount of urls 250+ get error need to handle errors
…d archon_graph.py and graph_service.py
…h exceeded" error
- Fix import conflict with list_documentation_pages_helper using aliases - Enhance agent type detection with expanded keyword list - Ensure correct coder selection based on query content - Add explicit source filters for documentation retrieval - Improve logging for better debugging and traceability - Fix Supabase agent not being properly selected for relevant queries
Woah there is a lot here - thank you so much for all of this! Over the next few days I'll be sure to review this in detail! |
@coleam00 sounds good, feel free to make any adjustments as needed. |
…accept a timeout parameter
…uld be replaced depending on what you want.
…tic_ai_docs" in both the documentation string and the actual database query, was incorrectly call pydantic_docs not pydantic_ai_docs, clear button actually clears database now
After looking over the code, I have to say I'm impressed with the level of detail here! There are a couple hesitations I have though:
|
I had some of the same concerns after using for awhile, and was thinking about a modular system, actually started making a dynamic crawler option see here https://github.com/bigsk1/Archon/tree/crawler-template There is a crawler_template.py all you do is copy and rename this file, it contains the bulk of what is needed for a crawler, i.e Supabase docs or any other resource. The base_crawler.py is the shared functions of all crawlers. The crawler_registry.py auto discovers it, you just modify and add the new crawler details: defaults = [
{
"name": "supabase_docs",
"module_path": "archon.crawl_supabase_docs",
"display_name": "Supabase Docs",
"keywords": ["supabase", "postgres", "postgresql", "rpc", "edge function",
"storage", "auth", "realtime", "subscription"],
"description": "Supabase documentation for building applications with Supabase"
}, You have a ui_helpers.py that is a helper module for creating doc tabs and ui components automatically in streamlit. Read about the Crawler Registry Guide here: https://github.com/bigsk1/Archon/blob/crawler-template/docs/CRAWLER_REGISTRY_GUIDE.md So that would be an idea for a dynamic modular design to get crawlers and embeddings in supabase for an AI agent to then have access to this knowledge. Currently it has some bugs I ran out of time to mess with. I'm sure there is a better way to do this, ideally you just paste a name, sitemap or url into the UI and bam, you got a new crawler with new Ui tabs and all the existing shown functions and features to crawl, clear, delete, ect.. As far as the coder agents it makes sense there is a general coder agent to make other agents, using streamlit is a little tough, going down this rabbit hole and thinking about all this led me to build this the other day Maybe you can take a few ideas and improve and extend as you see fit. |
Added
Updated
PRIMARY_MODEL
back as option if needed, embdedding were being inforced by openai already it seemed.Removed
Crawl4ai proving super clean scrape