Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moa #2628

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from
Draft

Moa #2628

wants to merge 25 commits into from

Conversation

gembancud
Copy link

@gembancud gembancud commented Dec 14, 2024

Add Mixture of Architects (MOA) Feature

Overview

This PR introduces a powerful new feature called "Mixture of Architects" (MOA) - a collaborative AI architecture where multiple LLM "architects" work together to solve programming tasks. Each architect maintains its own conversation thread while being able to see and respond to other architects' proposals, enabling true multi-agent collaboration.

Key Features

Multiple Architect Collaboration

  • Support for multiple LLM architects working together
  • Each architect identified by NATO phonetic name (alpha, bravo, charlie, etc.)
  • First architect (alpha) is always the main model
  • Each architect sees all other architects' proposals, enabling true collaborative discussion

Discussion Flow

The discussion proceeds in rounds, with each round following this pattern:

  1. User submits a query/request
  2. Architects respond sequentially:
    • Each architect sees:
      • Original user query
      • All previous architects' proposals (XML fenced)
    • Each architect provides:
      • Their analysis/instructions
      • Their own proposal (in XML fence)
    • Can reference, support, critique or object to other architects' proposals

Commands

Users can interact with MOA using three main commands:

  1. /discuss <message> (or just type normally) - Start/continue a discussion round
  2. /code <message> - Move to implementation phase
  3. /drop <architect-name> - Remove an architect from the discussion

Implementation Phase

When moving to implementation (/code), the entire discussion history is compiled chronologically with full context. The editor coder then decides how to implement the changes based on:

  • The full discussion history
  • The final user message
  • Their own analysis of the proposals

Technical Implementation

Key Components

  • MixtureOfArchitectsCoder: Main class implementing the MOA functionality
  • ArchitectAgent: Class representing individual architects
  • XML fencing for clear message boundaries:
    • <user_message> - Contains user queries
    • <proposal> - Contains an architect's specific proposal
    • <architect name='NAME'> - Contains full architect responses

Collaborative Design

  • All architects see all proposals
  • Architects can directly reference and critique each others' proposals
  • No formal consensus mechanism - the editor coder makes implementation decisions
  • User guides the final implementation through their /code message

Benefits

  1. Enhanced Problem Solving: Multiple perspectives and approaches to each programming task
  2. Better Decision Making: Architects can critique and improve upon each others' proposals
  3. Flexible Architecture: Easy to add or remove architects during discussion
  4. Clear Communication: Structured XML format ensures clear boundaries between different architects' inputs
  5. User Control: Users can guide the discussion and choose when to move to implementation

Example Usage & Message Flow

Visual Flow Representation

User Query 1: "Let's improve the error handling in our API endpoints"
└── Alpha's View:
    User Query 1
    └── Alpha Response + Proposal
        └── Bravo's Proposal
            └── Charlie's Proposal
                └── User Query 2
                    └── Alpha Response + Proposal
                        └── ...

└── Bravo's View:
    User Query 1 + Alpha's Proposal
    └── Bravo Response + Proposal
        └── Charlie's Proposal
            └── User Query 2 + Alpha's Proposal
                └── ...

└── Charlie's View:
    User Query 1 + Alpha's Proposal + Bravo's Proposal
    └── Charlie Response + Proposal
        └── User Query 2 + Alpha's Proposal + Bravo's Proposal
            └── ...

Detailed Example with Clear Message Flow

🗣️ User Initiates

User: Let's improve the error handling in our API endpoints

👤 ALPHA's Turn

What Alpha Sees:

<user_message>
Let's improve the error handling in our API endpoints
</user_message>

Alpha's Response:

ALPHA: I suggest we implement a structured approach to error handling.
<proposal>
1. Create a base ErrorResponse class:
   class ErrorResponse {
     code: string;
     message: string;
     details?: object;
     timestamp: Date;
   }
2. Implement standardized try-catch blocks
3. Add error codes and messages mapping
</proposal>

👤 BRAVO's Turn

What Bravo Sees:

<user_message>
Let's improve the error handling in our API endpoints
</user_message>

<architect name='ALPHA'>
<proposal>
1. Create a base ErrorResponse class:
   class ErrorResponse {
     code: string;
     message: string;
     details?: object;
     timestamp: Date;
   }
2. Implement standardized try-catch blocks
3. Add error codes and messages mapping
</proposal>
</architect>

Bravo's Response:

BRAVO: Building on Alpha's proposal, we should consider logging and monitoring.
<proposal>
4. Add structured logging in catch blocks:
   - Use Winston or Pino for structured JSON logging
   - Include request context (requestId, user, endpoint)
5. Implement error tracking metrics:
   - Count by error type
   - Track error rates and patterns
6. Set up monitoring dashboards for:
   - Error frequency
   - Response times
   - Error patterns over time
</proposal>

👤 CHARLIE's Turn

What Charlie Sees:

<user_message>
Let's improve the error handling in our API endpoints
</user_message>

<architect name='ALPHA'>
[Alpha's proposal]
</architect>

<architect name='BRAVO'>
[Bravo's proposal]
</architect>

Charlie's Response:

CHARLIE: Good suggestions for internal handling. Let's enhance the API client experience too.
<proposal>
7. Add detailed error documentation:
   - OpenAPI/Swagger error schemas
   - Error code catalog with solutions
8. Implement retry mechanisms:
   - Exponential backoff for 5xx errors
   - Retry-After header support
9. Add rate limiting handling:
   - Clear 429 responses
   - Rate limit headers
   - Client-side rate tracking
</proposal>

Key Points About the Flow

  1. Sequential Processing:

    • Each architect sees all previous proposals
    • Later architects can build on or critique earlier suggestions
    • Full context is maintained throughout the discussion
  2. XML Structure:

    • <user_message> wraps user inputs
    • <architect name='NAME'> wraps each architect's full response
    • <proposal> wraps specific proposals within responses
  3. Context Accumulation:

    • Alpha sees only the user's message
    • Bravo sees user's message + Alpha's proposal
    • Charlie sees user's message + Alpha's + Bravo's proposals
  4. Implementation Phase:

    • /code command triggers the editor
    • Editor receives complete discussion history
    • Makes informed decisions based on all architects' input

Testing

  • Tested with various model combinations
  • Verified XML parsing and message handling
  • Tested command processing and architect management
  • Validated implementation phase with different types of code changes

Future Enhancements

  1. Add support for architect voting/consensus mechanisms
  2. Implement architect specialization (e.g., security expert, performance expert)
  3. Add ability to save/load architect configurations
  4. Enhance discussion visualization

Breaking Changes

None. This is a new feature that doesn't affect existing functionality.

Dependencies

No new dependencies required.


This PR represents a significant enhancement to aider's capabilities, enabling more sophisticated and collaborative code generation and modification. The Mixture of Architects approach provides a unique way to leverage multiple LLMs for better code quality and more thorough problem solving.

Please contact me at discord for discussion :)

upnp

@CLAassistant
Copy link

CLAassistant commented Dec 14, 2024

CLA assistant check
All committers have signed the CLA.

@LuciferMornens
Copy link

Ngl this is a hell of a PR.

I hope @paul-gauthier accepts it.

@jerzydziewierz
Copy link

key question -- is this any good? @gembancud can you provide any kind of evaluations, or performance metrics?
any reference tasks that were not solved by a single architect but were solved by MoA ?

@gembancud
Copy link
Author

key question -- is this any good? @gembancud can you provide any kind of evaluations, or performance metrics? any reference tasks that were not solved by a single architect but were solved by MoA ?

This is with Sonnet and Gpt-4o together.

- dirname: 2024-12-18-11-12-24--trial_run9
  test_cases: 133
  model: openrouter/anthropic/claude-3.5-sonnet:beta, openai/gpt-4o
  edit_format: diff
  commit_hash: 49eb1d2-dirty
  pass_rate_1: 65.4
  pass_rate_2: 82.7
  percent_cases_well_formed: 100.0
  error_outputs: 8
  num_malformed_responses: 0
  num_with_malformed_responses: 0
  user_asks: 7
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 1
  exhausted_context_windows: 0
  test_timeouts: 0
  command: aider --model openrouter/anthropic/claude-3.5-sonnet:beta, openai/gpt-4o
  date: 2024-12-18
  versions: 0.68.1.dev
  seconds_per_case: 58.4
  total_cost: 5.1431

Will look to chaining o1, and 1206 as well, as soon as my rate limits relax abit!

@jerzydziewierz
Copy link

@gembancud so I understand that it is not currently any better than the Sonnet alone.

@paul-gauthier recommend to reject because: Having to wait approx. 1 minute per question is fundamentally not compatible with the original vision of Aider -- of being an user-interactive system.

@gembancud it's a great exercise but at this time I would recommend that you keep this private; we should avoid polluting overloading Aider with too many features of academic only merit.

@aj47
Copy link

aj47 commented Dec 20, 2024

image
/drop doesnt seem to be working for me

@aj47
Copy link

aj47 commented Dec 20, 2024

no autocomplete suggestion for /discuss

@gembancud
Copy link
Author

/drop doesnt seem to be working for me

my bad, i accidentally overrode /drop. should be working now, replaced removing an architect in moa to /ignore

My deepest apologies, i have been a bad contributor, not uploading a method use for this technique. I will do though soon, ive done prompt changes, and i pretty much use it as a daily driver now.
i have 3 architects working for me. sonnet35, gpt4o, and gemini 1206. and having 3 refinement steps before you trigger code implementation, from 3 different models means its under the lens multiple times. definitely still not perfect, but it definitely hits a much larger scope compared to /architect. By the time time the last model does its instructions everythings neatly refined. error handling, type validation, design concerns are all side effects rather than architect just straight up doing your tasks.

@paul-gauthier
Copy link
Collaborator

Thanks for your interest in aider and for preparing this PR.

This is a very large PR, that radically alters how aider would function. It seems unlikely that I could merge it would a pretty strong set of objective, quantitative evidence that it provides significant benefits.

Have you been benchmarking this approach?

@gembancud
Copy link
Author

gembancud commented Dec 21, 2024

Thank you for the attention! :)

I have been recently dismayed by jerzydziewierz's remarks, coupled that the expensive benchmark tests i ran didnt breach the saturation mark. I have not had the time and financial confidentiality to test beyond the code editing benchmark unfortunately. More of my benchmarks are in a discord thread easily dug through the showcases channel

Though you may notice the commits in here are continuous, that is evidenced by my tweaking nearly everyday as I have it as a main driver for development moving forward. I do think the code quality in here is much better compared to /architect but is pretty much anecdotal evidence coupled by the fact that i may be biased as the author.

I am on the lookout for QoL suggestions though, to make it easier for everyone to try it out, as I think thats just a much more organic adoption if people look beyond the benchmarks. I know that that's partly true because lmsys leaderboard does not have sonnet 3.5 at the top and alot in our circles advocate it largely by personal experience. In that regard, I would prefer getting feedback from moa that way as well.

But if my message does not answer the base need for quantitative evidence then, Im fine with postponing the fight until the next release of benchmarks with reduced saturation. On a minor note, moa is much more impressive in the code quality aspect, that i think is disregarded in unit testing benchmarks. If quanitifiable it would be something like chatbot arena.

@jerzydziewierz
Copy link

Dear @gembancud ,

sorry if you feel insulted,

My personal experience with these auto coding tools is that they very quickly fall into the trap of under-specification: that is, the problem shifts into eliciting what does the user even want and need in the first place, rather than providing the solution.

Hence, Aider has been envisioned as an coder-interactive tool rather than an auto-chatbot arena with some agentic effects on the source code.

as @paul-gauthier said, Aider, by now, is a relatively mature as a tool and it is unlikely that it will simply accept such a major change of the direction into the main repo, as-is

May I suggest that you can fork aider (if licence permits) and develop your vision there, and when you can demonstrate to a few people that your approach is superior, the word of mouth will surely spread.

As to benchmarking, you indeed do not need to demonstrate superiority on any of the big official benchmarks that may cost hundreds of dollars to run.

Just demonstrate it nicely on one or a few examples specifically tailored to the strength of what you are proposing here.

I honestly wish you best of luck with your passion project.

@VatsaPatel
Copy link

Hi @gembancud, I am happy to sponsor any benchmarking cost that you may need :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants