Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📚 READ: Architecture Overview & Roadmap #277

Open
4 of 14 tasks
mmirman opened this issue Sep 18, 2023 · 4 comments
Open
4 of 14 tasks

📚 READ: Architecture Overview & Roadmap #277

mmirman opened this issue Sep 18, 2023 · 4 comments
Labels
discussion no bounty just discuss documentation Improvements or additions to documentation feat/enhancement New feature or request HIGH-PRIORITY

Comments

@mmirman
Copy link
Contributor

mmirman commented Sep 18, 2023

High-Level Architecture Overview

LLM-VM Architecture

  • Implicit Agents 🔧🕵️: The Anarchy LLM-VM can be set up to use external tools through our agents such as REBEL just by supplying tool descriptions!

  • Inference Optimization 🚄: The Anarchy LLM-VM is optimized from the agent level all the way to assembly on known LLM architectures to get the most bang for your buck. With state-of-the-art batching, sparse inference and quantization, distillation, and multi-level colocation, we aim to provide the fastest framework available.

  • Task Auto-Optimization 🚅: The Anarchy LLM-VM will analyze your use cases for repetitive tasks where it can activate student-teacher distillation to train a super-efficient small model from a larger more general model without losing accuracy. It can furthermore take advantage of data-synthesis techniques to improve results.

  • Library Callable 📚: We provide a library that can be used from any Python codebase directly.

  • HTTP Endpoints 🕸️: We provide an HTTP standalone server to handle completion requests.

  • Live Data Augmentation 📊: You will be able to provide a live updating data set and the Anarchy LLM-VM will fine-tune your models or work with a vector DB to provide up-to-date information with citations

  • Web Playground 🛝: You will be able to run the Anarchy LLM-VM and test its outputs from the browser.

  • Load-Balancing and Orchestration ⚖️: If you have multiple LLMs or providers you'd like to utilize, you will be able to hand them to the Anarchy LLM-VM to automatically figure out which to work with and when to optimize your uptime or your costs

  • Output Templating 🤵: You can ensure that the LLM only outputs data in specific formats and fills in variables from a template with either regular expressions, LMQL, or OpenAI's template language

  • Persistent Stateful Memory 📝: The Anarchy LLM-VM can remember a user's conversation history and react accordingly

  • Smart batching 🗞️: Handle multiple calls at the same time from different levels of the llm-vm

  • Speculative Preemptive Sampling 🔮: Use a small LLM to predict outputs of a larger LLM and don't fall back to the large one unless sampling is getting bad.

  • Token Streaming 🚰: Get a hook for a constantly updating supply of tokens!

  • Streamed Backtracking 🔙: Didn't like one output? Look at others! Efficiently.

Projected features

@mmirman mmirman pinned this issue Sep 18, 2023
@mmirman mmirman changed the title Architecture Overview & Roadmap READ: Architecture Overview & Roadmap Sep 18, 2023
@mmirman mmirman added documentation Improvements or additions to documentation feat/enhancement New feature or request discussion no bounty just discuss labels Sep 18, 2023
@mmirman mmirman changed the title READ: Architecture Overview & Roadmap 📝 READ: Architecture Overview & Roadmap Sep 18, 2023
@mmirman mmirman changed the title 📝 READ: Architecture Overview & Roadmap 📚 READ: Architecture Overview & Roadmap Sep 18, 2023
@mmirman
Copy link
Contributor Author

mmirman commented Sep 18, 2023

Note: Some of these have existing issues attached to them! This needs cleanup and organization (this is high-priority).

@Avr-Arnold
Copy link
Contributor

I think that some features build on others. For example, Output Templating can depend on Persistent Stateful Memory by remembering the template. I have used LangChain and the way they do their templating is by injecting it into the users input as a header for every call. Having Memory can eliminate this. Additionally, the backtracking can also be extended to the Persistent State not just outputs. In conclusion, I believe Persistent Stateful Memory should go before Output Templating and perhaps at the same time as backtracking.

@mmirman
Copy link
Contributor Author

mmirman commented Oct 3, 2023

the way they do their templating is by injecting it into the users input as a header for every call

We want strong guarantees with our templating. The ability to sampling constrained by regex for example

@VictorOdede
Copy link
Contributor

Updated Short Term LLM-VM Roadmap

LLM-VM (1)

@anarchy-ai anarchy-ai deleted a comment from VictorOdede Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion no bounty just discuss documentation Improvements or additions to documentation feat/enhancement New feature or request HIGH-PRIORITY
Projects
None yet
Development

No branches or pull requests

3 participants