Skip to content

Latest commit

 

History

History
51 lines (47 loc) · 1.78 KB

PROJECT_PLANS.md

File metadata and controls

51 lines (47 loc) · 1.78 KB

Project Plan

Final Product

  • Press to talk
  • Transcribe speech to text using pywhispercpp
    • Use pywhispercpp for Whisper.cpp integration
    • Support Apple silicon chips (M1, M2, ...)
    • Support CUDA for GPU acceleration
    • Support real-time transcription via WebSocket
  • Use LangChain to parse the text and clean up the text
    • E.g. "Ehh what is the emm wheather like in SF? no, Salt Lake City" -> "What is the weather like in Salt Lake City?"
    • Support multiple LLM providers

Milestones

  • Speech to text setup
    • Install pywhispercpp with CoreML support for Apple Silicon
    • Basic transcription test
  • Basic git commit hook to check if the code is formatted
    • Format the code
  • Voice Processing Server
    • FastAPI server setup
    • Audio upload endpoint
    • Streaming audio support
  • LangChain Integration
    • Test OpenAI API Key loading
    • Chain configuration
    • Text processing pipeline
    • Response formatting
    • Support other LLMs (DeepSeek, Gemini, ...)
    • Local LLM support
  • Press to talk
    • Key listener
    • Capture a hot key regardless of the current application
    • Put the final result in the system clipboard
    • Show an icon when voice control is active
  • Command line interface
    • Add a command line interface using click
  • Web UI
    • Streamlit UI
    • Visualize (input audio), transcription, and output text
    • Visualize transcription history
    • Prompt config
    • LangChain config
  • Context Management
    • System prompt configuration
    • Chat history persistence
  • Documentation
    • API Documentation
    • Usage examples and guides