Project Plan

Final Product

Press to talk
Transcribe speech to text using pywhispercpp
- Use pywhispercpp for Whisper.cpp integration
- Support Apple silicon chips (M1, M2, ...)
- Support CUDA for GPU acceleration
- Support real-time transcription via WebSocket
Use LangChain to parse the text and clean up the text
- E.g. "Ehh what is the emm wheather like in SF? no, Salt Lake City" -> "What is the weather like in Salt Lake City?"
- Support multiple LLM providers

Speech to text setup
- Install pywhispercpp with CoreML support for Apple Silicon
- Basic transcription test
Basic git commit hook to check if the code is formatted
- Format the code
Voice Processing Server
- FastAPI server setup
- Audio upload endpoint
- Streaming audio support
LangChain Integration
- Test OpenAI API Key loading
- Chain configuration
- Text processing pipeline
- Response formatting
- Support other LLMs (DeepSeek, Gemini, ...)
- Local LLM support
Press to talk
- Key listener
- Capture a hot key regardless of the current application
- Put the final result in the system clipboard
- Show an icon when voice control is active
Command line interface
- Add a command line interface using click
Web UI
- Streamlit UI
- Visualize (input audio), transcription, and output text
- Visualize transcription history
- Prompt config
- LangChain config
Context Management
- System prompt configuration
- Chat history persistence
Documentation
- API Documentation
- Usage examples and guides