- Press to talk
- Transcribe speech to text using pywhispercpp
- Use pywhispercpp for Whisper.cpp integration
- Support Apple silicon chips (M1, M2, ...)
- Support CUDA for GPU acceleration
- Support real-time transcription via WebSocket
- Use LangChain to parse the text and clean up the text
- E.g. "Ehh what is the emm wheather like in SF? no, Salt Lake City" -> "What is the weather like in Salt Lake City?"
- Support multiple LLM providers
- Speech to text setup
- Install pywhispercpp with CoreML support for Apple Silicon
- Basic transcription test
- Basic git commit hook to check if the code is formatted
- Format the code
- Voice Processing Server
- FastAPI server setup
- Audio upload endpoint
- Streaming audio support
- LangChain Integration
- Test OpenAI API Key loading
- Chain configuration
- Text processing pipeline
- Response formatting
- Support other LLMs (DeepSeek, Gemini, ...)
- Local LLM support
- Press to talk
- Key listener
- Capture a hot key regardless of the current application
- Put the final result in the system clipboard
- Show an icon when voice control is active
- Command line interface
- Add a command line interface using
click
- Add a command line interface using
- Web UI
- Streamlit UI
- Visualize (input audio), transcription, and output text
- Visualize transcription history
- Prompt config
- LangChain config
- Context Management
- System prompt configuration
- Chat history persistence
- Documentation
- API Documentation
- Usage examples and guides