Skip to content

Latest commit

 

History

History
169 lines (126 loc) · 3.53 KB

README.md

File metadata and controls

169 lines (126 loc) · 3.53 KB

Whisper Chain

Whisper Chain Logo

Overview

Typing is boring, let's use voice to speed up your workflow. This project combines:

  • Real-time speech recognition using Whisper.cpp
  • Transcription cleanup using LangChain
  • Global hotkey support for voice control
  • Automatic clipboard integration for the cleaned transcription

Requirements

  • Python 3.8+
  • OpenAI API Key
  • For MacOS:
    • ffmpeg (for audio processing)
    • portaudio (for audio capture)

Installation

  1. Install system dependencies (MacOS):
# Install ffmpeg and portaudio using Homebrew
brew install ffmpeg portaudio
  1. Install the project:
pip install whisperchain

Configuration

WhisperChain will look for configuration in the following locations:

  1. Environment variables
  2. .env file in the current directory
  3. ~/.whisperchain/.env file

On first run, if no configuration is found, you will be prompted to enter your OpenAI API key. The key will be saved in ~/.whisperchain/.env for future use.

You can also manually set your OpenAI API key in any of these ways:

# Option 1: Environment variable
export OPENAI_API_KEY=your-api-key-here

# Option 2: Create .env file in current directory
echo "OPENAI_API_KEY=your-api-key-here" > .env

# Option 3: Create global config
mkdir -p ~/.whisperchain
echo "OPENAI_API_KEY=your-api-key-here" > ~/.whisperchain/.env

Usage

  1. Start the application:
# Run with default settings
whisperchain

# Run with custom configuration
whisperchain --config config.json

# Override specific settings
whisperchain --port 8080 --hotkey "<ctrl>+<alt>+t" --model "large" --debug
  1. Use the global hotkey (<ctrl>+<alt>+r by default. <ctrl>+<option>+r on MacOS):
    • Press and hold to start recording
    • Speak your text
    • Release to stop recording
    • The cleaned transcription will be copied to your clipboard automatically
    • Paste (Ctrl+V) to paste the transcription

Development

Streamlit UI

streamlit run src/whisperchain/ui/streamlit_app.py

If there is an error in the Streamlit UI, you can run the following command to kill all running Streamlit processes:

lsof -ti :8501 | xargs kill -9

Running Tests

Install test dependencies:

pip install -e ".[test]"

Run tests:

pytest tests/

Run tests with microphone input:

# Run specific microphone test
TEST_WITH_MIC=1 pytest tests/test_stream_client.py -v -k test_stream_client_with_real_mic

# Run all tests including microphone test
TEST_WITH_MIC=1 pytest tests/

Building the project

python -m build
pip install .

Publishing to PyPI

python -m build
twine upload --repository pypi dist/*

License

LICENSE

Acknowledgments

Architecture

graph TB
    subgraph "Client Options"
        K[Key Listener]
        A[Audio Stream]
        C[Clipboard]
    end

    subgraph "Streamlit Web UI :8501"
        WebP[Prompt]
        WebH[History]
    end

    subgraph "FastAPI Server :8000"
        WS[WebSocket /stream]
        W[Whisper Model]
        LC[LangChain Processor]
        H[History]
    end

    K -->|"Hot Key"| A
    A -->|"Audio Stream"| WS
    WS --> W
    W --> LC
    WebP --> LC
    LC --> C
    LC --> H
    H --> WebH
Loading