Skip to content

本地部署音视频转文字区分说话人+LLM总结 - Moded from FunClip - Offline video/auduio Transcription + SD + LLM conclusion

License

Notifications You must be signed in to change notification settings

MotorBottle/Private-ASR

Repository files navigation

Private-ASR

This project is modded from FunClip project, built with ASR (Automatic Speech Recognition), speaker identification, SRT editing, and LLM-based summarization capabilities. It integrates Gradio as the user interface, providing an interactive and easy-to-use platform.

简体中文 / English

本项目基于开源项目 FunClip 进行修改,集成了自动语音识别 (ASR)、说话人分离、SRT 字幕编辑以及基于 LLM 的总结功能。项目使用 Gradio 提供了一个直观易用的用户界面。


Update: Added support for GPU inference for both Docker/local deployment. Docker-GPU deployment Check This


📜 Credits

This project builds upon the open-source FunClip by Alibaba DAMO Academy. I modded the functionality to include:

  • ASR Summarization using LLMs (OpenAI GPT, custom API).
  • Dynamic SRT Replacement with speaker mapping.
  • Deployment Ready using Docker for production environments.

🎯 Features

  1. Automatic Speech Recognition (ASR):

    • Supports video and audio inputs.
    • Outputs text and SRT subtitles.
  2. Speaker Identification (SD):

    • Identifies and differentiates speakers in multi-speaker audio/video.
  3. SRT Subtitle Editing:

    • Replace speaker identifiers with user-defined names.
  4. LLM Summarization:

    • Summarize ASR results using GPT-based models.
    • Allows custom API configurations.
  5. Deployment Options:

    • Lightweight Docker container for production.
    • Python environment for development/testing.

🛠 Requirements

System(2 Ways to Deploy)

  • Docker (for containerized deployment)
  • Python 3.9+ (for manual deployment)

Dependencies

See the requirements.txt file


🚀 Deployment

1. Docker Deployment

Build the Docker Image

Run the following command to build the Docker image:

docker build -t audio-processor:latest .

Deploy with Docker Compose

Use the following docker-compose.yml file for deployment:

version: '3.8'

services:
  audio-processor:
    image: audio-processor:latest  # The image you built
    container_name: audio-processor
    ports:
      - "7860:7860"
    volumes:
      - ./.env:/app/.env  # Map the .env file
    working_dir: /app
    restart: unless-stopped

Run the deployment:

docker-compose up -d

The Gradio interface will be available at:
http://localhost:7860


2. Python Deployment

Setup Environment

  1. Clone the repository:

    git clone https://github.com/MotorBottle/Audio-Processor.git
    cd audio-processor
  2. Install dependencies:

    python3 -m venv .venv
    source .venv/bin/activate
    pip install --no-cache-dir -r requirements.txt
  3. Ensure FFmpeg is installed(for Mac use brew):

    sudo apt-get update
    sudo apt-get install -y ffmpeg

Run the Application

Use the following command:

python funclip/launch.py --listen

The Gradio interface will be available at:
http://localhost:7860

Default user name: motor

Default passwd: admin


⚙️ Environment Configuration

All credentials and API configurations can be stored in a .env file.

Example .env file:

USERNAME=motor
PASSWORD=admin
OPENAI_API_KEY=your_openai_key
OPENAI_API_BASE=https://your-custom-api.com

🎥 Usage

  1. Upload audio or video files.
  2. Perform ASR Recognition or Speaker Differentiation.
  3. Edit speaker names in the generated SRT subtitles.
  4. Use the LLM Summarization feature to analyze and summarize the ASR text.

🔗 Contributions & License

This project is released under the MIT License. Contributions are welcome!

For the original FunClip repository, visit:
FunClip on GitHub


About

本地部署音视频转文字区分说话人+LLM总结 - Moded from FunClip - Offline video/auduio Transcription + SD + LLM conclusion

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages