This project is modded from FunClip project, built with ASR (Automatic Speech Recognition), speaker identification, SRT editing, and LLM-based summarization capabilities. It integrates Gradio as the user interface, providing an interactive and easy-to-use platform.
本项目基于开源项目 FunClip 进行修改,集成了自动语音识别 (ASR)、说话人分离、SRT 字幕编辑以及基于 LLM 的总结功能。项目使用 Gradio 提供了一个直观易用的用户界面。
Update: Added support for GPU inference for both Docker/local deployment. Docker-GPU deployment Check This
This project builds upon the open-source FunClip by Alibaba DAMO Academy. I modded the functionality to include:
- ASR Summarization using LLMs (OpenAI GPT, custom API).
- Dynamic SRT Replacement with speaker mapping.
- Deployment Ready using Docker for production environments.
-
Automatic Speech Recognition (ASR):
- Supports video and audio inputs.
- Outputs text and SRT subtitles.
-
Speaker Identification (SD):
- Identifies and differentiates speakers in multi-speaker audio/video.
-
SRT Subtitle Editing:
- Replace speaker identifiers with user-defined names.
-
LLM Summarization:
- Summarize ASR results using GPT-based models.
- Allows custom API configurations.
-
Deployment Options:
- Lightweight Docker container for production.
- Python environment for development/testing.
- Docker (for containerized deployment)
- Python 3.9+ (for manual deployment)
See the requirements.txt
file
Run the following command to build the Docker image:
docker build -t audio-processor:latest .
Use the following docker-compose.yml
file for deployment:
version: '3.8'
services:
audio-processor:
image: audio-processor:latest # The image you built
container_name: audio-processor
ports:
- "7860:7860"
volumes:
- ./.env:/app/.env # Map the .env file
working_dir: /app
restart: unless-stopped
Run the deployment:
docker-compose up -d
The Gradio interface will be available at:
http://localhost:7860
-
Clone the repository:
git clone https://github.com/MotorBottle/Audio-Processor.git cd audio-processor
-
Install dependencies:
python3 -m venv .venv source .venv/bin/activate pip install --no-cache-dir -r requirements.txt
-
Ensure FFmpeg is installed(for Mac use brew):
sudo apt-get update sudo apt-get install -y ffmpeg
Use the following command:
python funclip/launch.py --listen
The Gradio interface will be available at:
http://localhost:7860
Default user name: motor
Default passwd: admin
All credentials and API configurations can be stored in a .env
file.
Example .env
file:
USERNAME=motor
PASSWORD=admin
OPENAI_API_KEY=your_openai_key
OPENAI_API_BASE=https://your-custom-api.com
- Upload audio or video files.
- Perform ASR Recognition or Speaker Differentiation.
- Edit speaker names in the generated SRT subtitles.
- Use the LLM Summarization feature to analyze and summarize the ASR text.
This project is released under the MIT License. Contributions are welcome!
For the original FunClip repository, visit:
FunClip on GitHub