Project Harmony.AI - Harmony Speech Engine

Harmony.AI's Speech Engine is a high performance inference engine for Open Source Speech AI.

It's goal is to serve as the backbone for cost-efficient local and self-hosted AI Speech Services, supporting a variety of Speech-related model architectures for Text-To-Speech, Speech-To-Text and Voice Conversion.

- Currently under Development -

Help is appreciated and highly welcome! See Open ToDo's below -

Core Features

OpenAI-Style APIs for Text-to-Speech, Speech-To-Text, Voice Conversion and Speech Embedding.
Multi-Model & Parallel Model Processing Support (See Models).
Toolchain-based request processing & internal re-routing between models working together (See Routing).
GPU & CPU Inference
Huggingface Integration
Interactive UI
Docker Support

Quickstart

Running with Docker

For instructions on how to set up and run Harmony Speech Engine using Docker, please refer to the Docker Setup Guide.

Running locally

1. Set up the base environment

Instructions for Setting up the base environment

1. Install Python

We recommend using a package manager like miniconda. Current version is tested with Python 3.12:

conda create -n hse python=3.12
conda activate hse

2. Install Pytorch

System	GPU	Command
Linux/WSL	NVIDIA	`pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121`
Linux/WSL	CPU only	`pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cpu`
Linux	AMD	`pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/rocm6.1`
MacOS + MPS	Any	`pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1`
Windows	NVIDIA	`pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121`
Windows	CPU only	`pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1`

The up-to-date commands can be found here: https://pytorch.org/get-started/locally/.

3. Clone Repository & Install Requirements:

git clone https://github.com/harmony-ai-solutions/harmony-speech-engine
cd harmony-speech-engine
pip install -r requirements

4. Set up NodeJS for the Frontend

conda install conda-forge::nodejs
cd frontend
npm -i

2. Check `config.yml` is correctly set up for the models you want to use (See Models).

3. Run Harmony Speech Engine from repo base directory:

python harmonyspeech/endpoints/cli.py run --host 0.0.0.0 --port 12080

Once started, you can access the API via http://127.0.0.1:12080. Swagger and ReDoc are now also available for interactive Documentation.

For more details about the API & generating OpenAPI clients, please check our API Documentation.

4. Run the Frontend

cd frontend
npm run dev

This will start a vite server accessible via http://localhost:5173/

5. If you encounter any bugs, please create a Github Issue describing the problems you're facing. We're happy to help. :-)

Introduction

The goal of this engine is to provide a reliable and easy-to-maintain service which can be used for deploying Open Source AI Speech technologies. Each of these Technologies have different setup requirements and pre-conditions, so the goal of this project is to unify these requirements in a way so these technologies can work together seamlessly.

Aside from providing a runtime for these technologies behind a unified service API, the Harmony Speech Engine also allows for recombining different technologies on-the-fly, to reduce processing duration and latency. For example, you can generate Speech using a TTS integration, and then apply additional filtering using voice conversion (TBD).

The rough Idea for this project is to become something like vLLM / Aphrodite Engine for AI Speech Inference. Significant parts of the codebase have been forked from Aphrodite engine, with a couple modifications to allow for the intended Speech related use cases. Support and Ideas for Improving this Project are very welcome.

Differences from forked Aphrodite Engine

Per-Request processing, instead of token sequence batching
Support for loading and executing multiple models in parallel
No distributed execution of single models (i.e. sharding); to reduce complexity - might be added later.
No general quantization; if quantization is supported, this will be part of the individual model config.
No Neuron Device Type Support (I don't have the means to test it properly; feel free to add support for it if you like)

Planned and availiable Integrations

The following Technologies and Features are planned to be supported. May change over time as new models and frameworks are being developed.

Open ToDo's & things that require work

Documentation
- Secondary Features:
  - Pre-/Post-Processing
  - Output Formats
- API Documentation
- Internal Re-Routing mechanics
Models and Features
- StyleTTS 2 (TTS & Voice Conversion)
- XTTS V2
- Vall-E-X
- EmotiVoice
- Silero VAD
- Input Pre-Processing
- Post-Processing / Overlays
- Output Format Customization
- Batching behaviour
- TTS-Streaming
- More comprehensive approach to internal Re-Routing of Requests
Testing & Operation
- Unit Testing & Test mocking for Key Components
- API Integration Tests
- Input Audio File Format Support
- Compatibility Testing for all APIs and Models

About Project Harmony.AI

Our goal: Elevating Human <-to-> AI Interaction beyond known boundaries.

Project Harmony.AI emerged from the idea to allow for a seamless living together between AI-driven characters and humans. Since it became obvious that a lot of technologies required for achieving this goal are not existing or still very experimental, the long term vision of Project Harmony is to establish the full set of technologies which help minimizing biological and technological barriers in Human <-to-> AI Interaction.

Our principles: Fair use and accessibility

We want to counter today's tendencies of AI development centralization at the hands of big corporations. We're pushing towards maximum transparency in our own development efforts, and aim for our software to be accessible and usable in the most democratic ways possible.

Therefore, for all our current and future software offerings, we'll perform a constant and well-educated evaluation whether we can safely open source them in parts or even completely, as long as this appears to be non-harmful towards achieving the project's main goal.

Harmony Speech Engine is being distributed under the AGPLv3 License, because A lot of the code in the module harmonyspeech has been borrowed from Aphrodite Engine. Everyone can use this software as part of their own projects without any restrictions from our side, except from restrictions derived from the nature of the licensing.

How to reach out to us

Official Website of Project Harmony.AI

If you want to collaborate or support this Project financially:

Feel free to join our Discord Server and / or subscribe to our Patreon - Even $1 helps us drive this project forward.

Harmony.AI Discord Server

Harmony.AI Patreon

If you want to use our software commercially or discuss a business or development partnership:

Contact us directly via: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
auth		auth
docker		docker
docs		docs
frontend		frontend
harmonyspeech		harmonyspeech
models		models
.dockerignore		.dockerignore
.env		.env
.env-amd		.env-amd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.gpu.yml		config.gpu.yml
config.yml		config.yml
docker-compose.amd.yml		docker-compose.amd.yml
docker-compose.nvidia.yml		docker-compose.nvidia.yml
docker-compose.yml		docker-compose.yml
requirements-common.txt		requirements-common.txt
requirements-cpu.txt		requirements-cpu.txt
requirements-cuda.txt		requirements-cuda.txt
requirements-rocm.txt		requirements-rocm.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Harmony.AI - Harmony Speech Engine

- Currently under Development -

Help is appreciated and highly welcome! See Open ToDo's below -

Core Features

Quickstart

Running with Docker

Running locally

1. Set up the base environment

1. Install Python

2. Install Pytorch

3. Clone Repository & Install Requirements:

4. Set up NodeJS for the Frontend

2. Check `config.yml` is correctly set up for the models you want to use (See Models).

3. Run Harmony Speech Engine from repo base directory:

4. Run the Frontend

5. If you encounter any bugs, please create a Github Issue describing the problems you're facing. We're happy to help. :-)

Introduction

Differences from forked Aphrodite Engine

Planned and availiable Integrations

Open ToDo's & things that require work

About Project Harmony.AI

Our goal: Elevating Human <-to-> AI Interaction beyond known boundaries.

Our principles: Fair use and accessibility

How to reach out to us

If you want to collaborate or support this Project financially:

If you want to use our software commercially or discuss a business or development partnership:

About

Releases

Packages

Contributors 2

Languages

License

harmony-ai-solutions/harmony-speech-engine

Folders and files

Latest commit

History

Repository files navigation

Project Harmony.AI - Harmony Speech Engine

- Currently under Development -

Help is appreciated and highly welcome! See Open ToDo's below -

Core Features

Quickstart

Running with Docker

Running locally

1. Set up the base environment

1. Install Python

2. Install Pytorch

3. Clone Repository & Install Requirements:

4. Set up NodeJS for the Frontend

2. Check config.yml is correctly set up for the models you want to use (See Models).

3. Run Harmony Speech Engine from repo base directory:

4. Run the Frontend

5. If you encounter any bugs, please create a Github Issue describing the problems you're facing. We're happy to help. :-)

Introduction

Differences from forked Aphrodite Engine

Planned and availiable Integrations

Open ToDo's & things that require work

About Project Harmony.AI

Our goal: Elevating Human <-to-> AI Interaction beyond known boundaries.

Our principles: Fair use and accessibility

How to reach out to us

If you want to collaborate or support this Project financially:

If you want to use our software commercially or discuss a business or development partnership:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

2. Check `config.yml` is correctly set up for the models you want to use (See Models).

Packages