-
-
Notifications
You must be signed in to change notification settings - Fork 191
Home
Welcome
This is Open-LLM-VTuber, an application that allows you to talk to (and interrupt) any LLM by voice (hands-free) locally with a Live2D talking face.
⚠️ This project is in its early stages and is currently under active development. Features are unstable, code is messy, and breaking changes will occur. The main goal of this stage is to build a minimum viable prototype using technologies that are easy to integrate.
⚠️ If you want to run this program on a server and access it remotely on your laptop, the microphone on the front end will only launch in a secure context (a.k.a. https or localhost). See MDN Web Doc. Therefore, you might want to configure https with a reverse proxy or launch the front end locally and connect to the server via websocket (untested). Open thestatic/index.html
with your browser and set the ws URL on the page.
Have ffmpeg
installed on your computer.
Python version >= 3.10, < 3.13 (there are currently dependencies installation issues in Python 3.13. If you encounter that, just go 3.12 or lower and it should work)
All of the settings are in conf.yaml
. You can (and probably will) do many things there, and there are also comments in that file explaining what those settings mean.
- Clone the repo
- [optional] Create a virtual environment like conda or venv for this project
- Install basic dependencies with
pip install -r requirements.txt
- Setup the LLM
- Setup your desired ASR (Automatic Speech Recognition)
- Setup your desired TTS (Text to Speech)
- Run it
Find a good spot on your computer and clone the repository or download the latest release.
git clone https://github.com/t41372/Open-LLM-VTuber
Nice. now go to GitHub and star this project if you haven't done so or you'll &&Eujehruedjhnoeire4939#pE$
It is optional, yet I highly recommend you to create a virtual environment for this project.
This project was developed using Python 3.10.13
. Python 3.11 is tested. Some other versions will probably work, too, but they are untested.
A Python virtual environment (venv) is a folder that contains the Python interpreter, third-party libraries, and other scripts. Venvs are isolated from other virtual environments, so changes to dependencies don't affect other virtual environments or system-wide libraries.
-- dataquest
The reason why I highly recommend you use a virtual environment for this project is that this will make your life a ton easier. This project uses a lot of dependencies, and dependency conflicts happen very often. Using a virtual environment to isolate them saves your hair.
If you don't know what conda is, we can use venv
, which is built into Python and is pretty nice.
# create a virtual environment
python -m venv open-llm-vtuber
To activate the virtual environment, run the following command:
On Windows
open-llm-vtuber\Scripts\activate
On macOS/Linux
source open-llm-vtuber/bin/activate
If you know what conda is, then you know what to do. Here is the command I personally use. If you don't know what conda is, I recommend you use venv
.
# create a conda environment in the project directory
conda create -p ./.conda python="3.10.4"
# activate this environment
conda activate ./.conda
Run the following in the root directory of this project to install the dependencies.
pip install -r requirements.txt # Run this in the project directory
You need to have Ollama or any other OpenAI-API-Compatible backend ready and running. You can use llama.cpp, vLLM, LM Studio, groq, OpenAI, and so much more.
If you want to use long-term memory with MemGPT, you will set MemGPT as your LLM backend instead of the ones mentioned above. Check out MemGPT section for more information (it's not very easy unless you already know how to run MemGPT, so I recommend you start with ollama or other OpenAI-Compatible LLM backends instead).
Prepare an LLM you like and have a running LLM inference server like ollama.
In conf.yaml
file, under the option ollama
, you can edit the configuration for all OpenAI Compatible LLM inference backend.
Here is the setting in conf.yaml
# ============== LLM Backend Settings ===================
# Provider of LLM. Choose either "ollama" or "memgpt" (or "fakellm for debug purposes")
# "ollama" for any OpenAI Compatible backend. "memgpt" requires setup
LLM_PROVIDER: "ollama"
# Ollama & OpenAI Compatible inference backend
ollama:
BASE_URL: "http://localhost:11434/v1"
LLM_API_KEY: "somethingelse"
ORGANIZATION_ID: "org_eternity"
PROJECT_ID: "project_glass"
## LLM name
MODEL: "llama3.1:latest"
# system prompt is at the very end of this file
VERBOSE: False
If you don't use LLM_API_KEY
, ORGANIZATION_ID
, and PROJECT_ID
, just leave them as it is.
If you are so excited right now that you want to try this project without voice interactions, you can set Don't change the LIVE2D
, VOICE_INPUT_ON
, and TTS_ON
to False
in the conf.yaml
to talk with the LLM by typing with no voice nor Live2D. Remember to turn them back on later on.LIVE2D
, VOICE_INPUT_ON
, and TTS_ON
options. These options were designed for CLI mode, which will be removed in the next major version v1.0.0
. The LIVE2D
options was deprecated and made useless since v0.2.0
, and after the release of v0.4.0
, users can now directly interact with text in the browser, which makes the VOICE_INPUT_ON
options useless. Using the web frontend with these options may lead to unpredictable outcomes. Luckily, those options along with the CLI mode will be removed in the next major version v1.0.0
, and the whole documentation will be rewritten, so yeah, there should be less confusion in the future.
This project supports many different speech recognition models and providers. Check out the ASR section for installation instructions.
In general, here are the steps to set up speech recognition:
- Install the dependencies
- Edit the configurations of the ASR you use in
conf.yaml
. You can usually change the language or model there if supported. - Set
ASR_MODEL
to the ASR of your choice.
As of writing, this project supports the following ASR:
-
FunASR, which support SenseVoiceSmall and some other models. (
LocalCurrently requires an internet connection for loading. Compute locally) - Faster-Whisper (Local)
- Whisper-CPP using the python binding pywhispercpp (Local, mac GPU acceleration can be configured)
- Whisper (local)
- Azure Speech Recognition (API Key required)
If you don't care it connects to the internet on launch (will be fixed in the future), I recommend FunASR with SenseVoiceSmall. It's very fast and the accuracy is pretty good.
If you want something that works offline, I recommend Faster-Whisper if you have an Nvidia GPU, and Whisper-CPP with coreML accleration if you are using macOS.
You can also use Azure Speech Recognition if you happen to have the API key.
⚠️ If you want to run this application (the server) inside a container or on a remote machine and access the webui with local device, you need to turnMIC_IN_BROWSER
to True in theconf.yaml
. There are more things you need to consider, and it's at the top of this page.
Check out TTS section for instruction of setting up the TTS you want.
In general, here are the steps to set up a text-to-speech service:
- Install the dependencies
- Edit the configurations of the TTS you use in
conf.yaml
. You can usually change the language or speakers there if supported. - Set
TTS_MODEL
to the TTS of your choice.
Here are some supported TTS as of writing:
- py3-tts (Local, it uses your system's default TTS engine)
- bark (Local, very resource-consuming)
- CosyVoice (Local, very resource-consuming)
- MeloTTS (local, fast)
- Edge TTS (online, no API key required)
- Azure Text-to-Speech (online, API Key required)
For now, if you are using live2D and everything we mentioned above, here are the steps to run the program:
- Run
server.py
- Open
localhost:12393
with your browser (default but you can change it in conf.yaml) -
Run(no longer needed)main.py
- Talk to the LLM once the Live2D model is loaded.
If you just want to talk and don't want the Live2D and browser that kind of stuff, you can just run the main.py
for cli mode.
Some related settings in conf.yaml
you might be interested:
- Turn off the live2D (and the web UI so you don't need the server.py) at
LIVE2D
- Turn off Speech Recognition and start typing in the terminal at
VOICE_INPUT_ON
- Let the mic listen in the browser instead of the terminal at
MIC_IN_BROWSER
- Turn off TTS at
TTS_ON
- Get TTS speaks everything at once at
SAY_SENTENCE_SEPARATELY
- Change/Edit persona prompt at
PERSONA_CHOICE
andDEFAULT_PERSONA_PROMPT_IN_YAML
- Change the host and port that the server is listening to at
HOST
andPORT
- and
VERBOSE
Some models will be downloaded during your first launch, which may take a while.