xtts-v2-truss

adding updated version of xtts v2 (#375 )

Nov 5, 2024

ce272e9 · Nov 5, 2024

Name		Name	Last commit message	Last commit date
parent directory ..
model		model
README.md		README.md
config.yaml		config.yaml

README.md

XTTS V2 Truss

This repository packages TTS as a Truss.

TTS is a generative audio model for text-to-speech generation. This model takes in text and a speaker's voice as input and converts the text to speech in the voice of the speaker.

Deploying XTTS

First, clone this repository:

git clone https://github.com/basetenlabs/truss-examples/
cd xtts-v2-truss

Before deployment:

Make sure you have a Baseten account and API key.
Install the latest version of Truss: pip install --upgrade truss

With xtts-v2-truss as your working directory, you can deploy the model with:

truss push

Paste your Baseten API key if prompted.

For more information, see Truss documentation.

Invoking the model

Here are the following inputs for the model:

text: The text that needs to be converted into speech
speaker_voice: A short audio clip of a voice in the format of a base64 string
language: Abbreviation of the supported languages for TTS

Here is an example of how to invoke this model:

import base64
import sys

def wav_to_base64(file_path):
  with open(file_path, "rb") as wav_file:
    binary_data = wav_file.read()
    base64_data = base64.b64encode(binary_data)
    base64_string = base64_data.decode("utf-8")
    return base64_string

def base64_to_wav(base64_string, output_file_path):
  binary_data = base64.b64decode(base64_string)
  with open(output_file_path, "wb") as wav_file:
    wav_file.write(binary_data)

voice = wav_to_base64("/path/to/wav/file/samuel_jackson_voice.wav")
text = "Listen up, people. Life's a wild ride, and sometimes you gotta grab it by the horns and steer it where you want to go. You can't just sit around waiting for things to happen – you gotta make 'em happen. Yeah, it's gonna get tough, but that's when you dig deep, find that inner badass, and come out swinging. Remember, success ain't handed to you on a silver platter; you gotta snatch it like it owes you money. So, lace up your boots, square those shoulders, and let the world know that you're here to play, and you're playing for keeps"
data = {"text": text, "speaker_voice": voice, "language": "en"}
res = requests.post("https://model-<model-id>.api.baseten.co/development/predict", headers=headers, json=data)
res = res.json()
output = base64_to_wav(res.get('output'), "test_output.wav")

The output of the model is a base64 string as well, so you can convert it to a wav file using the base64_to_wav function.

Here is the input file for Samuel Jackson's voice:

Here is the output file for the text to speech:

In this README, the audio files are in the MP4 format because github does not allow the WAV format. However, for the truss itself using the WAV format is recommended.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

xtts-v2-truss

xtts-v2-truss

README.md

XTTS V2 Truss

Deploying XTTS

Invoking the model

Files

xtts-v2-truss

Directory actions

More options

Directory actions

More options

Latest commit

History

xtts-v2-truss

Folders and files

parent directory

README.md

XTTS V2 Truss

Deploying XTTS

Invoking the model