How do I get embedding and use it in different parts of code? #3812

Egor-oop · 2024-07-04T10:07:06Z

Egor-oop
Jul 4, 2024

I've been building simple project with xtts-v2 model. While building it I noticed one thing when I use tts_to_file method that every time I call it, I provide speaker_wav and I assume that model is being trained or creates embedding, which takes time, to perform with voice I want. So I think it's better to create embedding and use it many times in code.
I been digging in source code and found two parts of code where embedding for the voice is being created:

in TTS.utils.synthesizer.Synthesizer.tts()

outputs = self.tts_model.synthesize(
    text=sen,
    config=self.tts_config,
    speaker_id=speaker_name,
    voice_dirs=self.voice_dir,
    d_vector=speaker_embedding,
    speaker_wav=speaker_wav,
    language=language_name,
    **kwargs,
 )

in TTS.tts.models.xtts.Xtts.full_inference()

(gpt_cond_latent, speaker_embedding) = self.get_conditioning_latents(
    audio_path=ref_audio_path,
    gpt_cond_len=gpt_cond_len,
    gpt_cond_chunk_len=gpt_cond_chunk_len,
    max_ref_length=max_ref_len,
    sound_norm_refs=sound_norm_refs,
)

How can I implement my idea?

Answered by Egor-oop

Jul 4, 2024

My question got resolved. Here is an example code:

import os
import torch
import torchaudio
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

# /Users/egorgulido/Library/"Application Support"/tts/tts_models--multilingual--multi-dataset--xtts_v2

xtts_path = '/Users/egorgulido/Library/Application Support/tts/tts_models--multilingual--multi-dataset--xtts_v2'
print("Loading model...")
config = XttsConfig()
config.load_json(xtts_path + '/config.json')
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir=xtts_path, use_deepspeed=False)
# model.cuda()

print("Computing speaker latents...")
gpt_cond_latent, speaker_embedding = m…

View full answer

Egor-oop · 2024-07-04T11:40:00Z

Egor-oop
Jul 4, 2024
Author

My question got resolved. Here is an example code:

import os
import torch
import torchaudio
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

# /Users/egorgulido/Library/"Application Support"/tts/tts_models--multilingual--multi-dataset--xtts_v2

xtts_path = '/Users/egorgulido/Library/Application Support/tts/tts_models--multilingual--multi-dataset--xtts_v2'
print("Loading model...")
config = XttsConfig()
config.load_json(xtts_path + '/config.json')
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir=xtts_path, use_deepspeed=False)
# model.cuda()

print("Computing speaker latents...")
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(audio_path=["audio/yegor_en.wav"])

print("Inference...")
print('First')
out = model.inference(
    "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
    "en",
    gpt_cond_latent,
    speaker_embedding,
    temperature=0.7
)
torchaudio.save("output.wav", torch.tensor(out["wav"]).unsqueeze(0), 24000)

print('Second')
out = model.inference(
    "It doesn't bother me! But what's your name?",
    "en",
    gpt_cond_latent,
    speaker_embedding,
    temperature=0.7
)
torchaudio.save("output2.wav", torch.tensor(out["wav"]).unsqueeze(0), 24000)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I get embedding and use it in different parts of code? #3812

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How do I get embedding and use it in different parts of code? #3812

Egor-oop Jul 4, 2024

Replies: 1 comment

Egor-oop Jul 4, 2024 Author

Egor-oop
Jul 4, 2024

Egor-oop
Jul 4, 2024
Author