How do I get embedding and use it in different parts of code? #3812
Answered
by
Egor-oop
Egor-oop
asked this question in
General Q&A
-
I've been building simple project with xtts-v2 model. While building it I noticed one thing when I use
outputs = self.tts_model.synthesize(
text=sen,
config=self.tts_config,
speaker_id=speaker_name,
voice_dirs=self.voice_dir,
d_vector=speaker_embedding,
speaker_wav=speaker_wav,
language=language_name,
**kwargs,
)
(gpt_cond_latent, speaker_embedding) = self.get_conditioning_latents(
audio_path=ref_audio_path,
gpt_cond_len=gpt_cond_len,
gpt_cond_chunk_len=gpt_cond_chunk_len,
max_ref_length=max_ref_len,
sound_norm_refs=sound_norm_refs,
) How can I implement my idea? |
Beta Was this translation helpful? Give feedback.
Answered by
Egor-oop
Jul 4, 2024
Replies: 1 comment
-
My question got resolved. Here is an example code: import os
import torch
import torchaudio
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
# /Users/egorgulido/Library/"Application Support"/tts/tts_models--multilingual--multi-dataset--xtts_v2
xtts_path = '/Users/egorgulido/Library/Application Support/tts/tts_models--multilingual--multi-dataset--xtts_v2'
print("Loading model...")
config = XttsConfig()
config.load_json(xtts_path + '/config.json')
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir=xtts_path, use_deepspeed=False)
# model.cuda()
print("Computing speaker latents...")
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(audio_path=["audio/yegor_en.wav"])
print("Inference...")
print('First')
out = model.inference(
"It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
"en",
gpt_cond_latent,
speaker_embedding,
temperature=0.7
)
torchaudio.save("output.wav", torch.tensor(out["wav"]).unsqueeze(0), 24000)
print('Second')
out = model.inference(
"It doesn't bother me! But what's your name?",
"en",
gpt_cond_latent,
speaker_embedding,
temperature=0.7
)
torchaudio.save("output2.wav", torch.tensor(out["wav"]).unsqueeze(0), 24000) |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Egor-oop
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
My question got resolved. Here is an example code: