You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
print("Inference...")
t0 = time.time()
chunks = model.inference_stream(
"It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
"en",
gpt_cond_latent,
speaker_embedding
)
wav_chuncks = []
for i, chunk in enumerate(chunks):
if i == 0:
print(f"Time to first chunck: {time.time() - t0}")
print(f"Received chunk {i} of audio length {chunk.shape[-1]}")
wav_chuncks.append(chunk)
wav = torch.cat(wav_chuncks, dim=0)
torchaudio.save("xtts_streaming.wav", wav.squeeze().unsqueeze(0).cpu(), 24000)
Expected behavior
It should output generated audio
Logs
/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py:138: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:818: UserWarning: `return_dict_in_generate` is NOT set to `True`, but `output_hidden_states` is. When `return_dict_in_generate` is not `True`, `output_hidden_states` is ignored.
warnings.warn(
Traceback (most recent call last):
File "/Users/zhz/Desktop/paradigm/conversation_playground/xtts/xtts_streaming.py", line 31, in<module>fori, chunkin enumerate(chunks):
File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
response = gen.send(None)
File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 652, in inference_stream
gpt_generator = self.gpt.get_generator(
File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 603, in get_generator
return self.gpt_inference.generate_stream(
File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py", line 186, in generate
model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/transformers/generation/utils.py", line 585, in _prepare_attention_mask_for_generation
pad_token_id = generation_config._pad_token_tensor
AttributeError: 'int' object has no attribute '_pad_token_tensor'
@eginhard Thanks! Can you share the installation steps? I'm on Ubuntu 22.04 and when I run the streaming script I receive the error: cannot import name 'log' from 'torch.distributed.elastic.agent.server.api'
Describe the bug
Tried the streaming code at https://docs.coqui.ai/en/latest/models/xtts.html#streaming-manually
with use_deepspeed=False on CPU. Got error: AttributeError: 'int' object has no attribute '_pad_token_tensor'
To Reproduce
import os
import time
import torch
import torchaudio
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
print("Loading model...")
config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", use_deepspeed=False)
#model.cuda()
print("Computing speaker latents...")
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(audio_path=["reference.wav"])
print("Inference...")
t0 = time.time()
chunks = model.inference_stream(
"It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
"en",
gpt_cond_latent,
speaker_embedding
)
wav_chuncks = []
for i, chunk in enumerate(chunks):
if i == 0:
print(f"Time to first chunck: {time.time() - t0}")
print(f"Received chunk {i} of audio length {chunk.shape[-1]}")
wav_chuncks.append(chunk)
wav = torch.cat(wav_chuncks, dim=0)
torchaudio.save("xtts_streaming.wav", wav.squeeze().unsqueeze(0).cpu(), 24000)
Expected behavior
It should output generated audio
Logs
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: