Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Streaming inference does not work #4118

Open
1640675651 opened this issue Dec 31, 2024 · 4 comments
Open

[Bug] Streaming inference does not work #4118

1640675651 opened this issue Dec 31, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@1640675651
Copy link

Describe the bug

Tried the streaming code at https://docs.coqui.ai/en/latest/models/xtts.html#streaming-manually
with use_deepspeed=False on CPU. Got error: AttributeError: 'int' object has no attribute '_pad_token_tensor'

To Reproduce

import os
import time
import torch
import torchaudio
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

print("Loading model...")
config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", use_deepspeed=False)
#model.cuda()

print("Computing speaker latents...")
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(audio_path=["reference.wav"])

print("Inference...")
t0 = time.time()
chunks = model.inference_stream(
"It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
"en",
gpt_cond_latent,
speaker_embedding
)

wav_chuncks = []
for i, chunk in enumerate(chunks):
if i == 0:
print(f"Time to first chunck: {time.time() - t0}")
print(f"Received chunk {i} of audio length {chunk.shape[-1]}")
wav_chuncks.append(chunk)
wav = torch.cat(wav_chuncks, dim=0)
torchaudio.save("xtts_streaming.wav", wav.squeeze().unsqueeze(0).cpu(), 24000)

Expected behavior

It should output generated audio

Logs

/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py:138: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:818: UserWarning: `return_dict_in_generate` is NOT set to `True`, but `output_hidden_states` is. When `return_dict_in_generate` is not `True`, `output_hidden_states` is ignored.
  warnings.warn(
Traceback (most recent call last):
  File "/Users/zhz/Desktop/paradigm/conversation_playground/xtts/xtts_streaming.py", line 31, in <module>
    for i, chunk in enumerate(chunks):
  File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
    response = gen.send(None)
  File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 652, in inference_stream
    gpt_generator = self.gpt.get_generator(
  File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/TTS/tts/layers/xtts/gpt.py", line 603, in get_generator
    return self.gpt_inference.generate_stream(
  File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/TTS/tts/layers/xtts/stream_generator.py", line 186, in generate
    model_kwargs["attention_mask"] = self._prepare_attention_mask_for_generation(
  File "/Users/zhz/miniconda3/envs/xtts/lib/python3.10/site-packages/transformers/generation/utils.py", line 585, in _prepare_attention_mask_for_generation
    pad_token_id = generation_config._pad_token_tensor
AttributeError: 'int' object has no attribute '_pad_token_tensor'

Environment

'{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": null
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.5.1",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Darwin",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "arm",
        "python": "3.10.16",
        "version": "Darwin Kernel Version 24.1.0: Thu Oct 10 21:06:23 PDT 2024; root:xnu-11215.41.3~3/RELEASE_ARM64_T8132"
    }
}

Additional context

No response

@1640675651 1640675651 added the bug Something isn't working label Dec 31, 2024
@sinangokce
Copy link

I'm facing the exact same issue as well.

@eginhard
Copy link
Contributor

You can use our fork (via pip install coqui-tts), it works with recent versions of transformers. This repo is not updated anymore.

@sinangokce
Copy link

sinangokce commented Dec 31, 2024

@eginhard Thanks! Can you share the installation steps? I'm on Ubuntu 22.04 and when I run the streaming script I receive the error: cannot import name 'log' from 'torch.distributed.elastic.agent.server.api'

@sinangokce
Copy link

I found out that the deepspeed version 0.10.3 documented at https://coqui-tts.readthedocs.io/en/latest/models/xtts.html caused this issue. I installed the version 0.14.4 as proposed at huggingface/alignment-handbook#180. It solved the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants