Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] #4124

Open
AimoneAndex opened this issue Jan 4, 2025 · 7 comments
Open

[Bug] #4124

AimoneAndex opened this issue Jan 4, 2025 · 7 comments
Labels
bug Something isn't working

Comments

@AimoneAndex
Copy link

Describe the bug

Never MPS?

To Reproduce

wav = tts.tts(text="Hello world!", speaker_wav="input/001.wav", language="en")

Text splitted to sentences.
['Hello world!']
Traceback (most recent call last):
File "", line 1, in
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 276, in tts
wav = self.synthesizer.tts(
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 386, in tts
outputs = self.tts_model.synthesize(
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 419, in synthesize
return self.full_inference(text, speaker_wav, language, **settings)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 480, in full_inference
(gpt_cond_latent, speaker_embedding) = self.get_conditioning_latents(
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 365, in get_conditioning_latents
speaker_embedding = self.get_speaker_embedding(audio, load_sr)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 320, in get_speaker_embedding
self.hifigan_decoder.speaker_encoder.forward(audio_16k.to(self.device), l2_norm=True)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/layers/xtts/hifigan_decoder.py", line 538, in forward
x = self.torch_spec(x)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/container.py", line 250, in forward
input = module(input)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/layers/xtts/hifigan_decoder.py", line 418, in forward
return torch.nn.functional.conv1d(x, self.filter).squeeze(1)
NotImplementedError: Output channels > 65536 not supported at the MPS device. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Expected behavior

ON MPS!

Logs

wav = tts.tts(text="Hello world!", speaker_wav="input/001.wav", language="en")
 > Text splitted to sentences.
['Hello world!']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 276, in tts
    wav = self.synthesizer.tts(
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 386, in tts
    outputs = self.tts_model.synthesize(
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 419, in synthesize
    return self.full_inference(text, speaker_wav, language, **settings)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 480, in full_inference
    (gpt_cond_latent, speaker_embedding) = self.get_conditioning_latents(
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 365, in get_conditioning_latents
    speaker_embedding = self.get_speaker_embedding(audio, load_sr)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 320, in get_speaker_embedding
    self.hifigan_decoder.speaker_encoder.forward(audio_16k.to(self.device), l2_norm=True)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/layers/xtts/hifigan_decoder.py", line 538, in forward
    x = self.torch_spec(x)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/container.py", line 250, in forward
    input = module(input)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/layers/xtts/hifigan_decoder.py", line 418, in forward
    return torch.nn.functional.conv1d(x, self.filter).squeeze(1)
NotImplementedError: Output channels > 65536 not supported at the MPS device. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Environment

MacBook Pro with macOS 14 and Apple M3 Pro

Additional context

No response

@AimoneAndex AimoneAndex added the bug Something isn't working label Jan 4, 2025
@eginhard
Copy link
Contributor

eginhard commented Jan 6, 2025

Can you try with our fork (pip install coqui-tts)? It might be fixed with more recent transformers versions.

@LudWittg
Copy link

LudWittg commented Jan 8, 2025

MPS seems to work with the fork .

Test Environment:

MacBook Pro with macOS 15.2 and Apple M2 Pro
PyTorch 2.7.0.dev20250108

@AimoneAndex
Copy link
Author

MPS seems to work with the fork .

Test Environment:

MacBook Pro with macOS 15.2 and Apple M2 Pro
PyTorch 2.7.0.dev20250108

It shows:

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
Using model: xtts
Traceback (most recent call last):
File "", line 1, in
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 74, in init
self.load_tts_model_by_name(model_name, gpu)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 177, in load_tts_model_by_name
self.synthesizer = Synthesizer(
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 109, in init
self._load_tts_from_dir(model_dir, use_cuda)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 164, in _load_tts_from_dir
self.tts_model.load_checkpoint(config, checkpoint_dir=model_dir, eval=True)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 773, in load_checkpoint
checkpoint = self.get_compatible_checkpoint_state_dict(model_path)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 714, in get_compatible_checkpoint_state_dict
checkpoint = load_fsspec(model_path, map_location=torch.device("cpu"))["model"]
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/io.py", line 54, in load_fsspec
return torch.load(f, map_location=map_location, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/serialization.py", line 1488, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
(1) In PyTorch 2.6, we changed the default value of the weights_only argument in torch.load from False to True. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
(2) Alternatively, to load with weights_only=True please check the recommended steps in the following error message.
WeightsUnpickler error: Unsupported global: GLOBAL TTS.tts.configs.xtts_config.XttsConfig was not an allowed global by default. Please use torch.serialization.add_safe_globals([XttsConfig]) or the torch.serialization.safe_globals([XttsConfig]) context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")
Traceback (most recent call last):
File "", line 1, in
NameError: name 'tts' is not defined. Did you mean: 'TTS'?

And do you know how to solve it?Thanks!

@AimoneAndex
Copy link
Author

Can you try with our fork (pip install coqui-tts)? It might be fixed with more recent transformers versions.

OK,I'd like to have a try.Is it supports MPS?

@LudWittg
Copy link

LudWittg commented Jan 8, 2025

MPS seems to work with the fork .

Test Environment:

MacBook Pro with macOS 15.2 and Apple M2 Pro
PyTorch 2.7.0.dev20250108

It shows:

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
Using model: xtts
Traceback (most recent call last):
File "", line 1, in
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 74, in init
self.load_tts_model_by_name(model_name, gpu)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 177, in load_tts_model_by_name
self.synthesizer = Synthesizer(
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 109, in init
self._load_tts_from_dir(model_dir, use_cuda)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 164, in _load_tts_from_dir
self.tts_model.load_checkpoint(config, checkpoint_dir=model_dir, eval=True)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 773, in load_checkpoint
checkpoint = self.get_compatible_checkpoint_state_dict(model_path)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 714, in get_compatible_checkpoint_state_dict
checkpoint = load_fsspec(model_path, map_location=torch.device("cpu"))["model"]
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/io.py", line 54, in load_fsspec
return torch.load(f, map_location=map_location, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/serialization.py", line 1488, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
(1) In PyTorch 2.6, we changed the default value of the weights_only argument in torch.load from False to True. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
(2) Alternatively, to load with weights_only=True please check the recommended steps in the following error message.
WeightsUnpickler error: Unsupported global: GLOBAL TTS.tts.configs.xtts_config.XttsConfig was not an allowed global by default. Please use torch.serialization.add_safe_globals([XttsConfig]) or the torch.serialization.safe_globals([XttsConfig]) context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")
Traceback (most recent call last):
File "", line 1, in
NameError: name 'tts' is not defined. Did you mean: 'TTS'?

And do you know how to solve it?Thanks!

Switching to the fork should solve all these problems.

@LudWittg
Copy link

LudWittg commented Jan 8, 2025

Can you try with our fork (pip install coqui-tts)? It might be fixed with more recent transformers versions.

OK,I'd like to have a try.Is it supports MPS?

It supports MPS (I've tested it).

@AimoneAndex
Copy link
Author

fork

OK,thank you!I'd like to have a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants