[Bug] #4124

AimoneAndex · 2025-01-04T17:01:49Z

Describe the bug

Never MPS?

To Reproduce

wav = tts.tts(text="Hello world!", speaker_wav="input/001.wav", language="en")

Text splitted to sentences.
['Hello world!']
Traceback (most recent call last):
File "", line 1, in
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 276, in tts
wav = self.synthesizer.tts(
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 386, in tts
outputs = self.tts_model.synthesize(
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 419, in synthesize
return self.full_inference(text, speaker_wav, language, **settings)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 480, in full_inference
(gpt_cond_latent, speaker_embedding) = self.get_conditioning_latents(
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 365, in get_conditioning_latents
speaker_embedding = self.get_speaker_embedding(audio, load_sr)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 320, in get_speaker_embedding
self.hifigan_decoder.speaker_encoder.forward(audio_16k.to(self.device), l2_norm=True)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/layers/xtts/hifigan_decoder.py", line 538, in forward
x = self.torch_spec(x)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/container.py", line 250, in forward
input = module(input)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/layers/xtts/hifigan_decoder.py", line 418, in forward
return torch.nn.functional.conv1d(x, self.filter).squeeze(1)
NotImplementedError: Output channels > 65536 not supported at the MPS device. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Expected behavior

ON MPS!

Logs

wav = tts.tts(text="Hello world!", speaker_wav="input/001.wav", language="en")
 > Text splitted to sentences.
['Hello world!']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 276, in tts
    wav = self.synthesizer.tts(
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 386, in tts
    outputs = self.tts_model.synthesize(
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 419, in synthesize
    return self.full_inference(text, speaker_wav, language, **settings)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 480, in full_inference
    (gpt_cond_latent, speaker_embedding) = self.get_conditioning_latents(
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 365, in get_conditioning_latents
    speaker_embedding = self.get_speaker_embedding(audio, load_sr)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 320, in get_speaker_embedding
    self.hifigan_decoder.speaker_encoder.forward(audio_16k.to(self.device), l2_norm=True)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/layers/xtts/hifigan_decoder.py", line 538, in forward
    x = self.torch_spec(x)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/container.py", line 250, in forward
    input = module(input)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/layers/xtts/hifigan_decoder.py", line 418, in forward
    return torch.nn.functional.conv1d(x, self.filter).squeeze(1)
NotImplementedError: Output channels > 65536 not supported at the MPS device. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Environment

MacBook Pro with macOS 14 and Apple M3 Pro

Additional context

No response

The text was updated successfully, but these errors were encountered:

eginhard · 2025-01-06T13:31:13Z

Can you try with our fork (pip install coqui-tts)? It might be fixed with more recent transformers versions.

LudWittg · 2025-01-08T15:59:11Z

MPS seems to work with the fork .

Test Environment:

MacBook Pro with macOS 15.2 and Apple M2 Pro
PyTorch 2.7.0.dev20250108

AimoneAndex · 2025-01-08T16:23:12Z

MPS seems to work with the fork .

Test Environment:

MacBook Pro with macOS 15.2 and Apple M2 Pro
PyTorch 2.7.0.dev20250108

It shows:

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
Using model: xtts
Traceback (most recent call last):
File "", line 1, in
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 74, in init
self.load_tts_model_by_name(model_name, gpu)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 177, in load_tts_model_by_name
self.synthesizer = Synthesizer(
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 109, in init
self._load_tts_from_dir(model_dir, use_cuda)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 164, in _load_tts_from_dir
self.tts_model.load_checkpoint(config, checkpoint_dir=model_dir, eval=True)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 773, in load_checkpoint
checkpoint = self.get_compatible_checkpoint_state_dict(model_path)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 714, in get_compatible_checkpoint_state_dict
checkpoint = load_fsspec(model_path, map_location=torch.device("cpu"))["model"]
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/io.py", line 54, in load_fsspec
return torch.load(f, map_location=map_location, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/serialization.py", line 1488, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
(1) In PyTorch 2.6, we changed the default value of the weights_only argument in torch.load from False to True. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
(2) Alternatively, to load with weights_only=True please check the recommended steps in the following error message.
WeightsUnpickler error: Unsupported global: GLOBAL TTS.tts.configs.xtts_config.XttsConfig was not an allowed global by default. Please use torch.serialization.add_safe_globals([XttsConfig]) or the torch.serialization.safe_globals([XttsConfig]) context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")
Traceback (most recent call last):
File "", line 1, in
NameError: name 'tts' is not defined. Did you mean: 'TTS'?

And do you know how to solve it?Thanks!

AimoneAndex · 2025-01-08T16:24:08Z

Can you try with our fork (pip install coqui-tts)? It might be fixed with more recent transformers versions.

OK,I'd like to have a try.Is it supports MPS?

LudWittg · 2025-01-08T16:50:51Z

MPS seems to work with the fork .

Test Environment:

MacBook Pro with macOS 15.2 and Apple M2 Pro
PyTorch 2.7.0.dev20250108

It shows:

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
Using model: xtts
Traceback (most recent call last):
File "", line 1, in
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 74, in init
self.load_tts_model_by_name(model_name, gpu)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/api.py", line 177, in load_tts_model_by_name
self.synthesizer = Synthesizer(
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 109, in init
self._load_tts_from_dir(model_dir, use_cuda)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 164, in _load_tts_from_dir
self.tts_model.load_checkpoint(config, checkpoint_dir=model_dir, eval=True)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 773, in load_checkpoint
checkpoint = self.get_compatible_checkpoint_state_dict(model_path)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 714, in get_compatible_checkpoint_state_dict
checkpoint = load_fsspec(model_path, map_location=torch.device("cpu"))["model"]
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/TTS/utils/io.py", line 54, in load_fsspec
return torch.load(f, map_location=map_location, **kwargs)
File "/opt/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/serialization.py", line 1488, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
(1) In PyTorch 2.6, we changed the default value of the weights_only argument in torch.load from False to True. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
(2) Alternatively, to load with weights_only=True please check the recommended steps in the following error message.
WeightsUnpickler error: Unsupported global: GLOBAL TTS.tts.configs.xtts_config.XttsConfig was not an allowed global by default. Please use torch.serialization.add_safe_globals([XttsConfig]) or the torch.serialization.safe_globals([XttsConfig]) context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")
Traceback (most recent call last):
File "", line 1, in
NameError: name 'tts' is not defined. Did you mean: 'TTS'?

And do you know how to solve it?Thanks!

Switching to the fork should solve all these problems.

LudWittg · 2025-01-08T16:52:05Z

Can you try with our fork (pip install coqui-tts)? It might be fixed with more recent transformers versions.

OK,I'd like to have a try.Is it supports MPS?

It supports MPS (I've tested it).

AimoneAndex · 2025-01-09T00:07:19Z

fork

OK,thank you!I'd like to have a try.

AimoneAndex added the bug Something isn't working label Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] #4124

[Bug] #4124

AimoneAndex commented Jan 4, 2025

eginhard commented Jan 6, 2025

LudWittg commented Jan 8, 2025

AimoneAndex commented Jan 8, 2025

AimoneAndex commented Jan 8, 2025

LudWittg commented Jan 8, 2025

LudWittg commented Jan 8, 2025

AimoneAndex commented Jan 9, 2025

[Bug] #4124

[Bug] #4124

Comments

AimoneAndex commented Jan 4, 2025

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

eginhard commented Jan 6, 2025

LudWittg commented Jan 8, 2025

AimoneAndex commented Jan 8, 2025

AimoneAndex commented Jan 8, 2025

LudWittg commented Jan 8, 2025

LudWittg commented Jan 8, 2025

AimoneAndex commented Jan 9, 2025