YourTTS voice conversion for mortals #1084
Unanswered
jreus
asked this question in
General Q&A
Replies: 3 comments 5 replies
-
Hey @jreus |
Beta Was this translation helpful? Give feedback.
3 replies
-
The structure is generated without predetermining the device |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Dear coqui/TTS team, thank you for your fantastic work on this project. :-) My expertise with TTS and deep learning is rather modest, so please excuse if I am asking ignorant questions ...
I'm working on creating custom voice TTS models using coqui. Until now it seems like the best approach to do this is to create a custom labeled dataset (text + speech snippets) of around ~1hour, and then fine tune using a model pre-trained on a voice that is similar to the one you are aiming to create. (maybe I'm wrong about this)
I really like the prosody quality of tacotron2 with GST, so that has been the model I have been studying the most. But now with the release of coqui 0.5.0 we have the pre-trained multi-speaker/multi-lingual YourTTS model, which seems to be capable of fine-tuning with much less training data, and even zero-shot voice conversion. Amazing!
However, I am having great difficulty replicating the process of the YourTTS colab demos in the latest coqui 0.5.0.
I've tried my best to get a similar thing happening in a basic script using the pre-trained YourTTS model, yet I keep getting the mysterious error below, which, looking into
TTS/tts/models/vits.py
I'm still struggling to quite understand why it's complaining about this not being a multi-speaker model.Any thoughts/ideas how to proceed?
Here's my script... which hopefully can serve as a reference if I can get it working.
Beta Was this translation helpful? Give feedback.
All reactions