Questions about training time, training set sizes and expectations #3791

yosuba · 2024-06-14T06:18:26Z

yosuba
Jun 14, 2024

Hi there, I've been trudging my way through this, me being relatively inexperienced with coding and TTS in general, trying to get somewhere. My main goal is to end up with a custom model that produces audio which sounds like the speaker in my sample files.

I have about 800 1 to 5 second long wavs with transcribed with corresponding metadata file for training. I attempted to train a WaveRNN vocoder and ran ~700 epochs (14 hours on a GTX1080) but whenever I went to test the .pth on some text, the output was just static noise.

Is my result expected after 14h training? How many epochs would one expect to have to train before getting legible output?
Is the data sample size too small to get anywhere with this?
Do I have to train both a spectogram model and a vocoder on the same dataset?
When I test my .pth, why does it take several minutes to synthesize audio (a few words of dialog) while one of the built-in models is almost instantaneous?

Thanks!

yosuba · 2024-06-18T05:09:38Z

yosuba
Jun 18, 2024
Author

Just to update on this:
I've trained a Univnet vocoder on the same data set for 1000 epochs and tested with the ljspeech Tacotron2-DDC, Glow-TTS and Speedy-speech spectogram models with similar results as my original post. Both vocoders output noise when trying to synthesize speech from text, regardless of which spectogram model is chosen.

One sounds like static, the other is more of grating high pitched sound:
https://www.dropbox.com/scl/fi/3zuzkrgr7w7bgfymdwwmq/Tacotron-2-DDC-WaveRNN-Test.wav?rlkey=uqs5staz5dt66wz80dnwdh7ni&dl=0
https://www.dropbox.com/scl/fi/58jslbf5kz23uk65f18rn/GlowTTS-Univnet-Vocoder-Test.wav?rlkey=cu5oac3ejvlc384zgcb00drxm&dl=0

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about training time, training set sizes and expectations #3791

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Questions about training time, training set sizes and expectations #3791

yosuba Jun 14, 2024

Replies: 1 comment

yosuba Jun 18, 2024 Author

yosuba
Jun 14, 2024

yosuba
Jun 18, 2024
Author