You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am able to start training but right before the first epoch I see the log:
DataLoader initialization
| > Tokenizer:
| > add_blank: True
| > use_eos_bos: False
| > use_phonemes: False
| > Number of instances : 979
| > Preprocessing samples
| > Max text length: 94
| > Min text length: 15
| > Avg text length: 50.786795048143055
|
| > Max audio length: 223959.0
| > Min audio length: 46143.0
| > Avg audio length: 144398.36176066025
| > Num. instances discarded samples: 252
| > Batch group size: 0.
My first thought was to check the min/mix permited length of audio samples on my config.json file.
My minimal desired length is of 16.000 (about 1 second) but after editing it in the config.json file it does not seem to have any effect in the code since the log still prints Min audio length: 46143.0.
What is even funnier is that 46143.0 is about 2 seconds and I only have 77 audio samples smaller than 2 seconds so I'm not really sure what happens for the network to discart 252 samples.
Any insights are very welcomed!
To Reproduce
To reproduce the error you need to install coqui-ai TTS and set up a custom dataset for finetuning (ljspeech format).
After that run: python train_tts.py --config_path tts_models--pt--cv--vits/config.json --restore_path tts_models--pt--cv--vits/model_file.pth.tar
Expected behavior
No response
Logs
No response
Environment
TTS 0.20.6
torch gpu 2.1.0
cuda 12.0
nvidia driver 527.41
Additional context
No response
The text was updated successfully, but these errors were encountered:
What is the audio format that you are using for training?
I think the issues is that the current audio len computation only supports 16 bits wave files. If it is a different audio format it computes the length wrongly. I fixed this issue on the PR #3092
@erogol I think we should merge #3092 as soon as possible to avoid issues like this one.
Describe the bug
Hello, I'm finetuning the VITS model on my 979 audio dataset.
I have followed this tutorial (https://tts.readthedocs.io/en/latest/formatting_your_dataset.html) on how to format my dataset and it looks just fine.
I am able to start training but right before the first epoch I see the log:
My first thought was to check the min/mix permited length of audio samples on my config.json file.
My minimal desired length is of 16.000 (about 1 second) but after editing it in the config.json file it does not seem to have any effect in the code since the log still prints Min audio length: 46143.0.
What is even funnier is that 46143.0 is about 2 seconds and I only have 77 audio samples smaller than 2 seconds so I'm not really sure what happens for the network to discart 252 samples.
Any insights are very welcomed!
To Reproduce
To reproduce the error you need to install coqui-ai TTS and set up a custom dataset for finetuning (ljspeech format).
After that run: python train_tts.py --config_path tts_models--pt--cv--vits/config.json --restore_path tts_models--pt--cv--vits/model_file.pth.tar
Expected behavior
No response
Logs
No response
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: