I trained several TTS models here( Tacotron2-DDC, glowTTS) and HifGAN but all of them is generating noise without speech. #2397

ahmedalbahnasawi · 2023-03-07T11:34:04Z

ahmedalbahnasawi
Mar 7, 2023

I'm dealing with very clean non-English dataset. I mapped the text to phoneme using my G2P model and also managed to write my formatter. samples of my metadata.csv file:
`
001_003_001|fT TIWVDUC T TIWFVC|fT TIWVDUC T TIWFVC

001_004_001|VGSCeC RDQVC b bFUC|VGSCeC RDQVC b bFUC
`
Each letter represent a phoneme in my characters list which contains upper and lower English character and ' space' is word separator. I was able to get nice audios from https://github.com/TensorSpeech/TensorFlowTTS repo but struggles to generate audio files more than 11 secs.
So many people recommend Tacotron2-DDC which can naturally align a 1 minute and 49 second input.
When I train the model the model is not able to generate anything only large electric noise wav .
During inference I debugged the workflow and found that all melspectrogram values are negative.

mel_postnet_spec = outputs["outputs"]["model_outputs"][0].detach().cpu().numpy()
print(mel_postnet_spec)
[[ -8.747214   -8.551337   -8.900395  ...  -9.947634   -9.654191
  -10.103686 ]
 [ -5.901767   -5.3775234  -4.7628665 ...  -7.14029    -6.765738
   -7.39282  ]
 [ -5.324728   -4.6910863  -3.9720397 ...  -6.370176   -5.9355555
   -6.572888 ]

I'm pretty sure that nothing wrong with Hifigan training as its not dealing with text.
Do you think i have to separate each letter by space and use '-' as word separator and put 'eos' token in every sentence # . example :
`
001_003_001|f T - T I W V D U C - T - T I W F V C|f T - T I W V D U C - T - T I W F V C #

001_004_001|V G S C e C - R D Q V C - b - b F U C|V G S C e C - R D Q V C - b - b F U C #
`
Many Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I trained several TTS models here( Tacotron2-DDC, glowTTS) and HifGAN but all of them is generating noise without speech. #2397

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

I trained several TTS models here( Tacotron2-DDC, glowTTS) and HifGAN but all of them is generating noise without speech. #2397

ahmedalbahnasawi Mar 7, 2023

Replies: 0 comments

ahmedalbahnasawi
Mar 7, 2023