You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| > Synthesizing test sentences.
Input text cannot be None
! Run is kept in /home/dev/workspace/TTS/recipes/ljspeech/multi_speaker/speaker_multispeaker_fromscratch-January-17-2025_08+17PM-dbf1a08a
Traceback (most recent call last):
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/trainer/trainer.py", line 1789, in _fit
self.test_run()
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/trainer/trainer.py", line 1698, in test_run
test_outputs = self.model.test_run(self.training_assets)
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/dev/workspace/TTS/TTS/tts/models/vits.py", line 1442, in test_run
wav, alignment, _, _ = synthesis(
File "/home/dev/workspace/TTS/TTS/tts/utils/synthesis.py", line 221, in synthesis
outputs = run_model_torch(
File "/home/dev/workspace/TTS/TTS/tts/utils/synthesis.py", line 53, in run_model_torch
outputs = _func(
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/dev/workspace/TTS/TTS/tts/models/vits.py", line 1150, in inference
attn = generate_path(w_ceil.squeeze(1), attn_mask.squeeze(1).transpose(1, 2))
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)`
Please only report an issue here if you're using the fork's code, the original package hasn't been updated in over a year now. Can you try again with the fork and provide enough details to reproduce (training recipe, config, environment).
Describe the bug
When I want to train multispeaker VITs model, after epoch 2, it occurs an error as below:
`
....
--> TIME: 2025-01-18 05:51:44 -- STEP: 1141/1144 -- GLOBAL_STEP: 228275
| > loss_disc: 2.55525541305542 (2.3107323108484175)
| > loss_disc_real_0: 0.07618683576583862 (0.1314041224111602)
| > loss_disc_real_1: 0.2294345200061798 (0.18530447144423717)
| > loss_disc_real_2: 0.14496399462223053 (0.21591699230472428)
| > loss_disc_real_3: 0.2127830535173416 (0.2248999754539924)
| > loss_disc_real_4: 0.17633846402168274 (0.2232837798131024)
| > loss_disc_real_5: 0.1818462312221527 (0.21921113943134243)
| > loss_0: 2.55525541305542 (2.3107323108484175)
| > grad_norm_0: tensor(19.7260, device='cuda:0') (tensor(25.7604, device='cuda:0'))
| > loss_gen: 2.512892723083496 (2.6231823042753417)
| > loss_kl: 2.0453853607177734 (2.0651548769680894)
| > loss_feat: 7.6552958488464355 (8.288802446972165)
| > loss_mel: 22.356651306152344 (22.661884306189645)
| > loss_duration: 1.5326560735702515 (1.469221028108329)
| > amp_scaler: 128.0 (160.98159509202455)
| > loss_1: 36.10288619995117 (37.10824498536824)
| > grad_norm_1: tensor(128.5965, device='cuda:0') (tensor(290.4952, device='cuda:0'))
| > current_lr_0: 0.0001997002061640866
| > current_lr_1: 0.0001997002061640866
| > step_time: 3.5176 (2.388011707745551)
| > loader_time: 0.021 (0.007880230309355165)
Evaluation:
...
....
--> STEP: 21
| > loss_disc: 2.4334442615509033 (2.429446754001436)
| > loss_disc_real_0: 0.12144245952367783 (0.13465816066378644)
| > loss_disc_real_1: 0.16504763066768646 (0.20793592716966355)
| > loss_disc_real_2: 0.2902933359146118 (0.2909027777966999)
| > loss_disc_real_3: 0.25102439522743225 (0.24391093992051624)
| > loss_disc_real_4: 0.3473176658153534 (0.24767487815448216)
| > loss_disc_real_5: 0.28220587968826294 (0.26010643371513914)
| > loss_0: 2.4334442615509033 (2.429446754001436)
| > loss_gen: 2.770388603210449 (2.59090789159139)
| > loss_kl: 1.5279167890548706 (2.3010178974696567)
| > loss_feat: 9.003446578979492 (7.517528613408406)
| > loss_mel: 25.329435348510742 (22.98664746965681)
| > loss_duration: 1.6732535362243652 (1.4985651345480056)
| > loss_1: 40.304439544677734 (36.894666853405184)
--> STEP: 22
| > loss_disc: 2.23518705368042 (2.420616767623208)
| > loss_disc_real_0: 0.07352343201637268 (0.13187930936163125)
| > loss_disc_real_1: 0.22937196493148804 (0.20891029252247376)
| > loss_disc_real_2: 0.3216579854488373 (0.29230074178088794)
| > loss_disc_real_3: 0.22647826373577118 (0.24311854554848236)
| > loss_disc_real_4: 0.3893234431743622 (0.25411344929174945)
| > loss_disc_real_3: 0.22647826373577118 (0.24311854554848236) [0/1848]
| > loss_disc_real_4: 0.3893234431743622 (0.25411344929174945)
| > loss_disc_real_5: 0.24885831773281097 (0.25959515571594244)
| > loss_0: 2.23518705368042 (2.420616767623208)
| > loss_gen: 3.2849197387695312 (2.622453884644942)
| > loss_kl: 1.5750868320465088 (2.2680210308595137)
| > loss_feat: 10.77985668182373 (7.6658162528818306)
| > loss_mel: 24.645959854125977 (23.062070759859953)
| > loss_duration: 1.6656222343444824 (1.506158639084209)
| > loss_1: 41.951446533203125 (37.12452047521418)
| > Synthesizing test sentences.
Input text cannot be None
! Run is kept in /home/dev/workspace/TTS/recipes/ljspeech/multi_speaker/speaker_multispeaker_fromscratch-January-17-2025_08+17PM-dbf1a08a
Traceback (most recent call last):
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/trainer/trainer.py", line 1789, in _fit
self.test_run()
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/trainer/trainer.py", line 1698, in test_run
test_outputs = self.model.test_run(self.training_assets)
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/dev/workspace/TTS/TTS/tts/models/vits.py", line 1442, in test_run
wav, alignment, _, _ = synthesis(
File "/home/dev/workspace/TTS/TTS/tts/utils/synthesis.py", line 221, in synthesis
outputs = run_model_torch(
File "/home/dev/workspace/TTS/TTS/tts/utils/synthesis.py", line 53, in run_model_torch
outputs = _func(
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/dev/workspace/TTS/TTS/tts/models/vits.py", line 1150, in inference
attn = generate_path(w_ceil.squeeze(1), attn_mask.squeeze(1).transpose(1, 2))
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)`
To Reproduce
Run vits multispeaker for 14 spk
Expected behavior
No response
Logs
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: