Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Input text cannot be None in attn = generate_path #266

Open
cod3r0k opened this issue Jan 18, 2025 · 1 comment
Open

[Bug] Input text cannot be None in attn = generate_path #266

cod3r0k opened this issue Jan 18, 2025 · 1 comment
Labels
bug Something isn't working question Further information is requested VITS Anything related to VITS/YourTTS/Fairseq models

Comments

@cod3r0k
Copy link

cod3r0k commented Jan 18, 2025

Describe the bug

When I want to train multispeaker VITs model, after epoch 2, it occurs an error as below:

`
....
--> TIME: 2025-01-18 05:51:44 -- STEP: 1141/1144 -- GLOBAL_STEP: 228275
| > loss_disc: 2.55525541305542 (2.3107323108484175)
| > loss_disc_real_0: 0.07618683576583862 (0.1314041224111602)
| > loss_disc_real_1: 0.2294345200061798 (0.18530447144423717)
| > loss_disc_real_2: 0.14496399462223053 (0.21591699230472428)
| > loss_disc_real_3: 0.2127830535173416 (0.2248999754539924)
| > loss_disc_real_4: 0.17633846402168274 (0.2232837798131024)
| > loss_disc_real_5: 0.1818462312221527 (0.21921113943134243)
| > loss_0: 2.55525541305542 (2.3107323108484175)
| > grad_norm_0: tensor(19.7260, device='cuda:0') (tensor(25.7604, device='cuda:0'))
| > loss_gen: 2.512892723083496 (2.6231823042753417)
| > loss_kl: 2.0453853607177734 (2.0651548769680894)
| > loss_feat: 7.6552958488464355 (8.288802446972165)
| > loss_mel: 22.356651306152344 (22.661884306189645)
| > loss_duration: 1.5326560735702515 (1.469221028108329)
| > amp_scaler: 128.0 (160.98159509202455)
| > loss_1: 36.10288619995117 (37.10824498536824)
| > grad_norm_1: tensor(128.5965, device='cuda:0') (tensor(290.4952, device='cuda:0'))
| > current_lr_0: 0.0001997002061640866
| > current_lr_1: 0.0001997002061640866
| > step_time: 3.5176 (2.388011707745551)
| > loader_time: 0.021 (0.007880230309355165)

Evaluation:
...
....
--> STEP: 21
| > loss_disc: 2.4334442615509033 (2.429446754001436)
| > loss_disc_real_0: 0.12144245952367783 (0.13465816066378644)
| > loss_disc_real_1: 0.16504763066768646 (0.20793592716966355)
| > loss_disc_real_2: 0.2902933359146118 (0.2909027777966999)
| > loss_disc_real_3: 0.25102439522743225 (0.24391093992051624)
| > loss_disc_real_4: 0.3473176658153534 (0.24767487815448216)
| > loss_disc_real_5: 0.28220587968826294 (0.26010643371513914)
| > loss_0: 2.4334442615509033 (2.429446754001436)
| > loss_gen: 2.770388603210449 (2.59090789159139)
| > loss_kl: 1.5279167890548706 (2.3010178974696567)
| > loss_feat: 9.003446578979492 (7.517528613408406)
| > loss_mel: 25.329435348510742 (22.98664746965681)
| > loss_duration: 1.6732535362243652 (1.4985651345480056)
| > loss_1: 40.304439544677734 (36.894666853405184)

--> STEP: 22
| > loss_disc: 2.23518705368042 (2.420616767623208)
| > loss_disc_real_0: 0.07352343201637268 (0.13187930936163125)
| > loss_disc_real_1: 0.22937196493148804 (0.20891029252247376)
| > loss_disc_real_2: 0.3216579854488373 (0.29230074178088794)
| > loss_disc_real_3: 0.22647826373577118 (0.24311854554848236)
| > loss_disc_real_4: 0.3893234431743622 (0.25411344929174945)
| > loss_disc_real_3: 0.22647826373577118 (0.24311854554848236) [0/1848]
| > loss_disc_real_4: 0.3893234431743622 (0.25411344929174945)
| > loss_disc_real_5: 0.24885831773281097 (0.25959515571594244)
| > loss_0: 2.23518705368042 (2.420616767623208)
| > loss_gen: 3.2849197387695312 (2.622453884644942)
| > loss_kl: 1.5750868320465088 (2.2680210308595137)
| > loss_feat: 10.77985668182373 (7.6658162528818306)
| > loss_mel: 24.645959854125977 (23.062070759859953)
| > loss_duration: 1.6656222343444824 (1.506158639084209)
| > loss_1: 41.951446533203125 (37.12452047521418)

| > Synthesizing test sentences.
Input text cannot be None
! Run is kept in /home/dev/workspace/TTS/recipes/ljspeech/multi_speaker/speaker_multispeaker_fromscratch-January-17-2025_08+17PM-dbf1a08a
Traceback (most recent call last):
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/trainer/trainer.py", line 1833, in fit
self._fit()
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/trainer/trainer.py", line 1789, in _fit
self.test_run()
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/trainer/trainer.py", line 1698, in test_run
test_outputs = self.model.test_run(self.training_assets)
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/dev/workspace/TTS/TTS/tts/models/vits.py", line 1442, in test_run
wav, alignment, _, _ = synthesis(
File "/home/dev/workspace/TTS/TTS/tts/utils/synthesis.py", line 221, in synthesis
outputs = run_model_torch(
File "/home/dev/workspace/TTS/TTS/tts/utils/synthesis.py", line 53, in run_model_torch
outputs = _func(
File "/home/dev/anaconda3/envs/tts2/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/dev/workspace/TTS/TTS/tts/models/vits.py", line 1150, in inference
attn = generate_path(w_ceil.squeeze(1), attn_mask.squeeze(1).transpose(1, 2))
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)`

To Reproduce

Run vits multispeaker for 14 spk

Expected behavior

No response

Logs

Environment

git+https://github.com/coqui-ai/TTS.git@dbf1a08a0d4e47fdad6172e433eeb34bc6b13b4e#egg=TTS

Additional context

No response

@cod3r0k cod3r0k added the bug Something isn't working label Jan 18, 2025
@cod3r0k cod3r0k changed the title [Bug] [Bug] Input text cannot be None in attn = generate_path Jan 18, 2025
@eginhard
Copy link
Member

Please only report an issue here if you're using the fork's code, the original package hasn't been updated in over a year now. Can you try again with the fork and provide enough details to reproduce (training recipe, config, environment).

@eginhard eginhard added question Further information is requested VITS Anything related to VITS/YourTTS/Fairseq models labels Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested VITS Anything related to VITS/YourTTS/Fairseq models
Projects
None yet
Development

No branches or pull requests

2 participants