Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cantonese model missing word #744

Open
4 tasks done
fengping1 opened this issue Jan 23, 2025 · 3 comments
Open
4 tasks done

cantonese model missing word #744

fengping1 opened this issue Jan 23, 2025 · 3 comments
Labels
help wanted Extra attention is needed

Comments

@fengping1
Copy link

Checks

  • This template is only for usage issues encountered.
  • I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
  • I have searched for existing issues, including closed ones, and couldn't find a solution.
  • I confirm that I am using English to submit this report in order to facilitate communication.

Environment Details

tesla-v100 32G

Steps to Reproduce

  1. clone code
  2. data processing
  3. running
  4. inference

✔️ Expected Behavior

output a generated audio with right words

❌ Actual Behavior

I’m training a Cantonese F5-TTS model using 650 hours of Common Voice Cantonese data. The audio quality is quite good in zero-shot, but the model always misses a few words in any sentence. I’ve checked both the data and inference code. I’m using 4 V100 GPUs to train the model, from 150k steps to 300k steps (100 epochs), However, the problem has not improved. Does anyone know what the issue might be? Should I wait for more steps or epochs, or should I stop training? Due to limited resources, this could take a few days.

@fengping1 fengping1 added the help wanted Extra attention is needed label Jan 23, 2025
@ZhikangNiu
Copy link
Collaborator

Maybe more steps

@fengping1
Copy link
Author

thx, i'll wait longer

@Alykasym
Copy link

Using "Sample" type batches instead of "Frame" type helped me to fix the issue with missing word. The "frame" type calculation works in mysteries ways in the current code, and its size is not adjusted dynamically, and ignores half of the dataset in my case.
Try training with the "Sample" batch size type and "Max Samples" set to Zero (0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants