Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about multilingual support in F5-TTS models #745

Closed
4 tasks done
ThomasLWang opened this issue Jan 24, 2025 · 1 comment
Closed
4 tasks done

Question about multilingual support in F5-TTS models #745

ThomasLWang opened this issue Jan 24, 2025 · 1 comment
Labels
question Further information is requested

Comments

@ThomasLWang
Copy link

Checks

  • This template is only for question, not feature requests or bug reports.
  • I have thoroughly reviewed the project documentation and read the related paper(s).
  • I have searched for existing issues, including closed ones, no similar questions.
  • I confirm that I am using English to submit this report in order to facilitate communication.

Question details

Hello,

I noticed in SHARED.md that there are separate models for different languages like German, French, and Japanese, and each model only generates one specific language.

If I want to build a translation model (e.g., supporting translation between 10 languages), is it possible to combine multiple languages into a single model? For example, can we create one model that supports multiple languages such as Chinese, English, French, German, Spanish, and Arabic?

Additionally, if combining multiple languages into one model is feasible, how should the training be approached? Should the training data for all languages simply be combined and used to train the model together, or are there any other specific techniques or considerations for training such a multilingual model?

Looking forward to your insights. Thank you!

@ThomasLWang ThomasLWang added the question Further information is requested label Jan 24, 2025
@Alykasym
Copy link

Not sure about "translation", but you can create multilingual model easily. The base model supports English+Chinese language.

Should the training data for all languages simply be combined and used to train the model together, or are there any other specific techniques or considerations for training such a multilingual model?

You can just merge all the languages you want into one dataset and train. No need to label. The model will learn to differentiate between the languages based on letters and grammatical patterns. Once you finish training, when you do inference, you can just write a sentence in any language you trained on and it will generate in that language.

P.S:
If you train multilingual model (for example, English + Spanish + Japanese), don't expect it to imitate reference audio from different language. For example, if you want to generate Spanish speech, and you give English audio as a reference audio, then it will generate Spanish speech with heavy English accent.

@SWivid SWivid closed this as completed Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants