Question about multilingual support in F5-TTS models #745

ThomasLWang · 2025-01-24T06:24:10Z

Checks

This template is only for question, not feature requests or bug reports.
I have thoroughly reviewed the project documentation and read the related paper(s).
I have searched for existing issues, including closed ones, no similar questions.
I confirm that I am using English to submit this report in order to facilitate communication.

Question details

Hello,

I noticed in SHARED.md that there are separate models for different languages like German, French, and Japanese, and each model only generates one specific language.

If I want to build a translation model (e.g., supporting translation between 10 languages), is it possible to combine multiple languages into a single model? For example, can we create one model that supports multiple languages such as Chinese, English, French, German, Spanish, and Arabic?

Additionally, if combining multiple languages into one model is feasible, how should the training be approached? Should the training data for all languages simply be combined and used to train the model together, or are there any other specific techniques or considerations for training such a multilingual model?

Looking forward to your insights. Thank you!

Alykasym · 2025-01-25T01:33:26Z

Not sure about "translation", but you can create multilingual model easily. The base model supports English+Chinese language.

Should the training data for all languages simply be combined and used to train the model together, or are there any other specific techniques or considerations for training such a multilingual model?

You can just merge all the languages you want into one dataset and train. No need to label. The model will learn to differentiate between the languages based on letters and grammatical patterns. Once you finish training, when you do inference, you can just write a sentence in any language you trained on and it will generate in that language.

P.S:
If you train multilingual model (for example, English + Spanish + Japanese), don't expect it to imitate reference audio from different language. For example, if you want to generate Spanish speech, and you give English audio as a reference audio, then it will generate Spanish speech with heavy English accent.

ThomasLWang added the question Further information is requested label Jan 24, 2025

SWivid closed this as completed Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about multilingual support in F5-TTS models #745

Question about multilingual support in F5-TTS models #745

ThomasLWang commented Jan 24, 2025

Alykasym commented Jan 25, 2025

Question about multilingual support in F5-TTS models #745

Question about multilingual support in F5-TTS models #745

Comments

ThomasLWang commented Jan 24, 2025

Checks

Question details

Alykasym commented Jan 25, 2025