Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mimi structure and utilization #221

Open
1 task done
RinRin-32 opened this issue Feb 7, 2025 · 0 comments
Open
1 task done

Mimi structure and utilization #221

RinRin-32 opened this issue Feb 7, 2025 · 0 comments
Labels
question Further information is requested

Comments

@RinRin-32
Copy link

Due diligence

  • I have done my due diligence in trying to find the answer myself.

Topic

The paper

Question

To my understanding, Mimi encode audio into semantic-acoustic token via a single tokenizer. I'd like to ask if Mimi has a discrete number of token, what the size of the cookbook is and if it's possible to be utilized in TTS application. My main idea currently is to somehow train a transformer based model to generate semantic-acoustic tokens based on input text, speaker, speaker attribute, etc.

@RinRin-32 RinRin-32 added the question Further information is requested label Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant