You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have done my due diligence in trying to find the answer myself.
Topic
The paper
Question
To my understanding, Mimi encode audio into semantic-acoustic token via a single tokenizer. I'd like to ask if Mimi has a discrete number of token, what the size of the cookbook is and if it's possible to be utilized in TTS application. My main idea currently is to somehow train a transformer based model to generate semantic-acoustic tokens based on input text, speaker, speaker attribute, etc.
The text was updated successfully, but these errors were encountered:
Due diligence
Topic
The paper
Question
To my understanding, Mimi encode audio into semantic-acoustic token via a single tokenizer. I'd like to ask if Mimi has a discrete number of token, what the size of the cookbook is and if it's possible to be utilized in TTS application. My main idea currently is to somehow train a transformer based model to generate semantic-acoustic tokens based on input text, speaker, speaker attribute, etc.
The text was updated successfully, but these errors were encountered: