Mimi structure and utilization #221

RinRin-32 · 2025-02-07T01:58:57Z

Due diligence

I have done my due diligence in trying to find the answer myself.

Topic

The paper

Question

To my understanding, Mimi encode audio into semantic-acoustic token via a single tokenizer. I'd like to ask if Mimi has a discrete number of token, what the size of the cookbook is and if it's possible to be utilized in TTS application. My main idea currently is to somehow train a transformer based model to generate semantic-acoustic tokens based on input text, speaker, speaker attribute, etc.

RinRin-32 added the question Further information is requested label Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mimi structure and utilization #221

Mimi structure and utilization #221

RinRin-32 commented Feb 7, 2025

Mimi structure and utilization #221

Mimi structure and utilization #221

Comments

RinRin-32 commented Feb 7, 2025

Due diligence

Topic

Question