You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have done my due diligence in trying to find the answer myself.
Topic
The PyTorch implementation
Question
Hello!
First of all, congrats! I've been doing some research about open-source speech-to-speech models and yours is by far the most natural one – I'm really excited to see your upcoming developments!
My question is about some high latency I'm experiencing on a L4 GPU when I start the server with python -m moshi.server on a GCP VM instance with a L4 GPU. On the README.md, you state that Moshi achieves a theoretical latency of 160ms (80ms for the frame size of Mimi + 80ms of acoustic delay), with a practical overall latency as low as 200ms on an **L4 GPU**.
As you can see in the image below I'm experiencing latencies up to 11ms. The latency starts to increase as the conversation progresses and I reached the 11ms at about 1min and 42s of conversation.
Do you know what I'm doing wrong?
Note: I'm still a noob in these topics, but very excited and eager to learn!
Thank you in advance!
The text was updated successfully, but these errors were encountered:
That's unexpected, our web infra moshi.chat is running on L4 and has been all good. I suppose that the model is properly running on the GPU as otherwise the stats would be even worse. Any chance that there is something else running on the server? You may want to try out script/moshi_benchmark.py to get some stats for the case where everything takes place locally - this will help knowing whether the issue is that the model is not running in real-time vs some network hickups.
It throws me the error below when run python3 scripts/moshi_benchmark.py. I need to check why that is happening.
loading mimi
mimi loaded
loading moshi
Traceback (most recent call last):
File "/home/brunovaz/speech-lms/moshi/scripts/moshi_benchmark.py", line 66, in <module>
lm = loaders.get_moshi_lm(args.moshi_weight, args.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/brunovaz/speech-lms/venv/lib/python3.11/site-packages/moshi/models/loaders.py", line 285, in get_moshi_lm
model = LMModel(
^^^^^^^^
TypeError: moshi.models.lm.LMModel() argument after ** must be a mapping, not str
Nonetheless, if I use rust with cargo run --features cuda --bin moshi-backend -r -- --config moshi-backend/config.json standalone the latencies on the L4 GPU are around 300ms – 500ms.
Due diligence
Topic
The PyTorch implementation
Question
Hello!
First of all, congrats! I've been doing some research about open-source speech-to-speech models and yours is by far the most natural one – I'm really excited to see your upcoming developments!
My question is about some high latency I'm experiencing on a L4 GPU when I start the server with
python -m moshi.server
on a GCP VM instance with a L4 GPU. On the README.md, you state thatMoshi achieves a theoretical latency of 160ms (80ms for the frame size of Mimi + 80ms of acoustic delay), with a practical overall latency as low as 200ms on an **L4 GPU**.
As you can see in the image below I'm experiencing latencies up to 11ms. The latency starts to increase as the conversation progresses and I reached the 11ms at about 1min and 42s of conversation.
Do you know what I'm doing wrong?
Thank you in advance!
The text was updated successfully, but these errors were encountered: