[Bug][Feature Request] Can't quantize vits models and ONNX quantization working slower than non-quantized version. #2991
Replies: 7 comments
-
Hello @mllopartbsc |
Beta Was this translation helpful? Give feedback.
-
Hi @Nanayeb34, I've just updated the post, I hope it makes sense now. Also, I think the reason why the dynamic quantized model runs slower than the non-quantized version is because dynamic quantization produces overhead when there are a lot of CNN. In regards to the Pytorch's quantization, I still don't know the source of the error. Kind Regards. |
Beta Was this translation helpful? Give feedback.
-
@mllopartbsc I think I may have made some progress. Apparently, the quantization is failing because we can't make a deepcopy of the model parameters. I created a copy of my custom fine tuned model and made
This succeeded and I was able to quantize the model. However running inference with it produces the ff error.
It seems the quantization is not being properly done because the model state_dict keys are empty. You can find my code here and model here. The config file is here |
Beta Was this translation helpful? Give feedback.
-
Hi @Nanayeb34 thank you for your efforts. In regards to this error in inference, what would be the next steps to follow to find a solution? And how could I help with that? I'll try reproducing your quantization to see what I can find. Do you know why the state_dict keys are not being stored properly? Also, the link you sent is not working properly. Kind Regards. |
Beta Was this translation helpful? Give feedback.
-
Sorry about that @mllopartbsc . I messed up the links. I have updated it now hence it should be working. On state_dict not being stored, I am at a loss at to why. I tried out something different by quantizing the default vits model that coqui provides using your approach above and the state_dict keys were missing as well |
Beta Was this translation helpful? Give feedback.
-
Hi @Nanayeb34, from the code you provided, where exactly do you export the non-ONNX dynamically quantized model? And where do you import it back and then load it as a checkpoint for inference? |
Beta Was this translation helpful? Give feedback.
-
Hi @mllopartbsc , this is not related to your thread but from the first approach you posted how can I download the audio as a wav file that vits.inference_onnx is returning |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
Hi everyone. I was trying to quantize a vits model and I tried used two approaches. On the one hand, I've tried converting the model and then applying dynamic quantization to it. On the other hand, I've tried using Pytorch's dynamic quantization function on the model's checkpoint. The first mentioned method delivered inference runtimes more than twice as slow as normal ones, while the second method didn't work because of this error:
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment
I was wondering if anyone has been able to successfully quantize a vits model and achieved better performance results for inference.
Kind Regards.
To Reproduce
This first code tries to quantize a VITS model that had been previously converted to ONNX.
In order to run this scripts you'll need to download the config.json and model_file.pth files from this link, save them in you preferred location and change the paths.
You'll see how the quantized version runs way slower than the non-quantized version.
Here's the non-quantized script of the same model for comparison.
This third script tries to quantize a VITS model using Pytorch's dynamic quantization function
When running this script, I get this error:
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment
And I already tried changing the vits.py
In order to detach all functions from the state dict.
Expected behavior
No response
Logs
No response
Environment
Additional context
This are the specifications of the system I'm running it with.
Beta Was this translation helpful? Give feedback.
All reactions