You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Candle is about 10x slower at evaluating this model on the CPU. I have provided a demonstration repository with all the code needed to reproduce.
Output of a typical run of python main.py:
Took 0.12951040267944336 seconds to evaluate
Output of a typical run of target/release/candle_issue_demo:
Took 1.016947847 seconds to evaluate Tensor[dims 1, 1536; f32]
This is unfortunate because loading the model from Rust is much faster than loading it from Python, and would be nice to avoid the need for a server process when running feature extraction on demand.
I tried to keep the gist of the code the same between these, but the Rust version contains two necessary alterations:
The imagenet code from the examples crate is pasted into a module (it probably should be available within the candle_transformers crate, but this is an incredibly minor issue)
Just to give a few more timings with my Ryzen 9 7950x (32GB memory) and running the inference multiple times.
The candle code in the repo runs in 0.33s per iteration. It's weird that it's so much faster than on your box.
When activating the mkl feature in all candle crates, runtime goes down to 0.14s per iteration.
The pytorch version run takes ~0.11s per iteration.
Not sure why there is so much of a discrepancy between your box and mine. Also note that the weights are mmap'ed so the first iteration might be slower as the weights might only be copied from the disk to memory there though in practice I don't see much of a difference between iterations on my side.
Candle is about 10x slower at evaluating this model on the CPU. I have provided a demonstration repository with all the code needed to reproduce.
Output of a typical run of
python main.py
:Output of a typical run of
target/release/candle_issue_demo
:This is unfortunate because loading the model from Rust is much faster than loading it from Python, and would be nice to avoid the need for a server process when running feature extraction on demand.
I tried to keep the gist of the code the same between these, but the Rust version contains two necessary alterations:
My system specs:
CPU: Ryzen 9 5950X
RAM: 64GB
The text was updated successfully, but these errors were encountered: