DINOv2 model slow CPU evaluation #2682

liamwhite · 2024-12-27T04:57:51Z

Candle is about 10x slower at evaluating this model on the CPU. I have provided a demonstration repository with all the code needed to reproduce.

Output of a typical run of python main.py:

Took 0.12951040267944336 seconds to evaluate

Output of a typical run of target/release/candle_issue_demo:

Took 1.016947847 seconds to evaluate Tensor[dims 1, 1536; f32]

This is unfortunate because loading the model from Rust is much faster than loading it from Python, and would be nice to avoid the need for a server process when running feature extraction on demand.

I tried to keep the gist of the code the same between these, but the Rust version contains two necessary alterations:

The imagenet code from the examples crate is pasted into a module (it probably should be available within the candle_transformers crate, but this is an incredibly minor issue)
The dinov2 code is not designed for the facebook safetensors model which has different parameter names; the most significant difference among these is that qkv is split up into query,key,value. This was addressed by pasting the dinov2 module from DinoV2 & Depth Anything V2: Bigger Models #2288 (c9ed473)

My system specs:
CPU: Ryzen 9 5950X
RAM: 64GB

The text was updated successfully, but these errors were encountered:

LaurentMazare · 2024-12-30T15:47:46Z

Just to give a few more timings with my Ryzen 9 7950x (32GB memory) and running the inference multiple times.

The candle code in the repo runs in 0.33s per iteration. It's weird that it's so much faster than on your box.
When activating the mkl feature in all candle crates, runtime goes down to 0.14s per iteration.
The pytorch version run takes ~0.11s per iteration.
Not sure why there is so much of a discrepancy between your box and mine. Also note that the weights are mmap'ed so the first iteration might be slower as the weights might only be copied from the disk to memory there though in practice I don't see much of a difference between iterations on my side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DINOv2 model slow CPU evaluation #2682

DINOv2 model slow CPU evaluation #2682

liamwhite commented Dec 27, 2024 •

edited

Loading

LaurentMazare commented Dec 30, 2024

DINOv2 model slow CPU evaluation #2682

DINOv2 model slow CPU evaluation #2682

Comments

liamwhite commented Dec 27, 2024 • edited Loading

LaurentMazare commented Dec 30, 2024

liamwhite commented Dec 27, 2024 •

edited

Loading