Revert #79 #192

ikawrakow · 2025-02-07T09:57:09Z

While testing potential improvements of IQ1_S_R4 quantization, I ran into NaNs while running a DeepSeek-Lite perplexity calculation. I did a grep -r on a folder with many big files while running the calculation and suddenly I got a NaN PPL. I repeated the calculation without doing anything else at the same time and the NaN did not happen. I then ran with 32 threads on a 16-core system and was able to reliably get a NaN at some random chunk.

This means there is a race.

The race was most likely introduced in #79 (avoid repeating already done quantizations of activations). I honestly do not understand why there could be a race, or even less do I understand why it would only happen for DeepSeek-Lite quantized with IQ1_S_R4. I have done countless runs since #79 and never observed anything suspicious.

Either way, this PR reverts #79. After doing so, there aren't any NaNs no matter how busy I make the system while running DeepSeek-Lite inference. Hopefully this will also fix the NaNs @saood06 gets with IQ1_S_R4 quantized DeepSeek-R1 (see discussion in #185).

This reverts commit 0bf4d99.

Kawrakow added 2 commits February 7, 2025 11:31

Revert "Do not quantize activations if not necessary (#79)"

4daff2f

This reverts commit 0bf4d99.

Fixed compilation after revert

df226f3

ikawrakow mentioned this pull request Feb 7, 2025

IQ1_S_R4: better 1.5 bpw quants #185

Merged

ikawrakow merged commit 6d7b58e into main Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert #79 #192

Revert #79 #192

ikawrakow commented Feb 7, 2025 •

edited

Loading

Revert #79 #192

Revert #79 #192

Conversation

ikawrakow commented Feb 7, 2025 • edited Loading

ikawrakow commented Feb 7, 2025 •

edited

Loading