llama : fix defrag logic #11707

ggerganov · 2025-02-06T11:05:03Z

While working on #11213 I realized that we are currently doing many unnecessary graph defrags because of incorrect cache fragmentation logic. The cache padding triggers the fragmentation threshold for small contexts even if there is no fragmentation at all.

./scripts/compare-commits.sh master gg/llama-fix-defrag -m models/llama-3.1-8b-instruct/ggml-model-q4_0.gguf -m models/llama-3.1-8b-instruct/ggml-model-q8_0.gguf -m models/llama-3.1-8b-instruct/ggml-model-f16.gguf -m models/qwen2.5-3b-coder/ggml-model-q4_0.gguf -m models/qwen2.5-3b-coder/ggml-model-q8_0.gguf -m models/qwen2.5-3b-coder/ggml-model-f16.gguf -fa 1

Model	Test	t/s master	t/s gg/llama-fix-defrag	Speedup
llama 8B F16	pp512	1458.51	1458.18	1.00
llama 8B F16	tg128	38.82	39.19	1.01
llama 8B Q4_0	pp512	1324.28	1323.85	1.00
llama 8B Q4_0	tg128	99.55	101.37	1.02
llama 8B Q8_0	pp512	1298.42	1298.34	1.00
llama 8B Q8_0	tg128	66.23	66.99	1.01
qwen2 3B F16	pp512	3226.49	3226.91	1.00
qwen2 3B F16	tg128	71.26	72.44	1.02
qwen2 3B Q4_0	pp512	2927.50	2925.14	1.00
qwen2 3B Q4_0	tg128	138.02	142.55	1.03
qwen2 3B Q8_0	pp512	2880.21	2878.93	1.00
qwen2 3B Q8_0	tg128	108.89	112.35	1.03

master has the following path applied:

diff --git a/examples/llama-bench/llama-bench.cpp b/examples/llama-bench/llama-bench.cpp
index 4ac19ca86..8e9f90f27 100644
--- a/examples/llama-bench/llama-bench.cpp
+++ b/examples/llama-bench/llama-bench.cpp
@@ -753,6 +753,7 @@ struct cmd_params_instance {
         cparams.offload_kqv = !no_kv_offload;
         cparams.flash_attn  = flash_attn;
         cparams.embeddings  = embeddings;
+        cparams.defrag_thold = 0.1f;
 
         return cparams;
     }

ggml-ci

ggerganov added 2 commits February 6, 2025 12:48

llama : fix defrag logic

04c01e9

ggml-ci

cont : better logic

32b8ce5

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : fix defrag logic #11707

llama : fix defrag logic #11707

ggerganov commented Feb 6, 2025

llama : fix defrag logic #11707

Are you sure you want to change the base?

llama : fix defrag logic #11707

Conversation

ggerganov commented Feb 6, 2025