From d7d0574378196d349ee2f9c4ea4ff50318a15534 Mon Sep 17 00:00:00 2001
From: Bar David <bardavvid@gmail.com>
Date: Wed, 27 Apr 2022 09:44:12 +0300
Subject: [PATCH] adding some roadmap to dedupe

Signed-off-by: Bar David <bardavvid@gmail.com>
---
 DEDUPE-TODO | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/DEDUPE-TODO b/DEDUPE-TODO
index 4b0bfd1d62..29b940b5b9 100644
--- a/DEDUPE-TODO
+++ b/DEDUPE-TODO
@@ -14,3 +14,22 @@
   The storage subsystem usually identifies the similar buffers using
   locality-sensitive hashing or other methods.
 
+- Varying compression ratios on a single job
+  We could accept a list of 2d tuples in form of
+  [(probability,compression_ratio), ...] such that the compression ratios
+  are generated according to their set probability
+
+- Rework verification with dedupe and compression.
+
+- Reduce memory required to manage to dedupe_working_set.
+  Currently we require to maintain a seed (12-16 bytes) per page in
+  the working set. With large files we waste a lot of memory.
+  Either leverage disk space for that, or recalculate the seeds during
+  buffer generation phase
+
+- Dedupe hot spots.
+  Maintain different probabilities within the dedupe_working_set such that when
+  generating dedupe buffers we choose the seeds non uniformly in motivation to
+  simulate real-world use-cases better.
+
+- Add examples of fio jobs utilizing deduplication and/or compression.