This project provides a C++ implementation of a Binary Fuse Filter (BFF) for key-value maps. This data structure allows for efficient storage and retrieval of values associated with keys, offering a balance between space efficiency and query speed. Unlike traditional hash tables, it enables value reconstruction during queries, eliminating the need to store values explicitly.
The BFF-for-KV-Map library offers:
- Creation: Constructs a BFF from a set of keys and their corresponding values. It employs a randomized construction algorithm to ensure a high probability of successful filter creation.
- Serialization: Serializes the filter into a byte array for storage or transmission.
- Deserialization: Reconstructs a BFF from its serialized byte representation.
- Recovery: Retrieves the value associated with a given key. The value is reconstructed from the filter's internal state, not directly retrieved from storage.
- Metrics: Provides methods to obtain the bits-per-entry and serialized size of the filter.
Using this implementation of Binary Fuse Filter for KV Maps, on AWS EC2 instance m8g.large
, it takes
Operation | Time Taken |
---|---|
Construct Filter over KV Map with 1M keys | 1.44s |
Recover value from this filter, corresponding to a queried key | 43.3ns |
___ | ___ |
Construct Filter over KV Map with 10M keys | 23.6s |
Recover value from this filter, corresponding to a queried key | 150ns |
See detailed benchmark results in bench_result_on_Linux_6.8.0-1018-aws_aarch64_with_g++_13.json.
Include the necessary header files in your C++ code:
#include "binary_fuse_filter/filter_for_kv_map.hpp"
#include "binary_fuse_filter/utils.hpp"
The library uses the following key structure:
struct bff_kv_map_utils::bff_key_t {
std::array<uint64_t, 4> words{};
};
You need to represent your keys using this structure.
Construct a BFF from your key-value pairs:
#include <random>
#include <array>
// ... (Your key-value data) ...
std::array<uint8_t, 32> seed{};
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> distrib(0, 255);
for (int n = 0; n < 32; ++n) {
seed[n] = distrib(gen);
}
std::vector<bff_kv_map_utils::bff_key_t> keys = { /* ... your keys ... */ };
std::vector<uint32_t> values = { /* ... your values ... */ };
uint64_t plaintext_modulo = 1024; // Choose an appropriate value. Must be >= 256.
uint64_t label = 12345; // Choose an arbitrary label.
bff_kv_map::bff_for_kv_map_t bff(seed, keys, values, plaintext_modulo, label);
Retrieve a value using its key:
bff_kv_map_utils::bff_key_t query_key = { /* ... your query key ... */ };
uint32_t recovered_value = bff.recover(query_key);
Serialize the BFF to a byte array:
const size_t serialized_size = bff.serialized_num_bytes();
std::vector<uint8_t> serialized_bff(serialized_size, 0);
bff.serialize(serialized_bff);
Deserialize the BFF from the byte array:
bff_kv_map::bff_for_kv_map_t deserialized_bff(serialized_bff);
Note
I maintain an example program @ bff_for_kv_map.cpp, demonstrating usage of the Binary Fuse Filter for Key-Value Maps. You can build and run that program by issuing
make example -j
Number of keys: 100000
Plaintext modulo: 1024
Bits per entry: 11
Serialized size: 475204 bytes
All values recovered correctly !
This project uses a Makefile for building. Make sure you have a C++20 compiler (like g++ or clang++), Google Benchmark, and Google Test installed.
Note
You can run $ make
to show a help output, covering all available Make commands.
- Run Tests: To run the tests, use the following commands:
make test -j # Runs tests in release mode.
make debug_asan_test -j # Runs tests in debug mode with AddressSanitizer (detects memory errors). Requires AddressSanitizer to be properly configured in your compiler.
make release_asan_test -j # Runs tests in release mode with AddressSanitizer.
make debug_ubsan_test -j # Runs tests in debug mode with UndefinedBehaviorSanitizer (detects undefined behavior). Requires UndefinedBehaviorSanitizer to be properly configured in your compiler.
make release_ubsan_test -j # Runs tests in release mode with UndefinedBehaviorSanitizer.
- Run Benchmarks: To run the benchmarks, use these commands:
make benchmark # Runs benchmarks without detailed CPU cycle counting.
make perf # Runs benchmarks with CPU cycle counting (requires `libpfm4` to be installed: `sudo apt-get install libpfm4`).
- C++20 compiler (g++, clang++)
- Google Benchmark, see this
- Google Test, see this
- (Optional for
make perf
)libpfm4
- The random seed is crucial for filter construction. Using a cryptographically secure random number generator is recommended for production environments.
- Error handling is included to catch issues like non-unique keys and invalid parameter values.
This README provides a basic overview. Refer to the source code for detailed implementation specifics and advanced usage options.