diff --git a/Architecture.md b/Architecture.md deleted file mode 100644 index ff2b24a..0000000 --- a/Architecture.md +++ /dev/null @@ -1,41 +0,0 @@ -# Architecture - -## Storage Engine Public APIs - -- `put(key, value)`: store a key-value pair in the LSM tree. -- `delete(key)`: remove a key and its corresponding value. -- `get(key)`: get the value corresponding to a key. -- `scan(range)`: get a range of key-value pairs. - -Internal APIs -- `flush()`: ensure all the operations before sync are persisted to the disk. - -## Data Flow - -### Write Flow - -1. Write the key-value pair to WAL (write ahead log), so that it can be recovered if storage engine crashes. -2. Write the key-value pair to memtable. After writing to WAL and MemTable completes, we can notify user that the write operation is completed. -3. When a memtable is full, flush it to the disk as an SST file in the background. -4. Compact files into lower levels to maintain a good shape for the LSM Tree, so that the read amplification is low. - -### Read Flow - -1. Probe all the memtables from latest to oldest. -2. If the key is not found, we will then search the entire LSM tree containing SSTs to find the data. - ---- - -## LSM Tree features - -1. Data are immutable on persistent storage, which means that it is easier to offload the background tasks (compaction) to remote servers. It is also feasible to directly store and serve data from cloud-native storage systems like S3. -2. An LSM tree can balance between read, write and space amplification by changing the compaction algorithm. The data structure itself is super versatile and can be optimized for different workloads. - -## LSM Tree vs B-Tree - -In RB-Tree and B-Tree, all values are overwritten at it's original memory or disk space when we update the value corresponding to the the key. -But in LSM Tree, all write operations, i.e., insert, update, delete, are performed in somewhere else. -This operations will be batched into SST (sorted string table) files and can be written to the disk. -Once written to the disk, the file will not be changed. -These operations are applied lazily on disk with a special task called **compaction**. -The compaction will merge multiple SST files and remove unused data. \ No newline at end of file diff --git a/Cargo.toml b/Cargo.toml index ee3cfb9..f05870d 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -2,7 +2,7 @@ name = "lsmdb" description = "lsmdb is an efficient storage engine that implements the Log-Structured Merge Tree (LSM-Tree) data structure, designed specifically for handling key-value pairs." authors = ["Nrishinghananda Roy "] -version = "0.3.0" +version = "0.4.0" categories = ["data-structures", "database-implementations"] keywords = ["database", "key-value-store", "key-value", "kv", "db"] repository = "https://github.com/roynrishingha/lsmdb" @@ -18,5 +18,8 @@ rustdoc-args = ["--document-private-items"] [lib] path = "src/lib.rs" +[dependencies] +bit-vec = "0.6.3" +chrono = "0.4.26" + [dev-dependencies] -rand = "0.8" diff --git a/README.md b/README.md index a3b5370..e461394 100644 --- a/README.md +++ b/README.md @@ -1,79 +1,699 @@ -# LSMDB +# lsmdb [github](https://github.com/roynrishingha/lsmdb) [crates.io](https://crates.io/crates/lsmdb) [docs.rs](https://docs.rs/lsmdb) -lsmdb is an efficient storage engine that implements the Log-Structured Merge Tree (LSM-Tree) data structure, designed specifically for handling key-value pairs. +The `lsmdb` is a LSM (Log-Structured Merge) Tree storage engine, a disk-based storage engine designed for efficient read and write operations. +It utilizes a combination of in-memory and on-disk data structures to provide fast data retrieval and durability. -## Components of LSM Tree +## Public API -The components of the LSM Tree Storage Engine -### MemTable +### `StorageEngine` -In a LSM Tree storage engine, the MemTable is an in-memory data structure that holds recently updated key-value pairs. It is implemented as a sorted table-like structure, typically using a skip list or a sorted array, for efficient key lookup and range queries. +The `StorageEngine` struct represents the main component of the LSM Tree storage engine. It consists of the following fields: -The MemTable serves as the first level of data storage in the LSM Tree and is responsible for handling write operations. When a write operation (such as a key-value pair insertion or update) is performed, the corresponding key-value pair is added to the MemTable. The MemTable allows fast write operations since it resides in memory, which provides low-latency access. +- `memtable`: An instance of the `MemTable` struct that serves as an in-memory table for storing key-value pairs. It provides efficient write operations. +- `wal`: An instance of the `WriteAheadLog` struct that handles write-ahead logging. It ensures durability by persistently storing write operations before they are applied to the memtable and SSTables. +- `sstables`: A vector of `SSTable` instances, which are on-disk sorted string tables storing key-value pairs. The SSTables are organized in levels, where each level contains larger and more compacted tables. +- `dir`: An instance of the `DirPath` struct that holds the directory paths for the root directory, write-ahead log directory, and SSTable directory. -However, since the MemTable is in memory, its capacity is limited. As the MemTable fills up, its contents need to be periodically flushed to disk to make space for new updates. This process is typically triggered based on a certain size threshold or number of updates. +The `StorageEngine` struct provides methods for interacting with the storage engine: -#### Role of Memory Table +- `new`: Creates a new instance of the `StorageEngine` struct. It initializes the memtable, write-ahead log, and SSTables. +- `put`: Inserts a new key-value pair into the storage engine. It writes the key-value entry to the memtable and the write-ahead log. If the memtable reaches its capacity, it is flushed to an SSTable. +- `get`: Retrieves the value associated with a given key from the storage engine. It first searches in the memtable, which has the most recent data. If the key is not found in the memtable, it searches in the SSTables, starting from the newest levels and moving to the older ones. +- `remove`: Removes a key-value pair from the storage engine. It first checks if the key exists in the memtable. If not, it searches for the key in the SSTables and removes it from there. The removal operation is also logged in the write-ahead log for durability. +- `update`: Updates the value associated with a given key in the storage engine. It first removes the existing key-value pair using the `remove` method and then inserts the updated pair using the `put` method. +- `clear`: Clears the storage engine by deleting the memtable and write-ahead log. It creates a new instance of the storage engine, ready to be used again. -The `MemTable` is a crucial component in the implementation of an LSM Tree-based storage engine. It represents an in-memory table that holds key-value entries before they are written to disk. +### DirPath +The `DirPath` struct represents the directory paths used by the storage engine. It consists of the following fields: -The main purpose of the `MemTable` is to provide efficient write operations by buffering the incoming key-value pairs in memory before flushing them to disk. By keeping the data in memory, write operations can be performed at a much higher speed compared to directly writing to disk. +- `root`: A `PathBuf` representing the root directory path, which serves as the parent directory for the write-ahead log and SSTable directories. +- `wal`: A `PathBuf` representing the write-ahead log directory path, where the write-ahead log file is stored. +- `sst`: A `PathBuf` representing the SSTable directory path, where the SSTable files are stored. -Here are the main functions and responsibilities of the `MemTable`: +The `DirPath` struct provides methods for building and retrieving the directory paths. -1. **Insertion of key-value entries**: The `MemTable` allows for the insertion of key-value entries. When a new key-value pair is added to the `MemTable`, it is typically appended to an in-memory data structure, such as a sorted array or a skip list. The structure is designed to provide fast insertion and lookup operations. +### SizeUnit -2. **Lookup of key-value entries**: The `MemTable` provides efficient lookup operations for retrieving the value associated with a given key. It searches for the key within the in-memory data structure to find the corresponding value. If the key is not found in the `MemTable`, it means the key is not present or has been deleted. +The `SizeUnit` enum represents the unit of measurement for capacity and size. It includes the following variants: -3. **Deletion of key-value entries**: The `MemTable` supports the deletion of key-value entries. When a deletion operation is performed, a special marker or tombstone is added to indicate that the key is deleted. This allows for efficient handling of deletions and ensures that the correct value is returned when performing lookups. +- `Bytes`: Represents the byte unit. +- `Kilobytes`: Represents the kilobyte unit. +- `Megabytes`: Represents the megabyte unit. +- `Gigabytes`: Represents the gigabyte unit. -4. **Flush to disk**: Periodically, when the `MemTable` reaches a certain size or based on a predefined policy, it is necessary to flush the contents of the `MemTable` to disk. This process involves writing the key-value pairs (including tombstones) to a disk-based storage structure, such as an SSTable (Sorted String Table), which provides efficient read operations. +The `SizeUnit` enum provides a method `to_bytes` for converting a given value to bytes based on the selected unit. -It's important to note that the `MemTable` is a volatile data structure that resides in memory, and its contents are not durable. To ensure durability and crash recovery, the data in the `MemTable` is periodically flushed to disk and also logged in a write-ahead log (WAL). +### Helper Functions +The code includes several helper functions: -Overall, the `MemTable` plays a crucial role in the LSM Tree storage engine by providing fast write operations and buffering data in memory before flushing it to disk for persistence. +- `with_capacity`: A helper function that creates a new instance of the `StorageEngine` struct with a specified capacity for the memtable. +- `with_capacity_and_rate`: A helper function + that creates a new instance of the `StorageEngine` struct with a specified capacity for the memtable and a compaction rate for the SSTables. +- `flush_memtable`: A helper function that flushes the contents of the memtable to an SSTable. It creates a new SSTable file and writes the key-value pairs from the memtable into it. After flushing, the memtable is cleared. +- `recover_memtable`: A helper function that recovers the contents of the memtable from the write-ahead log during initialization. It reads the logged write operations from the write-ahead log and applies them to the memtable. -### Write-Ahead Log (WAL) +These helper functions assist in initializing the storage engine, flushing the memtable to an SSTable when it reaches its capacity, and recovering the memtable from the write-ahead log during initialization, ensuring durability and maintaining data consistency. -The Write-Ahead Log (WAL) is a persistent storage mechanism used in conjunction with the MemTable. It is an append-only log file that records all the write operations before they are applied to the MemTable. The WAL ensures durability by persistently storing all modifications to the database. +--- -When a write operation is performed, the corresponding key-value pair modification is first written to the WAL. This ensures that the modification is durably stored on disk before being applied to the MemTable. Once the write is confirmed in the WAL, the modification is applied to the MemTable in memory. +## MemTable -The WAL provides crash recovery capabilities for the LSM Tree. In the event of a system crash or restart, the LSM Tree can replay the write operations recorded in the WAL to restore the MemTable to its last consistent state. This guarantees that no data is lost or corrupted during system failures. +The `MemTable` (short for memory table) is an in-memory data structure that stores recently written data before it is flushed to disk. It serves as a write buffer and provides fast write operations. -Additionally, the WAL can also improve the performance of read operations. By storing the modifications sequentially in the WAL, disk writes can be performed more efficiently, reducing the impact of random disk access and improving overall throughput. +### Dependencies -#### Role of WAL +The implementation requires the following dependencies to be imported from the standard library: -The WAL (Write-Ahead Log) is a fundamental component of many database systems, including those that use the LSM Tree storage engine. It is a mechanism used to ensure durability and crash recovery by providing a reliable log of write operations before they are applied to the main data storage. +- **`std::collections::BTreeMap`**: A balanced binary search tree implementation that stores key-value pairs in sorted order. +- **`std::io`**: Provides input/output functionality, including error handling. +- **`std::sync::{Arc, Mutex}`**: Provides thread-safe shared ownership(`Arc`) and mutual exclusion (`Mutex`) for concurrent access to data. -Here's how the WAL works and what it does: +### Constants -1. **Logging write operations**: Whenever a write operation (insertion, update, deletion) is performed on the database, the WAL captures the details of the operation in a sequential log file. This log entry includes the necessary information to reconstruct the write operation, such as the affected data item, the type of operation, and any associated metadata. +The implementation defines the following constants: -2. **Write ordering**: The WAL enforces a "write-ahead" policy, meaning that the log entry must be written to disk before the corresponding data modification is applied to the main data storage. This ordering ensures that the log contains a reliable record of all changes before they are committed. +#### `DEFAULT_MEMTABLE_CAPACITY` -3. **Durability guarantee**: By writing the log entry to disk before the actual data modification, the WAL provides durability guarantees. Even if the system crashes or experiences a power failure, the log entries are preserved on disk. Upon recovery, the system can use the log to replay and apply the previously logged operations to restore the database to a consistent state. +Represents the default maximum size of the MemTable. By default, it is set to 1 gigabyte (1GB). +```rs +pub(crate) static DEFAULT_MEMTABLE_CAPACITY: usize = SizeUnit::Gigabytes.to_bytes(1); +``` -4. **Crash recovery**: During system recovery after a crash, the WAL is consulted to bring the database back to a consistent state. The system replays the log entries that were not yet applied to the main data storage before the crash. This process ensures that all the write operations that were logged but not yet persisted are correctly applied to the database. +#### `DEFAULT_FALSE_POSITIVE_RATE` -5. **Log compaction**: Over time, the WAL log file can grow in size, which may impact the performance of the system. To address this, log compaction techniques can be applied to periodically consolidate and remove unnecessary log entries. This process involves creating a new log file by compacting the existing log and discarding the obsolete log entries. +Represents the default false positive rate for the Bloom filter used in the `MemTable`. By default, it is set to 0.0001 (0.01%). -The use of a WAL provides several benefits for database systems, including improved durability, crash recovery, and efficient write performance. It ensures that modifications to the database are reliably captured and persisted before being applied to the main data storage, allowing for consistent and recoverable operations even in the event of failures. +```rs +pub(crate) static DEFAULT_FALSE_POSITIVE_RATE: f64 = 0.0001; +``` -### SSTables (Sorted String Tables) +### Structure -SSTables are disk-based storage structures that store sorted key-value pairs. When the MemTable is flushed to disk, it is typically written as an SSTable file. Multiple SSTables may exist on disk, each representing a previous MemTable state. SSTables are immutable, which simplifies compaction and allows for efficient read operations. +The **`MemTable`** structure represents the in-memory data structure and contains the following fields: -### Compaction +```rs +pub(crate) struct MemTable { + entries: Arc, Vec>>>, + entry_count: usize, + size: usize, + capacity: usize, + bloom_filter: BloomFilter, + size_unit: SizeUnit, + false_positive_rate: f64, +} +``` -Compaction is the process of merging and organizing multiple SSTables to improve read performance and reclaim disk space. It involves merging overlapping key ranges from different SSTables and eliminating duplicate keys. Compaction reduces the number of disk seeks required for read operations and helps maintain an efficient data structure. +#### `entries` -### Bloom Filters +The entries field is of type `Arc, Vec>>>`. It holds the key-value pairs of the `MemTable` in sorted order using a `BTreeMap`. The `Arc` (Atomic Reference Counting) and `Mutex` types allow for concurrent access and modification of the `entries` data structure from multiple threads, ensuring thread safety. -Bloom filters are probabilistic data structures that provide efficient and fast membership tests. They are used to reduce the number of disk accesses during read operations by quickly determining whether a specific key is present in an SSTable. Bloom filters can help improve read performance by reducing unnecessary disk reads. +#### `entry_count` + +The `entry_count` field is of type `usize` and represents the number of key-value entries currently stored in the `MemTable`. + +#### `size` + +The `size` field is of type `usize` and represents the current size of the `MemTable` in bytes. It is updated whenever a new key-value pair is added or removed. + +#### `capacity` + +The `capacity` field is of type `usize` and represents the maximum allowed size for the `MemTable` in bytes. It is used to enforce size limits and trigger flush operations when the `MemTable` exceeds this capacity. + +#### `bloom_filter` + +The `bloom_filter` field is of type `BloomFilter` and is used to probabilistically determine whether a `key` may exist in the `MemTable` without accessing the `entries` map. It helps improve performance by reducing unnecessary lookups in the map. + +#### `size_unit` + +The `size_unit` field is of type `SizeUnit` and represents the unit of measurement used for `capacity` and `size` calculations. It allows for flexibility in specifying the capacity and size of the `MemTable` in different units (e.g., bytes, kilobytes, megabytes, etc.). + +#### `false_positive_rate` + +The `false_positive_rate` field is of type `f64` and represents the desired false positive rate for the bloom filter. It determines the trade-off between memory usage and the accuracy of the bloom filter. + +### Constructor Methods + +#### `new` + +```rs +pub(crate) fn new() -> Self +``` + +The `new` method creates a new `MemTable` instance with the default capacity. It internally calls the `with_capacity_and_rate` method, passing the default capacity and false positive rate. + +#### `with_capacity_and_rate` + +```rs +pub(crate) fn with_capacity_and_rate( + size_unit: SizeUnit, + capacity: usize, + false_positive_rate: f64, +) -> Self +``` + +The `with_capacity_and_rate` method creates a new `MemTable` with the specified capacity, size unit, and false positive rate. It initializes the `entries` field as an empty `BTreeMap`, sets the `entry_count` and `size` to zero, and creates a new `BloomFilter` with the given capacity and false positive rate. The capacity is converted to bytes based on the specified size unit. + +### Public Methods + +#### `set` + +```rs +pub(crate) fn set(&mut self, key: Vec, value: Vec) -> io::Result<()> +``` + +The `set` method inserts a new key-value pair into the `MemTable`. It first acquires a lock on the `entries` field to ensure thread-safety. If the key is not present in the `BloomFilter`, it adds the key-value pair to the `entries` map, updates the `entry_count` and `size`, and sets the key in the `BloomFilter`. If the key already exists, an `AlreadyExists` error is returned. + +#### `get` + +```sh +pub(crate) fn get(&self, key: Vec) -> io::Result>> +``` + +The `get` method retrieves the value associated with a given key from the `MemTable`. It first checks if the key is present in the `BloomFilter`. If it is, it acquires a lock on the `entries` field and returns the associated value. If the key is not present in the `BloomFilter`, it returns `None`. + +#### `remove` + +```sh +pub(crate) fn remove(&mut self, key: Vec) -> io::Result, Vec)>> +``` + +The `remove` method removes a key-value pair from the `MemTable` based on a given key. It first checks if the key is present in the `BloomFilter`. If it is, it acquires a lock on the `entries` field and removes the key-value pair from the `entries` map. It updates the `entry_count` and `size` accordingly and returns the removed key-value pair as a tuple. If the key is not present in the `BloomFilter`, it returns `None`. + +#### `clear` + +```rs +pub(crate) fn clear(&mut self) -> io::Result<()> +``` + +The `clear` method removes all key-value entries from the `MemTable`. It acquires a lock on the `entries` field, clears the `entries` map, and sets the `entry_count` and `size` fields to zero. + +#### `entries` + +```rs +pub(crate) fn entries(&self) -> io::Result, Vec)>> +``` + +The `entries` method returns a vector of all key-value pairs in the `MemTable`. It acquires a lock on the `entries` field and iterates over the key-value pairs in the `entries` map. It clones each key-value pair and collects them into a vector, which is then returned. + +### Internal Method + +#### `capacity` + +```rs +pub(crate) fn capacity(&self) -> usize +``` + +The `capacity` method returns the capacity of the `MemTable` in bytes. + +#### `size` + +```rs +pub(crate) fn size(&self) -> usize +``` + +The `size` method returns the current size of the `MemTable` in the specified size unit. It divides the internal `size` by the number of bytes in one unit of the specified size unit. + +#### `false_positive_rate` + +```rs +pub(crate) fn false_positive_rate(&self) -> f64 +``` + +The `false_positive_rate` method returns the false positive rate of the `MemTable`. + +#### `size_unit` + +```rs +pub(crate) fn size_unit(&self) -> SizeUnit +``` + +The `size_unit` method returns the size unit used by the `MemTable`. + +### Error Handling + +All the methods that involve acquiring a lock on the `entries` field use the `io::Error` type to handle potential errors when obtaining the lock. If an error occurs during the locking process, an `io::Error` instance is created with a corresponding error message. + +### Thread Safety + +The `MemTable` implementation ensures thread safety by using an `Arc, Vec>>>` for storing the key-value entries. The `Arc` allows multiple ownership of the `entries` map across threads, and the `Mutex` ensures exclusive access to the map during modification operations, preventing data races. + +The locking mechanism employed by the `Mutex` guarantees that only one thread can modify the `entries` map at a time, while allowing multiple threads to read from it simultaneously. + +### Test Suite + +A test suite is provided to ensure the correctness and functionality of the `MemTable`. +It includes tests for creating an empty `MemTable`, setting and getting key-value pairs, and removing key-value entries. + +--- + +## Write-Ahead Log (WAL) + +The Sequential Write-Ahead Log (WAL) is a crucial component of the LSM Tree storage engine. +It provides durability and atomicity guarantees by logging write operations before they are applied to the main data structure. + +When a write operation is received, the key-value pair is first appended to the WAL. +In the event of a crash or system failure, the WAL can be replayed to recover the data modifications and bring the MemTable back to a consistent state. + +### Dependencies + +The implementation requires the following dependencies to be imported from the standard library: + +- **`std::fs`**: Provides file system-related operations. +- **`std::io`**: Provides input/output functionality, including error handling. +- **`std::path::PathBuf`**: Represents file system paths. +- **`std::sync::{Arc, Mutex}`**: Provides thread-safe shared ownership and synchronization. + +### WriteAheadLog Structure + +The `WriteAheadLog` structure represents the write-ahead log (WAL) and contains the following field: + +```rs +struct WriteAheadLog { + log_file: Arc>, +} +``` + +#### log_file + +The `log_file` field is of type `Arc>`. It represents the WAL file and provides concurrent access and modification through the use of an `Arc` (Atomic Reference Counting) and `Mutex`. + +### Log File Structure Diagram + +The `log_file` is structured as follows: + +```sh ++-------------------+ +| Entry Length | (4 bytes) ++-------------------+ +| Entry Kind | (1 byte) ++-------------------+ +| Key Length | (4 bytes) ++-------------------+ +| Value Length | (4 bytes) ++-------------------+ +| Key | (variable) +| | +| | ++-------------------+ +| Value | (variable) +| | +| | ++-------------------+ +| Entry Length | (4 bytes) ++-------------------+ +| Entry Kind | (1 byte) ++-------------------+ +| Key Length | (4 bytes) ++-------------------+ +| Value Length | (4 bytes) ++-------------------+ +| Key | (variable) +| | +| | ++-------------------+ +| Value | (variable) +| | +| | ++-------------------+ +``` + +- **Entry Length**: A 4-byte field representing the total length of the entry in bytes. +- **Entry Kind**: A 1-byte field indicating the type of entry (Insert or Remove). +- **Key Length**: A 4-byte field representing the length of the key in bytes. +- **Key**: The actual key data, which can vary in size. +- **Value** Length: A 4-byte field representing the length of the value in bytes. +- **Value**: The actual value data, which can also vary in size. + +Each entry is written sequentially into the `log_file` using the `write_all` method, ensuring that the entries are stored contiguously. New entries are appended to the end of the `log_file` after the existing entries. + +### Constants + +A constant named `WAL_FILE_NAME` is defined, representing the name of the WAL file. + +```rs +static WAL_FILE_NAME: &str = "lsmdb_wal.bin"; +``` + +### `EntryKind` + +```rs +enum EntryKind { + Insert = 1, + Remove = 2, +} +``` + +The `EntryKind` enum represents the kind of entry stored in the WAL. It has two variants: `Insert` and `Remove`. Each variant is associated with an integer value used for serialization. + +### `WriteAheadLogEntry` + +```rs +struct WriteAheadLogEntry { + entry_kind: EntryKind, + key: Vec, + value: Vec, +} +``` + +The `WriteAheadLogEntry` represents a single entry in the Write-Ahead Log. It contains the following fields: + +- **`entry_kind`**: An enumeration (`EntryKind`) representing the type of the entry (insert or remove). +- **`key`**: A vector of bytes (`Vec`) representing the key associated with the entry. +- **`value`**: A vector of bytes (`Vec`) representing the value associated with the entry. + +### `WriteAheadLogEntry` Methods + +#### `new` + +```rs +fn new(entry_kind: EntryKind, key: Vec, value: Vec) -> Self +``` + +The `new` method creates a new instance of the `WriteAheadLogEntry` struct. It takes the `entry_kind`, `key`, and `value` as parameters and initializes the corresponding fields. + +#### `serialize` + +```rs +fn serialize(&self) -> Vec +``` + +The `serialize` method serializes the `WriteAheadLogEntry` into a vector of bytes. +It calculates the length of the entry, then serializes the length, entry kind, key length, value length, key, and value into the vector. The serialized data is returned. + +#### `deserialize` + +```rs +fn deserialize(serialized_data: &[u8]) -> io::Result +``` + +This method deserializes a `WriteAheadLogEntry` from the provided serialized data. +It performs validation checks on the length and structure of the serialized data and returns an `io::Result` containing the deserialized entry if successful. + +### `WriteAheadLog` Methods + +#### `new` + +```rs +fn new(directory_path: &PathBuf) -> io::Result +``` + +The `new` method is a constructor function that creates a new `WriteAheadLog` instance. +It takes a `directory_path` parameter as a `PathBuf` representing the directory path where the WAL file will be stored. + +If the directory doesn't exist, it creates it. It then opens the log file with read, append, and create options, and initializes the log_file field. + +#### `append` + +```rs +fn append(&mut self, entry_kind: EntryKind, key: Vec, value: Vec ) -> io::Result<()> +``` + +The `append` method appends a new entry to the Write-Ahead Log. +It takes an `entry_kind` parameter of type `EntryKind`, a `key` parameter of type `Vec`, and a `value` parameter of type `Vec`. The method acquires a lock on the `log_file` to ensure mutual exclusion when writing to the file. + +It creates a `WriteAheadLogEntry` with the provided parameters, serializes it, and writes the serialized data to the log file. + +Finally, it flushes the log file to ensure the data is persisted. If the operation succeeds, `Ok(())` is returned; otherwise, an `io::Error` instance is created and returned. + +#### `recover` + +```rs +fn recover(&mut self) -> io::Result> +``` + +The `recover` method reads and recovers the entries from the Write-Ahead Log. The method acquires a lock on the `log_file` to ensure exclusive access during the recovery process. + +It reads the log file and deserializes the data into a vector of `WriteAheadLogEntry` instances. +It continues reading and deserializing until the end of the log file is reached. The recovered entries are returned as a vector. + +#### `clear` + +```rs +fn clear(&mut self) -> io::Result<()> +``` + +The `clear` method clears the contents of the WAL file. It acquires a lock on the `log_file` to ensure exclusive access when truncating and seeking the file. +The method sets the length of the file to `0` using the `set_len` method, effectively truncating it. Then, it seeks to the start of the file using `seek` with `SeekFrom::Start(0)` to reset the file pointer. +If the operation succeeds, `Ok(())` is returned; otherwise, an `io::Error` instance is created and returned. + +### Thread Safety + +The `WriteAheadLog` implementation ensures thread safety by using an `Arc>` for the `log_file` field. The `Arc` allows multiple ownership of the WAL file across threads, and the `Mutex` ensures exclusive access to the file during write, recovery, and clear operations, preventing data races. + +The locking mechanism employed by the `Mutex` guarantees that only one thread can modify the WAL file at a time, while allowing multiple threads to read from it simultaneously. + +### Testing + +The module includes a test function that verifies the correctness of the serialization and deserialization process. + +It creates a sample WriteAheadLogEntry, serializes it, and then deserializes the serialized data back into an entry. It compares the deserialized entry with the original entry to ensure consistency. + +--- + +## Bloom Filter + +The Bloom Filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It provides a fast and memory-efficient way to check for set membership, but it introduces a small probability of false positives. + +The Bloom Filter implementation is provided as a Rust module and consists of a struct called `BloomFilter`. It uses a `BitVec` to represent the array of bits that make up the filter. The number of hash functions used by the Bloom Filter is configurable, and it keeps track of the number of elements inserted into the filter. + +### Dependencies + +- **`std::collections::hash_map::DefaultHasher`**: Provides the default hasher implementation used for calculating hash values of keys. +- **`std::hash::{Hash, Hasher}`**: Defines the Hash trait used for hashing keys. +- **`std::sync::{Arc, Mutex}`**: Provides thread-safe shared ownership (Arc) and mutual exclusion (Mutex) for concurrent access to data. +- **`bit_vec::BitVec`**: Implements a bit vector data structure used to store the Bloom filter's bit array. + +### Structure + +The BloomFilter struct represents the Bloom Filter data structure and contains the following fields: + +```rs +pub(crate) struct BloomFilter { + bits: Arc>, + num_hashes: usize, + num_elements: AtomicUsize, +} +``` + +#### `bits` + +An `Arc>` representing the array of bits used to store the Bloom filter. + +#### `num_hashes` + +The number of hash functions used by the Bloom filter. + +#### `num_elements` + +An `AtomicUsize` representing the number of elements inserted into the Bloom filter. + +### Constructor Methods + +#### `new` + +```rs +fn new(num_elements: usize, false_positive_rate: f64) -> Self +``` + +The `new` method creates a new `BloomFilter` with the specified number of elements and false positive rate. It initializes the Bloom filter's bit array, calculates the number of hash functions, and sets the initial number of elements to zero. + +### Public Methods + +#### `set` + +```rs +fn set(&mut self, key: &T)``` + +The `set` method inserts a key into the Bloom filter. It calculates the hash value for the key using multiple hash functions and sets the corresponding bits in the bit array to true. It also increments the element count. + +#### `contains` + +```rs +fn contains(&self, key: &T) -> bool +``` + +The `contains` method checks if a key is possibly present in the Bloom filter. +It calculates the hash value for the key using multiple hash functions and checks the corresponding bits in the bit array. +If any of the bits are false, it indicates that the key is definitely not present, and the method returns false. +If all bits are true, the method returns true, indicating that the key is possibly present. + +#### `num_elements` + +```rs +fn num_elements(&self) -> usize +``` + +This method returns the current number of elements inserted into the Bloom filter. + +### Internal Method + +#### `calculate_hash` + +```rs +fn calculate_hash(&self, key: &T, seed: usize) -> u64 +``` + +This function calculates a hash value for a given key and seed. It uses a suitable hash function to hash the key and incorporates the seed value for introducing randomness. + +#### `calculate_num_bits` + +```rs +fn calculate_num_bits(num_elements: usize, false_positive_rate: f64) -> usize +``` + +This function calculates the optimal number of bits for the Bloom filter based on the desired false positive rate and the expected number of elements. It uses a formula to estimate the number of bits required. + +#### `calculate_num_hashes` + +```rs +fn calculate_num_hashes(num_bits: usize, num_elements: usize) -> usize +``` + +This function calculates the optimal number of hash functions for the Bloom filter based on the number of bits and the expected number of elements. It uses a formula to estimate the number of hash functions required. + +### Usage within MemTable Architecture + +In the MemTable architecture pattern, the Bloom Filter is utilized as a component within the MemTable data structure. +The MemTable serves as an in-memory structure that holds recently updated key-value pairs before they are flushed to disk. +The Bloom Filter is used to optimize data retrieval by quickly determining if a key is likely to be present in the MemTable, thus reducing the need for disk accesses. + +When a key-value pair is inserted into the MemTable, the key is also inserted into the Bloom Filter using the set method. +During data retrieval, the Bloom Filter's contains method is first used to check if the key is possibly present in the MemTable. +If the Bloom Filter indicates that the key is possibly present, further disk accesses can be performed to retrieve the corresponding value. +If the Bloom Filter indicates that the key is definitely not present, disk accesses can be avoided, improving overall performance. + +It is important to note that the Bloom Filter is a probabilistic data structure and can produce false positives, indicating that a key is possibly present even when it is not. +Therefore, if the Bloom Filter indicates that a key is possibly present, a subsequent disk access is still required to validate the actual presence of the key and retrieve the correct value. + +--- + +## SSTable (Sorted String Table) + +An SSTable, or Sorted String Table, is an immutable on-disk data structure that stores key-value pairs in a sorted order. +It serves as the persistent storage layer for the LSM Tree-based engine. +SSTables are typically stored as multiple files, each containing a sorted range of key-value pairs. + +When the MemTable reaches a certain threshold size, it is "flushed" to disk as a new SSTable file. +The MemTable is atomically replaced with an empty one, allowing new write operations to continue. This process is known as a "memtable flush." + +```rs ++-----------------------+ +| SSTable | ++-----------------------+ +| - file_path | (PathBuf) +| - blocks | (Vec) +| - created_at | (DateTime) ++-----------------------+ +| + new(dir: PathBuf) | -> SSTable +| + set(key, value) | -> Result<(), io::Error> +| + get(key) | -> Option> +| + remove(key) | -> Result<(), io::Error> ++-----------------------+ + ++-----------------------+ +| Block | ++-----------------------+ +| - data | (Vec) +| - index | (HashMap>, usize>) +| - entry_count | (usize) ++-----------------------+ +| + new() | -> Block +| + is_full(size) | -> bool +| + set_entry(key, value) | -> Result<(), io::Error> +| + remove_entry(key) | -> bool +| + get_value(key) | -> Option> +| + entry_count() | -> usize ++-----------------------+ +``` + +The `SSTable` struct represents the Sorted String Table and contains the following fields: +- `file_path`: Stores the path of the SSTable file (PathBuf). +- `blocks`: Represents a collection of blocks that hold the data (Vec). +- `created_at`: Indicates the creation timestamp of the SSTable (DateTime). + +The `SSTable` struct provides the following methods: + +- `new(dir: PathBuf) -> SSTable`: Creates a new instance of the `SSTable` struct given a directory path and initializes its fields. Returns the created `SSTable`. + +- `set(key: Vec, value: Vec) -> Result<(), io::Error>`: Sets an entry with the provided key and value in the `SSTable`. It internally manages the blocks and their capacity to store entries. Returns `Result<(), io::Error>` indicating success or failure. + +- `get(key: Vec) -> Option>`: Retrieves the value associated with the provided key from the `SSTable`. It iterates over the blocks to find the key-value pair. Returns `Option>` with the value if found, or `None` if the key is not present. + +- `remove(key: Vec) -> Result<(), io::Error>`: Removes the entry with the provided key from the `SSTable`. It iterates over the blocks in reverse order to delete from the most recent block first. Returns `Result<(), io::Error>` indicating success or failure. + +The `Block` struct represents an individual block within the SSTable and contains the following fields: + +- `data`: Stores the data entries within the block (Vec). +- `index`: Maintains an index for efficient key-based lookups (HashMap>, usize>). +- `entry_count`: Tracks the number of entries in the block (usize). + +The `Block` struct provides the following methods: + +- `new() -> Block`: Creates a new instance of the `Block` struct and initializes its fields. Returns the created `Block`. + +- `is_full(entry_size: usize) -> bool`: Checks if the block is full given the size of an entry. It compares the combined size of the existing data and the new entry size with the predefined block size. Returns `true` if the block is full, `false` otherwise. + +- `set_entry(key: &[u8], value: &[u8]) -> Result<(), io::Error>`: Sets an entry with the provided key and value in the block. It calculates the entry size, checks if the block has enough capacity, and adds the entry to the block's data and index. Returns `Result<(), io::Error>` indicating success or failure. + +- `remove_entry(key: &[u8]) -> bool`: Removes the entry with the provided key from the block. It searches for the key in the index, clears the entry in the data vector, and updates the entry count. Returns `true` if the entry was found and removed, `false` otherwise. + +- `get_value(key: &[u8]) -> Option>`: Retrieves the value associated with the provided key from the block. It looks up the key in the index, extracts the value bytes from the data vector, and returns them as a new `Vec`. Returns `Option>` with the value if found, or `None` if the key is not present. + +- `entry_count() -> usize`: Returns the number of entries in the block. + +Together, the `SSTable` and `Block` form the basic components of the SSTable implementation, providing efficient storage and retrieval of key-value pairs with support for adding and removing entries. + +The `SSTable` manages multiple `Block` instances to store the data, and the `Block` handles individual block-level operations and indexing. + +A diagram illustrating how key-value pairs are stored inside a Block: + +```rs ++----------------------------------+ +| Block | ++----------------------------------+ +| - data: Vec | // Data entries within the block +| - index: HashMap>, usize> | // Index for key-based lookups +| - entry_count: usize | // Number of entries in the block ++----------------------------------+ +| Block Data | +| +------------------------+ | +| | Entry 1 | | +| | +-------------------+ | | +| | | Length Prefix | | | +| | | (4 bytes, little- | | | +| | | endian format) | | | +| | +-------------------+ | | +| | | Key | | | +| | | (variable length) | | | +| | +-------------------+ | | +| | | Value | | | +| | | (variable length) | | | +| | +-------------------+ | | +| +------------------------+ | +| | Entry 2 | | +| | ... | | +| +------------------------+ | ++----------------------------------+ +``` + +In the diagram: +- The `Block` struct represents an individual block within the SSTable. +- The `data` field of the `Block` is a vector (`Vec`) that stores the data entries. +- The `index` field is a `HashMap` that maintains the index for efficient key-based lookups. +- The `entry_count` field keeps track of the number of entries in the block. + +Each entry within the block consists of three parts: +1. Length Prefix: A 4-byte length prefix in little-endian format, indicating the length of the value. +2. Key: Variable-length key bytes. +3. Value: Variable-length value bytes. + +The block's data vector (`data`) stores these entries sequentially. Each entry follows the format mentioned above, and they are concatenated one after another within the data vector. The index hashmap (`index`) maintains references to the keys and their corresponding offsets within the data vector. + + --- + + Author: Nrishinghananda Roy diff --git a/notes.md b/notes.md new file mode 100644 index 0000000..26a0f1e --- /dev/null +++ b/notes.md @@ -0,0 +1,92 @@ +# NOTES + +## Why use `Vec` for `key` and `value`? + +1. **Lower memory overhead**: Byte arrays typically have lower memory overhead compared to strings, as they store raw binary data without additional metadata. +2. **Efficient serialization**: Byte arrays are well-suited for serialization and deserialization operations, making them efficient for storing and retrieving data from disk or network. +3. **Flexibility**: Byte arrays can represent arbitrary binary data, allowing you to store non-textual or structured data efficiently. + +## `&[u8]` or `Vec` ? + +`&[u8]` and `Vec` are two different types used for representing sequences of bytes (`u8` values) but with different ownership and lifetimes. +Both types can be converted to each other using methods like `as_slice()` to convert a `Vec` to a `&[u8]` or `to_vec()` to convert a `&[u8]` to a `Vec`. + +### Differences + +1. `&[u8]` (byte slice): + - A byte slice is an immutable view into a contiguous sequence of bytes. + - It represents a borrowed reference to an existing byte sequence and does not own the data. + - The length of the slice is fixed and cannot be changed. + - Byte slices are often used as function parameters or return types to efficiently pass or return sequences of bytes without copying the data. + +2. `Vec` (byte vector): + - A byte vector is a growable, mutable container for a sequence of bytes. + - It owns the underlying byte data and can dynamically resize and modify the content. + - `Vec` provides additional methods beyond what a byte slice offers, such as pushing, popping, and modifying elements. + - Since it owns the data, a `Vec` has a specific lifetime and is deallocated when it goes out of scope. + + +In summary, `&[u8]` is a borrowed reference to an existing byte sequence and is useful for working with byte data without ownership or modifying capabilities. `Vec` is a mutable container that owns the byte data and provides additional operations for dynamic modification and ownership of byte sequences. + +## Mutex or RwLock ? + +Using either `RwLock` or `Mutex` for protecting the `MemTable` entries in LSM Tree implementation has its own pros and cons. + +### Pros of using `RwLock`: + +1. **Concurrent Reads**: `RwLock` allows multiple readers to acquire a read lock simultaneously, enabling concurrent read operations. This can improve performance in scenarios where there are frequent read operations compared to write operations. +2. **Lower Contention**: Since multiple threads can read the entries concurrently, it reduces contention among threads and improves overall throughput. +3. **Thread Safety**: `RwLock` guarantees thread safety by enforcing exclusive write access but allowing concurrent read access. + +### Cons of using `RwLock`: + +1. **Exclusive Write Access**: While `RwLock` allows multiple threads to read concurrently, it only allows a single thread to acquire the write lock at a time. This can lead to reduced performance if there are frequent write operations or if the write operations take a significant amount of time. +2. **Potential Deadlocks**: It's important to be cautious when using `RwLock` to avoid potential deadlocks. If a thread holds a read lock and tries to acquire a write lock or vice versa, it can result in a deadlock situation. + +### Pros of using `Mutex`: + +1. **Simplicity**: `Mutex` is a straightforward synchronization primitive, making it easier to reason about and use correctly. +2. **Exclusive Access**: `Mutex` ensures exclusive access to the entries, which can be beneficial if the write operations require strong consistency guarantees. +3. **Avoiding Deadlocks**: Since `Mutex` allows only one thread to hold the lock at a time, the possibility of deadlocks due to lock contention is reduced. + +### Cons of using `Mutex`: + +1. **Limited Concurrency**: `Mutex` allows only one thread to hold the lock at a time, which means concurrent reads are not possible. This can impact performance if there are frequent read operations or if there are multiple threads that primarily perform reads. +2. **Potential Contention**: Since `Mutex` allows only one thread to hold the lock at a time, other threads attempting to acquire the lock may experience contention, leading to reduced throughput in high concurrency scenarios. + + +## Different types of WAL implementation + +The Write-Ahead Log can be implemented as a file-based log or a sequential write-ahead log. Here are the pros and cons of of both: + +### File-Based Log: + +#### Pros: + +- **Durability**: Writing to a file ensures that the logged data is persisted even in the event of system failures or crashes. The data remains intact and can be recovered during system startup or crash recovery. +- **Simplicity**: Implementing the WAL as a file-based log is often straightforward and easier to understand compared to other approaches. +- **Flexibility**: The file-based log can be easily managed, rotated, truncated, and archived. This allows for efficient space utilization and log management. + +#### Cons: + +- **Disk I/O**: Writing to a file involves disk I/O operations, which can introduce latency and impact performance, especially in high-write scenarios. +- **Synchronization Overhead**: Ensuring the durability of each write operation may require additional synchronization mechanisms such as fsync or flushing the file system cache, which can impact performance. +- **Fragmentation**: Over time, the log file may become fragmented due to various operations like rotation, truncation, and compaction. This fragmentation can impact read and write performance. + +### Sequential Write-Ahead Log: + +#### Pros: + +- **Performance**: Sequential writes are generally faster compared to random disk writes. Writing data in sequential order minimizes seek time and maximizes disk throughput, resulting in improved performance. +- **Reduced Disk I/O**: Sequential write-ahead logging reduces disk I/O operations, as multiple write operations can be batched together and written to disk in a single sequential write operation. +- **Atomicity**: Sequential write-ahead logging ensures atomicity by guaranteeing that all the logged operations are either fully written or not written at all. This helps maintain consistency and reliability. + +#### Cons: + +- **Complexity**: Implementing a sequential write-ahead log may require more complex logic compared to a file-based log. Proper management of the log's sequential order and handling batched writes is essential. +- **Limited Flexibility**: Sequential write-ahead logging may have limitations on log management operations like rotation, truncation, or compaction, as maintaining the sequential order of writes is crucial. +- **Increased Memory Usage**: Maintaining an in-memory buffer to collect and batch write operations before flushing them to disk may require additional memory resources. + +**Going through both options, I want to implement Write-Ahead Log as Sequential Write-Ahead Log.** + +--- \ No newline at end of file diff --git a/scripts/run_concurrent_tests.sh b/scripts/run_concurrent_tests.sh deleted file mode 100755 index 51d4994..0000000 --- a/scripts/run_concurrent_tests.sh +++ /dev/null @@ -1,64 +0,0 @@ -#!/bin/bash - -# Number of test runs -num_runs=10 - -# Function to run `cargo test` and capture the output and execution time -run_test() { - output=$(cargo test --quiet 2>&1) - if [[ $output == *"test result: FAILED"* ]]; then - # Extract the name of the failed test - failed_tests=$(echo "$output" | awk '/test .* ... FAILED/ { print $2 }') - - # Split the failed test names into an array - IFS=$'\n' read -r -d '' -a failed_test_names <<<"$failed_tests" - - # Increment the failure count for each failed test - for failed_test in "${failed_test_names[@]}"; do - for ((i = 0; i < num_runs; i++)); do - if [[ "${failed_test_names[i]}" == "$failed_test" ]]; then - ((test_failures[i]++)) - break - fi - done - done - fi -} - -# Initialize arrays -failed_test_names=() -test_failures=() - -# Run `cargo test` for the specified number of runs -for ((i = 0; i < num_runs; i++)); do - failed_test_names[i]="" - test_failures[i]=0 - - run_test & -done - -# Wait for all background tasks to complete -wait - -# Print the test summary -echo "Test Summary:" -echo "--------------------------------" -total_tests=$((num_runs * num_runs)) -total_failures=0 -for ((i = 0; i < num_runs; i++)); do - failure_count=${test_failures[i]} - success_count=$((num_runs - failure_count)) - - if [[ "${failed_test_names[i]}" != "" ]]; then - echo "${failed_test_names[i]}: $success_count successes, $failure_count failures" - ((total_failures += failure_count)) - fi - - ((total_tests += num_runs)) -done - -successes=$((total_tests - total_failures)) -echo "--------------------------------" -echo "Total Tests: $total_tests" -echo "Successes: $successes" -echo "Failures: $total_failures" diff --git a/src/api.rs b/src/api.rs new file mode 100644 index 0000000..3864e3b --- /dev/null +++ b/src/api.rs @@ -0,0 +1,324 @@ +use crate::{ + memtable::{MemTable, DEFAULT_FALSE_POSITIVE_RATE, DEFAULT_MEMTABLE_CAPACITY}, + sst::SSTable, + write_ahead_log::{EntryKind, WriteAheadLog, WriteAheadLogEntry, WAL_FILE_NAME}, +}; +use std::{fs, io, path::PathBuf}; + +pub struct StorageEngine { + pub memtable: MemTable, + pub wal: WriteAheadLog, + pub sstables: Vec, + pub dir: DirPath, +} + +pub struct DirPath { + pub root: PathBuf, + pub wal: PathBuf, + pub sst: PathBuf, +} + +/// Represents the unit of measurement for capacity and size. +#[derive(Clone)] +pub enum SizeUnit { + Bytes, + Kilobytes, + Megabytes, + Gigabytes, +} + +impl StorageEngine { + /// Creates a new instance of the `StorageEngine` struct. + /// + /// It initializes the memtable, write-ahead log, and SSTables. + /// + /// # Arguments + /// + /// * `dir` - A string representing the directory path where the database files will be stored. + /// + /// # Returns + /// + /// A Result containing the `StorageEngine` instance if successful, or an `io::Error` if an error occurred. + pub fn new(dir: &str) -> io::Result { + StorageEngine::with_capacity(dir, SizeUnit::Bytes, DEFAULT_MEMTABLE_CAPACITY) + } + + /// Inserts a new key-value pair into the storage engine. + /// + /// It writes the key-value entry to the memtable and the write-ahead log. If the memtable reaches its capacity, it is flushed to an SSTable. + /// + /// # Arguments + /// + /// * `key` - A string representing the key. + /// * `value` - A string representing the value. + /// + /// # Returns + /// + /// A Result indicating success or an `io::Error` if an error occurred. + pub fn put(&mut self, key: &str, value: &str) -> io::Result<()> { + // Convert the key and value into Vec from given &str. + let key = key.as_bytes().to_vec(); + let value = value.as_bytes().to_vec(); + + // Write the key-value entry to the sequential wal and mark it as an insert entry. + self.wal + .append(EntryKind::Insert, key.clone(), value.clone())?; + + // Check if the MemTable has reached its capacity or size threshold for flushing to SSTable. + if self.memtable.size() >= self.memtable.capacity() { + // Get the current capacity. + let capacity = self.memtable.capacity(); + + // Get the current size_unit. + let size_unit = self.memtable.size_unit(); + + // Get the current false_positive_rate. + let false_positive_rate = self.memtable.false_positive_rate(); + + // Flush MemTable to SSTable. + self.flush_memtable()?; + + // Create a new empty MemTable. + self.memtable = + MemTable::with_capacity_and_rate(size_unit, capacity, false_positive_rate); + } + + // Write the key-value in MemTable entries by calling `set` method of MemTable. + self.memtable.set(key, value)?; + + // Return Ok(()), if everything goes well + + Ok(()) + } + + /// Retrieves the value associated with a given key from the storage engine. + /// + /// It first searches in the memtable, which has the most recent data. If the key is not found in the memtable, it searches in the SSTables, starting from the newest levels and moving to the older ones. + /// + /// # Arguments + /// + /// * `key` - A string representing the key to search for. + /// + /// # Returns + /// + /// A Result containing an Option: + /// - `Some(value)` if the key is found and associated value is returned. + /// - `None` if the key is not found. + /// + /// An `io::Error` is returned if an error occurred. + pub fn get(&self, key: &str) -> io::Result> { + // Convert the key into Vec from given &str. + let key = key.as_bytes().to_vec(); + + // Search in the MemTable first. + if let Some(value) = self.memtable.get(key.clone())? { + return Ok(Some(String::from_utf8_lossy(&value).to_string())); + } + + // Search in the SSTable + for sstable in &self.sstables { + if let Some(value) = sstable.get(key.clone()) { + return Ok(Some(String::from_utf8_lossy(&value).to_string())); + } + } + + // Invalid key. No value found. + Ok(None) + } + + /// Removes a key-value pair from the storage engine. + /// + /// It first checks if the key exists in the memtable. If not, it searches for the key in the SSTables and removes it from there. The removal operation is also logged in the write-ahead log for durability. + /// + /// # Arguments + /// + /// * `key` - A string representing the key to remove. + /// + /// # Returns + /// + /// A Result indicating success. + pub fn remove(&mut self, key: &str) -> io::Result<()> { + // Convert the key and value into Vec from given &str. + let key = key.as_bytes().to_vec(); + + // Check if the key exists in the MemTable. + if let Some((_res_key, value)) = self.memtable.remove(key.clone())? { + // Remove the entry from the MemTable and add a remove log into WAL. + self.wal.append(EntryKind::Remove, key, value)?; + } else { + // If the key is not found in the MemTable, search for it in the SSTable. + for sstable in &mut self.sstables { + if let Some(value) = sstable.get(key.clone()) { + // If the key is found in an SSTable, remove it from the SSTable and add a remove log into WAL. + sstable.remove(key.clone())?; + self.wal.append(EntryKind::Remove, key, value)?; + + // Exit the loop after removing the key from one SSTable + break; + } + } + } + + Ok(()) + } + + pub fn update(&mut self, key: &str, value: &str) -> io::Result<()> { + // Call remove method defined in StorageEngine. + self.remove(key)?; + + // Call set method defined in StorageEngine. + self.put(key, value)?; + + // return Ok(()) if the update is successfull. + Ok(()) + } + + pub fn clear(mut self) -> io::Result { + // Get the current capacity. + let capacity = self.memtable.capacity(); + + // Get the current size_unit. + let size_unit = self.memtable.size_unit(); + + // Get the current false_positive_rate. + let false_positive_rate = self.memtable.false_positive_rate(); + + // Delete the memtable by calling the `clear` method defined in MemTable. + self.memtable.clear()?; + + // Delete the wal by calling the `clear` method defined in WriteAheadLog. + self.wal.clear()?; + + // Call the build method of StorageEngine and return a new instance. + StorageEngine::with_capacity_and_rate( + self.dir.get_dir(), + size_unit, + capacity, + false_positive_rate, + ) + } + + pub(crate) fn with_capacity( + dir: &str, + size_unit: SizeUnit, + capacity: usize, + ) -> io::Result { + Self::with_capacity_and_rate(dir, size_unit, capacity, DEFAULT_FALSE_POSITIVE_RATE) + } + + pub fn with_capacity_and_rate( + dir: &str, + size_unit: SizeUnit, + capacity: usize, + false_positive_rate: f64, + ) -> io::Result { + let dir_path = DirPath::build(dir); + + // The WAL file path. + let wal_file_path = dir_path.wal.join(WAL_FILE_NAME); + + // Check if the WAL file exists and has contents. + let wal_exists = wal_file_path.exists(); + let wal_empty = wal_exists && fs::metadata(&wal_file_path)?.len() == 0; + + if wal_empty { + // WAL file empty, create a new WAL and MemTable. + let memtable = + MemTable::with_capacity_and_rate(size_unit, capacity, false_positive_rate); + + let wal = WriteAheadLog::new(&dir_path.wal)?; + let sstables = Vec::new(); + + Ok(Self { + memtable, + wal, + sstables, + dir: dir_path, + }) + } else { + // WAL file has logs, recover the MemTable from the WAL. + let mut wal = WriteAheadLog::new(&dir_path.wal)?; + + // I should not create empty sstable. I need to load existing sstables if exists. Otherwise new empty one should be used. + let sstables = Vec::new(); + + let entries = wal.recover()?; + + let memtable = + StorageEngine::recover_memtable(entries, size_unit, capacity, false_positive_rate)?; + + Ok(Self { + memtable, + wal, + sstables, + dir: dir_path, + }) + } + } + + fn flush_memtable(&mut self) -> io::Result<()> { + // Create a new SSTable. + let mut sstable = SSTable::new(self.dir.sst.clone()); + + // Iterate over the entries in the MemTable and write them to the SSTable. + for (key, value) in self.memtable.entries()? { + sstable.set(key.clone(), value.clone())?; + } + + // Clear the MemTable after flushing its contents to the SSTable. + self.memtable.clear()?; + + Ok(()) + } + + fn recover_memtable( + entries: Vec, + size_unit: SizeUnit, + capacity: usize, + false_positive_rate: f64, + ) -> io::Result { + let mut memtable = + MemTable::with_capacity_and_rate(size_unit, capacity, false_positive_rate); + + // Iterate over the WAL entries + for entry in entries { + match entry.entry_kind { + EntryKind::Insert => { + memtable.set(entry.key, entry.value)?; + } + EntryKind::Remove => { + memtable.remove(entry.key)?; + } + } + } + + Ok(memtable) + } +} + +impl DirPath { + fn build(directory_path: &str) -> Self { + let root = PathBuf::from(directory_path); + let wal = root.join("wal"); + let sst = root.join("sst"); + + Self { root, wal, sst } + } + + fn get_dir(&self) -> &str { + self.root + .to_str() + .expect("Failed to convert path to string.") + } +} + +impl SizeUnit { + pub(crate) const fn to_bytes(&self, value: usize) -> usize { + match self { + Self::Bytes => value, + Self::Kilobytes => value * 1024, + Self::Megabytes => value * 1024 * 1024, + Self::Gigabytes => value * 1024 * 1024 * 1024, + } + } +} diff --git a/src/engine.rs b/src/engine.rs deleted file mode 100644 index 73bdff6..0000000 --- a/src/engine.rs +++ /dev/null @@ -1,164 +0,0 @@ -use crate::{helper::generate_timestamp, mem_table::MemTable, wal::Wal}; -use std::{fs, path::PathBuf}; - -/// StorageEngine represents a storage engine that combines a memory table and a write-ahead log (WAL) -/// for data storage and retrieval. -pub struct StorageEngine { - pub dir_path: PathBuf, - pub mem_table: MemTable, - pub wal: Wal, -} - -impl StorageEngine { - /// Constructs a new instance of the StorageEngine. - /// - /// # Arguments - /// - /// * `path` - The directory path where the storage files are located. - /// - /// # Returns - /// - /// A new instance of the StorageEngine. - /// - /// # Panics - /// - /// This method will panic if loading the WAL and MemTable from the given directory path fails. - pub fn new(path: &str) -> Self { - let dir_path = PathBuf::from(path); - - if !dir_path.exists() { - // Create the directory if it doesn't exist - fs::create_dir_all(&dir_path).expect("Failed to create directory for StorageEngine"); - } - - let (wal, mem_table) = - Wal::load_wal_from_dir(&dir_path).expect("Failed to load from given directory path"); - - StorageEngine { - dir_path, - mem_table, - wal, - } - } - - /// Retrieves a value from the storage engine based on the provided key. - /// - /// # Arguments - /// - /// * `key` - The key used for retrieval. - /// - /// # Returns - /// - /// An optional StorageEngineEntry containing the key, value, and timestamp if the key exists - /// in the storage engine; otherwise, None. - pub fn get(&self, key: &[u8]) -> Option { - if let Some(mem_table_entry) = self.mem_table.get(key) { - return mem_table_entry - .value - .as_ref() - .map(|value| StorageEngineEntry { - key: mem_table_entry.key.clone(), - value: value.clone(), - timestamp: mem_table_entry.timestamp, - }); - } - None - } - - /// Sets a key-value pair in the storage engine. - /// - /// # Arguments - /// - /// * `key` - The key to set. - /// * `value` - The value associated with the key. - /// - /// # Returns - /// - /// A Result indicating success (Ok(1)) or failure (Err(0)). - pub fn set(&mut self, key: &[u8], value: &[u8]) -> Result { - let timestamp = generate_timestamp(); - - let wal_res = self.wal.set(key, value, timestamp); - if wal_res.is_err() { - return Err(0); - } - - if self.wal.flush().is_err() { - return Err(0); - } - - self.mem_table.set(key, value, timestamp); - - Ok(1) - } - - /// Deletes a key-value pair from the storage engine. - /// - /// # Arguments - /// - /// * `key` - The key to delete. - /// - /// # Returns - /// - /// A Result indicating success (Ok(1)) or failure (Err(0)). - pub fn delete(&mut self, key: &[u8]) -> Result { - let timestamp = generate_timestamp(); - - let wal_res = self.wal.delete(key, timestamp); - - if wal_res.is_err() { - return Err(0); - } - - if self.wal.flush().is_err() { - return Err(0); - } - - self.mem_table.delete(key, timestamp); - - Ok(1) - } -} - -/// Represents an entry in the storage engine, containing the key, value, and timestamp. -#[derive(Debug, Clone)] -pub struct StorageEngineEntry { - pub key: Vec, - pub value: Vec, - pub timestamp: u128, -} - -impl StorageEngineEntry { - /// Retrieves the key of the storage engine entry. - /// - /// # Returns - /// - /// A reference to the key. - pub fn key(&self) -> &[u8] { - &self.key - } - - /// Retrieves the value of the storage engine entry. - /// - /// # Returns - /// - /// A reference to the value. - pub fn value(&self) -> &[u8] { - &self.value - } - - /// Retrieves the timestamp of the storage engine entry. - /// - /// # Returns - /// - /// The timestamp. - pub fn timestamp(&self) -> u128 { - self.timestamp - } -} - -impl PartialEq for StorageEngineEntry { - fn eq(&self, other: &Self) -> bool { - self.key == other.key && self.value == other.value && self.timestamp == other.timestamp - } -} diff --git a/src/helper.rs b/src/helper.rs deleted file mode 100644 index a8ffc3d..0000000 --- a/src/helper.rs +++ /dev/null @@ -1,26 +0,0 @@ -use std::{ - fs::read_dir, - path::{Path, PathBuf}, - time::{SystemTime, UNIX_EPOCH}, -}; - -/// Generates current time as micro-seconds -pub fn generate_timestamp() -> u128 { - SystemTime::now() - .duration_since(UNIX_EPOCH) - .expect("failed to generate timestamp") - .as_micros() -} - -/// Gets the set of files with a given extension from a given directory as a Vector of path buffers -pub fn get_files_with_ext(dir: &Path, ext: &str) -> Vec { - let mut files = Vec::new(); - for file in read_dir(dir).expect("No directory found") { - let path = file.unwrap().path(); - if path.extension().unwrap() == ext { - files.push(path); - } - } - - files -} diff --git a/src/lib.rs b/src/lib.rs index 50655bd..608473f 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -1,86 +1,680 @@ -//! LSM Tree Storage Engine Library +//! # `lsmdb` //! -//! This library provides a LSM Tree storage engine implementation for efficient data storage and retrieval. It is designed to support data-intensive applications that require fast and reliable storage capabilities. +//! `lsmdb` is an efficient storage engine that implements the Log-Structured Merge Tree (LSM-Tree) data structure, designed specifically for handling key-value pairs. //! -//! # Modules +//! ## Public API //! -//! - `engine`: Entry point to the storage engine. Contains the main functionality for storing and retrieving data. -//! - `helper`: Utility functions and helper methods used by the storage engine. -//! - `mem_table`: In-memory table implementation for fast data writes and reads. -//! - `wal`: Write-Ahead Log (WAL) module for durability and crash recovery. +//! The Public API of `lsmdb` crate. //! -//! # Usage +//! ### `StorageEngine` //! -//! Add this library as a dependency in your `Cargo.toml`: +//! The `StorageEngine` struct represents the main component of the LSM Tree storage engine. It consists of the following fields: //! -//! ```toml -//! [dependencies] -//! lsmdb = "0.1.0" +//! - `memtable`: An instance of the `MemTable` struct that serves as an in-memory table for storing key-value pairs. It provides efficient write operations. +//! - `wal`: An instance of the `WriteAheadLog` struct that handles write-ahead logging. It ensures durability by persistently storing write operations before they are applied to the memtable and SSTables. +//! - `sstables`: A vector of `SSTable` instances, which are on-disk sorted string tables storing key-value pairs. The SSTables are organized in levels, where each level contains larger and more compacted tables. +//! - `dir`: An instance of the `DirPath` struct that holds the directory paths for the root directory, write-ahead log directory, and SSTable directory. +//! +//! The `StorageEngine` struct provides methods for interacting with the storage engine: +//! +//! - `new`: Creates a new instance of the `StorageEngine` struct. It initializes the memtable, write-ahead log, and SSTables. +//! - `put`: Inserts a new key-value pair into the storage engine. It writes the key-value entry to the memtable and the write-ahead log. If the memtable reaches its capacity, it is flushed to an SSTable. +//! - `get`: Retrieves the value associated with a given key from the storage engine. It first searches in the memtable, which has the most recent data. If the key is not found in the memtable, it searches in the SSTables, starting from the newest levels and moving to the older ones. +//! - `remove`: Removes a key-value pair from the storage engine. It first checks if the key exists in the memtable. If not, it searches for the key in the SSTables and removes it from there. The removal operation is also logged in the write-ahead log for durability. +//! - `update`: Updates the value associated with a given key in the storage engine. It first removes the existing key-value pair using the `remove` method and then inserts the updated pair using the `put` method. +//! - `clear`: Clears the storage engine by deleting the memtable and write-ahead log. It creates a new instance of the storage engine, ready to be used again. +//! +//! ### DirPath +//! The `DirPath` struct represents the directory paths used by the storage engine. It consists of the following fields: +//! +//! - `root`: A `PathBuf` representing the root directory path, which serves as the parent directory for the write-ahead log and SSTable directories. +//! - `wal`: A `PathBuf` representing the write-ahead log directory path, where the write-ahead log file is stored. +//! - `sst`: A `PathBuf` representing the SSTable directory path, where the SSTable files are stored. +//! +//! The `DirPath` struct provides methods for building and retrieving the directory paths. +//! +//! ### SizeUnit +//! +//! The `SizeUnit` enum represents the unit of measurement for capacity and size. It includes the following variants: +//! +//! - `Bytes`: Represents the byte unit. +//! - `Kilobytes`: Represents the kilobyte unit. +//! - `Megabytes`: Represents the megabyte unit. +//! - `Gigabytes`: Represents the gigabyte unit. +//! +//! The `SizeUnit` enum provides a method `to_bytes` for converting a given value to bytes based on the selected unit. +//! +//! ### Helper Functions +//! The code includes several helper functions: +//! +//! - `with_capacity`: A helper function that creates a new instance of the `StorageEngine` struct with a specified capacity for the memtable. +//! - `with_capacity_and_rate`: A helper function +//! +//! that creates a new instance of the `StorageEngine` struct with a specified capacity for the memtable and a compaction rate for the SSTables. +//! - `flush_memtable`: A helper function that flushes the contents of the memtable to an SSTable. It creates a new SSTable file and writes the key-value pairs from the memtable into it. After flushing, the memtable is cleared. +//! - `recover_memtable`: A helper function that recovers the contents of the memtable from the write-ahead log during initialization. It reads the logged write operations from the write-ahead log and applies them to the memtable. +//! +//! These helper functions assist in initializing the storage engine, flushing the memtable to an SSTable when it reaches its capacity, and recovering the memtable from the write-ahead log during initialization, ensuring durability and maintaining data consistency. +//! +//! --- +//! +//! ## MemTable +//! +//! The `MemTable` (short for memory table) is an in-memory data structure that stores recently written data before it is flushed to disk. It serves as a write buffer and provides fast write operations. +//! +//! ### Dependencies +//! +//! The implementation requires the following dependencies to be imported from the standard library: +//! +//! - **`std::collections::BTreeMap`**: A balanced binary search tree implementation that stores key-value pairs in sorted order. +//! - **`std::io`**: Provides input/output functionality, including error handling. +//! - **`std::sync::{Arc, Mutex}`**: Provides thread-safe shared ownership(`Arc`) and mutual exclusion (`Mutex`) for concurrent access to data. +//! +//! ### Constants +//! +//! The implementation defines the following constants: +//! +//! #### `DEFAULT_MEMTABLE_CAPACITY` +//! +//! Represents the default maximum size of the MemTable. By default, it is set to 1 gigabyte (1GB). +//! ```rs +//! pub(crate) static DEFAULT_MEMTABLE_CAPACITY: usize = SizeUnit::Gigabytes.to_bytes(1); +//! ``` +//! +//! #### `DEFAULT_FALSE_POSITIVE_RATE` +//! +//! Represents the default false positive rate for the Bloom filter used in the `MemTable`. By default, it is set to 0.0001 (0.01%). +//! +//! ```rs +//! pub(crate) static DEFAULT_FALSE_POSITIVE_RATE: f64 = 0.0001; +//! ``` +//! +//! ### Structure +//! +//! The **`MemTable`** structure represents the in-memory data structure and contains the following fields: +//! +//! ```rs +//! pub(crate) struct MemTable { +//! entries: Arc, Vec>>>, +//! entry_count: usize, +//! size: usize, +//! capacity: usize, +//! bloom_filter: BloomFilter, +//! size_unit: SizeUnit, +//! false_positive_rate: f64, +//! } +//! ``` +//! +//! #### `entries` +//! +//! The entries field is of type `Arc, Vec>>>`. It holds the key-value pairs of the `MemTable` in sorted order using a `BTreeMap`. The `Arc` (Atomic Reference Counting) and `Mutex` types allow for concurrent access and modification of the `entries` data structure from multiple threads, ensuring thread safety. +//! +//! #### `entry_count` +//! +//! The `entry_count` field is of type `usize` and represents the number of key-value entries currently stored in the `MemTable`. +//! +//! #### `size` +//! +//! The `size` field is of type `usize` and represents the current size of the `MemTable` in bytes. It is updated whenever a new key-value pair is added or removed. +//! +//! #### `capacity` +//! +//! The `capacity` field is of type `usize` and represents the maximum allowed size for the `MemTable` in bytes. It is used to enforce size limits and trigger flush operations when the `MemTable` exceeds this capacity. +//! +//! #### `bloom_filter` +//! +//! The `bloom_filter` field is of type `BloomFilter` and is used to probabilistically determine whether a `key` may exist in the `MemTable` without accessing the `entries` map. It helps improve performance by reducing unnecessary lookups in the map. +//! +//! #### `size_unit` +//! +//! The `size_unit` field is of type `SizeUnit` and represents the unit of measurement used for `capacity` and `size` calculations. It allows for flexibility in specifying the capacity and size of the `MemTable` in different units (e.g., bytes, kilobytes, megabytes, etc.). +//! +//! #### `false_positive_rate` +//! +//! The `false_positive_rate` field is of type `f64` and represents the desired false positive rate for the bloom filter. It determines the trade-off between memory usage and the accuracy of the bloom filter. +//! +//! ### Constructor Methods +//! +//! #### `new` +//! +//! ```rs +//! pub(crate) fn new() -> Self +//! ``` +//! +//! The `new` method creates a new `MemTable` instance with the default capacity. It internally calls the `with_capacity_and_rate` method, passing the default capacity and false positive rate. +//! +//! #### `with_capacity_and_rate` +//! +//! ```rs +//! pub(crate) fn with_capacity_and_rate( +//! size_unit: SizeUnit, +//! capacity: usize, +//! false_positive_rate: f64, +//! ) -> Self +//! ``` +//! +//! The `with_capacity_and_rate` method creates a new `MemTable` with the specified capacity, size unit, and false positive rate. It initializes the `entries` field as an empty `BTreeMap`, sets the `entry_count` and `size` to zero, and creates a new `BloomFilter` with the given capacity and false positive rate. The capacity is converted to bytes based on the specified size unit. +//! +//! ### Public Methods +//! +//! #### `set` +//! +//! ```rs +//! pub(crate) fn set(&mut self, key: Vec, value: Vec) -> io::Result<()> +//! ``` +//! +//! The `set` method inserts a new key-value pair into the `MemTable`. It first acquires a lock on the `entries` field to ensure thread-safety. If the key is not present in the `BloomFilter`, it adds the key-value pair to the `entries` map, updates the `entry_count` and `size`, and sets the key in the `BloomFilter`. If the key already exists, an `AlreadyExists` error is returned. +//! +//! #### `get` +//! +//! ```sh +//! pub(crate) fn get(&self, key: Vec) -> io::Result>> +//! ``` +//! +//! The `get` method retrieves the value associated with a given key from the `MemTable`. It first checks if the key is present in the `BloomFilter`. If it is, it acquires a lock on the `entries` field and returns the associated value. If the key is not present in the `BloomFilter`, it returns `None`. +//! +//! #### `remove` +//! +//! ```sh +//! pub(crate) fn remove(&mut self, key: Vec) -> io::Result, Vec)>> +//! ``` +//! +//! The `remove` method removes a key-value pair from the `MemTable` based on a given key. It first checks if the key is present in the `BloomFilter`. If it is, it acquires a lock on the `entries` field and removes the key-value pair from the `entries` map. It updates the `entry_count` and `size` accordingly and returns the removed key-value pair as a tuple. If the key is not present in the `BloomFilter`, it returns `None`. +//! +//! #### `clear` +//! +//! ```rs +//! pub(crate) fn clear(&mut self) -> io::Result<()> +//! ``` +//! +//! The `clear` method removes all key-value entries from the `MemTable`. It acquires a lock on the `entries` field, clears the `entries` map, and sets the `entry_count` and `size` fields to zero. +//! +//! #### `entries` +//! +//! ```rs +//! pub(crate) fn entries(&self) -> io::Result, Vec)>> +//! ``` +//! +//! The `entries` method returns a vector of all key-value pairs in the `MemTable`. It acquires a lock on the `entries` field and iterates over the key-value pairs in the `entries` map. It clones each key-value pair and collects them into a vector, which is then returned. +//! +//! ### Internal Method +//! +//! #### `capacity` +//! +//! ```rs +//! pub(crate) fn capacity(&self) -> usize +//! ``` +//! +//! The `capacity` method returns the capacity of the `MemTable` in bytes. +//! +//! #### `size` +//! +//! ```rs +//! pub(crate) fn size(&self) -> usize +//! ``` +//! +//! The `size` method returns the current size of the `MemTable` in the specified size unit. It divides the internal `size` by the number of bytes in one unit of the specified size unit. +//! +//! #### `false_positive_rate` +//! +//! ```rs +//! pub(crate) fn false_positive_rate(&self) -> f64 +//! ``` +//! +//! The `false_positive_rate` method returns the false positive rate of the `MemTable`. +//! +//! #### `size_unit` +//! +//! ```rs +//! pub(crate) fn size_unit(&self) -> SizeUnit +//! ``` +//! +//! The `size_unit` method returns the size unit used by the `MemTable`. +//! +//! ### Error Handling +//! +//! All the methods that involve acquiring a lock on the `entries` field use the `io::Error` type to handle potential errors when obtaining the lock. If an error occurs during the locking process, an `io::Error` instance is created with a corresponding error message. +//! +//! ### Thread Safety +//! +//! The `MemTable` implementation ensures thread safety by using an `Arc, Vec>>>` for storing the key-value entries. The `Arc` allows multiple ownership of the `entries` map across threads, and the `Mutex` ensures exclusive access to the map during modification operations, preventing data races. +//! +//! The locking mechanism employed by the `Mutex` guarantees that only one thread can modify the `entries` map at a time, while allowing multiple threads to read from it simultaneously. +//! +//! --- +//! +//! ## Bloom Filter +//! +//! The Bloom Filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It provides a fast and memory-efficient way to check for set membership, but it introduces a small probability of false positives. +//! +//! The Bloom Filter implementation is provided as a Rust module and consists of a struct called `BloomFilter`. It uses a `BitVec` to represent the array of bits that make up the filter. The number of hash functions used by the Bloom Filter is configurable, and it keeps track of the number of elements inserted into the filter. +//! +//! ### Dependencies +//! +//! - **`std::collections::hash_map::DefaultHasher`**: Provides the default hasher implementation used for calculating hash values of keys. +//! - **`std::hash::{Hash, Hasher}`**: Defines the Hash trait used for hashing keys. +//! - **`std::sync::{Arc, Mutex}`**: Provides thread-safe shared ownership (Arc) and mutual exclusion (Mutex) for concurrent access to data. +//! - **`bit_vec::BitVec`**: Implements a bit vector data structure used to store the Bloom filter's bit array. +//! +//! ### Structure +//! +//! The BloomFilter struct represents the Bloom Filter data structure and contains the following fields: +//! +//! ```rs +//! pub(crate) struct BloomFilter { +//! bits: Arc>, +//! num_hashes: usize, +//! num_elements: AtomicUsize, +//! } +//! ``` +//! +//! #### `bits` +//! +//! An `Arc>` representing the array of bits used to store the Bloom filter. +//! +//! #### `num_hashes` +//! +//! The number of hash functions used by the Bloom filter. +//! +//! #### `num_elements` +//! +//! An `AtomicUsize` representing the number of elements inserted into the Bloom filter. +//! +//! ### Constructor Methods +//! +//! #### `new` +//! +//! ```rs +//! fn new(num_elements: usize, false_positive_rate: f64) -> Self +//! ``` +//! +//! The `new` method creates a new `BloomFilter` with the specified number of elements and false positive rate. It initializes the Bloom filter's bit array, calculates the number of hash functions, and sets the initial number of elements to zero. +//! +//! ### Public Methods +//! +//! #### `set` +//! +//! ```rs +//! fn set(&mut self, key: &T)``` +//! +//! The `set` method inserts a key into the Bloom filter. It calculates the hash value for the key using multiple hash functions and sets the corresponding bits in the bit array to true. It also increments the element count. +//! +//! #### `contains` +//! +//! ```rs +//! fn contains(&self, key: &T) -> bool +//! ``` +//! +//! The `contains` method checks if a key is possibly present in the Bloom filter. +//! It calculates the hash value for the key using multiple hash functions and checks the corresponding bits in the bit array. +//! If any of the bits are false, it indicates that the key is definitely not present, and the method returns false. +//! If all bits are true, the method returns true, indicating that the key is possibly present. +//! +//! #### `num_elements` +//! +//! ```rs +//! fn num_elements(&self) -> usize //! ``` //! -//! Import the engine module into your Rust code: +//! This method returns the current number of elements inserted into the Bloom filter. //! -//! ```rust -//! use lsmdb::engine::{StorageEngine, StorageEngineEntry}; +//! ### Internal Method +//! +//! #### `calculate_hash` +//! +//! ```rs +//! fn calculate_hash(&self, key: &T, seed: usize) -> u64 +//! ``` +//! +//! This function calculates a hash value for a given key and seed. It uses a suitable hash function to hash the key and incorporates the seed value for introducing randomness. +//! +//! #### `calculate_num_bits` +//! +//! ```rs +//! fn calculate_num_bits(num_elements: usize, false_positive_rate: f64) -> usize //! ``` //! -//! # Examples +//! This function calculates the optimal number of bits for the Bloom filter based on the desired false positive rate and the expected number of elements. It uses a formula to estimate the number of bits required. +//! +//! #### `calculate_num_hashes` +//! +//! ```rs +//! fn calculate_num_hashes(num_bits: usize, num_elements: usize) -> usize +//! ``` +//! +//! This function calculates the optimal number of hash functions for the Bloom filter based on the number of bits and the expected number of elements. It uses a formula to estimate the number of hash functions required. +//! +//! --- +//! +//! ## Write-Ahead Log (WAL) +//! +//! The Sequential Write-Ahead Log (WAL) is a crucial component of the LSM Tree storage engine. +//! It provides durability and atomicity guarantees by logging write operations before they are applied to the main data structure. +//! +//! When a write operation is received, the key-value pair is first appended to the WAL. +//! In the event of a crash or system failure, the WAL can be replayed to recover the data modifications and bring the MemTable back to a consistent state. //! -//! Here's an example demonstrating the basic usage of the storage engine: +//! ### Dependencies //! -//! ```rust -//! # use lsmdb::engine::StorageEngine; +//! The implementation requires the following dependencies to be imported from the standard library: //! -//! fn main() { -//! // Create a new storage engine instance -//! let mut engine = StorageEngine::new("./lib_test_dir"); +//! - **`std::fs`**: Provides file system-related operations. +//! - **`std::io`**: Provides input/output functionality, including error handling. +//! - **`std::path::PathBuf`**: Represents file system paths. +//! - **`std::sync::{Arc, Mutex}`**: Provides thread-safe shared ownership and synchronization. //! -//! // Write data to the storage engine -//! engine.set(b"key", b"value"); +//! ### WriteAheadLog Structure //! -//! // Read data from the storage engine -//! let value = engine.get(b"key"); -//! println!("Value: {:?}", value); +//! The `WriteAheadLog` structure represents the write-ahead log (WAL) and contains the following field: //! -//!# if let Err(e) = std::fs::remove_dir_all("./lib_test_dir") { -//!# println!("Failed to remove test directory: {:?}", e); -//!# } +//! ```rs +//! struct WriteAheadLog { +//! log_file: Arc>, //! } //! ``` //! -//! # Testing +//! #### log_file +//! +//! The `log_file` field is of type `Arc>`. It represents the WAL file and provides concurrent access and modification through the use of an `Arc` (Atomic Reference Counting) and `Mutex`. +//! +//! ### Log File Structure Diagram +//! +//! The `log_file` is structured as follows: +//! +//! ```sh +//! +-------------------+ +//! | Entry Length | (4 bytes) +//! +-------------------+ +//! | Entry Kind | (1 byte) +//! +-------------------+ +//! | Key Length | (4 bytes) +//! +-------------------+ +//! | Value Length | (4 bytes) +//! +-------------------+ +//! | Key | (variable) +//! | | +//! | | +//! +-------------------+ +//! | Value | (variable) +//! | | +//! | | +//! +-------------------+ +//! | Entry Length | (4 bytes) +//! +-------------------+ +//! | Entry Kind | (1 byte) +//! +-------------------+ +//! | Key Length | (4 bytes) +//! +-------------------+ +//! | Value Length | (4 bytes) +//! +-------------------+ +//! | Key | (variable) +//! | | +//! | | +//! +-------------------+ +//! | Value | (variable) +//! | | +//! | | +//! +-------------------+ +//! ``` +//! +//! - **Entry Length**: A 4-byte field representing the total length of the entry in bytes. +//! - **Entry Kind**: A 1-byte field indicating the type of entry (Insert or Remove). +//! - **Key Length**: A 4-byte field representing the length of the key in bytes. +//! - **Key**: The actual key data, which can vary in size. +//! - **Value** Length: A 4-byte field representing the length of the value in bytes. +//! - **Value**: The actual value data, which can also vary in size. +//! +//! Each entry is written sequentially into the `log_file` using the `write_all` method, ensuring that the entries are stored contiguously. New entries are appended to the end of the `log_file` after the existing entries. +//! +//! ### Constants +//! +//! A constant named `WAL_FILE_NAME` is defined, representing the name of the WAL file. +//! +//! ```rs +//! static WAL_FILE_NAME: &str = "lsmdb_wal.bin"; +//! ``` +//! +//! ### `EntryKind` +//! +//! ```rs +//! enum EntryKind { +//! Insert = 1, +//! Remove = 2, +//! } +//! ``` +//! +//! The `EntryKind` enum represents the kind of entry stored in the WAL. It has two variants: `Insert` and `Remove`. Each variant is associated with an integer value used for serialization. +//! +//! ### `WriteAheadLogEntry` +//! +//! ```rs +//! struct WriteAheadLogEntry { +//! entry_kind: EntryKind, +//! key: Vec, +//! value: Vec, +//! } +//! ``` +//! +//! The `WriteAheadLogEntry` represents a single entry in the Write-Ahead Log. It contains the following fields: +//! +//! - **`entry_kind`**: An enumeration (`EntryKind`) representing the type of the entry (insert or remove). +//! - **`key`**: A vector of bytes (`Vec`) representing the key associated with the entry. +//! - **`value`**: A vector of bytes (`Vec`) representing the value associated with the entry. +//! +//! ### `WriteAheadLogEntry` Methods +//! +//! #### `new` +//! +//! ```rs +//! fn new(entry_kind: EntryKind, key: Vec, value: Vec) -> Self +//! ``` +//! +//! The `new` method creates a new instance of the `WriteAheadLogEntry` struct. It takes the `entry_kind`, `key`, and `value` as parameters and initializes the corresponding fields. +//! +//! #### `serialize` +//! +//! ```rs +//! fn serialize(&self) -> Vec +//! ``` +//! +//! The `serialize` method serializes the `WriteAheadLogEntry` into a vector of bytes. +//! It calculates the length of the entry, then serializes the length, entry kind, key length, value length, key, and value into the vector. The serialized data is returned. +//! +//! #### `deserialize` +//! +//! ```rs +//! fn deserialize(serialized_data: &[u8]) -> io::Result +//! ``` +//! +//! This method deserializes a `WriteAheadLogEntry` from the provided serialized data. +//! It performs validation checks on the length and structure of the serialized data and returns an `io::Result` containing the deserialized entry if successful. +//! +//! ### `WriteAheadLog` Methods +//! +//! #### `new` +//! +//! ```rs +//! fn new(directory_path: &PathBuf) -> io::Result +//! ``` +//! +// The `new` method is a constructor function that creates a new `WriteAheadLog` instance. +// It takes a `directory_path` parameter as a `PathBuf` representing the directory path where the WAL file will be stored. +//! +//! If the directory doesn't exist, it creates it. It then opens the log file with read, append, and create options, and initializes the log_file field. +//! +//! #### `append` +//! +//! ```rs +//! fn append(&mut self, entry_kind: EntryKind, key: Vec, value: Vec ) -> io::Result<()> +//! ``` +//! +//! The `append` method appends a new entry to the Write-Ahead Log. +//! It takes an `entry_kind` parameter of type `EntryKind`, a `key` parameter of type `Vec`, and a `value` parameter of type `Vec`. The method acquires a lock on the `log_file` to ensure mutual exclusion when writing to the file. +//! +//! It creates a `WriteAheadLogEntry` with the provided parameters, serializes it, and writes the serialized data to the log file. +//! +//! Finally, it flushes the log file to ensure the data is persisted. If the operation succeeds, `Ok(())` is returned; otherwise, an `io::Error` instance is created and returned. +//! +//! #### `recover` +//! +//! ```rs +//! fn recover(&mut self) -> io::Result> +//! ``` +//! +//! The `recover` method reads and recovers the entries from the Write-Ahead Log. The method acquires a lock on the `log_file` to ensure exclusive access during the recovery process. +//! +//! It reads the log file and deserializes the data into a vector of `WriteAheadLogEntry` instances. +//! It continues reading and deserializing until the end of the log file is reached. The recovered entries are returned as a vector. //! -//! The library includes test modules to ensure the correctness of its components. Run the tests using `cargo test`: +//! #### `clear` //! -//! ```bash -//! cargo test +//! ```rs +//! fn clear(&mut self) -> io::Result<()> //! ``` //! -//! # Contributing +//! The `clear` method clears the contents of the WAL file. It acquires a lock on the `log_file` to ensure exclusive access when truncating and seeking the file. +//! The method sets the length of the file to `0` using the `set_len` method, effectively truncating it. Then, it seeks to the start of the file using `seek` with `SeekFrom::Start(0)` to reset the file pointer. +//! If the operation succeeds, `Ok(())` is returned; otherwise, an `io::Error` instance is created and returned. //! -//! Contributions to this lsm tree storage engine library are welcome! If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository: [link-to-repository](https://github.com/roynrishingha/lsmdb) +//! ### Thread Safety //! -//! # License +//! The `WriteAheadLog` implementation ensures thread safety by using an `Arc>` for the `log_file` field. The `Arc` allows multiple ownership of the WAL file across threads, and the `Mutex` ensures exclusive access to the file during write, recovery, and clear operations, preventing data races. //! -//! This library is licensed under the [MIT License](https://opensource.org/licenses/MIT). +//! The locking mechanism employed by the `Mutex` guarantees that only one thread can modify the WAL file at a time, while allowing multiple threads to read from it simultaneously. //! //! --- -//! Author: Nrishinghananda Roy -//! Version: 0.1.0 +//! +//! ## SSTable (Sorted String Table) +//! +//! An SSTable, or Sorted String Table, is an immutable on-disk data structure that stores key-value pairs in a sorted order. +//! It serves as the persistent storage layer for the LSM Tree-based engine. +//! SSTables are typically stored as multiple files, each containing a sorted range of key-value pairs. +//! +//! When the MemTable reaches a certain threshold size, it is "flushed" to disk as a new SSTable file. +//! The MemTable is atomically replaced with an empty one, allowing new write operations to continue. This process is known as a "memtable flush." +//! +//! ```rs +//! +-----------------------+ +//! | SSTable | +//! +-----------------------+ +//! | - file_path | (PathBuf) +//! | - blocks | (Vec) +//! | - created_at | (DateTime) +//! +-----------------------+ +//! | + new(dir: PathBuf) | -> SSTable +//! | + set(key, value) | -> Result<(), io::Error> +//! | + get(key) | -> Option> +//! | + remove(key) | -> Result<(), io::Error> +//! +-----------------------+ +//! +//! +-----------------------+ +//! | Block | +//! +-----------------------+ +//! | - data | (Vec) +//! | - index | (HashMap>, usize>) +//! | - entry_count | (usize) +//! +-----------------------+ +//! | + new() | -> Block +//! | + is_full(size) | -> bool +//! | + set_entry(key, value) | -> Result<(), io::Error> +//! | + remove_entry(key) | -> bool +//! | + get_value(key) | -> Option> +//! | + entry_count() | -> usize +//! +-----------------------+ //! ``` //! +//! The `SSTable` struct represents the Sorted String Table and contains the following fields: +//! - `file_path`: Stores the path of the SSTable file (PathBuf). +//! - `blocks`: Represents a collection of blocks that hold the data (`Vec`). +//! - `created_at`: Indicates the creation timestamp of the SSTable (`DateTime`). +//! +//! The `SSTable` struct provides the following methods: +//! +//! - `new(dir: PathBuf) -> SSTable`: Creates a new instance of the `SSTable` struct given a directory path and initializes its fields. Returns the created `SSTable`. +//! +//! - `set(key: Vec, value: Vec) -> Result<(), io::Error>`: Sets an entry with the provided key and value in the `SSTable`. It internally manages the blocks and their capacity to store entries. Returns `Result<(), io::Error>` indicating success or failure. +//! +//! - `get(key: Vec) -> Option>`: Retrieves the value associated with the provided key from the `SSTable`. It iterates over the blocks to find the key-value pair. Returns `Option>` with the value if found, or `None` if the key is not present. +//! +//! - `remove(key: Vec) -> Result<(), io::Error>`: Removes the entry with the provided key from the `SSTable`. It iterates over the blocks in reverse order to delete from the most recent block first. Returns `Result<(), io::Error>` indicating success or failure. +//! +//! The `Block` struct represents an individual block within the SSTable and contains the following fields: +//! +//! - `data`: Stores the data entries within the block (`Vec`). +//! - `index`: Maintains an index for efficient key-based lookups (`HashMap>, usize>`). +//! - `entry_count`: Tracks the number of entries in the block (`usize`). +//! +//! The `Block` struct provides the following methods: +//! +//! - `new() -> Block`: Creates a new instance of the `Block` struct and initializes its fields. Returns the created `Block`. +//! +//! - `is_full(entry_size: usize) -> bool`: Checks if the block is full given the size of an entry. It compares the combined size of the existing data and the new entry size with the predefined block size. Returns `true` if the block is full, `false` otherwise. +//! +//! - `set_entry(key: &[u8], value: &[u8]) -> Result<(), io::Error>`: Sets an entry with the provided key and value in the block. It calculates the entry size, checks if the block has enough capacity, and adds the entry to the block's data and index. Returns `Result<(), io::Error>` indicating success or failure. +//! +//! - `remove_entry(key: &[u8]) -> bool`: Removes the entry with the provided key from the block. It searches for the key in the index, clears the entry in the data vector, and updates the entry count. Returns `true` if the entry was found and removed, `false` otherwise. +//! +//! - `get_value(key: &[u8]) -> Option>`: Retrieves the value associated with the provided key from the block. It looks up the key in the index, extracts the value bytes from the data vector, and returns them as a new `Vec`. Returns `Option>` with the value if found, or `None` if the key is not present. +//! +//! - `entry_count() -> usize`: Returns the number of entries in the block. +//! +//! Together, the `SSTable` and `Block` form the basic components of the SSTable implementation, providing efficient storage and retrieval of key-value pairs with support for adding and removing entries. +//! +//! +//! The `SSTable` manages multiple `Block` instances to store the data, and the `Block` handles individual block-level operations and indexing. +//! +//! A diagram illustrating how key-value pairs are stored inside a Block: +//! +//! ```rs +//! +----------------------------------+ +//! | Block | +//! +----------------------------------+ +//! | - data: Vec | // Data entries within the block +//! | - index: HashMap>, usize> | // Index for key-based lookups +//! | - entry_count: usize | // Number of entries in the block +//! +----------------------------------+ +//! | Block Data | +//! | +------------------------+ | +//! | | Entry 1 | | +//! | | +-------------------+ | | +//! | | | Length Prefix | | | +//! | | | (4 bytes, little- | | | +//! | | | endian format) | | | +//! | | +-------------------+ | | +//! | | | Key | | | +//! | | | (variable length) | | | +//! | | +-------------------+ | | +//! | | | Value | | | +//! | | | (variable length) | | | +//! | | +-------------------+ | | +//! | +------------------------+ | +//! | | Entry 2 | | +//! | | ... | | +//! | +------------------------+ | +//! +----------------------------------+ +//! ``` +//! +//! In the diagram: +//! - The `Block` struct represents an individual block within the SSTable. +//! - The `data` field of the `Block` is a vector (`Vec`) that stores the data entries. +//! - The `index` field is a `HashMap` that maintains the index for efficient key-based lookups. +//! - The `entry_count` field keeps track of the number of entries in the block. +//! +//! Each entry within the block consists of three parts: +//! 1. Length Prefix: A 4-byte length prefix in little-endian format, indicating the length of the value. +//! 2. Key: Variable-length key bytes. +//! 3. Value: Variable-length value bytes. +//! +//! The block's data vector (`data`) stores these entries sequentially. Each entry follows the format mentioned above, and they are concatenated one after another within the data vector. The index hashmap (`index`) maintains references to the keys and their corresponding offsets within the data vector. +//! +//! --- +//! +//! Author: Nrishinghananda Roy +//! -/// Entry point to storage engine -pub mod engine; - -mod helper; -mod mem_table; -mod wal; - -/// Test modules for `mem_table` module. -#[cfg(test)] -mod mem_table_tests; +#![allow(dead_code)] -/// Test modules for `wal` module. -#[cfg(test)] -mod wal_tests; +pub mod api; +mod memtable; +mod sst; +mod write_ahead_log; diff --git a/src/mem_table.rs b/src/mem_table.rs deleted file mode 100644 index 38a906c..0000000 --- a/src/mem_table.rs +++ /dev/null @@ -1,127 +0,0 @@ -/// Entry for MemTable -#[derive(Debug, PartialEq)] -pub struct MemTableEntry { - /// key is always there and used to query value - pub key: Vec, - /// value may or may not exist - pub value: Option>, - /// when the entry is created or modified - pub timestamp: u128, - /// state of the entry - pub deleted: bool, -} - -/// Each table consists of multiple entries -/// entries are then stored into WAL: Write Ahead Log -/// to recover MemTable from dataloss -pub struct MemTable { - // TODO: This should be using a `SkipList` instead of a Vector - /// All entries are stored in memory before copied into WAL - /// Then it'll be written in SSTable - pub entries: Vec, - /// Size of the MemTable - pub size: usize, -} - -impl MemTable { - /// create a new MemTable with empty values - pub fn new() -> Self { - MemTable { - entries: Vec::new(), - size: 0, - } - } - - /// Set a new entry in MemTable - pub fn set(&mut self, key: &[u8], value: &[u8], timestamp: u128) { - let memtable_entry = new_memtable_entry(key, Some(value), timestamp, false); - - match self.get_index(key) { - // If a Value existed on the deleted record, then add the difference of the new and old Value to the MemTable's size. - Ok(id) => { - if let Some(stored_value) = self.entries[id].value.as_ref() { - if value.len() < stored_value.len() { - self.size -= stored_value.len() - value.len(); - } else { - self.size += value.len() - stored_value.len() - } - } else { - self.size += value.len(); - } - self.entries[id] = memtable_entry; - } - Err(id) => { - // Increase the size of the MemTable by the Key size, Value size, Timestamp size (16 bytes), Tombstone size (1 byte) - self.size += key.len() + value.len() + 16 + 1; - self.entries.insert(id, memtable_entry) - } - } - } - - /// Get an entry from MemTable - /// - /// If no record with the same key exists in the MemTable, return None. - pub fn get(&self, key: &[u8]) -> Option<&MemTableEntry> { - if let Ok(id) = self.get_index(key) { - return Some(&self.entries[id]); - } - None - } - - /// Delete an entry from MemTable - pub fn delete(&mut self, key: &[u8], timestamp: u128) { - let memtable_entry = new_memtable_entry(key, None, timestamp, true); - - // check if entry exists for the given key - match self.get_index(key) { - Ok(id) => { - // If a Value existed on the deleted record, then subtract the size of the Value from the MemTable. - if let Some(existed_value) = self.entries[id].value.as_ref() { - self.size -= existed_value.len(); - } - - // update stored entry with new entry - self.entries[id] = memtable_entry; - } - Err(id) => { - // Increase the size of the MemTable by the Key size, Timestamp size (16 bytes), Tombstone size (1 byte). - self.size += key.len() + 16 + 1; - - // Insert new entry - self.entries.insert(id, memtable_entry); - } - } - } -} - -// Methods for internal usecases. -// A user will not directly call these methods. -impl MemTable { - fn get_index(&self, key: &[u8]) -> Result { - self.entries - .binary_search_by_key(&key, |entry| entry.key.as_slice()) - } -} - -/// Creates a new MemTableEntry -/// Takes key: &[u8], value: Option<&[u8]>, timestamp: u128, deleted: bool -/// Returns: MemTableEntry -pub fn new_memtable_entry( - key: &[u8], - value: Option<&[u8]>, - timestamp: u128, - deleted: bool, -) -> MemTableEntry { - MemTableEntry { - key: key.to_owned(), - value: value.map(|v| v.to_vec()), - timestamp, - deleted, - } -} - -impl Default for MemTable { - fn default() -> Self { - Self::new() - } -} diff --git a/src/mem_table_tests.rs b/src/mem_table_tests.rs deleted file mode 100644 index e263a48..0000000 --- a/src/mem_table_tests.rs +++ /dev/null @@ -1,200 +0,0 @@ -use crate::mem_table::{new_memtable_entry, MemTable}; - -#[test] -fn create_empty_mem_table() { - let mem_table = MemTable::new(); - assert_eq!(mem_table.size, 0); -} - -#[test] -fn set_single_entry() { - let mut mem_table = MemTable::new(); - - let key = b"key1"; - let value = b"value1"; - - // set the entry - mem_table.set(key, value, 10); - - let expected_entry = new_memtable_entry(key, Some(value), 10, false); - - // query the entry and compare - assert_eq!(mem_table.get(key), Some(&expected_entry)); -} - -#[test] -fn test_delete_entry() { - let mut memtable = MemTable::new(); - let key = b"key"; - let value = b"value"; - - memtable.set(key, value, 20); - memtable.delete(key, 30); - - let expected_entry = new_memtable_entry(key, None, 30, true); - - assert_eq!(memtable.get(key), Some(&expected_entry)); -} - -#[test] -fn test_mem_table_put_start() { - let mut table = MemTable::new(); - table.set(b"Lime", b"Lime Smoothie", 0); // 17 + 16 + 1 - table.set(b"Orange", b"Orange Smoothie", 10); // 21 + 16 + 1 - - table.set(b"Apple", b"Apple Smoothie", 20); // 19 + 16 + 1 - - assert_eq!(table.entries[0].key, b"Apple"); - assert_eq!(table.entries[0].value.as_ref().unwrap(), b"Apple Smoothie"); - assert_eq!(table.entries[0].timestamp, 20); - assert_eq!(table.entries[0].deleted, false); - assert_eq!(table.entries[1].key, b"Lime"); - assert_eq!(table.entries[1].value.as_ref().unwrap(), b"Lime Smoothie"); - assert_eq!(table.entries[1].timestamp, 0); - assert_eq!(table.entries[1].deleted, false); - assert_eq!(table.entries[2].key, b"Orange"); - assert_eq!(table.entries[2].value.as_ref().unwrap(), b"Orange Smoothie"); - assert_eq!(table.entries[2].timestamp, 10); - assert_eq!(table.entries[2].deleted, false); - - assert_eq!(table.size, 108); -} - -#[test] -fn test_mem_table_put_middle() { - let mut table = MemTable::new(); - table.set(b"Apple", b"Apple Smoothie", 0); - table.set(b"Orange", b"Orange Smoothie", 10); - - table.set(b"Lime", b"Lime Smoothie", 20); - - assert_eq!(table.entries[0].key, b"Apple"); - assert_eq!(table.entries[0].value.as_ref().unwrap(), b"Apple Smoothie"); - assert_eq!(table.entries[0].timestamp, 0); - assert_eq!(table.entries[0].deleted, false); - assert_eq!(table.entries[1].key, b"Lime"); - assert_eq!(table.entries[1].value.as_ref().unwrap(), b"Lime Smoothie"); - assert_eq!(table.entries[1].timestamp, 20); - assert_eq!(table.entries[1].deleted, false); - assert_eq!(table.entries[2].key, b"Orange"); - assert_eq!(table.entries[2].value.as_ref().unwrap(), b"Orange Smoothie"); - assert_eq!(table.entries[2].timestamp, 10); - assert_eq!(table.entries[2].deleted, false); - - assert_eq!(table.size, 108); -} - -#[test] -fn test_mem_table_put_end() { - let mut table = MemTable::new(); - table.set(b"Apple", b"Apple Smoothie", 0); - table.set(b"Lime", b"Lime Smoothie", 10); - - table.set(b"Orange", b"Orange Smoothie", 20); - - assert_eq!(table.entries[0].key, b"Apple"); - assert_eq!(table.entries[0].value.as_ref().unwrap(), b"Apple Smoothie"); - assert_eq!(table.entries[0].timestamp, 0); - assert_eq!(table.entries[0].deleted, false); - assert_eq!(table.entries[1].key, b"Lime"); - assert_eq!(table.entries[1].value.as_ref().unwrap(), b"Lime Smoothie"); - assert_eq!(table.entries[1].timestamp, 10); - assert_eq!(table.entries[1].deleted, false); - assert_eq!(table.entries[2].key, b"Orange"); - assert_eq!(table.entries[2].value.as_ref().unwrap(), b"Orange Smoothie"); - assert_eq!(table.entries[2].timestamp, 20); - assert_eq!(table.entries[2].deleted, false); - - assert_eq!(table.size, 108); -} - -#[test] -fn test_mem_table_put_overwrite() { - let mut table = MemTable::new(); - table.set(b"Apple", b"Apple Smoothie", 0); - table.set(b"Lime", b"Lime Smoothie", 10); - table.set(b"Orange", b"Orange Smoothie", 20); - - table.set(b"Lime", b"A sour fruit", 30); - - assert_eq!(table.entries[0].key, b"Apple"); - assert_eq!(table.entries[0].value.as_ref().unwrap(), b"Apple Smoothie"); - assert_eq!(table.entries[0].timestamp, 0); - assert_eq!(table.entries[0].deleted, false); - assert_eq!(table.entries[1].key, b"Lime"); - assert_eq!(table.entries[1].value.as_ref().unwrap(), b"A sour fruit"); - assert_eq!(table.entries[1].timestamp, 30); - assert_eq!(table.entries[1].deleted, false); - assert_eq!(table.entries[2].key, b"Orange"); - assert_eq!(table.entries[2].value.as_ref().unwrap(), b"Orange Smoothie"); - assert_eq!(table.entries[2].timestamp, 20); - assert_eq!(table.entries[2].deleted, false); - - assert_eq!(table.size, 107); -} - -#[test] -fn test_mem_table_get_exists() { - let mut table = MemTable::new(); - table.set(b"Apple", b"Apple Smoothie", 0); - table.set(b"Lime", b"Lime Smoothie", 10); - table.set(b"Orange", b"Orange Smoothie", 20); - - let entry = table.get(b"Orange").unwrap(); - - assert_eq!(entry.key, b"Orange"); - assert_eq!(entry.value.as_ref().unwrap(), b"Orange Smoothie"); - assert_eq!(entry.timestamp, 20); -} - -#[test] -fn test_mem_table_get_not_exists() { - let mut table = MemTable::new(); - table.set(b"Apple", b"Apple Smoothie", 0); - table.set(b"Lime", b"Lime Smoothie", 0); - table.set(b"Orange", b"Orange Smoothie", 0); - - let res = table.get(b"Potato"); - assert_eq!(res.is_some(), false); -} - -#[test] -fn test_mem_table_delete_exists() { - let mut table = MemTable::new(); - table.set(b"Apple", b"Apple Smoothie", 0); - - table.delete(b"Apple", 10); - - let res = table.get(b"Apple").unwrap(); - assert_eq!(res.key, b"Apple"); - assert_eq!(res.value, None); - assert_eq!(res.timestamp, 10); - assert_eq!(res.deleted, true); - - assert_eq!(table.entries[0].key, b"Apple"); - assert_eq!(table.entries[0].value, None); - assert_eq!(table.entries[0].timestamp, 10); - assert_eq!(table.entries[0].deleted, true); - - assert_eq!(table.size, 22); -} - -#[test] -fn test_mem_table_delete_empty() { - let mut table = MemTable::new(); - - table.delete(b"Apple", 10); - - let res = table.get(b"Apple").unwrap(); - assert_eq!(res.key, b"Apple"); - assert_eq!(res.value, None); - assert_eq!(res.timestamp, 10); - assert_eq!(res.deleted, true); - - assert_eq!(table.entries[0].key, b"Apple"); - assert_eq!(table.entries[0].value, None); - assert_eq!(table.entries[0].timestamp, 10); - assert_eq!(table.entries[0].deleted, true); - - assert_eq!(table.size, 22); -} diff --git a/src/memtable.rs b/src/memtable.rs new file mode 100644 index 0000000..8980761 --- /dev/null +++ b/src/memtable.rs @@ -0,0 +1,265 @@ +use crate::{api::SizeUnit, sst::BloomFilter}; +use std::{ + collections::BTreeMap, + io, + sync::{Arc, Mutex}, +}; + +/// Setting default capacity to be 1GB. +pub(crate) static DEFAULT_MEMTABLE_CAPACITY: usize = SizeUnit::Gigabytes.to_bytes(1); + +// 0.0001% false positive rate. +pub(crate) static DEFAULT_FALSE_POSITIVE_RATE: f64 = 0.0001; + +/// The MemTable is an in-memory data structure that stores recently written data before it is flushed to disk. +pub struct MemTable { + /// Stores key-value pairs in sorted order. + entries: Arc, Vec>>>, + /// The number of key-value entries present in the MemTable. + entry_count: usize, + /// Current size of the MemTable in bytes. + size: usize, + /// The maximum allowed size for the MemTable in MB. This is used to enforce size limits and trigger `flush` operations. + capacity: usize, + bloom_filter: BloomFilter, + size_unit: SizeUnit, + false_positive_rate: f64, +} + +impl MemTable { + /// Creates a new MemTable with the default MemTable capacity of 1GB. + pub(crate) fn new() -> Self { + Self::with_capacity_and_rate( + SizeUnit::Bytes, + DEFAULT_MEMTABLE_CAPACITY, + DEFAULT_FALSE_POSITIVE_RATE, + ) + } + + /// Creates a new MemTable with the specified capacity, size_change and false positive rate. + pub(crate) fn with_capacity_and_rate( + size_unit: SizeUnit, + capacity: usize, + false_positive_rate: f64, + ) -> Self { + assert!(capacity > 0, "Capacity must be greater than zero"); + + let capacity_bytes = size_unit.to_bytes(capacity); + + // Average key-value entry size in bytes. + let avg_entry_size = 100; + let num_elements = capacity_bytes / avg_entry_size; + + let bloom_filter = BloomFilter::new(num_elements, false_positive_rate); + + Self { + entries: Arc::new(Mutex::new(BTreeMap::new())), + entry_count: 0, + size: SizeUnit::Bytes.to_bytes(0), + capacity: capacity_bytes, + bloom_filter, + size_unit: SizeUnit::Bytes, + false_positive_rate, + } + } + + /// Inserts a key-value pair into the MemTable. + pub(crate) fn set(&mut self, key: Vec, value: Vec) -> io::Result<()> { + let mut entries = self.entries.lock().map_err(|_| { + io::Error::new( + io::ErrorKind::Other, + "Failed to acquire lock on MemTable entries.", + ) + })?; + + if !self.bloom_filter.contains(&key) { + let size_change = key.len() + value.len(); + entries.insert(key.clone(), value); + + self.entry_count += 1; + self.size += size_change; + + self.bloom_filter.set(&key); + + return Ok(()); + } + + Err(io::Error::new( + io::ErrorKind::AlreadyExists, + "key already exists", + )) + } + + /// Get the value of the given key + pub(crate) fn get(&self, key: Vec) -> io::Result>> { + if !self.bloom_filter.contains(&key) { + return Ok(None); + } + + let entries = self.entries.lock().map_err(|_| { + io::Error::new( + io::ErrorKind::Other, + "Failed to acquire lock on MemTable entries.", + ) + })?; + Ok(entries.get(&key).cloned()) + } + + /// Remove the key-value entry of the given key. + pub(crate) fn remove(&mut self, key: Vec) -> io::Result, Vec)>> { + if !self.bloom_filter.contains(&key) { + return Ok(None); + } + + let mut entries = self.entries.lock().map_err(|_| { + io::Error::new( + io::ErrorKind::Other, + "Failed to acquire lock on MemTable entries.", + ) + })?; + + if let Some((key, value)) = entries.remove_entry(&key) { + self.entry_count -= 1; + self.size -= key.len() + value.len(); + + Ok(Some((key, value))) + } else { + Ok(None) + } + } + + /// Clears all key-value entries in the MemTable. + pub(crate) fn clear(&mut self) -> io::Result<()> { + let mut entries = self.entries.lock().map_err(|_| { + io::Error::new( + io::ErrorKind::Other, + "Failed to acquire lock on MemTable entries.", + ) + })?; + + entries.clear(); + self.entry_count = 0; + self.size = 0; + Ok(()) + } + + pub(crate) fn entries(&self) -> io::Result, Vec)>> { + let entries = self.entries.lock().map_err(|_| { + io::Error::new( + io::ErrorKind::Other, + "Failed to acquire lock on MemTable entries.", + ) + })?; + + Ok(entries + .iter() + .map(|(k, v)| (k.clone(), v.clone())) + .collect()) + } + + pub(crate) fn capacity(&self) -> usize { + self.capacity + } + + pub(crate) fn size(&self) -> usize { + self.size / self.size_unit.to_bytes(1) + } + + pub(crate) fn false_positive_rate(&self) -> f64 { + self.false_positive_rate + } + + pub(crate) fn size_unit(&self) -> SizeUnit { + self.size_unit.clone() + } +} + +#[cfg(test)] +mod tests { + use super::*; + + macro_rules! kv { + ($k:expr, $v:expr) => { + ($k.as_bytes().to_vec(), $v.as_bytes().to_vec()) + }; + } + + fn generate_dummy_kv_pairs() -> Vec<(Vec, Vec)> { + let pairs = vec![ + kv!("key1", "value1"), + kv!("key2", "value2"), + kv!("key3", "value3"), + ]; + + pairs + } + + #[test] + fn create_empty_memtable() { + let m = MemTable::new(); + assert_eq!(m.size, 0); + assert_eq!(m.entry_count, 0); + } + + #[test] + fn create_single_entry() { + let mut m = MemTable::new(); + + let (key, value) = generate_dummy_kv_pairs() + .pop() + .expect("Failed to get first key-value pair"); + + let _ = m.set(key.clone(), value.clone()); + + assert_eq!(m.entry_count, 1); + assert_eq!(m.size, key.len() + value.len()); + + assert_eq!(m.get(key).unwrap().unwrap(), value); + } + + #[test] + fn set_and_get() { + let mut m = MemTable::new(); + + // setup dummy key-value pairs + let kv = generate_dummy_kv_pairs(); + + // set all dummy key-value pairs to mem-table + let mut size = 0; + kv.iter().for_each(|(key, value)| { + let _ = m.set(key.clone(), value.clone()); + + size += key.len() + value.len(); + + assert_eq!(m.get(key.clone()).unwrap().unwrap(), value.clone()); + }); + + assert_eq!(m.entry_count, kv.len()); + assert_eq!(m.size, size); + } + + #[test] + fn set_and_remove() { + let mut m = MemTable::new(); + + let kv = generate_dummy_kv_pairs(); + + let mut size = 0; + + // set key-value + kv.iter().for_each(|(key, value)| { + let _ = m.set(key.clone(), value.clone()); + size += key.len() + value.len(); + assert_eq!(m.size, size); + }); + + // remove key-value + kv.iter().for_each(|(key, value)| { + let _ = m.remove(key.clone()); + size -= key.len() + value.len(); + assert_eq!(m.size, size); + }); + + assert_eq!(m.size, 0); + } +} diff --git a/src/sst/bloom_filter.rs b/src/sst/bloom_filter.rs new file mode 100644 index 0000000..91e8f48 --- /dev/null +++ b/src/sst/bloom_filter.rs @@ -0,0 +1,323 @@ +use std::{ + collections::hash_map::DefaultHasher, + hash::{Hash, Hasher}, + sync::{ + atomic::{AtomicUsize, Ordering}, + Arc, Mutex, + }, +}; + +use bit_vec::BitVec; + +/// A Bloom filter is a space-efficient probabilistic data structure used to test +/// whether an element is a member of a set. +pub(crate) struct BloomFilter { + /// The array of bits representing the Bloom filter. + bits: Arc>, + /// The number of hash functions used by the Bloom filter. + num_hashes: usize, + /// The number of elements inserted into the Bloom filter. + num_elements: AtomicUsize, +} + +impl BloomFilter { + /// Creates a new Bloom filter with the specified number of elements and false positive rate. + /// + /// # Arguments + /// + /// * `num_elements` - The expected number of elements to be inserted into the Bloom filter. + /// * `false_positive_rate` - The desired false positive rate (e.g., 0.001 for 0.1%). + /// + /// # Panics + /// + /// This function will panic if `num_elements` is zero or if `false_positive_rate` is not within (0, 1). + pub(crate) fn new(num_elements: usize, false_positive_rate: f64) -> Self { + assert!( + num_elements > 0, + "Number of elements must be greater than zero" + ); + assert!( + false_positive_rate > 0.0 && false_positive_rate < 1.0, + "False positive rate must be between 0 and 1" + ); + + let num_bits = Self::calculate_num_bits(num_elements, false_positive_rate); + let num_hashes = Self::calculate_num_hashes(num_bits, num_elements); + + let bits = Arc::new(Mutex::new(BitVec::from_elem(num_bits, false))); + + Self { + bits, + num_hashes, + num_elements: AtomicUsize::new(0), + } + } + + /// Inserts an key into the Bloom filter. + /// + /// # Arguments + /// + /// * `key` - The key to be inserted into the Bloom filter. + pub(crate) fn set(&mut self, key: &T) { + let mut bits = self + .bits + .lock() + .expect("Failed to acquire lock on Bloom Filter bits."); + + for i in 0..self.num_hashes { + let hash = self.calculate_hash(key, i); + + let index = (hash % (bits.len() as u64)) as usize; + + bits.set(index, true); + } + + // Increment the element count. + self.num_elements.fetch_add(1, Ordering::Relaxed); + } + + /// Checks if an key is possibly present in the Bloom filter. + /// + /// # Arguments + /// + /// * `key` - The key to be checked. + /// + /// # Returns + /// + /// * `true` if the key is possibly present in the Bloom filter. + /// * `false` if the key is definitely not present in the Bloom filter. + pub(crate) fn contains(&self, key: &T) -> bool { + let bits = self + .bits + .lock() + .expect("Failed to acquire lock on Bloom Filter bits."); + + for i in 0..self.num_hashes { + let hash = self.calculate_hash(key, i); + let index = (hash % (bits.len() as u64)) as usize; + + if !bits[index] { + return false; + } + } + // All bits are true, so the key is possibly present. + true + } + + /// Returns the current number of elements inserted into the Bloom filter. + pub(crate) fn num_elements(&self) -> usize { + // Retrieve the element count atomically. + self.num_elements.load(Ordering::Relaxed) + } + + // Internal helper functions + + /// Calculates a hash value for a given key and seed. + /// + /// # Arguments + /// + /// * `key` - The key to be hashed. + /// * `seed` - The seed value for incorporating randomness. + /// + /// # Returns + /// + /// The calculated hash value as a `u64`. + fn calculate_hash(&self, key: &T, seed: usize) -> u64 { + let mut hasher = DefaultHasher::new(); + + key.hash(&mut hasher); + hasher.write_usize(seed); + hasher.finish() + } + + /// Calculates the optimal number of bits for the Bloom filter based on the desired false positive rate and the expected number of elements. + /// + /// # Arguments + /// + /// * `num_elements` - The expected number of elements. + /// * `false_positive_rate` - The desired false positive rate. + /// + /// # Returns + /// + /// The calculated number of bits as a `usize`. + fn calculate_num_bits(num_elements: usize, false_positive_rate: f64) -> usize { + let num_bits_float = + (-((num_elements as f64) * false_positive_rate.ln()) / (2.0_f64.ln().powi(2))).ceil(); + + num_bits_float as usize + } + + /// Calculates the optimal number of hash functions for the Bloom filter based on the number of bits and the expected number of elements. + /// + /// # Arguments + /// + /// * `num_bits` - The number of bits in the Bloom filter. + /// * `num_elements` - The expected number of elements. + /// + /// # Returns + /// + /// The calculated number of hash functions as a `usize`. + fn calculate_num_hashes(num_bits: usize, num_elements: usize) -> usize { + let num_hashes_float = (num_bits as f64 / num_elements as f64) * 2.0_f64.ln(); + + num_hashes_float.ceil() as usize + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_insertion_and_containment() { + let mut bloom = BloomFilter::new(100, 0.001); + + // Insert an element + bloom.set(&"apple"); + + // Check containment of the inserted element + assert!(bloom.contains(&"apple")); + + // Check containment of a non-inserted element + assert!(!bloom.contains(&"banana")); + } + + #[test] + fn test_num_elements() { + let mut bloom = BloomFilter::new(100, 0.01); + + // Insert multiple elements + for i in 0..10 { + bloom.set(&i); + } + + // Check the number of elements + assert_eq!(bloom.num_elements(), 10); + } + + #[test] + fn test_false_positives_high_rate() { + // Number of elements. + let num_elements = 10000; + + // False Positive Rate. + let false_positive_rate = 0.1; + + // Create a Bloom Filter. + let mut bloom = BloomFilter::new(num_elements, false_positive_rate); + + // Insert elements into the Bloom Filter. + for i in 0..num_elements { + bloom.set(&i); + } + + let mut false_positives = 0; + let num_tested_elements = 2000; + + // Test all non-inserted elements for containment. + // Count the number of false positives. + for i in num_elements..num_elements + num_tested_elements { + if bloom.contains(&i) { + false_positives += 1; + } + } + + // Calculate the observed false positive rate. + let observed_false_positive_rate = false_positives as f64 / num_tested_elements as f64; + + // Allow for a small margin (10%) of error due to the probabilistic nature of Bloom filters. + // Maximum Allowed False Positive Rate = False Positive Rate + (False Positive Rate * Tolerance) + let max_allowed_false_positive_rate = false_positive_rate + (false_positive_rate * 0.1); + + assert!( + observed_false_positive_rate <= max_allowed_false_positive_rate, + "Observed false positive rate ({}) is greater than the maximum allowed ({})", + observed_false_positive_rate, + max_allowed_false_positive_rate + ); + } + + #[test] + fn test_false_positives_medium_rate() { + // Number of elements. + let num_elements = 10000; + + // False Positive Rate. + let false_positive_rate = 0.001; + + // Create a Bloom Filter. + let mut bloom = BloomFilter::new(num_elements, false_positive_rate); + + // Insert elements into the Bloom Filter. + for i in 0..num_elements { + bloom.set(&i); + } + + let mut false_positives = 0; + let num_tested_elements = 2000; + + // Test all non-inserted elements for containment. + // Count the number of false positives. + for i in num_elements..num_elements + num_tested_elements { + if bloom.contains(&i) { + false_positives += 1; + } + } + + // Calculate the observed false positive rate. + let observed_false_positive_rate = false_positives as f64 / num_tested_elements as f64; + + // Allow for a small margin (10%) of error due to the probabilistic nature of Bloom filters. + // Maximum Allowed False Positive Rate = False Positive Rate + (False Positive Rate * Tolerance) + let max_allowed_false_positive_rate = false_positive_rate + (false_positive_rate * 0.1); + + assert!( + observed_false_positive_rate <= max_allowed_false_positive_rate, + "Observed false positive rate ({}) is greater than the maximum allowed ({})", + observed_false_positive_rate, + max_allowed_false_positive_rate + ); + } + + #[test] + fn test_false_positives_low_rate() { + // Number of elements. + let num_elements = 10000; + + // False Positive Rate. + let false_positive_rate = 0.000001; + + // Create a Bloom Filter. + let mut bloom = BloomFilter::new(num_elements, false_positive_rate); + + // Insert elements into the Bloom Filter. + for i in 0..num_elements { + bloom.set(&i); + } + + let mut false_positives = 0; + let num_tested_elements = 2000; + + // Test all non-inserted elements for containment. + // Count the number of false positives. + for i in num_elements..num_elements + num_tested_elements { + if bloom.contains(&i) { + false_positives += 1; + } + } + + // Calculate the observed false positive rate. + let observed_false_positive_rate = false_positives as f64 / num_tested_elements as f64; + + // Allow for a small margin (10%) of error due to the probabilistic nature of Bloom filters. + // Maximum Allowed False Positive Rate = False Positive Rate + (False Positive Rate * Tolerance) + let max_allowed_false_positive_rate = false_positive_rate + (false_positive_rate * 0.1); + + assert!( + observed_false_positive_rate <= max_allowed_false_positive_rate, + "Observed false positive rate ({}) is greater than the maximum allowed ({})", + observed_false_positive_rate, + max_allowed_false_positive_rate + ); + } +} diff --git a/src/sst/mod.rs b/src/sst/mod.rs new file mode 100644 index 0000000..05331b3 --- /dev/null +++ b/src/sst/mod.rs @@ -0,0 +1,6 @@ +mod bloom_filter; +mod sst_block; +mod sstable; + +pub(crate) use bloom_filter::BloomFilter; +pub(crate) use sstable::SSTable; diff --git a/src/sst/sst_block.rs b/src/sst/sst_block.rs new file mode 100644 index 0000000..6171242 --- /dev/null +++ b/src/sst/sst_block.rs @@ -0,0 +1,257 @@ +use std::{collections::HashMap, io, sync::Arc}; + +const BLOCK_SIZE: usize = 4 * 1024; // 4KB + +const SIZE_OF_U32: usize = std::mem::size_of::(); + +pub(crate) struct Block { + data: Vec, + index: HashMap>, usize>, + entry_count: usize, +} + +impl Block { + /// Creates a new empty Block. + pub(crate) fn new() -> Self { + Block { + data: Vec::with_capacity(BLOCK_SIZE), + index: HashMap::new(), + entry_count: 0, + } + } + + /// Checks if the Block is full given the size of an entry. + pub(crate) fn is_full(&self, entry_size: usize) -> bool { + self.data.len() + entry_size > BLOCK_SIZE + } + + /// Sets an entry with the provided key and value in the Block. + /// + /// Returns an `io::Result` indicating success or failure. An error is returned if the Block + /// is already full and cannot accommodate the new entry. + pub(crate) fn set_entry(&mut self, key: &[u8], value: &[u8]) -> io::Result<()> { + // Calculate the total size of the entry, including the key, value, and the size of the length prefix. + let entry_size = key.len() + value.len() + SIZE_OF_U32; + + // Check if the Block is already full and cannot accommodate the new entry. + if self.is_full(entry_size) { + return Err(io::Error::new(io::ErrorKind::OutOfMemory, "Block is full")); + } + + // Convert the length of the value to a little-endian byte array. + let value_len = value.len() as u32; + + // Create the entry by concatenating the length prefix, key, and value. + let entry: Vec = [&value_len.to_le_bytes(), key, value].concat(); + + // Get the current offset in the data vector and extend it with the new entry. + let offset = self.data.len(); + self.data.extend_from_slice(&entry); + + // Insert the key and its corresponding offset into the index hashmap. + self.index.insert(Arc::new(key.to_owned()), offset); + + // Increment the entry count. + self.entry_count += 1; + + Ok(()) + } + + /// Removes the entry with the provided key from the Block. + /// + /// Returns `true` if an entry was found and removed, `false` otherwise. + pub(crate) fn remove_entry(&mut self, key: &[u8]) -> bool { + // Check if the key exists in the index map. + if let Some(offset) = self.index.remove(&Arc::new(key.to_owned())) { + // Calculate the start and end positions of the entry to be removed. + + // The start position is the offset in the data vector. + let start = offset; + + // The end position is calculated as follows: + // - Add the size of the length prefix (SIZE_OF_U32). + // - Add the length of the key. + // - Find the position of the first zero byte in the remaining data, starting from the offset. + // This indicates the end of the entry. + let end = start + + SIZE_OF_U32 + + key.len() + + self.data[start + SIZE_OF_U32..] + .iter() + .position(|&byte| byte == 0) + .unwrap_or(0); + + // Clear the entry in the data vector by filling it with zeros. + for byte in &mut self.data[start..end] { + *byte = 0; + } + + // Decrease the entry count. + self.entry_count -= 1; + + // Return true to indicate that an entry was found and removed. + true + } else { + // Return false to indicate that the entry was not found. + false + } + } + + /// Retrieves the value associated with the provided key from the Block. + /// + /// Returns `Some(value)` if the key is found in the Block, `None` otherwise. + pub(crate) fn get_value(&self, key: &[u8]) -> Option> { + // Check if the key exists in the index. + if let Some(&offset) = self.index.get(&Arc::new(key.to_owned())) { + // Calculate the starting position of the value in the data vector. + let start = offset + SIZE_OF_U32 + key.len(); + + // Extract the bytes representing the length of the value from the data vector. + let value_len_bytes = &self.data[offset..offset + SIZE_OF_U32]; + + // Convert the value length bytes into a u32 value using little-endian byte order. + let value_len = u32::from_le_bytes(value_len_bytes.try_into().unwrap()) as usize; + + // Calculate the ending position of the value in the data vector. + let end = start + value_len; + + // Extract the value bytes from the data vector and return them as a new Vec. + Some(self.data[start..end].to_vec()) + } else { + // The key was not found in the index, return None. + None + } + } + + /// Returns the number of entries in the Block. + pub(crate) fn entry_count(&self) -> usize { + self.entry_count + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_new_empty_block_creation() { + let block = Block::new(); + assert_eq!(block.data.len(), 0); + assert_eq!(block.data.capacity(), BLOCK_SIZE); + assert_eq!(block.index.len(), 0); + assert_eq!(block.entry_count, 0); + } + + #[test] + fn test_is_full() { + let block = Block::new(); + assert!(!block.is_full(10)); + assert!(block.is_full(BLOCK_SIZE + 1)); + } + + #[test] + fn test_set_entry() { + let mut block = Block::new(); + let key: Vec = vec![1, 2, 3]; + let value: Vec = vec![4, 5, 6]; + + let res = block.set_entry(&key, &value); + // check if we have IO error. + assert!(res.is_ok()); + + assert_eq!(block.data.len(), key.len() + value.len() + SIZE_OF_U32); + assert_eq!(block.entry_count, 1); + assert_eq!(block.index.len(), 1); + } + + #[test] + fn test_set_and_get_value() { + let mut block = Block::new(); + let key: &[u8] = &[1, 2, 3]; + let value: &[u8] = &[4, 5, 6]; + + // Set the key-value pair + let res = block.set_entry(key, value); + assert!(res.is_ok()); + + // Retrieve the value using the key + let retrieved_value = block.get_value(key); + assert_eq!(retrieved_value, Some(value.to_vec())); + } + + #[test] + fn test_set_and_remove_entry() { + let mut block = Block::new(); + let key1: Vec = vec![1, 2, 3]; + let value1: Vec = vec![4, 5, 6]; + let key2: Vec = vec![7, 8, 9]; + let value2: Vec = vec![10, 11, 12]; + let key3: Vec = vec![13, 14, 15]; + let value3: Vec = vec![16]; + + let _ = block.set_entry(&key1, &value1); + assert_eq!(block.entry_count, 1); + assert_eq!(block.get_value(&key1), Some(value1)); + + let _ = block.set_entry(&key2, &value2); + assert_eq!(block.entry_count, 2); + assert_eq!(block.get_value(&key2), Some(value2)); + + let _ = block.set_entry(&key3, &value3); + assert_eq!(block.entry_count, 3); + assert_eq!(block.get_value(&key3), Some(value3.clone())); + + let entry_count_before_removal = block.entry_count(); + let result = block.remove_entry(&key1); + assert!(result); + assert_eq!(block.entry_count(), entry_count_before_removal - 1); + assert_eq!(block.get_value(&key1), None); + } + + #[test] + fn test_set_entry_full_block() { + // Test case to check setting an entry when the block is already full + let mut block = Block::new(); + let key: Vec = vec![1, 2, 3]; + let value: Vec = vec![4, 5, 6]; + + // Fill the block to its maximum capacity + while !block.is_full(key.len() + value.len() + SIZE_OF_U32) { + block.set_entry(&key, &value).unwrap(); + } + + // Attempt to set a new entry, which should result in an error + let res = block.set_entry(&key, &value); + assert!(res.is_err()); + assert_eq!( + block.entry_count(), + BLOCK_SIZE / (key.len() + value.len() + SIZE_OF_U32) + ); + } + + #[test] + fn test_remove_entry_nonexistent_key() { + // Test case to check removing a non-existent key + let mut block = Block::new(); + let key1: Vec = vec![1, 2, 3]; + let key2: Vec = vec![4, 5, 6]; + + block.set_entry(&key1, &[7, 8, 9]).unwrap(); + + // Attempt to remove a non-existent key, which should return false + let result = block.remove_entry(&key2); + assert!(!result); + assert_eq!(block.entry_count(), 1); + } + + #[test] + fn test_get_value_nonexistent_key() { + // Test case to check getting a value for a non-existent key + let block = Block::new(); + let key: Vec = vec![1, 2, 3]; + + // Attempt to get the value for a non-existent key, which should return None + let value = block.get_value(&key); + assert_eq!(value, None); + } +} diff --git a/src/sst/sstable.rs b/src/sst/sstable.rs new file mode 100644 index 0000000..4704746 --- /dev/null +++ b/src/sst/sstable.rs @@ -0,0 +1,58 @@ +use super::sst_block::Block; +use chrono::{DateTime, Utc}; +use std::{io, path::PathBuf}; + +pub struct SSTable { + file_path: PathBuf, + blocks: Vec, + created_at: DateTime, +} + +impl SSTable { + pub(crate) fn new(dir: PathBuf) -> Self { + let created_at = Utc::now(); + let file_name = format!("sstable_{}.dat", created_at.timestamp_millis()); + let file_path = dir.join(file_name); + + Self { + file_path, + blocks: Vec::new(), + created_at, + } + } + + pub(crate) fn set(&mut self, key: Vec, value: Vec) -> io::Result<()> { + if self.blocks.is_empty() || self.blocks.last().unwrap().is_full(key.len() + value.len()) { + let new_block = Block::new(); + self.blocks.push(new_block); + } + + let last_block = self.blocks.last_mut().unwrap(); + last_block.set_entry(&key, &value)?; + + Ok(()) + } + + pub(crate) fn get(&self, key: Vec) -> Option> { + for block in &self.blocks { + if let Some(value) = block.get_value(&key) { + return Some(value); + } + } + None + } + + pub(crate) fn remove(&mut self, key: Vec) -> io::Result<()> { + // Iterate over the blocks in reverse order to delete from the most recent block first. + for block in self.blocks.iter_mut().rev() { + if block.remove_entry(&key) { + return Ok(()); + } + } + + Err(io::Error::new( + io::ErrorKind::NotFound, + "Key not found in SSTable", + )) + } +} diff --git a/src/wal.rs b/src/wal.rs deleted file mode 100644 index c861112..0000000 --- a/src/wal.rs +++ /dev/null @@ -1,208 +0,0 @@ -use crate::{ - helper::{generate_timestamp, get_files_with_ext}, - mem_table::MemTable, -}; -use std::{ - fs::{remove_file, File, OpenOptions}, - io::{self, prelude::*, BufReader, BufWriter}, - path::{Path, PathBuf}, -}; - -/// Write Ahead Log(WAL) -/// -/// An append-only file that holds the operations performed on the MemTable. -/// The WAL is intended for recovery of the MemTable when the server is shutdown. -pub struct Wal { - pub path: PathBuf, - pub file: BufWriter, -} - -/// WALEntry -pub struct WalEntry { - /// key is always there and used to query value - key: Vec, - /// value may or may not exist - value: Option>, - /// when the entry is created or modified - timestamp: u128, - /// state of the entry - deleted: bool, -} - -/// WAL iterator to iterate over the items in a WAL file. -pub struct WalIterator { - reader: BufReader, -} - -impl Wal { - /// Creates a new WAL in a given directory. - pub fn new(dir_path: &Path) -> std::io::Result { - let now = generate_timestamp(); - let path = Path::new(dir_path).join(now.to_string() + ".wal"); - let file = OpenOptions::new().append(true).create(true).open(&path)?; - - let file = BufWriter::new(file); - - Ok(Wal { path, file }) - } - - /// Creates a WAL from an existing file path. - pub fn from_path(path: &Path) -> io::Result { - let file = OpenOptions::new().append(true).create(true).open(path)?; - let file = BufWriter::new(file); - - Ok(Wal { - path: path.to_owned(), - file, - }) - } - - /// Loads the WAL(s) within a directory, returning a new WAL and the recovered MemTable. - /// - /// If multiple WALs exist in a directory, they are merged by file date. - pub fn load_wal_from_dir(dir_path: &Path) -> io::Result<(Self, MemTable)> { - // get existing `wal` files and sort it - let mut wal_files = get_files_with_ext(dir_path, "wal"); - wal_files.sort(); - - // create new MemTable - let mut new_mem_table = MemTable::new(); - // create new Wal directory - let mut new_wal_dir = Wal::new(dir_path)?; - - for wal_file in wal_files.iter() { - if let Ok(wal) = Wal::from_path(wal_file) { - for wal_entry in wal.into_iter() { - if wal_entry.deleted { - new_mem_table.delete(wal_entry.key.as_slice(), wal_entry.timestamp); - new_wal_dir.delete(wal_entry.key.as_slice(), wal_entry.timestamp)?; - } else { - new_mem_table.set( - wal_entry.key.as_slice(), - wal_entry.value.as_ref().unwrap().as_slice(), - wal_entry.timestamp, - ); - - new_wal_dir.set( - wal_entry.key.as_slice(), - wal_entry.value.unwrap().as_slice(), - wal_entry.timestamp, - )?; - } - } - } - } - new_wal_dir.flush().unwrap(); - wal_files - .into_iter() - .for_each(|file| remove_file(file).unwrap()); - - Ok((new_wal_dir, new_mem_table)) - } - - /// Sets a Key-Value pair and the operation is appended to the WAL. - pub fn set(&mut self, key: &[u8], value: &[u8], timestamp: u128) -> io::Result<()> { - self.file.write_all(&key.len().to_le_bytes())?; - self.file.write_all(&(false as u8).to_le_bytes())?; - self.file.write_all(&value.len().to_le_bytes())?; - self.file.write_all(key)?; - self.file.write_all(value)?; - self.file.write_all(×tamp.to_le_bytes())?; - - Ok(()) - } - - /// Deletes a Key-Value pair and the operation is appended to the WAL. - /// - /// This is achieved using tombstones. - /// - pub fn delete(&mut self, key: &[u8], timestamp: u128) -> io::Result<()> { - self.file.write_all(&key.len().to_le_bytes())?; - self.file.write_all(&(true as u8).to_le_bytes())?; - self.file.write_all(key)?; - self.file.write_all(×tamp.to_le_bytes())?; - - Ok(()) - } - - /// Flushes the WAL to disk. - /// - /// This is useful for applying bulk operations and flushing the final result to - /// disk. Waiting to flush after the bulk operations have been performed will improve - /// write performance substantially. - pub fn flush(&mut self) -> io::Result<()> { - self.file.flush() - } -} - -impl WalIterator { - /// Creates a new WALIterator from a path to a WAL file. - pub fn new(path: PathBuf) -> io::Result { - let file = OpenOptions::new().read(true).open(path)?; - let reader = BufReader::new(file); - Ok(WalIterator { reader }) - } -} - -impl Iterator for WalIterator { - type Item = WalEntry; - - /// Gets the next entry in the WAL file. - fn next(&mut self) -> Option { - let mut len_buffer = [0; 8]; - if self.reader.read_exact(&mut len_buffer).is_err() { - return None; - } - let key_len = usize::from_le_bytes(len_buffer); - - let mut bool_buffer = [0; 1]; - if self.reader.read_exact(&mut bool_buffer).is_err() { - return None; - } - let deleted = bool_buffer[0] != 0; - - let mut key = vec![0; key_len]; - let mut value = None; - if deleted { - if self.reader.read_exact(&mut key).is_err() { - return None; - } - } else { - if self.reader.read_exact(&mut len_buffer).is_err() { - return None; - } - let value_len = usize::from_le_bytes(len_buffer); - if self.reader.read_exact(&mut key).is_err() { - return None; - } - let mut value_buf = vec![0; value_len]; - if self.reader.read_exact(&mut value_buf).is_err() { - return None; - } - value = Some(value_buf); - } - - let mut timestamp_buffer = [0; 16]; - if self.reader.read_exact(&mut timestamp_buffer).is_err() { - return None; - } - let timestamp = u128::from_le_bytes(timestamp_buffer); - - Some(WalEntry { - key, - value, - timestamp, - deleted, - }) - } -} - -impl IntoIterator for Wal { - type IntoIter = WalIterator; - type Item = WalEntry; - - /// Converts a WAL into a `WALIterator` to iterate over the entries. - fn into_iter(self) -> WalIterator { - WalIterator::new(self.path).unwrap() - } -} diff --git a/src/wal_tests.rs b/src/wal_tests.rs deleted file mode 100644 index 02005ad..0000000 --- a/src/wal_tests.rs +++ /dev/null @@ -1,252 +0,0 @@ -use crate::{helper::generate_timestamp, wal::Wal}; - -use rand::Rng; -use std::fs::{create_dir, remove_dir_all}; -use std::fs::{metadata, File, OpenOptions}; -use std::io::prelude::*; -use std::io::BufReader; -use std::path::PathBuf; - -fn check_entry( - reader: &mut BufReader, - key: &[u8], - value: Option<&[u8]>, - timestamp: u128, - deleted: bool, -) { - let mut len_buffer = [0; 8]; - reader.read_exact(&mut len_buffer).unwrap(); - let file_key_len = usize::from_le_bytes(len_buffer); - assert_eq!(file_key_len, key.len()); - - let mut bool_buffer = [0; 1]; - reader.read_exact(&mut bool_buffer).unwrap(); - let file_deleted = bool_buffer[0] != 0; - assert_eq!(file_deleted, deleted); - - if deleted { - let mut file_key = vec![0; file_key_len]; - reader.read_exact(&mut file_key).unwrap(); - assert_eq!(file_key, key); - } else { - reader.read_exact(&mut len_buffer).unwrap(); - let file_value_len = usize::from_le_bytes(len_buffer); - assert_eq!(file_value_len, value.unwrap().len()); - let mut file_key = vec![0; file_key_len]; - reader.read_exact(&mut file_key).unwrap(); - assert_eq!(file_key, key); - let mut file_value = vec![0; file_value_len]; - reader.read_exact(&mut file_value).unwrap(); - assert_eq!(file_value, value.unwrap()); - } - - let mut timestamp_buffer = [0; 16]; - reader.read_exact(&mut timestamp_buffer).unwrap(); - let file_timestamp = u128::from_le_bytes(timestamp_buffer); - assert_eq!(file_timestamp, timestamp); -} - -#[test] -fn test_write_one() { - let mut rng = rand::thread_rng(); - let dir = PathBuf::from(format!("./{}/", rng.gen::())); - create_dir(&dir).unwrap(); - - let timestamp = generate_timestamp(); - - let mut wal = Wal::new(&dir).unwrap(); - wal.set(b"Lime", b"Lime Smoothie", timestamp).unwrap(); - wal.flush().unwrap(); - - let file = OpenOptions::new().read(true).open(&wal.path).unwrap(); - let mut reader = BufReader::new(file); - - check_entry( - &mut reader, - b"Lime", - Some(b"Lime Smoothie"), - timestamp, - false, - ); - - remove_dir_all(&dir).unwrap(); -} - -#[test] -fn test_write_many() { - let mut rng = rand::thread_rng(); - let dir = PathBuf::from(format!("./{}/", rng.gen::())); - create_dir(&dir).unwrap(); - - let timestamp = generate_timestamp(); - - let entries: Vec<(&[u8], Option<&[u8]>)> = vec![ - (b"Apple", Some(b"Apple Smoothie")), - (b"Lime", Some(b"Lime Smoothie")), - (b"Orange", Some(b"Orange Smoothie")), - ]; - - let mut wal = Wal::new(&dir).unwrap(); - - for e in entries.iter() { - wal.set(e.0, e.1.unwrap(), timestamp).unwrap(); - } - wal.flush().unwrap(); - - let file = OpenOptions::new().read(true).open(&wal.path).unwrap(); - let mut reader = BufReader::new(file); - - for e in entries.iter() { - check_entry(&mut reader, e.0, e.1, timestamp, false); - } - - remove_dir_all(&dir).unwrap(); -} - -#[test] -fn test_write_delete() { - let mut rng = rand::thread_rng(); - let dir = PathBuf::from(format!("./{}/", rng.gen::())); - create_dir(&dir).unwrap(); - - let timestamp = generate_timestamp(); - - let entries: Vec<(&[u8], Option<&[u8]>)> = vec![ - (b"Apple", Some(b"Apple Smoothie")), - (b"Lime", Some(b"Lime Smoothie")), - (b"Orange", Some(b"Orange Smoothie")), - ]; - - let mut wal = Wal::new(&dir).unwrap(); - - for e in entries.iter() { - wal.set(e.0, e.1.unwrap(), timestamp).unwrap(); - } - for e in entries.iter() { - wal.delete(e.0, timestamp).unwrap(); - } - - wal.flush().unwrap(); - - let file = OpenOptions::new().read(true).open(&wal.path).unwrap(); - let mut reader = BufReader::new(file); - - for e in entries.iter() { - check_entry(&mut reader, e.0, e.1, timestamp, false); - } - for e in entries.iter() { - check_entry(&mut reader, e.0, None, timestamp, true); - } - - remove_dir_all(&dir).unwrap(); -} - -#[test] -fn test_read_wal_none() { - let mut rng = rand::thread_rng(); - let dir = PathBuf::from(format!("./{}/", rng.gen::())); - create_dir(&dir).unwrap(); - - let (new_wal, new_mem_table) = Wal::load_wal_from_dir(&dir).unwrap(); - assert_eq!(new_mem_table.entries.len(), 0); - - let m = metadata(new_wal.path).unwrap(); - assert_eq!(m.len(), 0); - - remove_dir_all(&dir).unwrap(); -} - -#[test] -fn test_read_wal_one() { - let mut rng = rand::thread_rng(); - let dir = PathBuf::from(format!("./{}/", rng.gen::())); - create_dir(&dir).unwrap(); - - let entries: Vec<(&[u8], Option<&[u8]>)> = vec![ - (b"Apple", Some(b"Apple Smoothie")), - (b"Lime", Some(b"Lime Smoothie")), - (b"Orange", Some(b"Orange Smoothie")), - ]; - - let mut wal = Wal::new(&dir).unwrap(); - - for (i, e) in entries.iter().enumerate() { - wal.set(e.0, e.1.unwrap(), i as u128).unwrap(); - } - wal.flush().unwrap(); - - let (new_wal, new_mem_table) = Wal::load_wal_from_dir(&dir).unwrap(); - - let file = OpenOptions::new().read(true).open(&new_wal.path).unwrap(); - let mut reader = BufReader::new(file); - - for (i, e) in entries.iter().enumerate() { - check_entry(&mut reader, e.0, e.1, i as u128, false); - - let mem_e = new_mem_table.get(e.0).unwrap(); - assert_eq!(mem_e.key, e.0); - assert_eq!(mem_e.value.as_ref().unwrap().as_slice(), e.1.unwrap()); - assert_eq!(mem_e.timestamp, i as u128); - } - - remove_dir_all(&dir).unwrap(); -} - -#[test] -fn test_read_wal_multiple() { - let mut rng = rand::thread_rng(); - let dir = PathBuf::from(format!("./{}/", rng.gen::())); - create_dir(&dir).unwrap(); - - let entries_1: Vec<(&[u8], Option<&[u8]>)> = vec![ - (b"Apple", Some(b"Apple Smoothie")), - (b"Lime", Some(b"Lime Smoothie")), - (b"Orange", Some(b"Orange Smoothie")), - ]; - let mut wal_1 = Wal::new(&dir).unwrap(); - for (i, e) in entries_1.iter().enumerate() { - wal_1.set(e.0, e.1.unwrap(), i as u128).unwrap(); - } - wal_1.flush().unwrap(); - - let entries_2: Vec<(&[u8], Option<&[u8]>)> = vec![ - (b"Strawberry", Some(b"Strawberry Smoothie")), - (b"Blueberry", Some(b"Blueberry Smoothie")), - (b"Orange", Some(b"Orange Milkshake")), - ]; - let mut wal_2 = Wal::new(&dir).unwrap(); - for (i, e) in entries_2.iter().enumerate() { - wal_2.set(e.0, e.1.unwrap(), (i + 3) as u128).unwrap(); - } - wal_2.flush().unwrap(); - - let (new_wal, new_mem_table) = Wal::load_wal_from_dir(&dir).unwrap(); - - let file = OpenOptions::new().read(true).open(&new_wal.path).unwrap(); - let mut reader = BufReader::new(file); - - for (i, e) in entries_1.iter().enumerate() { - check_entry(&mut reader, e.0, e.1, i as u128, false); - - let mem_e = new_mem_table.get(e.0).unwrap(); - if i != 2 { - assert_eq!(mem_e.key, e.0); - assert_eq!(mem_e.value.as_ref().unwrap().as_slice(), e.1.unwrap()); - assert_eq!(mem_e.timestamp, i as u128); - } else { - assert_eq!(mem_e.key, e.0); - assert_ne!(mem_e.value.as_ref().unwrap().as_slice(), e.1.unwrap()); - assert_ne!(mem_e.timestamp, i as u128); - } - } - for (i, e) in entries_2.iter().enumerate() { - check_entry(&mut reader, e.0, e.1, (i + 3) as u128, false); - - let mem_e = new_mem_table.get(e.0).unwrap(); - assert_eq!(mem_e.key, e.0); - assert_eq!(mem_e.value.as_ref().unwrap().as_slice(), e.1.unwrap()); - assert_eq!(mem_e.timestamp, (i + 3) as u128); - } - - remove_dir_all(&dir).unwrap(); -} diff --git a/src/write_ahead_log.rs b/src/write_ahead_log.rs new file mode 100644 index 0000000..35e4f4b --- /dev/null +++ b/src/write_ahead_log.rs @@ -0,0 +1,238 @@ +use std::{ + fs::{self, File, OpenOptions}, + io::{self, Read, Seek, SeekFrom, Write}, + path::PathBuf, + sync::{Arc, Mutex}, +}; + +pub(crate) static WAL_FILE_NAME: &str = "lsmdb_wal.bin"; + +pub struct WriteAheadLog { + log_file: Arc>, +} + +#[derive(Clone, Copy, PartialEq, Debug)] +pub(crate) enum EntryKind { + Insert = 1, + Remove = 2, +} + +#[derive(PartialEq, Debug)] +pub(crate) struct WriteAheadLogEntry { + pub(crate) entry_kind: EntryKind, + pub(crate) key: Vec, + pub(crate) value: Vec, +} + +impl WriteAheadLog { + pub(crate) fn new(directory_path: &PathBuf) -> io::Result { + // Convert directory path to a PathBuf. + let dir_path = PathBuf::from(directory_path); + + // Create the directory if it doesn't exist. + if !dir_path.exists() { + fs::create_dir_all(&dir_path)?; + } + + // Generate the file path within the directory. + let file_path = dir_path.join(WAL_FILE_NAME); + + let log_file = OpenOptions::new() + .read(true) + .append(true) + .create(true) + .open(file_path)?; + + Ok(Self { + log_file: Arc::new(Mutex::new(log_file)), + }) + } + + pub(crate) fn append( + &mut self, + entry_kind: EntryKind, + key: Vec, + value: Vec, + ) -> io::Result<()> { + let mut log_file = self.log_file.lock().map_err(|poison_error| { + io::Error::new( + io::ErrorKind::Other, + format!("Failed to obtain lock: {:?}", poison_error), + ) + })?; + + let entry = WriteAheadLogEntry::new(entry_kind, key, value); + + let serialized_data = entry.serialize(); + + log_file.write_all(&serialized_data)?; + + log_file.flush()?; + + Ok(()) + } + + pub(crate) fn recover(&mut self) -> io::Result> { + let mut log_file = self.log_file.lock().map_err(|poison_error| { + io::Error::new( + io::ErrorKind::Other, + format!("Failed to obtain lock: {:?}", poison_error), + ) + })?; + + let mut entries = Vec::new(); + + loop { + let mut serialized_data = Vec::new(); + log_file.read_to_end(&mut serialized_data)?; + + if serialized_data.is_empty() { + // Reached the end of the log file + break; + } + + let entry = WriteAheadLogEntry::deserialize(&serialized_data)?; + + entries.push(entry); + } + Ok(entries) + } + + pub(crate) fn clear(&mut self) -> io::Result<()> { + let mut log_file = self.log_file.lock().map_err(|poison_error| { + io::Error::new( + io::ErrorKind::Other, + format!("Failed to obtain lock: {:?}", poison_error), + ) + })?; + + log_file.set_len(0)?; + log_file.seek(SeekFrom::Start(0))?; + Ok(()) + } +} + +impl WriteAheadLogEntry { + pub(crate) fn new(entry_kind: EntryKind, key: Vec, value: Vec) -> Self { + Self { + entry_kind, + key, + value, + } + } + + fn serialize(&self) -> Vec { + // Calculate entry length + let entry_len = 4 + 1 + 4 + 4 + self.key.len() + self.value.len(); + + let mut serialized_data = Vec::with_capacity(entry_len); + + // Serialize entry length + serialized_data.extend_from_slice(&(entry_len as u32).to_le_bytes()); + + // Serialize entry kind + serialized_data.push(self.entry_kind as u8); + + // Serialize key length + serialized_data.extend_from_slice(&(self.key.len() as u32).to_le_bytes()); + + // Serialize value length + serialized_data.extend_from_slice(&(self.value.len() as u32).to_le_bytes()); + + // Serialize key + serialized_data.extend_from_slice(&self.key); + + // Serialize value + serialized_data.extend_from_slice(&self.value); + + serialized_data + } + + fn deserialize(serialized_data: &[u8]) -> io::Result { + if serialized_data.len() < 13 { + return Err(io::Error::new( + io::ErrorKind::InvalidInput, + "Invalid serialized data length", + )); + } + + let entry_len = u32::from_le_bytes([ + serialized_data[0], + serialized_data[1], + serialized_data[2], + serialized_data[3], + ]) as usize; + + if serialized_data.len() != entry_len { + return Err(io::Error::new( + io::ErrorKind::InvalidInput, + "Invalid serialized data length", + )); + } + + let entry_kind = match serialized_data[4] { + 1 => EntryKind::Insert, + 2 => EntryKind::Remove, + _ => { + return Err(io::Error::new( + io::ErrorKind::InvalidInput, + "Invalid entry kind value", + )) + } + }; + + let key_len = u32::from_le_bytes([ + serialized_data[5], + serialized_data[6], + serialized_data[7], + serialized_data[8], + ]) as usize; + let value_len = u32::from_le_bytes([ + serialized_data[9], + serialized_data[10], + serialized_data[11], + serialized_data[12], + ]) as usize; + + if serialized_data.len() != 13 + key_len + value_len { + return Err(io::Error::new( + io::ErrorKind::InvalidInput, + "Invalid serialized data length", + )); + } + + let key = serialized_data[13..(13 + key_len)].to_vec(); + let value = serialized_data[(13 + key_len)..].to_vec(); + + Ok(WriteAheadLogEntry::new(entry_kind, key, value)) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_serialize_deserialize() { + // Create a sample WriteAheadLogEntry. + let entry_kind = EntryKind::Insert; + let key = vec![1, 2, 3]; + let value = vec![4, 5, 6]; + + let original_entry = WriteAheadLogEntry::new(entry_kind, key.clone(), value.clone()); + + // Serialize the entry. + let serialized_data = original_entry.serialize(); + + // Verify serialized data. + let expected_entry_len = 4 + 1 + 4 + 4 + key.len() + value.len(); + assert_eq!(serialized_data.len(), expected_entry_len); + + // Deserialize the serialized data. + let deserialized_entry = + WriteAheadLogEntry::deserialize(&serialized_data).expect("Failed to deserialize"); + + // Verify deserialized entry matches the original entry + assert_eq!(deserialized_entry, original_entry); + } +} diff --git a/tests/integration_tests.rs b/tests/integration_tests.rs deleted file mode 100644 index e4884a2..0000000 --- a/tests/integration_tests.rs +++ /dev/null @@ -1,133 +0,0 @@ -use lsmdb::engine::*; -use std::fs; -use std::sync::{Arc, Mutex}; - -#[test] -fn test_storage_engine() { - // Create a new storage engine instance - let mut engine = StorageEngine::new("./test_dir"); - - // Write data to the storage engine - let result = engine.set(b"key1", b"value1"); - assert_eq!(result, Ok(1)); - - // Read data from the storage engine - let entry = engine.get(b"key1"); - assert_eq!( - entry.clone(), - Some(StorageEngineEntry { - key: b"key1".to_vec(), - value: b"value1".to_vec(), - timestamp: entry.unwrap().timestamp() - }) - ); - - // Delete data from the storage engine - let result = engine.delete(b"key1"); - assert_eq!(result, Ok(1)); - - // Verify that the data is deleted - let entry = engine.get(b"key1"); - assert_eq!(entry, None); - - // Cleanup: Remove the test directory - // Attempt to remove the directory "./test_dir" - // If an error occurs during the removal, the error will be captured and handled - if let Err(e) = fs::remove_dir_all("./test_dir") { - // Print an error message indicating that the removal of the test directory failed, - // along with the specific error encountered - println!("Failed to remove test directory: {:?}", e); - } -} - -#[test] -fn test_storage_engine_concurrent_writes() { - // Create a new storage engine instance - let engine = Arc::new(Mutex::new(StorageEngine::new("./test_dir"))); - - // Spawn multiple threads to concurrently write data to the storage engine - let num_threads = 10; - let mut handles = vec![]; - - for i in 0..num_threads { - let engine_ref = Arc::clone(&engine); - let key = format!("key{}", i); - let value = format!("value{}", i); - - let handle = std::thread::spawn(move || { - let mut engine = engine_ref.lock().unwrap(); - engine.set(key.as_bytes(), value.as_bytes()) - }); - - handles.push(handle); - } - - // Wait for all threads to complete - for handle in handles { - if let Err(e) = handle.join() { - panic!("Thread panicked: {:?}", e); - } - } - - // Verify that all writes were successful - for i in 0..num_threads { - let key = format!("key{}", i); - let value = format!("value{}", i); - - let engine = engine.lock().unwrap(); - let entry = engine.get(key.as_bytes()).unwrap(); - assert_eq!( - entry, - StorageEngineEntry { - key: key.as_bytes().to_vec(), - value: value.as_bytes().to_vec(), - timestamp: entry.timestamp, - } - ); - } - - // Cleanup: Remove the test directory - // Attempt to remove the directory "./test_dir" - // If an error occurs during the removal, the error will be captured and handled - if let Err(e) = fs::remove_dir_all("./test_dir") { - // Print an error message indicating that the removal of the test directory failed, - // along with the specific error encountered - println!("Failed to remove test directory: {:?}", e); - } -} - -#[test] -fn test_storage_engine_delete() { - // Create a new storage engine instance - let mut engine = StorageEngine::new("./test_dir"); - - // Write data to the storage engine - engine.set(b"key1", b"value1").expect("Failed to set data"); - engine.set(b"key2", b"value2").expect("Failed to set data"); - - // Verify that the data exists - let entry1 = engine.get(b"key1").expect("Failed to get data"); - let entry2 = engine.get(b"key2").expect("Failed to get data"); - assert_eq!(entry1.value, b"value1"); - assert_eq!(entry2.value, b"value2"); - - // Delete an entry from the storage engine - engine.delete(b"key1").expect("Failed to delete data"); - - // Verify that the deleted entry is no longer accessible - let deleted_entry = engine.get(b"key1"); - assert!(deleted_entry.is_none()); - - // Verify that the remaining entry still exists - let remaining_entry = engine.get(b"key2").expect("Failed to get data"); - assert_eq!(remaining_entry.value, b"value2"); - - // Cleanup: Remove the test directory - // Attempt to remove the directory "./test_dir" - // If an error occurs during the removal, the error will be captured and handled - if let Err(e) = fs::remove_dir_all("./test_dir") { - // Print an error message indicating that the removal of the test directory failed, - // along with the specific error encountered - println!("Failed to remove test directory: {:?}", e); - } -}