-
Notifications
You must be signed in to change notification settings - Fork 173
What are Distributed Hash Tables ?
OpenDHT implements a Distributed Hash Table (DHT) of the Kademlia kind.
A DHT can be viewed as a dictionary distributed over a network: it provides access to a common shared key->value data-store, distributed over participating nodes with great performance and scalability.
From a user perspective, a DHT essentially provides a map interface, with two main operations: put(key, value)
and get(key)
. Get will retrieve values stored at a certain key while put (often called announce) will store a value on the network. Note that many values can be stored under the same key.
The Kademlia DHT algorithm allows to contact only O(log(n))
nodes for a get operation, n being the number of nodes in the network. This property makes DHTs very scalable as demonstrated, for instance, by the mainline BitTorrent DHT running with tens of millions of nodes.
Every node in the network has a random ID (of 160 bits for OpenDHT). Every value stored on the network has a randomly distributed ID of the same length.
Kademlia DHTs define XOR as the distance operator between node IDs or between keys and node IDs, and use an iterative search algorithm working as follow.
Every node maintains a routing table of known node IDs and their IP. The routing table, a binary tree of node IDs, is divided in buckets of 8 nodes. Only the bucket containing the node' own ID will split to fit more nodes, so that the table of a node will contain a number of nodes proportional to log(N)
, concentrated around the node ID (according to the XOR metric).
To find the nodes globally responsible of storage for a key H, a node S will contact the nodes in his table that are the closest to H according to the XOR metric. Requests include the key H and answers include values for H (if any) and IDs and IPs of new nodes closest to H. Those new closest nodes are then contacted until there is no new (uncontacted) closest node in answers.
The same iterative process is used when a new node enter the network: the node performs the algorithm with its own node ID as H to find neighbors. By doing so, contacted neighbors will then also be aware of the new node and its IP through requests sent to them.
This allows arbitrary nodes to read and write values by sending messages to the same 8 closest nodes to the target hash.