Skip to content
Adrien Béraud edited this page Apr 16, 2018 · 58 revisions

Introduction

OpenDHT offers the following features:

  • Distributed shared key->value data-store.
  • IPv4 and IPv6 support.
  • Storage of arbitrary binary values up to 64 KiB. Keys are 160 bits long.
  • Different values under a same key can be distinguished by a key-unique 64 bits ID.
  • Every value also has a "value type". Each value type defines potentially complex storage, edition and expiration policies, allowing for instance different value expiration times. The set of supported "value types" is hardcoded and known by every node.

Note that OpenDHT is not compatible with the Mainline Bittorrent DHT (which only stores IP addresses).

An optional public-key cryptography layer on top of the DHT allows to put signed or encrypted data on the DHT. Signed values can then be edited, only by their owner (as verified cryptographically). Signed values retrieved from the DHT are automatically checked and will only be presented to the user if the signature verification succeeds.

The identity layer also publishes a (usually self-signed) certificate on the DHT that can be used to encrypt data for other nodes. Encrypted values are always signed, and the signature is part of the encrypted data, so that only the recipient can know who signed the value. For this reason, like other non-signed values, encrypted values can't be edited (because storage nodes can't check the identity of the author).

The OpenDHT API

OpenDHT uses the dht C++ namespace and is composed by a few major classes :

  • Infohash represents a key or a node ID, which are 20 bytes/160 bits bitstrings. Infohash instances can be compared with the comparison operator ==. The user can compute hashes from strings or binary data using static methods Infohash::get(), for instance Infohash::get("my_key") returns the SHA1 hash of the string "my_key".
  • Value represents a value potentially stored on the DHT. dht::Value is the result type of get operations and the argument type of put operations. A dht::Value can be easily built from any binary object, for instance using the constructor dht::Value::Value(const std::vector<uint8_t>&) or C-style with dht::Value::Value(const uint8_t* ptr, size_t len).
  • ValueType defines how data is stored on the DHT : preservation time, storage and edition constraints etc. Every stored Value have an associated value type. Note that ValueType usually have no impact on data serialization.
  • Value::Filter is a class inheriting from std::function<bool(Value&)>. It lets you define whether a value should be returned to the user. It also defines some useful methods like chain(Value::Filter&&) and chainOr(Value::Filter&&).
  • Query much like the filters, the Query lets you filter values, but also fields in each value. It pretty much defines an SQL SELECT, WHERE statements. In fact, one of it's constructors literally takes an SQL-ish fromatted string as parameter. Fields on which SELECT and WHERE operations are permitted are listed in Value::Fields. This is a subset of the fields a Value contains. The most meaningful distinction between the query and the filter is that the query is going to be executed by the remote nodes, giving you a better control over the traffic triggered by your usage of the library.
  • Dht is the class implementing the actual distributed hash table and providing basic operations. It requires an already-open UDP socket to send packets. When used alone, the Dht::periodic method must be called regularly and when a packet is received.
  • SecureDht is a child class of dht::Dht that exposes its APIs and will transparently check signed values (for get and listen operations), decrypt encrypted values (that we can decrypt), and provide additional methods to publish signed or encrypted values.
  • DhtRunner provides a thread-safe interface to SecureDht and manages UDP sockets. DhtRunner is what most applications implementing OpenDHT should use: the instance can be safely shared to be used independently by various components or threads, with networking managed transparently. DhtRunner can launch a dedicated thread or be integrated in the program main loop.

Get/listen operations take a callback argument of type GetCallback, defined as:

std::function<bool(const std::vector<std::shared_ptr<Value>>& values)>

Query operations take a callback argument of type QueryCallback, defined as:

std::function<bool(const std::vector<std::shared_ptr<FieldValueIndex>>& fields)>;

Many operations also use an "operation completed" callback DoneCallback, defined as:

std::function<void(bool success)>

dht::Dht

This class provides the core API. Important methods are:

  • Constructor
Dht::Dht(int s, int s6, const InfoHash& id)

The constructor takes open IPv4, IPv6 UDP sockets used to send packets, and the node ID. At least one open socket must be provided for the Dht instance to be considered running. If a valid socket is not provided the value -1 should be passed instead.

Most apps implementing OpenDHT should use the class DhtRunner that will instantiate Dht, handle networking transparently and provide a thread-safe interface to the dht instance.

  • Get
void Dht::get(const InfoHash& key, GetCallback cb, DoneCallback donecb={}, Value::Filter f = {}, Query q = {});

Get will initiate a search on the network to find values associated with the provided key. Results will be provided during the search through the second argument cb. The callback will be called multiple times with new values when they are found on the network or until the callback returns false. An optional "done callback" allows the application to be informed of operation completion (success or failure), after which no further callback is called. An optional filter allows to pre-filter values according to user-defined rules.

Example using Dht::get:

//node is a running instance of dht::Dht
node.get(
    dht::InfoHash::get("some_key"),
    [](const std::vector<std::shared_ptr<dht::Value>>& values) {
        for (const auto& v : values)
            std::cout << "Got value: " << *v << std::endl;
        return true; // keep looking for values
    },
    [](bool success) {
        std::cout << "Get finished with " << (success ? "success" : "failure") << std::endl;
    }
);
  • Query
void Dht::query(const InfoHash& key, QueryCallback cb, DoneCallback done_cb = {}, Query&& q = {});

The query function behaves mostly like the Get except for its first callback argument and the the Query argument. The main reason for this existance of this function is to use the "SELECT" feature hidden behind the queries (see section on Filters and queries).

Example using Dht::query:

//node is a running instance of dht::Dht
node.query(
    dht::InfoHash::get("some_key"),
    [](const std::vector<std::shared_ptr<dht::FieldValueIndex>>& fields) {
        for (const auto& i : fields)
            std::cout << "Got index: " << *i << std::endl;
        return true; // keep looking for field value index
    },
    [](bool success) {
        std::cout << "Get finished with " << (success ? "success" : "failure") << std::endl;
    }
);
  • Put
void Dht::put(const InfoHash& key, const std::shared_ptr<Value>& value, DoneCallback cb = {});

Put will publish a value on the network. The put operation takes two mandatory arguments: the key/infohash and the value to store. An optional "done callback" allows the application to be informed when the put operation is complete (the value have been successfully announced, or an error happened). Example using Dht::put:

const char* my_data = "42 cats";

//node is a running instance of dht::Dht
node.put(
    dht::InfoHash::get("some_key"),
    dht::Value((const uint8_t*)my_data, std::strlen(my_data))
);
  • Listen
size_t Dht::listen(const InfoHash& key, GetCallback cb, Value::Filter q = {}, Query q = {});

Listen will initiate a search on the network to find values associated with the provided key and will keep being informed of new values published at key, calling the provided callback function cb every time there is a new or changed value at key, until the callback cb returns false or the operation is canceled with bool cancelListen(const InfoHash& key, size_t token), where token is the return value from listen. Calling cancelListen has the same effect as returning false from the callback.

Example using Dht::listen:

auto key = dht::InfoHash::get("some_key");
auto token = node.listen(key,
    [](const std::vector<std::shared_ptr<dht::Value>>& values) {
        for (const auto& v : values)
            std::cout << "Found value: " << *v << std::endl;
        return true; // keep listening
    }
);

// later
node.cancelListen(key, token);

Listen with type template for automatic deserialization:

struct Cloud {
    uint32_t altitude;
    double width, height;
    bool rainbow;
    MSGPACK_DEFINE_MAP(altitude, width, height, rainbow);
}
std::vector<Cloud> found_clouds;

auto key = dht::InfoHash::get("some_key");
auto token = node.listen<Cloud>(key, [](Cloud&& value) {
        // warning: called from another thread
        found_clouds.emplace_back(std::move(value));
    }
);

// later
node.cancelListen(key, token);

Filters and queries

Filters

Using a filter is pretty straightforward:

Value::Id id = 5;
auto filter = [id](Value& v) {
    return v.id == id;
};
node.get(
    dht::InfoHash::get("some_key"),
    [](const std::vector<std::shared_ptr<dht::Value>>& values) {
        for (const auto& v : values)
            std::cout << "This value passed through the filter (its id is 5) " << *v << std::endl;
        return true; // keep looking for values
    },
    [](bool success) {
        std::cout << "Get finished with " << (success ? "success" : "failure") << std::endl;
    }, filter
);

As you can see, the Value::Filter class is really flexible. However, this filtering is only going to be processed on the local node upon receiving values in a response. What if you know that the storage you're interested in is hosting a high number of values and you don't want to trigger big traffic. Use queries!

Queries

An equivalent to the last example, but using queries is as follows:

Where w;
w.id(5); /* the same as Where w("WHERE id=5"); */
node.get(
    dht::InfoHash::get("some_key"),
    [](const std::vector<std::shared_ptr<dht::Value>>& values) {
        for (const auto& v : values)
            std::cout << "This value has passed through the remotes filters " << *v << std::endl;
        return true; // keep looking for values
    },
    [](bool success) {
        std::cout << "Get finished with " << (success ? "success" : "failure") << std::endl;
    }, {}, w
);

All available fields are listed below:

Field
Id
ValueType
OwnerPk
UserType

Note: fields usage in string initialization is snake case!

A query can tell if it is satisfied by another query. For e.g.:

Query q1;
q1.where.id(5); // the whole value with id=5 will be sent

Query q2 {{"SELECT value_type"}};
// q2 the same as Query q("SELECT * WHERE value_type=10,user_type=foo_type");
q2.where.valueType(10).userType("foo_type");

Query q3("SELECT id WHERE id=5"); // only the id=5 will be sent

q1.isSatisfiedBy(q3); // false
q2.isSatisfiedBy(q1); // false
q3.isSatisfiedBy(q1); // true
q2.isSatisfiedBy(q3); // false

dht::SecureDht

This class extends dht::Dht, and provides the same API methods (get, put, listen). It adds a public-key cryptography layer on top of the DHT. A user-provided or generated Identity (RSA key pair) will be used for signing and decrypting.

Values returned to the user by ::get and ::listen are checked beforehand and filtered: signed values are dropped if their signature verification fails. Similarly, encrypted values that we can't decrypt are dropped, or provided decrypted to the user if we can.

The user can know if a value was encrypted by checking the recipient field of the Value (which should be our public key ID).

As a layer on top of Dht, SecureDht can also be used for plain values. Methods like get and put will behave the same as Dht for non-encrypted and non-signed values.

Additionally, SecureDht adds a few methods:

  • PutSigned
void putSigned(const InfoHash& hash, const std::shared_ptr<Value>& val, DoneCallback callback);
  • PutEncrypted
void putEncrypted(const InfoHash& hash, const InfoHash& to, std::shared_ptr<Value> val, DoneCallback callback);

dht::DhtRunner

DhtRunner provides a thread-safe access to the running DHT instance and exposes all methods from SecureDht. See more information here : Running a node in your program

Clone this wiki locally