Index is a prototype data indexing and tracking client. It is intended to provide a simple means of interactively investigating indexd deployments. It is built upon a basic REST-like API and demonstrates how a client utility can be built to interact with the index in a meaningful manner.
The prototype implementation for the client is requests based. This provides a minimum list of requirements and allows for deployment on a wide range of systems with next to no configuration overhead. That said, it is highly recommended to use pip and a virtualenv to isolate the installation.
To install the prototype implementation, simply run
pip install .
At present, all configuration options are hard-coded in the prototype. This will be subject to change in the future, as options are moved to configuration files. Until that time, the primary hard-coded configurations to keep in mind is the index host and port combination.
HOST = 'localhost'
PORT = 8080
Records are collections of information necessary to as-uniquely-as-possible identify a piece of information. This is done through the use of hashes and metadata. Records are assigned a UUIDv4 at the time of creation. This allows records to be uniquely referenced amongst multiple records. To prevent an update conflict when multiple systems are editing the same record, a revision is stored and changed for every update. This is an opaque string and is not used for anything other than avoiding update conflicts.
Hashes used by the index are deployment specific, but are intended to be the results of widely known and commonly available hashing algorithms, such as MD5 or SHA1. This is similar to the way that torrents are tracked, and provides a mechanism by which data can be safely retrieved from potentially untrusted sources in a secure manner.
Additional metadata that is store in index records include the size of the data as well as the type.
Records adhere to the json-schema described in indexd:
An example of one such record:
{
"id": "119d292f-b786-421e-a8dd-72208e77c269",
"rev": "dbee8496-5d03-4fbd-9115-6871c4ebf59f",
"size": 512,
"hash": {
"md5": "e2a3a55aa1596f87f502c8ff29d74244",
"sha1": "cb4e5ba5d30fd4667beba95bf73ea9d76ad3dcd4",
"sha256": "20b599fa98f5f98e89e128ba6de3b65ff753c662721f368649fb8d7e7d4933b0"
},
"type": "object",
"urls": [
"s3://endpointurl/bucket/key"
]
}
All queries to the index are made through HTTP using JSON data payloads. This gives a simple means of interaction that is easily accessible to any number of languages.
These queries are handled via requests and wrapped into the index client.
TODO
TODO
> ./bin/index --host 'indexd.service.consul' --port 80 retrieve 00000073-27e1-4dcd-bfdc-e458c31feec2 | jq '.did,.created_date'
"00000073-27e1-4dcd-bfdc-e458c31feec2"
"2021-12-14T01:47:28.566542"
TODO
TODO
A plugin with fixtures for indexd/indexclient related tests. pytest_indexd
We use pre-commit to setup pre-commit hooks for this repo. We use detect-secrets to search for secrets being committed into the repo.
To install the pre-commit hook, run
pre-commit install
To update the .secrets.baseline file run
detect-secrets scan --baseline .secrets.baseline
.secrets.baseline
contains all the string that were caught by detect-secrets but are not stored in plain text. Audit the baseline to view the secrets .
detect-secrets audit .secrets.baseline