Indexes S3 data for Gen3's indexd
microservice, fast. You're going to need a bigger RDS instance.
- Move secrets from environment vars to AWS secret store
First you need to generate a manifest containing the following example info used for job submissions:
"jobs": [
"start_key": null,
"end_key": "dg.XXXX/2525cfe8-d233-4d0c-9601-0b69d222b2a5/clinical.json"
"start_key": "dg.XXXX/2525cfe8-d233-4d0c-9601-0b69d222b2a5/clinical.json",
"end_key": "dg.XXXX/8a097d37-e4c0-49a2-b433-728521a8cd2a/output.tsv"
"start_key": "dg.XXXX/8a097d37-e4c0-49a2-b433-728521a8cd2a/output.tsv",
"end_key": null
"opts": {
"regexs": null,
"prefix": "dg.XXXX",
"batch_size": 3
"obj_count": 41
Download the tool used to generate this manifest file.
$ docker pull jacquayj/gen3-s3indexer-manifest
$ docker run jacquayj/gen3-s3indexer-manifest -h
manifest [OPTIONS]
Application Options:
-r, --regex= Object keys must match this or be skipped, multiple
expressions can be specified
-p, --prefix= Limits the response to keys that begin with the specified
-s, --batch-size= Batch cluster size (default: 10)
Help Options:
-h, --help Show this help message
Clone this repo, the manifest you generate needs to be inside the gen3-s3indexer-extramural
$ git clone
$ cd gen3-s3indexer-extramural
Generate the manifest.json
- Save ENV file
containing your AWS S3 configuration, see.env.example
as reference:
- Pass in the desired
, and any prefixes (--prefix
) or regex filters (--regex
). Also specify the.env
file you created.
$ docker run --env-file=.env jacquayj/gen3-s3indexer-manifest \
--batch-size=3 \
--prefix="dg.XXXX" > manifest.json
Then build the job container, including the manifest.json
you generated in previous steps (should exist in same directory).
$ docker build -t username/my-batch-container .
$ docker push username/my-batch-container
"jobDefinitionName": "gen3-indexer",
"jobDefinitionArn": "arn:aws:batch:us-east-1:098381893833:job-definition/gen3-indexer:17",
"revision": 17,
"status": "ACTIVE",
"type": "container",
"parameters": {},
"containerProperties": {
"image": ",
"vcpus": 4,
"memory": 4000,
"command": [
"/gen3-s3indexer-extramural $AWS_BATCH_JOB_ARRAY_INDEX"
"volumes": [],
"environment": [
"name": "AWS_REGION",
"value": "us-east-1"
"name": "AWS_ACCESS_KEY_ID",
"value": "-redacted-"
"name": "NUM_WORKERS",
"value": "10"
"value": "-redacted-"
"name": "AWS_BUCKET",
"value": "-redacted-"
"name": "INDEXD_PASS",
"value": "-redacted-"
"name": "INDEXD_URL",
"value": "-redacted-"
"value": "[email protected]"
"name": "INDEXD_USER",
"value": "gdcapi"
"name": "JOB_QUEUE_SIZE",
"value": "1000"
"mountPoints": [],
"ulimits": [],
"resourceRequirements": []
"timeout": {
"attemptDurationSeconds": 86400