Skip to content
This repository has been archived by the owner on Jul 21, 2023. It is now read-only.

Latest commit

 

History

History
23 lines (19 loc) · 1.08 KB

README.md

File metadata and controls

23 lines (19 loc) · 1.08 KB

S3 Usage plugin

This plugin provides metadata on data stored in multiple S3 buckets in AWS. The main purpose is to provide a view on:

  • inefficiently stored data (multiple small files),
  • size of each table/partition,
  • temporary data using S3 space.

The plugin configration is stored as S3_USAGE_PARAMS as a dictionary with the following parameters:

  • buckets_regexp - regular expression to match the buckets
  • aws_access_key_id and aws_secret_access_key - credentials to be used to communicate with AWS API
  • ttl (optional) - time to live of cached sizes in seconds (default: 24 hours)

From the technical perspective: after the application starts, plugin starts S3StatsRefreshTask that checks for buckets matching the provided regular expression. For each bucket all keys are listed and the metadata are stored in aggregated context (in "directory" context) in SQLite database inside Docker container. Tab view uses REST API to retrieve the requested data from SQLite database and present them in form of the directory tree and subburst diagram.

s3 usage