Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway cache warming #19

Closed
vasco-santos opened this issue Feb 28, 2022 · 0 comments
Closed

Gateway cache warming #19

vasco-santos opened this issue Feb 28, 2022 · 0 comments
Assignees
Labels
kind/enhancement A net-new feature or improvement to an existing feature

Comments

@vasco-santos
Copy link
Contributor

vasco-santos commented Feb 28, 2022

Gateway warming cache proposal

Anecdotal evidence suggests that recent uploads have a high probability to be requested from a gateway in a near future.

The solution proposed here aims to improve the performance of reads on recently uploaded content by warming the cache immediatly after write. Indirectly, it will also help decrease load on public gateways and work around delays on data availability.

High level flow

  1. Write backup upload chunk to S3 bucket nft.storage/packages/api ✔️
  2. Lambda function triggers on S3 write assemble-cars-lambda
    • verify if /raw namespace trigger
    • perform CAR completeness optimistic validations
      • is CAR file complete? we can rely on S3 metadata
    • if CAR is complete: write to /complete namespace
    • else: does CAR file have a DagPB root with know acceptable size?
      • get list of CARs in same directory and verify if their total size is bigger than known size
      • Get all CARs in the S3 directory
      • Validate CAR is complete through all the links
      • Join CARs and write to /complete namespace
  3. Lambda function triggers on S3 write gateway-warm-cache-lambda
    • verify if /complete namespace trigger
    • ask Gateway worker to cache CID
  4. CF Gateway Worker Receives a request to look for a CID and cache it nft.storage/packages/gateway
    • Asks S3 Gateway for CID content
    • Put response in CF cache
  5. S3 exporter s3-exporter or s3-gateway
    • Read requested CID from S3 /complete namespace
    • Unpack CAR, unixfs export it and stream it

image

Implementation details

Metrics

  • Successful warm cache requests
  • Length distribution of warm cache content

How future with stream uploads looks like

TBD
Notes:

  • With stream uploads, we will be able to know in the API when uploads are wrapped up
    • (Probably) We will be able to drop the lambda function and API Worker can just ask Gateway Worker to warm cache when upload ends -- Cache content needs to be added within gateway worker region ID

Alerting system

Users have reported intermittent issues trying to get data from public gateways.

We have been considering the read test lambda as way of creating alerts but that will result in a lot of extra load to the gateways, which at least for now would not be desirable.

TBD
Notes:

This will drop tracked work on:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants