You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Anecdotal evidence suggests that recent uploads have a high probability to be requested from a gateway in a near future.
The solution proposed here aims to improve the performance of reads on recently uploaded content by warming the cache immediatly after write. Indirectly, it will also help decrease load on public gateways and work around delays on data availability.
High level flow
Write backup upload chunk to S3 bucket nft.storage/packages/api ✔️
Lambda function triggers on S3 write assemble-cars-lambda
verify if /raw namespace trigger
perform CAR completeness optimistic validations
is CAR file complete? we can rely on S3 metadata
if CAR is complete: write to /complete namespace
else: does CAR file have a DagPB root with know acceptable size?
get list of CARs in same directory and verify if their total size is bigger than known size
Get all CARs in the S3 directory
Validate CAR is complete through all the links
Join CARs and write to /complete namespace
Lambda function triggers on S3 write gateway-warm-cache-lambda
verify if /complete namespace trigger
ask Gateway worker to cache CID
CF Gateway Worker Receives a request to look for a CID and cache it nft.storage/packages/gateway
Asks S3 Gateway for CID content
Put response in CF cache
S3 exporter s3-exporter or s3-gateway
Read requested CID from S3 /complete namespace
Unpack CAR, unixfs export it and stream it
Implementation details
assemble-cars-lambda
We need to look into limitations of AWS Lambda memory for "acceptable" size config
gateway-warm-cache-lambda
Only ask to warm cache for content smaller than 512Mb
Gateway Worker
Route /cache/:cid
Protected route! Only our internal services should be able to do this!
With stream uploads, we will be able to know in the API when uploads are wrapped up
(Probably) We will be able to drop the lambda function and API Worker can just ask Gateway Worker to warm cache when upload ends -- Cache content needs to be added within gateway worker region ID
Alerting system
Users have reported intermittent issues trying to get data from public gateways.
We have been considering the read test lambda as way of creating alerts but that will result in a lot of extra load to the gateways, which at least for now would not be desirable.
Gateway warming cache proposal
Anecdotal evidence suggests that recent uploads have a high probability to be requested from a gateway in a near future.
The solution proposed here aims to improve the performance of reads on recently uploaded content by warming the cache immediatly after write. Indirectly, it will also help decrease load on public gateways and work around delays on data availability.
High level flow
nft.storage/packages/api
✔️assemble-cars-lambda
/raw
namespace trigger/complete
namespace/complete
namespacegateway-warm-cache-lambda
/complete
namespace triggernft.storage/packages/gateway
s3-exporter
ors3-gateway
/complete
namespaceImplementation details
assemble-cars-lambda
gateway-warm-cache-lambda
/cache/:cid
Metrics
How future with stream uploads looks like
TBD
Notes:
Alerting system
Users have reported intermittent issues trying to get data from public gateways.
We have been considering the read test lambda as way of creating alerts but that will result in a lot of extra load to the gateways, which at least for now would not be desirable.
TBD
Notes:
This will drop tracked work on:
The text was updated successfully, but these errors were encountered: