Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider splitting MVTs into layers in storage #58

Open
pnorman opened this issue Jan 2, 2025 · 0 comments
Open

Consider splitting MVTs into layers in storage #58

pnorman opened this issue Jan 2, 2025 · 0 comments

Comments

@pnorman
Copy link
Owner

pnorman commented Jan 2, 2025

A vector tile is just a concatenation of vector tile layers together, so some systems store individual layers.

Now that I've got some experience with generating tiles on a minutely updating planet, I can look at the issues with this option and if it makes sense.

This only makes sense with either per-layer generation or filtering of output layers. The latter is not planned for tilekiln to keep caching simple, but the former is possible. Right now a change in any layer requires re-rendering all layers.

To be useful per-layer generation, it has to be possible to render only some layers, which means tiles need to partially exist

Storage

Currently a VT takes about 50 bytes overhead to store (18GB total) plus whatever is needed for indexes. This includes storing the timestamp the tile was generated at.

Storing individual tile layers as tuples

As it's necessary to distinguish between an empty layer ''::bytea and an ungenerated one NULL::bytea, this would require a row for each layer on each tile. On shortbread this would add about 200GB of storage. The queries are a bit awkward. At first it seems that SELECT string_agg(tile) FROM storage WHERE z=$1 AND x=$2 AND y=$3 would work, but this doesn't work with missing layers.

Storing tiles as tuples with multiple columns

Ignoring generated timestamps, this would have the tables defined as
storage (
zoom smallint,
x integer,
y integer,
layer1 bytea,
layer2 bytea,
...
)

I'm leaning towards this. It avoids the overheads of per-tile and allows us to easily know when a layer is unrendered.

Timestamps

We use timestamps for cache headers, so some consideration of this is needed. The two options are storing a timestamp for each layer, or one for the overall tile. If storing per-layer, best-case is 4 bytes per layer per tile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant