Creating indexes on compressed columns #17

Hakkin · 2022-09-12T16:09:45Z

I tried using this on a database I'm using to store a lot of raw JSON objects, which I then have a view that uses SQLite's JSON support to extract the contents for querying. I originally had a few indexes on JSON expressions (something like CREATE INDEX idx_data_uuid ON data(json_extract(body, '$.data.uuid'))). SQLite's query planner uses this index even when querying from the view, but after compressing the data, there's no more obvious way to create indexes like this since you can't create indexes on views.

I tried creating an index like _data_zstd(json_extract(zstd_decompress_col(body, 1, _body_dict, true), '$.data.uuid')), but SQLite's query planner won't select this when querying from a view. I can make it use the index by replacing my data_view.uuid with the literal json_extract(zstd_decompress_col(body, 1, _body_dict, true), '$.data.uuid') in the SELECT statement, but that kind of defeats the purpose of the view

I see proper index support is on the "Future Work" section, so I assume there may not be any way to do this currently, but I wonder if you have any hints or experience with being able to coerce the query planner into using an index like the one above, since I believe it "should" be possible in theory.

In any case, I hope work on this extension continues, since the results I got were extremely promising. My database compressed to around 6% the original size, and for queries that didn't need those indexes, there was seemingly no loss in performance.

The text was updated successfully, but these errors were encountered:

Hakkin · 2022-09-12T16:40:03Z

While looking around I actually just stumbled on your original blog post for this project which more or less describes this exact situation to a tee, you would think I was copying straight from the blog post.

I see your suggested solution in the blog is to just break out the values you want to index into real columns, which I suppose works well enough, though proper indexing support would definitely be nice. I wonder if you ever manged to get anything else working with GENERATED columns?

phiresky · 2022-09-14T13:14:30Z

Sadly I don't think there's any solution for this right now except storing a copy of the extracted values separately. I'm not sure why SQLite doesn't use the existing index when querying the view even though the expression should be the same. It might be useful to ask this in the SQLite forum, it might be something that's easy to fix for the SQLite devs in an update.

With GENERATED columns the idea is that the whole thing would basically not use a view and backing table, but instead it would just add the _xxxx_dict to the original table, rename the xxxx column to _xxxx_zstd, and add add column xxxx generated always as (zstd_decompress_col(_xxxx_zstd, 1, _xxxx_dict, true). Then an index on json_extract(xxxx, ...) might work, and it should have lower overhead compared to using views.

I haven't tried this yet (should be easy to create an example manually even with the extension in it's current state), but I think there was some logical issue with it that I don't remember right now.

Hakkin · 2022-09-14T15:22:21Z

Just tested it out, can confirm adding (virtual) generated columns to the _zstd table (ALTER TABLE _data_zstd ADD COLUMN uuid TEXT GENERATED ALWAYS AS (json_extract(zstd_decompress_col(body, 1, _body_dict, true), '$.data.uuid')) VIRTUAL), creating an index on that column (CREATE INDEX idx_data_uuid ON _data_zstd(uuid)), modifying the view the extension creates to also SELECT the new row, then modify my own view to SELECT that row instead of doing json_extract on the body row, does work!

It's a little contrived, but it definitely works and allows me to use this more or less as is with a little housework on the schema.
The fact that the generated column is VIRTUAL as opposed to STORED also means there should be no additional size overhead besides the index itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating indexes on compressed columns #17

Creating indexes on compressed columns #17

Hakkin commented Sep 12, 2022

Hakkin commented Sep 12, 2022 •

edited

Loading

phiresky commented Sep 14, 2022

Hakkin commented Sep 14, 2022

Creating indexes on compressed columns #17

Creating indexes on compressed columns #17

Comments

Hakkin commented Sep 12, 2022

Hakkin commented Sep 12, 2022 • edited Loading

phiresky commented Sep 14, 2022

Hakkin commented Sep 14, 2022

Hakkin commented Sep 12, 2022 •

edited

Loading