-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison to compression VFS #9
Comments
Any thoughts on SQLite's official CEROD extension? I know it's commercial but the source is public, so as long as you did an academic study I think that'd be legal? If not, do you know anecdotally why that extension might not be a great choice for comparison? Does it have too much performance penalty? Is read-only requirement too restrictive? |
I didn't know CEROD was source-available. I don't see any obvious documentation about how the compression aspect of CEROD works, but I'll assume it's similar to sqlite_zstd_vfs. First of all, read-only sounds extremely restrictive to me. The main potential disadvantage of the VFS approach is that all data is compressed in the same way. That means that even the often-accessed parts (e.g. B-Tree inner nodes) are compressed which might be bad for performance. It also means there's no way to compress different data differently. For example, with my extension you can compress one table with high compression level, another with low compression level (resulting in higher write performance), and another one not at all. You can also flexibly choose the method for deciding which data is compressed together or at all with an arbitrary SQL expression, which is advantageous for e.g. a hot-set vs archived-data situation. Another advantage of sqlite-zstd is that it can be incrementally enabled without rewriting the rest of the database and without affecting the storage layer (how a database is opened / written to) at all. The main disadvantage is that this extension is more complex than e.g. sqlite_zstd_vfs and not 100% transparent to the application, and that only data can be compressed, not the B-trees / indexes, which can reduce the effectiveness. I'm pretty sure it's possible to construct both examples where my extension is faster and more storage efficient than VFS-based alternatives, as well as where it is worse. But it's hard to say in general which will be the case for a real-world application. |
In my case, it boiled down to VFS applied to a database of mostly ints (no blobs or big varchars). If I had blobs or big varchars, I would use yours :) |
Give your original idea in the blog,there is some overlap with time-series db (or timeseries db extensions, there are a few for postgresql eg https://github.com/timescale/timescaledb) |
Something like https://github.com/mlin/sqlite_zstd_vfs. Pretty different in the tradeoffs and design decisions, but still should be compared to.
The text was updated successfully, but these errors were encountered: