-
Notifications
You must be signed in to change notification settings - Fork 10
Conversation
📦 quay.io/s3gw/s3gw:pr-39134ba44abcb24c6fa7f0b40ac5c8e49737199f-6572130889-1 https://quay.io/repository/s3gw/s3gw?tab=tags&tag=pr-39134ba44abcb24c6fa7f0b40ac5c8e49737199f-6572130889-1 |
(tests are failing because of https://github.com/aquarist-labs/s3gw/issues/756) |
3abc71f
to
dd4c90e
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
dd4c90e
to
2f1c295
Compare
sqlite_orm v1.8.2 has fnc12/sqlite_orm#1054, which we want to ensure thread safety. Unfortunately it's missing fnc12/sqlite_orm#1169, without which `storage.sync_schema()` breaks (or at least gives us nasty errors) due to incorrect quotes. In order to pick up the latter, I've forked sqlite_orm into the aquarist-labs org, and made a branch "v1.8.2-s3gw", which is upstream v1.8.2 plus a cherry pick of that additional fix. Signed-off-by: Tim Serong <[email protected]>
If we're alrady using an SFStore, don't create a new DBConnRef, just use the one that's already part of the SFStore. Signed-off-by: Tim Serong <[email protected]>
Using `transaction_guard()` instead of `begin_transaction()` ensures the transaction will rollback if anything in this block throws an exception. Signed-off-by: Tim Serong <[email protected]>
2f1c295
to
aa374c0
Compare
Rebased |
Is this failed test related with the chats we've been having about our CI self-hosted runners? Only thing I see is that the log was abruptly interrupted. |
Nope, my bad - when I rebased, I missed changing a couple of instances of |
This ensures we're not making multiple copies of the Storage object. In this commit, I renamed Storage to StorageImpl, so I could do a build and check for compiler errors to ensure I'd caught all the previous uses of Storage throughout the codebase. Signed-off-by: Tim Serong <[email protected]>
aa374c0
to
cfa1f10
Compare
...let's see how that goes now. |
LOL! now it's failing the WAL explosion test, because the WAL didn't explode as much as usual... |
📦 quay.io/s3gw/s3gw:pr-4976c7504f08b89aa74790a73d2b15f40bf92421-6636123673-1 https://quay.io/repository/s3gw/s3gw?tab=tags&tag=pr-4976c7504f08b89aa74790a73d2b15f40bf92421-6636123673-1 |
cfa1f10
to
8cb8807
Compare
How dare! 🤣 |
src/rgw/driver/sfs/sqlite/dbconn.h
Outdated
|
||
public: | ||
sqlite3* first_sqlite_conn; | ||
std::unordered_map<std::thread::id, sqlite3*> all_sqlite_conns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this private? It has a non trivial access story (pull mutex?)
Do we actually need the thread id here? I can't find any references. Maybe a vector is enough. Still needs synchronized access though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second question first, because it's the easy one :-) The thread id is there because get_sqlite_conn()
uses it to get the sqlite3*
specifically associated with the calling thread. Same mechanism as get_storage()
, just a different map. I found keeping two maps made some of the code easier to read rather than one map with a std::pair<Storage, sqlite*>
or a struct, but either approach would work.
As to can we make it private... The current code inside DBConn actually only uses the main thread's sqlite*
for a couple of things during startup to do with updating metadata. The other use of all_sqlite_conns
is in SFSStatusPage::render()
, where we use it to get db status for all threads. Having it public made that really easy and lightweight, but you're right, it would then need synchronized access, dammit.
I guess the safest thing to do under the circumstances is make that map private (or fold it into the storage_pool
map), and add a method that returns a vector created as necessary from the contents of the map. Something like this (untested):
std::vector<sqlite*> all_sqlite_conns() const {
std::vector<sqlite*> conns;
std::shared_lock lock(storage_pool_mutex);
for (auto p : all_sqlite_conns) {
conns.emplace(p.second)
}
return conns;
}
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_sqlite_conns sounds good. Right now this is only called by a human via the status page so this doesn't need to e high performance - at the end this is roughly 4k (64 bit * 512 threads) data :).
An alternative would be to store the sqlite* pointers in a vector and the special first one separately (or define it as vector[0]). We'd loose the thread association, but the copy is cheaper. I'd rather not have the status page under the pool mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with the alternative :-)
Currently, `DBConn` keeps an instance of `Storage`, which is created by `sqlite_orm::make_storage()`. That first instance of `Storage` is long-lived (the `DBConn` c'tor calls `storage->open_forever()`) so the sqlite database is open for the entire life of the program, but this first `Storage` object and its associated sqlite database connection pointer are largely not used for anything much after initialization. The exception is the SFS status page, which also uses this connection to report some sqlite statistics. All the other threads (the workers, the garbage collector, ...) call `DBConn::get_storage()`, which returns a copy of the first `Storage` object. These copies don't have `open_forever()` called on them, which means every time they're used for queries we get a pair of calls to `sqlite3_open()` and `sqlite3_close()` at the start end end of the query. These calls don't open the main database file again (it's already open) but they do open and close the WAL. There's a couple of problems with this. One is that the SFS status page only sees the main thread (which is largely boring), and can't track any of the worker threads. The other problem is that something about not keeping the connection open on the worker threads is relatively expensive. If we keep connections open rather than opening and closing with every query, we can get something like a 20x speed increase on read queries, and at least 2x on writes. This new implementation gives one `Storage` object per thread, created on demand as a copy of the first `Storage` object created in the `DBConn` constructor. Fixes: https://github.com/aquarist-labs/s3gw/issues/727 Signed-off-by: Tim Serong <[email protected]>
This adds a separate std::vector<sqlite3*> of all DB connections, and a method to get a copy of it so that the SFS stats page can iterate though it easily to query DB statistics. Signed-off-by: Tim Serong <[email protected]>
Fixes: https://github.com/aquarist-labs/s3gw/issues/726 Signed-off-by: Tim Serong <[email protected]>
Signed-off-by: Tim Serong <[email protected]>
Since adding storage_pool, the WAL doesn't explode quite as much as it used to with 10 threads and 1000 objects each (previously with multiple unpooled connections it'd reliably go over 500MB, but I've seen one unit test where it "only" got to ~450MB, and another where it barely got over 70MB). To try to get it to explode nicely I'm now using std::thread::hardware_concurrency() as the number of threads, and creating 2000 objects. I've also dropped the test value to 300MB to give some more wiggle room. Signed-off-by: Tim Serong <[email protected]>
Signed-off-by: Tim Serong <[email protected]>
8cb8807
to
6c6bde6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@irq0 when you have a chance, please do another pass on this so we can get it merged soon :) |
This is the culmination of the work in #209 but I've cleaned up the commit history to get rid of old experiments and also added some tests. I didn't want to squash the commits in the other PR because some of that is still interesting (to me at least) as a historical record.
Checklist