Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

Closes #660 Store management #731

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

fluffyemily
Copy link
Contributor

@fluffyemily fluffyemily commented Jun 1, 2018

Add Store manager to ensure that we only have one open handle to each database.

This implementation has gone through several iterations. The considerations that formed the current design are as follows:

  1. rusqlite::Connection is Send but not Sync and therefore cannot be stored in a static var. This is why we only store the Conn inside Stores.
  2. Conn is stored as a Weak reference because otherwise every time we return a Store we increase the reference count from one to two, forcing every mutable operation performed on the Conn to clone rather than return a mutable reference, dramatically increasing the number of Conn references proliferating the system. By storing Conn as a Weak, for as long as there is only one Store we can perform mutable operations without increasing the reference count on Conn.
  3. By storing Conn as a Weak reference, if we cease to hold any strong references to the Conn, by not having any active Store's, the Weak will no longer be upgradable, but the entry will remain in the map of open Conns. We therefore must recreate the Conn if a request for a previously opened store is received. We do this in the background without consumers being aware.
  4. The original implementation had a thread_local store of open Rc<rusqlite::Connection>'s mapped by the same key as Conn's. However, due to rusqlite::Connection not being Clone, this meant that Rc::make_mut could not be used, which resulted in the code being unable to get a mutable reference to the rusqlite::Connection if there was more than one reference. As the map retained a strong reference, that ensured that every returned Store had a reference count of at least 2 on it's rusqlite::Connection which resulted in all mutable operations on the Store failing. I therefore made the decision to create a new rusqlite::Connection for every Store returned. This contravenes the original spec, but given the constraints something had to give and I decided on this one.
  5. Writing tests with in memory stores mapped by path ("") with a static Store lead to a problem where the same Conn was shared between tests and the assertions around reference counts on stored Conn's were unreliable. In order to get around this I introduced the concept of the named_in_memory_store which assigns a name to an in memory store ensuring that the same Conn was not shared between tests. This is the requirement that lead to the decision to store keys as String's rather than PathBuf.

The thing I don't like about this is the number of rusqlite::Connections that we create. After chatting with @rnewman I will look into providing a thread_local connection pool and get Stores to take a reference to a connection that it will use to create Storess.

@fluffyemily fluffyemily requested review from grigoryk and ncalexan June 1, 2018 14:04
Copy link
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emily, I have mentioned a few things -- largely style changes that I want you to incorporate globally (some EDN preferences and using full sentences for comments throughout).

But then there are more substantive things that we'll need to work through together.

  1. I'd like to see a fixed-up version of the first commit land (splitting Store out of Conn), but I'd like to see it split into store.rs. I have done this locally and may just land it myself, since it's a good refactoring. You can see my tree at

https://github.com/ncalexan/mentat/tree/fluffyemily/store-management/

  1. You can't use canonicalize in the way that you are, because it goes to the filesystem and fails if the file doesn't exist. (That's why your tests are failing in Travis.) You can work around this by canonicalizing after the first successful open, but it's all kinds of complicated. Do we really care about this?

  2. The split between named memory stores and files is awkward as written, and it doesn't do what a consumer familiar with SQLite thinks. That is, two named memory stores do not correspond to the same SQLite in-memory store, since unless you take steps to unify them, SQLite treats in-memory stores as distinct. I'd really prefer to standardize on SQLite's URI notation for stores (and potentially build an API for named in-memory stores that uses that URI notation). Canonicalizing URIs is much simpler than canonicalizing file paths, since there's really only .. to worry about and not symlinks.

Finally, on the meat of the problem: I'm just not sure this expression is going in a good direction. At first I though you wouldn't be able Arc::make_mut your way forward, because make_mut updates or clones the underlying data. But I see you have this Arc<Mutex<...>> thing in place, so in theory the clone should do the right thing. But my test across threads (see my branch) shows that it doesn't work, at least not as written,. I can't convince myself if that's a small bug or a fundamental problem (yet). I will say that it's very hard to reason about the nested Arc and Mutex layers, and the Weak is another layer of indirection that I haven't really understood yet. So I thought I'd give an initial review, asking for some obvious fixes and clean-ups, and point at what I think are some non-obvious problems, so that we can iterate on the final solution. Thanks!

src/errors.rs Outdated
@@ -109,5 +110,20 @@ error_chain! {
description("provided value doesn't match value type")
display("provided value of type {} doesn't match attribute value type {}", provided, expected)
}

StoreNotFound(path: String) {
description("the Store provided does not exist or is not yet open.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-existence (on disk) is pretty different from not being opened. Is there a reason this is one error?

src/stores.rs Outdated
pub fn open_store<T>(path: T) -> Result<Store> where T: AsRef<Path> {
let path_ref = path.as_ref();
let path_string = path_ref.to_string_lossy();
let (name, cannonical) = if path_string.len() > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "canonical" throughout this method.

src/stores.rs Outdated
} else {
(path_string.into(), path_ref.to_path_buf())
};
Stores::singleton().write().unwrap().open(&name, cannonical)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be .write()? throughout? This is just a (special) Result so we don't want to panic on failure.

src/stores.rs Outdated
assert_eq!(1, Arc::strong_count(store1.conn()));
let mut in_progress = store1.begin_transaction().expect("begun");
in_progress.transact(r#"[
{ :db/ident :foo/bar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's usual to tighten EDN, so no spaces before or after curlies (like {:db/ident ...}), and it's unusual to include commas (which are EDN whitespace).

src/stores.rs Outdated
let mut store2 = Stores::connect_store(path).expect("expected a new store");
let mut in_progress = store2.begin_transaction().expect("begun");
in_progress.transact(r#"[
{:foo/bar 15, :foo/baz false, :foo/x [1, 2, 3]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no commas, please. (But the tight curlies look more natural here.)

src/stores.rs Outdated
}

{
// forking an open store leads to a ref count of 2 on the shared conn.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (throughout) we use full sentences with punctuation for comments. Everywhere, please.

}
}

pub fn singleton() -> &'static RwLock<Stores> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hide the RwLock<Stores> in something opaque, say:

pub struct StoreManager(RwLock<Stores>)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reading this code and getting confused until I realized that there's a static implementation and a non-static implementation of basically the same thing. What's the point of exposing the singleton and the static implementation? I'd rather have a non-static implementation and a singleton; it makes testing easier ('cuz you can test against an instance that isn't the singleton).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Access to the singleton should definitely not be public. (Right?)


#[test]
fn test_stores_get_closed_store() {
match Stores::get_named_in_memory_store("test_stores_get_closed_store").expect("Expected a store to be fetched") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can assert!(blah.is_none()) or similar, if you like.

@fluffyemily fluffyemily force-pushed the fluffyemily/store-management branch 6 times, most recently from a164969 to e291159 Compare July 4, 2018 14:07
@fluffyemily fluffyemily requested a review from ncalexan July 4, 2018 14:07
@fluffyemily
Copy link
Contributor Author

@ncalexan This is the version without connection pooling.

Copy link
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking much closer to what I expect, but I think there's still work to do around the StoreConnection functionality:

  • either there's work to radically simplify that functionality
  • or there's work to explain to me why this layer of complexity is needed
    But overall, this is looking much better! Have you tried racing lots of threads opening and closing the same stores simultaneously? That's a good test to implement.

@@ -582,7 +582,6 @@ impl Conn {
/// Prepare the provided SQLite handle for use as a Mentat store. Creates tables but
/// _does not_ write the bootstrap schema. This constructor should only be used by
/// consumers that expect to populate raw transaction data themselves.

pub(crate) fn empty(sqlite: &mut rusqlite::Connection) -> Result<Conn> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first commit, rebase the changes so as to not lose the original commit message.

@@ -86,7 +86,7 @@ fn escape_string_for_pragma(s: &str) -> String {
s.replace("'", "''")
}

fn make_connection(uri: &Path, maybe_encryption_key: Option<&str>) -> rusqlite::Result<rusqlite::Connection> {
pub fn make_connection(uri: &Path, maybe_encryption_key: Option<&str>) -> rusqlite::Result<rusqlite::Connection> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thom didn't make this public for a reason. We should be feature-checking sqlcipher in Store and Stores to handle this, I think.

src/errors.rs Outdated
@@ -80,6 +80,15 @@ pub enum MentatError {
#[fail(display = "provided value of type {} doesn't match attribute value type {}", _0, _1)]
ValueTypeMismatch(ValueType, ValueType),

#[fail(display = "Cannot open store {} at path {:?} as it does not match previous store location {:?}", _0, _1, _2)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we differentiating "names" from "paths" at all? If you give the same path two different names, it's a catastrophe, and this error is all about preventing you giving the same name to two different paths.

Let's just let folks give an in-memory URI if they want, and then make it easy for them to create those URIs (which are a little bit of a SQLite detail). Right now we have a split in the API based on whether the store is named that seems odd.

src/errors.rs Outdated
#[fail(display = "The Store at {} does not exist or is not yet open.", _0)]
StoreNotFound(String),

#[fail(display = "The Store at {:?} has active connections and cannot be closed.", _0)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The display and the actual name seem to have diverged. I don't think we should leak something about lock poisoning if we can help it; it's a weird case to expose to high-level consumers.

src/errors.rs Outdated
#[fail(display = "Cannot open store {} at path {:?} as it does not match previous store location {:?}", _0, _1, _2)]
StorePathMismatch(String, PathBuf, PathBuf),

#[fail(display = "The Store at {} does not exist or is not yet open.", _0)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not yet open, when would I want to get a "not found" error rather than opening it?


impl Stores {
// Returns true if there exists an entry for the provided name in the connections map.
// This does not guarentee that the weak reference we hold to the Conn is still valid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This business about the weak reference is outdated now. (Here and below.)

// Creates a new store on an existing connection with a new encrypted rusqlite connection.
// Equivalent to forking an existing store.
#[cfg(feature = "sqlcipher")]
pub fn connect_with_key(&mut self, name: &str, encryption_key: &str) -> Result<Store> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when you try to connect to a store that is encrypted with and without a key?

Can we, in some way, namespace stores, perhaps by asserting that encrypted stores always need to be suffixed .encrypted.mentatdb, and rejecting unencrypted stores with that suffix? Basically, don't be accepting -- be rigid up front to make this problem easier; we can always loosen our restrictions.

@@ -18,7 +18,7 @@ buildscript {
google()
}
dependencies {
classpath 'com.android.tools.build:gradle:3.1.2'
classpath 'com.android.tools.build:gradle:3.1.3'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent commit comment. This is... distressing. But whatever moves us forward, I guess.

@@ -10,6 +10,7 @@
# Specifies the JVM arguments used for the daemon process.
# The setting is particularly useful for tweaking memory settings.
org.gradle.jvmargs=-Xmx1536m
org.gradle.configureondemand=false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? This sucks, because always configuring all projects is very expensive as sub-projects multiply.

@@ -36,7 +36,7 @@ public Mentat(String dbPath) {
}

/**
* Open a connection to an in-memory Store.
* Open a connection to an anonymous in-memory Store.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very misleading, and is really the problem here -- your store isn't anonymous, it's the "empty string", hence it always open a new store. Why not make this function actually anonymous by randomizing the string and drop all the hand-coded names?

Emily Toop added 4 commits August 29, 2018 11:29
Enable ability to create named in memory stores and in memory stores with shared caches.
Include ability to create encrypted connections.
Update `Store` to take an `Arc<Conn>` so references can be shared.
Update FFI to use `Stores` instead of `Store`.
Add `store_open_named_in_memory_store` to open a named in-memory store over FFI (useful for tests).
… Utilize named in memory store in tests to ensure isolation when running.
@fluffyemily fluffyemily force-pushed the fluffyemily/store-management branch from e291159 to 5ead1d3 Compare August 29, 2018 16:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants