Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUST-735 Remove Document as default generic type on Collection and Cursor #323

Merged

Conversation

patrickfreed
Copy link
Contributor

RUST-735

This PR removes Document as the default generic type for Collection and Cursor and updates Database::collection and Database::collection_with_options to return Collection<T>. As part of this, the existing typed helpers became redundant and were removed. These changes were made to ease and promote the use of serde models with the driver instead of regular Documents.

Note: I didn't update the README examples to avoid confusing users of the 1.2.x or 2.0.0-alpha versions. I filed RUST-736 to cover the work for updating that, which we can do right before the release.

Copy link
Contributor Author

@patrickfreed patrickfreed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few non-as-yet-discussed comments/questions about the public API, so tagging in @kmahar as well.

/// Gets a handle to a collection with type `T` specified by `name` of the database. The
/// `Collection` options (e.g. read preference and write concern) will default to those of the
/// `Database`.
///
/// This method does not send or receive anything across the wire to the database, so it can be
/// used repeatedly without incurring any costs from I/O.
pub fn collection_with_type<T>(&self, name: &str) -> Collection<T>
pub fn collection<T>(&self, name: &str) -> Collection<T>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we discussed the possibility of introducing a collection_with_document helper (or something like that) as an ergonomic way to retrieve a Collection<Document>, but after writing a few of these conversions, it seemed simpler / easier to just use the turbofish (::<>) operator on the existing helpers.

e.g.

let collection: Collection<Document> = db.collection("foo"); // explicit type way
let collection = db.collection::<Document>("foo"); // turbofish
let collection = db.collection_with_document("foo"); // helper

To me, the helper didn't really make it any easier to type nor did it actually make it clearer what was happening, so it seemed fine to me to just not have it. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my outsider perspective it doesn't seem to make a significant difference, but I'd defer to you two on whether collection_with_document is in any way more idiomatic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine by me. The explicit type/turbofish ways both make it clearer that the value returned is generic over Document. I also think a collection_with_document method might be confusing when we have both a document and raw document type.

@@ -213,7 +194,7 @@ impl Database {
&self,
filter: impl Into<Option<Document>>,
options: impl Into<Option<ListCollectionsOptions>>,
) -> Result<Cursor> {
) -> Result<Cursor<Document>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Swift, we have this return a MongoCursor<CollectionSpecification>, and I think it would be a nice improvement here as well for 2.0. The versioned API will ensure that we won't have to worry about deserialization errors for the foreseeable future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm (though that's probably not surprising since I already agreed to it once...)

if so, would be nice to use a DatabaseSpecification type for list_databases too (assuming listDatabases is also in API version 1?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I filed RUST-740 to cover the both of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with the swift library, would CollectionSpecification/DatabaseSpecification be structs that model the info returned from the list methods? If so that sounds good to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly.

@@ -153,7 +153,7 @@ where
&self,
pipeline: impl IntoIterator<Item = Document>,
options: impl Into<Option<AggregateOptions>>,
) -> Result<Cursor> {
) -> Result<Cursor<Document>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider returning a Cursor<O> here (where O is a generic parameter to the aggregate method, not the T from the Collection). Or should we have a separate aggregate_with_output_type or something? I think with aggregation it's more likely that a user might just want to retrieve a document than with a find, since they may mutate the output along the way, so it's a little riskier to have this one be pure generic. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per slack conversation: seems we're all on the same page the T from the collection should not be used since people often do transformations here.

as for whether to require a type be specified in this method (which I was not thinking about the option of in the slack discussion), or just having a separate aggregate_with_type I don't have particularly strong feelings. I agree with your comment that it does seem a bit at odds with the choice to make requiring a type the default for db.collection. OTOH maybe it's less likely that people will bother writing custom Deserialize types for the output types of single methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my original inclination was to just leave this as Document by default and introduce the new separate method, since I imagine it'll be pretty common to use Document at least while prototyping the pipeline. I'm starting to flip back to wanting to maintain consistency though, given how easy it is to just append a ::<Document> to the end of the aggregate call. Especially in the face of complex pipelines, handling the generic type would be a piece of cake.

However

This got me trying it out and I noticed an interesting / frustrating technical challenge to doing it the Cursor<O> way: turbofish cannot be used in conjunction with impl Trait in the argument position (yet, see rust-lang/rust#83701).

e.g.

fn foo<T>(bar: impl ToString) { ... }
foo::<usize>("blah"); // compile error!

yields the following compiler error:

error[E0632]: cannot provide explicit generic arguments when `impl Trait` is used in argument position
  --> src/main.rs:21:11
   |
21 |     foo::<usize>("hello");
   |           ^^^^^ explicit generic argument not allowed

So you'd have to do it the explicit type parameter way:

let x: Cursor<Document> = coll.aggregate(vec![], None).await?

Which could be pretty frustrating.

This will make it tough to implement the one that takes in a generic argument, since we use impl Trait in all our CRUD methods for our options types (e.g. to be able to say options instead of having to say Some(options)). This makes me think we may want to do the builder approach here and have the method be on Cursor actually, something like:

let cursor = coll.aggregate(vec![ ... ], None).await?.with_type::<MyDeserializeType>();

This is pretty ugly admittedly, but I think it's preferable to breaking consistency with how options are provided to the other CRUD methods or requiring users to use explicit type annotations instead of turbofish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree changing the way we pass options either here or everywhere doesn't seem like a great path forward.

is turbofish common/used rather than type annotations that people would be running into this a lot?

adding the method to cursor seems reasonable though it does look clunky.... it also seems consistent with the clone_with_type behavior that we currently have on Collection to change the generic type. and maybe it would be useful in other use cases as well - e.g. if you provide a projection via FindOptions that changes the shape of your documents as well, and you could call this on the resulting cursor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah none of these options is very ideal to me; with_type seems like it would be more ergonomic for chaining method calls together, since providing a type annotation would require breaking out the call to aggregate into a separate line. Updating our CRUD API to use the builder pattern would get rid of this issue, right?

In any case since we're opting to go the aggregate_with_type route, I think we can defer coming to a conclusion here til we implement that (and make a decision regarding the builder pattern).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even with the builder pattern for options/sessions we'd still have the same issue unfortunately, since we use impl IntoIterator for the pipeline.

Also, to clarify, I don't think aggregate_with_type method will really be possible, since we'd have to ditch the impl Trait arguments, so I think we kind of have to do the Cursor::with_type approach. That being said, we don't need to implement that functionality now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I forgot we use impl for the pipeline.

@patrickfreed patrickfreed marked this pull request as ready for review April 8, 2021 19:36
@@ -139,7 +139,7 @@ where
let coll = self
.client
.database(ns.db.as_str())
.collection(ns.coll.as_str());
.collection::<Document>(ns.coll.as_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is honestly not really relevant to the PR, but I'm just curious for my own knowledge and future reference for Swift- aren't some cursors not actually tied to particular collections e.g. one resulting from db-level aggregate? how does this work in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure either. From testing it out, the namespace appears to be "blah.$cmd.aggregate":

{
	"cursor" : {
		"firstBatch" : [
			{
				"_id" : {
					"id" : UUID("8b95ed9d-7d24-4880-bc4e-483510bd5046"),
					"uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
				},
				"lastUse" : ISODate("2021-04-08T22:53:20.885Z")
			}
		],
		"id" : NumberLong(0),
		"ns" : "blah.$cmd.aggregate"
	},
	"ok" : 1
}

We should probably add some test coverage of this case to ensure it's working properly. Assuming that's a valid namespace to send a killCursors to, it seems like it should be okay though.

@@ -213,7 +194,7 @@ impl Database {
&self,
filter: impl Into<Option<Document>>,
options: impl Into<Option<ListCollectionsOptions>>,
) -> Result<Cursor> {
) -> Result<Cursor<Document>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm (though that's probably not surprising since I already agreed to it once...)

if so, would be nice to use a DatabaseSpecification type for list_databases too (assuming listDatabases is also in API version 1?)

/// Gets a handle to a collection with type `T` specified by `name` of the database. The
/// `Collection` options (e.g. read preference and write concern) will default to those of the
/// `Database`.
///
/// This method does not send or receive anything across the wire to the database, so it can be
/// used repeatedly without incurring any costs from I/O.
pub fn collection_with_type<T>(&self, name: &str) -> Collection<T>
pub fn collection<T>(&self, name: &str) -> Collection<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my outsider perspective it doesn't seem to make a significant difference, but I'd defer to you two on whether collection_with_document is in any way more idiomatic

@@ -153,7 +153,7 @@ where
&self,
pipeline: impl IntoIterator<Item = Document>,
options: impl Into<Option<AggregateOptions>>,
) -> Result<Cursor> {
) -> Result<Cursor<Document>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per slack conversation: seems we're all on the same page the T from the collection should not be used since people often do transformations here.

as for whether to require a type be specified in this method (which I was not thinking about the option of in the slack discussion), or just having a separate aggregate_with_type I don't have particularly strong feelings. I agree with your comment that it does seem a bit at odds with the choice to make requiring a type the default for db.collection. OTOH maybe it's less likely that people will bother writing custom Deserialize types for the output types of single methods?

Copy link
Contributor

@isabelatkinson isabelatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, just a few questions. Also wanted to double check that the examples in the README are still good, I believe they should be unaffected but would be worth taking a glance to sanity check if you haven't yet. (It would also probably be a good idea to put those in a test somewhere to make sure they don't become outdated.)

/// Gets a handle to a collection with type `T` specified by `name` of the database. The
/// `Collection` options (e.g. read preference and write concern) will default to those of the
/// `Database`.
///
/// This method does not send or receive anything across the wire to the database, so it can be
/// used repeatedly without incurring any costs from I/O.
pub fn collection_with_type<T>(&self, name: &str) -> Collection<T>
pub fn collection<T>(&self, name: &str) -> Collection<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine by me. The explicit type/turbofish ways both make it clearer that the value returned is generic over Document. I also think a collection_with_document method might be confusing when we have both a document and raw document type.

@@ -213,7 +194,7 @@ impl Database {
&self,
filter: impl Into<Option<Document>>,
options: impl Into<Option<ListCollectionsOptions>>,
) -> Result<Cursor> {
) -> Result<Cursor<Document>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with the swift library, would CollectionSpecification/DatabaseSpecification be structs that model the info returned from the list methods? If so that sounds good to me

@@ -153,7 +153,7 @@ where
&self,
pipeline: impl IntoIterator<Item = Document>,
options: impl Into<Option<AggregateOptions>>,
) -> Result<Cursor> {
) -> Result<Cursor<Document>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah none of these options is very ideal to me; with_type seems like it would be more ergonomic for chaining method calls together, since providing a type annotation would require breaking out the call to aggregate into a separate line. Updating our CRUD API to use the builder pattern would get rid of this issue, right?

In any case since we're opting to go the aggregate_with_type route, I think we can defer coming to a conclusion here til we implement that (and make a decision regarding the builder pattern).

Copy link
Contributor Author

@patrickfreed patrickfreed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally planning on holding off on updating the README until close to the release (RUST-736), but now that we're planning on doing a second alpha I went ahead and updated them. We can still use RUST-736 to track the work for updating the examples to be use serde more prominently. Also, good point on having tests for them--I added a file to our tests folder since the examples only use the public API to ensure the examples compile.

@@ -213,7 +194,7 @@ impl Database {
&self,
filter: impl Into<Option<Document>>,
options: impl Into<Option<ListCollectionsOptions>>,
) -> Result<Cursor> {
) -> Result<Cursor<Document>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly.

@@ -153,7 +153,7 @@ where
&self,
pipeline: impl IntoIterator<Item = Document>,
options: impl Into<Option<AggregateOptions>>,
) -> Result<Cursor> {
) -> Result<Cursor<Document>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even with the builder pattern for options/sessions we'd still have the same issue unfortunately, since we use impl IntoIterator for the pipeline.

Also, to clarify, I don't think aggregate_with_type method will really be possible, since we'd have to ditch the impl Trait arguments, so I think we kind of have to do the Cursor::with_type approach. That being said, we don't need to implement that functionality now.

@@ -153,7 +153,7 @@ where
&self,
pipeline: impl IntoIterator<Item = Document>,
options: impl Into<Option<AggregateOptions>>,
) -> Result<Cursor> {
) -> Result<Cursor<Document>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I forgot we use impl for the pipeline.

@isabelatkinson
Copy link
Contributor

isabelatkinson commented Apr 9, 2021

whoops, just noticed there's a clippy failure in one of the README examples tests (lgtm otherwise though, no need for another round of review on that)

edit: looks like there's also somewhere in the AWS test that needs a type annotation

@patrickfreed patrickfreed merged commit 3e68edf into mongodb:master Apr 9, 2021
@seanpianka
Copy link
Contributor

seanpianka commented Apr 18, 2021

After updating from alpha to alpha.1, I'm now seeing this run-time error with the same code:

failed to update collection: Error { kind: ArgumentError { message: \"update document must have first key starting with \\\'$\" }, labels: [] }" }

Upserting is failing due to this missing key, causing the update_document_check call to fail. However, this was never an issue previously... is there some automatic key insertion for Collection that is no longer being performed?

@patrickfreed
Copy link
Contributor Author

The update_ family of methods are meant to be used with update operator expressions rather than replacement documents, which the replace family of methods are mean to be used with, but we only started validating this in alpha.1 (RUST-718). To fix your issue, update any usages of update_one that use a replacement document to use replace_one instead.

e.g.

coll.update_one(doc! { "x": 2 }, doc! { "x": 3 }, None).await?; // error
coll.replace_one(doc! { "x": 2 }, doc! { "$inc": { "x": 1 } }, None).await?; // error

coll.update_one(doc! { "x": 2 }, doc! { "$inc": { "x": 1 } }, None).await?; // correct
coll.replace_one(doc! { "x": 2 }, doc! { "x": 3 }, None).await?; // correct

If you have any more questions or concerns, please open a ticket on our Jira project rather than continuing the discussion here or on any other PR, since it can be difficult for the team or other users to find the discussion in the future otherwise.

@seanpianka
Copy link
Contributor

Gotcha, I appreciate your detailed reply, thank you @patrickfreed! In the future, I'll open a ticket in the Jira project.

I managed to fix the update_one call via doc! { "$set": replacement_doc }, but replace_one more closely captures my intent. I'll out that, thanks again!

@patrickfreed
Copy link
Contributor Author

No problem, glad you were able to get it working!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants