Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collectionName/databaseName attributes to MongoDB provider #3322

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jesmith17
Copy link

Added support for defining the target collectionName and databaseName for the logs to the config file. This allows users to define the values via config instead of only configuring them via the connectionString.

@ppkarwasz ppkarwasz changed the title Main Add collectionName/databaseName attributes to MongoDB provider Dec 25, 2024
@ppkarwasz
Copy link
Contributor

@jesmith17,

Thank you for your contribution.

While I am not a MongoDB user, the current behavior of using the connection string to infer the name of the database and collection does not make sense to me. Your proposal looks like a valid improvement. According to the MongoDB connection string documentation the connection string has the form:

mongodb://[username:password@]host1[:port1][,...hostN[:portN]][/[defaultauthdb][?options]]

where defaultauthdb is documented as:

The authentication database to use if the connection string includes username:password@ authentication credentials but the authSource option is unspecified.

Therefore:

  • defaultauthdb does not seem to be the right value to be used as database name to store the logs: I imagine that some MongoDB installation might use a different database for authentication and to store the data, right?
  • the collection name does not even appear in the official connection string. It only appears in the Java-specific ConnectionString class. I guess that many users don't know that syntax.

@garydgregory, what do you think?

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesmith17,

This looks like a very valid improvement! 💯

The changes to the MongoDB provider should be documented. Can you:

  • Modify database.adoc and add a description of the databaseName and collectionName configuration attributes. IMHO these attributes should be marked as "required" in the documentation and the description should mention their fallback values (which are kept only for backward compatibility). The fallback values can be marked as deprecated and we might remove them in a future Log4j Core version.
  • Since IMHO providing the database and collection name explicitly seems like the right thing to do, all the examples and integration tests should be rewritten to use the databaseName and collectionName attributes.

@ppkarwasz
Copy link
Contributor

@jesmith17,

The integration tests in the log4j-mongodb module use Docker. If you want to run the tests locally, you need to have Docker available on your system and run:

./mvnw verify -pl log4j-mongodb -Pdocker

@garydgregory
Copy link
Member

I'm not a fan because this creates confusion with the connection string. IMO: It should be that EITHER you provide a connection string OR you provide all the bits and pieces to build a connection string. I don't like the idea of doing it half one way and half the other.

IMO, we should follow the KISS principle and only provide connection string support.

Just like with JDBC, you just need to know a driver's connection string syntax, which invariably changes from time to time, and I certainly don't want to have to change this code because there is some Mongo driver change in some version that breaks our code.

@ppkarwasz
Copy link
Contributor

@garydgregory,

I'm not a fan because this creates confusion with the connection string. IMO: It should be that EITHER you provide a connection string OR you provide all the bits and pieces to build a connection string. I don't like the idea of doing it half one way and half the other.

These were my initial thoughts too, until I dug deeper into the connection string.
From what I have understood from the connection string (mainly this documentation), it does not provide neither the database nor the collection name:

  • it does not contain the database name, because the defaultauthdb part is the name of the database, where the user is defined, not the database that will be used by the MongoDB client. The MongoDB client can use any database available on the server. Did I misinterpret this part?
  • the official connections string does not even have a "collectionName" part. The collection name is only defined by the Java-specific ConnectionString. So Java developers might know this alternative syntax, but MongoDB administrators might not know it.

As I mentioned before, I have almost no experience with MongoDB, so these interpretations might not make sense.

Just like with JDBC, you just need to know a driver's connection string syntax, which invariably changes from time to time, and I certainly don't want to have to change this code because there is some Mongo driver change in some version that breaks our code.

For JDBC you also need to provide the name of the table.

@garydgregory
Copy link
Member

So the real problem is that the Mongo documentation could be better, like in any software.

I am concerned this opens the door for more and more options to be added to "override" the connection string components, making maintenance harder.

We should only, IMO, offer configuration hooks for behavior that can't be achieved otherwise. Let's review:

The database as part of the connection string is documented here (where Google took me):
https://www.mongodb.com/docs/manual/reference/connection-string-examples/#std-label-connections-connection-examples
For example. 2 examples I picked from the page:
Self-Hosted records Database Running Locally:

mongodb://myDatabaseUser:D1fficultP%40ssw0rd@localhost/records

and fancier:
MongoDB Atlas Cluster that Authenticates with AWS IAM credentials:

mongodb+srv://<aws access key id>:<aws secret access key>@cluster0.example.com/testdb?authSource=$external&authMechanism=MONGODB-AWS

The collection is specified as "databaseName.collectionName", see our XML configuration files.

The classic problem is that we do not want to create support tickets like "My appender connects to A instead of B, why?" where we then have to say "Oh, that's because you specified the foo in both the connection string and in the other thing and this takes precedence over that, blah blah".

For these use cases of double configuration, we should throw an exception.

Who would port this to log4j-mongodb4? to master?

@vy
Copy link
Member

vy commented Dec 25, 2024

Agreeing with @garydgregory's sentiment that we should not add more configuration knobs, if the shared use case can be served with just the connection string. I can spare time for this further after the vacation (i.e., 2025-01-02), if not earlier.

@jesmith17
Copy link
Author

I agree that there is a slippery slope here, but specifying a table name for a db connection in the connection string is very very odd. Allowing the collection name via config instead of connection string puts this in line how the JDBC appender works which required a table name as part of the config. I would actually prefer to make this a config only part and have it override the connection string value. With the way the code currently functions you have to extract the db and collection name from the connection string to pass them to other code blocks anyway. But to preserve backwards compatibility I left it in.

@ppkarwasz
Copy link
Contributor

I looked into the Spring Boot configuration properties for MongoDB:

  • Spring Boot offers a set of properties (username, password, host, port, additional-hosts) that are mutually exclusive with uri. We should certainly not introduce those.
  • The database (and authentication-database) property can however be used **together** with uri`.

It seems that Spring Boot interprets the /defaultauthdb part as "default database to use for authentication and data", while it allows users to specify a different database for each one of these functionalities. We should probably offer a similar configuration option. As documented in Authentication Database, you can create a user in database foo and allow the user to access database bar, so we need at least (and at most 😉) an additional configuration knob for the database to use (or authentication database to use).

@jesmith17
Copy link
Author

I would also point out that spring doesn't have a configuration property for the collection, and that's really the key item here. They assume, appropriately, that you will have various objects being written to various collections inside the app. So you configure the collection for items differently than you do the DB.

That's really the pattern I think is needed here.

@jesmith17
Copy link
Author

So I am working on making the unit test changes but I am finding 2 key challenges.

  1. The MongoDbProvider and the MongoDbConnection classes aren't easily unitTested. Essentially they don't expose anyways to ask them about the database or collection (or even the connection for that matter) based on how they currently sit. So Im finding that I have to make some more changes to make them really testable.
  2. I am getting blocked by the io.fabric5:docker-maven-plugin and it's unhappiness that I am running on a M1 Mac. It can't seem to resolve any of the docker server connections so it fails when trying to start docker. I have verified that it's all running, but seem to be stuck. Has anyone recently hit this issue with the Mac M1 process and docker? According to the docker-maven-plugin this was supposed to be resolved in 0.38 and the version mine is finding is 0.45.1.

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesmith17,

This looks almost done to me, thanks! It is still missing some details:

  • can you modify the config files in log4j-mongodb/src/test/resources to use the databaseName and collectionName attributes?
  • can you add a changelog entry to src/changelog/.2.x.x? Just copy/paste an existing one and modify the data.

@jesmith17
Copy link
Author

Unable to do the changelog entry as this work is done on the main (3.x) branch and that file doesn't exist. I can work to back-port (or recreate these in the 2.x structure and can make those changes to that version as needed.

@ppkarwasz
Copy link
Contributor

Unable to do the changelog entry as this work is done on the main (3.x) branch and that file doesn't exist. I can work to back-port (or recreate these in the 2.x structure and can make those changes to that version as needed.

Sorry, I meant src/changelog/.3.x.x 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants