Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s server fails to start when using postgres datastore with no existing database named postgres #9033

Closed
mortenlj opened this issue Dec 10, 2023 · 11 comments · Fixed by k3s-io/kine#258

Comments

@mortenlj
Copy link
Contributor

Environmental Info:
K3s Version: v1.28.4+k3s2

Node(s) CPU architecture, OS, and Version:

Linux rpi4b01 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux

Cluster Configuration:

1 server, 5 agents, on various kinds of Raspberry Pi. Using PostgreSQL external datastore, hosted on aiven.io (free tier).

Describe the bug:

When starting up, the server attempts to connect to the postgres database to check if the wanted database exists.
When there is no database named postgres on the server, k3s fails and exits.

Steps To Reproduce:

  • Installed K3s: Used DietPi installation, but I believe installation method to be irrelevant.
  • Configure PostgreSQL external datastore with database named something other than postgres
  • Ensure no database named postgres exists on the database server
  • Start k3s server

Expected behavior:

I expected k3s to connect to the configured database and start normally

Actual behavior:

k3s server attempts to connect to postgres database, fails to connect and exits with an error

Additional context / logs:

https://rancher-users.slack.com/archives/CGGQEHPPW/p1702161565508369
k3s-io/kine#241
https://www.postgresql.org/docs/9.1/creating-cluster.html

@Nyctelor
Copy link

Nyctelor commented Dec 11, 2023

I'll add my own config to this issue:
--datastore-endpoint=postgres://[ROLENAME]:[PASSWORD]@[HOST]:5432/k3sdb

starting kubernetes: preparing server: creating storage endpoint: building kine: failed to connect to host=[HOST] user=[ROLENAME] database=postgres:

The database name the process is trying to connect to does not match the database name sent by --datastore-endpoint

@brandond
Copy link
Member

The postgres database is expected to exist by default, and is used temporarily while testing to see if the selected database exists or needs to be created. From the docs linked above:

After initialization, a database cluster will contain a database named postgres, which is meant as a default database for use by utilities, users and third party applications. The database server itself does not require the postgres database to exist, but many external utility programs assume it exists.

@Nyctelor
Copy link

Ok, thank you for the clarification. I added a line to pg_hba to let my k3s server reach the postgres DB and it is finally working.

@brandond
Copy link
Member

We can look at a way to address that, but the current mechanism was supposed to be an improvement over the previous approach, which always tried to create the k3s database, and just handled the failure - but not in a consistently portable way. The new approach connects temporarily to the postgres database, and queries to see if the k3s database exists - and only creates it if it does not.

@mortenlj
Copy link
Contributor Author

I guess an argument to be made here is if k3s/kine really should be responsible for creating the database?

Creating the tables and anything else is normal and fine, but creating the database itself is in my experience often handled outside the application that uses it, because it requires giving quite elevated privileges to the application for something that should only be needed once in the application lifecycle.

@brandond
Copy link
Member

brandond commented Dec 11, 2023

I guess an argument to be made here is if k3s/kine really should be responsible for creating the database?

It seems to be the behavior that most users expect, so that's what we've always done. Changing that now is not likely.

creating the database itself is in my experience often handled outside the application that uses it

Most users are not using k3s with a DBA team that handles database maintance; they just create a managed database instance (think AWS RDS) and pass the credentials to k3s/kine. That was the original use case for kine and is still what we see most frequently.

@mortenlj
Copy link
Contributor Author

Would it make sense to change it so that any errors inside createDBIfNotExist are treated/logged as warnings but doesn't cause the server to exit?

Then you get the following scenarios:

Database exists at start Able to check for existence Able to create Consequence
Everything works
Everything works
Warning about not being able to check for existence, but everything works
Warning about not being able to check for existence, warning about not being able to create database, but everything works
Database created, everything works
Warning about not being able to create database, then error because database not found
Warning about not being able to check for existence, database created, then everything works
Warning about not being able to check for existence, warning about not being able to create database, then error because database not found

@brandond
Copy link
Member

brandond commented Dec 12, 2023

Sure, if you're up for it, a PR to https://github.com/k3s-io/kine would be welcome!

@ringerc
Copy link

ringerc commented Jan 23, 2024

It seems to be the behavior that most users expect, so that's what we've always done. Changing that now is not likely.

Most users expect insecure, but convenient, defaults.

That doesn't make them a good idea.

I explained here #9111 (comment) why this is a nice "quickstart setup" style option, but totally unappropriate for production deployments.

k3s should connect to the database it's given, with the role it's given.

If that fails, it can try to fall back to creating a DB, but that should really be behind a non-default config option and/or require the admin to configure a separate connection string for a more privileged database role with admin rights, e.g. --datastore-admin-endpoint=postgres://[ADMINROLENAME]:[PASSWORD]@[HOST]:5432/k3sdb. Ideally this connection string should only be available to the app during initial setup, then discarded once the app's dedicated database and DB user (role) are created.

(k3s should really also refuse to run if the query SHOW is_superuser returns on (true) for normal operational database use rather than admin use; this indicates that it's running with a way-over-privileged role.)

@brandond
Copy link
Member

brandond commented Jan 24, 2024

If you have suggestions on how to follow the principle of least privilege (don't access things you don't need), alongside the principle of least surprise (don't fail to run if there's a problem that can be fixed automatically, such as creating a database for the user), while also maintaining as much legacy behavior as possible to reduce disruption when upgrading... a PR to https://github.com/k3s-io/kine/blob/master/pkg/drivers/pgsql/pgsql.go would be appreciated.

Recent changes to postgres support have all been community driven, and while these ideas all sound reasonable, we don't have any DBAs on the team and are not well positioned to referee arguments about how a database cluster should be run.

Do note however that we are unlikely to accept any change that suddenly causes k3s to refuse to run if it has "excessive" privilege, as we commonly see users configuring k3s to use the default admin account provided by a managed database service or prepackaged container image.

@ringerc
Copy link

ringerc commented Jan 26, 2024

Do note however that we are unlikely to accept any change that suddenly causes k3s to refuse to run if it has "excessive" privilege, as we commonly see users configuring k3s to use the default admin account provided by a managed database service or prepackaged container image.

Right; since that's been historically permitted, it should emit at most a warning. Then add a non default feature flag warning that eventually that capability will go away, etc. Longer term stuff. While ideally it should not have permitted this configuration in the first place, it's a bit late to just turn it off.

As for the rest, I'll have to see how the k3s testing in the org I'm working with goes. Thanks for the pointer on where to start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants