Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls: certificates auto renewal will become stuck if issuer is changed between config reloads #6732

Open
WeidiDeng opened this issue Dec 6, 2024 · 3 comments
Assignees
Labels
bug 🐞 Something isn't working

Comments

@WeidiDeng
Copy link
Member

When config is reloaded with a changed acme issuer, certmagic will check for the existence of the certificate of created from the new issuer next time certificate should be issued. These certificates don't exist because we want to use them to be created in the first place. certmagic will try in vain for 30 days to renew these certificates.

Detailed explanation:

When caddy is starting, a global tls cache is created if needed

certCacheMu.Lock()
if certCache == nil {
certCache = certmagic.NewCache(cacheOpts)
} else {
certCache.SetOptions(cacheOpts)
}
certCacheMu.Unlock()

it will be destroyed if tls is not used anymore

} else {
// no more TLS app running, so delete in-memory cert cache
certCache.Stop()
certCacheMu.Lock()
certCache = nil
certCacheMu.Unlock()
}

TLS cache will start renewing certificates in the background

https://github.com/caddyserver/certmagic/blob/3fcd710c0cfc6d80026011c8ef9b0d7e94860b2b/cache.go#L127

Managed domains are updated through caddy configuration.

Eventually, renewal will be done here

https://github.com/caddyserver/certmagic/blob/3fcd710c0cfc6d80026011c8ef9b0d7e94860b2b/maintain.go#L235

TLS cache will try to renew the certificate using the latest issuer url, but first it will check the existence of the old certificate:

https://github.com/caddyserver/certmagic/blob/3fcd710c0cfc6d80026011c8ef9b0d7e94860b2b/config.go#L807-L812

It doesn't exist because the old certificate if from a different issuer and the path checked is from the latest issuer.

This will be retried here

https://github.com/caddyserver/certmagic/blob/3fcd710c0cfc6d80026011c8ef9b0d7e94860b2b/config.go#L982

There are at least two ways to fix this: to restart caddy or remove the active caddy configuration and reload it so that caddy will realize these certificates don't exist and should be created instead.

@mholt
Copy link
Member

mholt commented Dec 6, 2024

Thanks for the report. I know you did in Slack, but could you share your logs here too? For the record, so as I go to fix I can ensure that the proper code paths are recreated and I fix the right problem. :)

@mholt mholt added the bug 🐞 Something isn't working label Dec 6, 2024
@WeidiDeng
Copy link
Member Author

This is the screenshort shat shows the stuck job:

Image

The log is as

Nov 12 16:37:54 linux caddy[455]: {"level":"warn","ts":1731400674.037309,"logger":"tls.cache.maintenance","msg":"error while checking if stored certificate is also expiring soon","identifiers":["example.com"],"error":"open /tmp/caddy/certificates/new-acme/example.com/example.com.key: no such file or directory"}
Nov 12 16:37:54 linux  caddy[455]: {"level":"info","ts":1731400674.0373702,"logger":"tls.cache.maintenance","msg":"certificate expires soon; queuing for renewal","identifiers":["example.com"],"remaining":1956125.962630629}
Nov 12 16:37:54 linux  caddy[455]: {"level":"info","ts":1731400674.0377727,"logger":"tls.cache.maintenance","msg":"attempting certificate renewal","identifiers":["example.com"],"remaining":1956125.962229608}
Nov 12 16:37:54 linux  caddy[455]: {"level":"info","ts":1731400674.1341906,"logger":"tls.renew","msg":"acquiring lock","identifier":"example.com"}
Nov 12 16:37:54 linux  caddy[455]: {"level":"info","ts":1731400674.1503463,"logger":"tls.renew","msg":"lock acquired","identifier":"example.com"}
Nov 12 16:37:54 linux  caddy[455]: {"level":"error","ts":1731400674.1505845,"logger":"tls.renew","msg":"will retry","error":"open /tmp/caddy/certificates/new-acme/example.com/example.com.key: no such file or directory","attempt":1,"retrying_in":60,"elapsed":0.000176754,"max_duration":2592000}
Nov 12 16:38:54 linux  caddy[455]: {"level":"error","ts":1731400734.152456,"logger":"tls.renew","msg":"will retry","error":"open /tmp/caddy/certificates/new-acme/example.com/example.com.key: no such file or directory","attempt":2,"retrying_in":120,"elapsed":60.002046754,"max_duration":2592000}
Nov 12 16:40:54 linux  caddy[455]: {"level":"error","ts":1731400854.1529422,"logger":"tls.renew","msg":"will retry","error":"open /tmp/caddy/certificates/new-acme/example.com/example.com.key: no such file or directory","attempt":3,"retrying_in":120,"elapsed":180.002532897,"max_duration":2592000}

The log about certificate expires soon; queuing for renewal and no file or directory appears multiple times later and is omitted.

The cached certificate is from /tmp/caddy/certificates/old-acme/example.com/example.com.key, and I reverted acme url and deleted the config and reload to fix it.

@crzdg
Copy link

crzdg commented Dec 24, 2024

I experience a similar issue with no file or directory. However, my config neither the issuer changes. At least I could not reason why it should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants