Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVv2 secrets are always entirely removed from Vault when RandomSecret is deleted from Kubernetes since v0.8.17 #196

Open
davoustp opened this issue Sep 15, 2023 · 20 comments

Comments

@davoustp
Copy link
Contributor

davoustp commented Sep 15, 2023

Release v0.8.17 changes the logic to delete the KVv2 secret entirely from Vault whenever the RandomSecret Kubernetes resource is deleted, instead of removing only the last version of the Vault secret (releases v0.8.16 and below).

As mentioned in #134 (comment) , there are situations where deleting only the last version of a KVv2 secret is still desirable.

A very common example is to use a RandomSecret to generate a root password or database admin password.

In such a case, you would like to protect this KVv2 secret at any cost - loosing it for any reason (think about an erroneous gitops operation removing the RandomSecret resource, or even a straight kubectl deletion) would be catastrophic as the secret can only be recovered from a Vault backup.

So I'd suggest to make this KVv2 deletion logic configurable.

The deletion logic (either delete-last-version or delete-all-versions) could be set using:

  1. an environment variable
  2. a new RandomSecret spec field (requires changing the CRD here)

Consider .2 first, then if not set, then use 1. , and if not set either, use the current behavior.

This looks to maximize the use case coverage and flexibility.

@trevorbox @raffaelespazzoli does this make sense?
I can provide a PR along this lines.

@davoustp davoustp changed the title RandomSecret are always entirely removed from Vault when deleted from Kubernetes since v0.8.17 KVv2 secret are always entirely removed from Vault when RandomSecret is deleted from Kubernetes since v0.8.17 Sep 15, 2023
@davoustp davoustp changed the title KVv2 secret are always entirely removed from Vault when RandomSecret is deleted from Kubernetes since v0.8.17 KVv2 secrets are always entirely removed from Vault when RandomSecret is deleted from Kubernetes since v0.8.17 Sep 15, 2023
@smnbbrv
Copy link

smnbbrv commented Oct 5, 2023

I find this very disturbing as well. As for me, I'd completely block the secret removal instead of even the version removal. If the secret has only one version and then this version gets removed, the entire secret is removed as well, if I get it right?

@davoustp
Copy link
Contributor Author

davoustp commented Oct 5, 2023

I find this very disturbing as well. As for me, I'd completely block the secret removal instead of even the version removal. If the secret has only one version and then this version gets removed, the entire secret is removed as well, if I get it right?

With v0.8.17+, the entire secret is removed from Vault, irrespective of the number of versions.
Previous behavior (for KVv2 secrets) was to mark the last version as deleted, so when attempting to access this secret, you would get an error stating that this secret does not exist. However, you could go to Vault and undelete this latest version, making it available again (no data loss here).

In some cases, the entire removal could be useful as well, such as a testing system with a lot of automation around it: not removing these secrets from Vault would leave a lot of useless mess behind, even potentially creating unexpected behaviour if reusing the same secrets across testing sessions.

So what I suggest is that the deletion logic should be made configurable to suit the use case at hand.
Some users may decide that no secret should be deleted for production systems, because too sensitive if secret is lost.
Some others could choose to delete the last version only, so that there is an error but without data loss (requiring manual undelete if this was a mistake).
Some could decide that the full removal better suits their need because their deployment process is stable and robust enough to avoid the situation entirely...

My point is that there is no one-suits-all deletion logic and should be made configurable...

@trevorbox
Copy link
Contributor

I find this very disturbing as well. As for me, I'd completely block the secret removal instead of even the version removal. If the secret has only one version and then this version gets removed, the entire secret is removed as well, if I get it right?

With v0.8.17+, the entire secret is removed from Vault, irrespective of the number of versions. Previous behavior (for KVv2 secrets) was to mark the last version as deleted, so when attempting to access this secret, you would get an error stating that this secret does not exist. However, you could go to Vault and undelete this latest version, making it available again (no data loss here).

In some cases, the entire removal could be useful as well, such as a testing system with a lot of automation around it: not removing these secrets from Vault would leave a lot of useless mess behind, even potentially creating unexpected behaviour if reusing the same secrets across testing sessions.

So what I suggest is that the deletion logic should be made configurable to suit the use case at hand. Some users may decide that no secret should be deleted for production systems, because too sensitive if secret is lost. Some others could choose to delete the last version only, so that there is an error but without data loss (requiring manual undelete if this was a mistake). Some could decide that the full removal better suits their need because their deployment process is stable and robust enough to avoid the situation entirely...

My point is that there is no one-suits-all deletion logic and should be made configurable...

I agree it would be safer to keep it configurable. The conditional should probably apply to both KVv1 and KVv2 RandomSecret types - completely remove the secret in vault or not upon RandomSecret deletion.

I don't see the point in worrying about deleting specific versions in a KVv2 secret, are there use cases anyone cares to share? I think it would be good to keep the deletion logic consistent/simple... completely delete or do nothing to the secret in vault.

@davoustp
Copy link
Contributor Author

davoustp commented Oct 6, 2023

@trevorbox thanks for the feedback.

I agree it would be safer to keep it configurable. The conditional should probably apply to both KVv1 and KVv2
RandomSecret types - completely remove the secret in vault or not upon RandomSecret deletion.

Ok, that makes sense.
Before I get into the coding phase, let's agree on details.
The global environment variable to drive this behavior could be named KEEP_VAULT_KV_ON_REMOVAL (when true, then KV secrets are NOT removed from Vault when the resource is deleted from k8s).
Thoughts, any other proposal?

Second, is there a need to override this settings at RandomSecret resource level? If yes, what could be the (optional) field's name?

I don't see the point in worrying about deleting specific versions in a KVv2 secret, are these use cases anyone cares to share? I think it would be good to keep the deletion logic consistent/simple... completely delete or do nothing to the secret in vault.

I can imagine some as written above (an additional one was to maintain previous versions behavior), but I personally have no such need.

@smnbbrv
Copy link

smnbbrv commented Oct 6, 2023

Probably it makes sense to reuse the k8s naming convention that they use for PersistentVolume that covers same issues.

https://kubernetes.io/docs/concepts/storage/persistent-volumes/#reclaiming

@davoustp
Copy link
Contributor Author

davoustp commented Oct 6, 2023

@smnbbrv you suggest to use the word "retain" instead of "keep" so that the environment variable is named something like RETAIN_VAULT_KV_ON_REMOVAL?
I like that idea. 👍

@trevorbox
Copy link
Contributor

@davoustp might be better to add another option to the the RandomSecretSpec for more granular control of deletion behavior? The retainVaulltKVSecretOnRemoval option should default to false to match existing behavior. Perhaps a RETAIN_VAULT_KV_SECRET_ON_REMOVAL_OVERRIDE env variable could override any individual option to make things simpler.

@davoustp
Copy link
Contributor Author

davoustp commented Oct 6, 2023

@trevorbox agreed, more granular control is better.
Last one: what's the best precedence? Use the envvar settings (defaulting to current behaviour) that can be overriden at resource level?
Or the other way around?

@smnbbrv
Copy link

smnbbrv commented Oct 6, 2023

@smnbbrv you suggest to use the word "retain" instead of "keep" so that the environment variable is named something like RETAIN_VAULT_KV_ON_REMOVAL?

Not really, I would just call it policy. KV_SECRET_RETAIN_POLICY or similar, that can further accept values like retain, delete or even deleteLastVersion in case it is implemented

@trevorbox
Copy link
Contributor

@trevorbox agreed, more granular control is better. Last one: what's the best precedence? Use the envvar settings (defaulting to current behaviour) that can be overriden at resource level? Or the other way around?

We need to decide if it would be useful to override any and all settings, if not then I don't see the point in an env var. I could see a use case to be able to override for testing purposes. Thoughts @davoustp @smnbbrv ?

@smnbbrv you suggest to use the word "retain" instead of "keep" so that the environment variable is named something like RETAIN_VAULT_KV_ON_REMOVAL?

Not really, I would just call it policy. KV_SECRET_RETAIN_POLICY or similar, that can further accept values like retain, delete or even deleteLastVersion in case it is implemented

I think that's a good idea, the field in the spec could be kvSecretRetainPolicy to default to delete with support for retain for now.

@davoustp
Copy link
Contributor Author

davoustp commented Oct 9, 2023

We need to decide if it would be useful to override any and all settings, if not then I don't see the point in an env var. I could see a use case to be able to override for testing purposes. Thoughts @davoustp @smnbbrv ?

@trevorbox when running in production, I would shoot for the safest policy - that is, retaining the Vault KV secret on deletion. If there is no way to set this globally as the default value, then it would mean that we need to explicitly set the flag on each and every RandomSecret resource, which is a pain.
Hence the proposal to use of an env var that would allow setting this globally (but any other global mechanism to set this flag would fit as well).

I think that's a good idea, the field in the spec could be kvSecretRetainPolicy to default to delete with support for retain for now.

Agreed.

@raffaelespazzoli
Copy link
Collaborator

guys I have been following the discussion.
I have two questions:

  1. if we control this with an environment variable the behavior will be applied to all the random secrets. Should we (also) control this behavior via a field in the CR?
  2. if I chose to retain a secret and the RandomSecret CR is deleted and then recreated, what behavior do you expect?

@davoustp
Copy link
Contributor Author

@raffaelespazzoli thanks for the (very relevant) questions! :-)

if we control this with an environment variable the behavior will be applied to all the random secrets. Should we (also) control this behavior via a field in the CR?

I think so - the envvar would be here to control the default policy, and the field to override this policy at resource level.

if I chose to retain a secret and the RandomSecret CR is deleted and then recreated, what behavior do you expect?

Unless the RandomSecret have periodic rotation defined, if it exists, leave it alone, as the sole goal of the rotation-less RandomSecret is to generate a Vault secret, which already exists in this case.

@davoustp
Copy link
Contributor Author

To be totally honest, the downside that I see with the environment variable defining the default policy is that the behavior of the RandomSecret resources is not self-defined anymore: you need to know the value of the envvar (if set) in addition to the resource spec to understand how it behaves globally.

And I dislike this situation - a lot.

It could prove useful, but it will also create headaches: each time a new issue is created with RandomSecrets will raise the question about the envvar... or deploying the same resources in two similar environments may lead to different behavior just because this envvar is not set to the same value.

So now, I'm backtracking on this envvar proposal of mine - I think that we should drop it and keep only the resource field.
This is more consistent in the long run.

Suggestions?

@trevorbox
Copy link
Contributor

To be totally honest, the downside that I see with the environment variable defining the default policy is that the behavior of the RandomSecret resources is not self-defined anymore: you need to know the value of the envvar (if set) in addition to the resource spec to understand how it behaves globally.

And I dislike this situation - a lot.

It could prove useful, but it will also create headaches: each time a new issue is created with RandomSecrets will raise the question about the envvar... or deploying the same resources in two similar environments may lead to different behavior just because this envvar is not set to the same value.

So now, I'm backtracking on this envvar proposal of mine - I think that we should drop it and keep only the resource field. This is more consistent in the long run.

Suggestions?

I think it most reasonable to simply add the retention policy to the CR and default to delete with an option to retain. The status section in the RandomSecret CR could explain an override env variable but its probably not intuitive so lets exclude that idea for now.

@davoustp
Copy link
Contributor Author

Ok. Let me summarize, this means:

  • adding of a new optional RandomSecret spec field named kvSecretRetainPolicy, accepting delete (default) and retain
  • updating the documentation
  • adding the related tests

@trevorbox @raffaelespazzoli @smnbbrv agreed?

If yes, I'll work along those lines and propose a PR.

@smnbbrv
Copy link

smnbbrv commented Oct 16, 2023

@davoustp LGTM, thank you! The default delete would also be aligned with https://kubernetes.io/docs/concepts/storage/storage-classes/#reclaim-policy

@raffaelespazzoli
Copy link
Collaborator

@davoustp I like it.

davoustp added a commit to davoustp/vault-config-operator that referenced this issue Oct 17, 2024
…and logic

Including one-off existing Vault secret overwrite protection
davoustp added a commit to davoustp/vault-config-operator that referenced this issue Oct 17, 2024
…and logic

Including one-off existing Vault secret overwrite protection
@davoustp
Copy link
Contributor Author

Hi @trevorbox , @smnbbrv and @raffaelespazzoli
Sorry, been busy :-( - and now I finally got a few hours to get back to this.

I just implemented what was discussed above a while ago (you can can have a look at fork above, code changes are quite small):

  • added the additional optional retention policy attribute to RandomSecret
  • added existing Vault secret overwrite protection upon one-off RandomSecret creation (no refresh period defined)
  • add a documentation section for this, describing the behaviour and use case
  • manually tested all cases I could think of (incl. v1 and v2)

However, and before I create the PR, I failed to find any existing RandomSecret integration tests that I could enrich with these use cases.
Are there any?
If not, I'm not that confident to scaffold everything from scratch, to be honest... What would be the best course of actions?

@davoustp
Copy link
Contributor Author

Hi @raffaelespazzoli @trevorbox ,
Would it be possible to have a look at that PR and provide feedback, please?
We've been bitten once again by RandomSecrets removal leading to the deletion of the corresponding Vault secret...
Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants