Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got backup PartiallyFailed result when backing up PVCs which are not used by any pod #7233

Closed
danfengliu opened this issue Dec 20, 2023 · 11 comments

Comments

@danfengliu
Copy link
Contributor

Describe the problem/challenge you have

Backup namespace contains PVCs which not in used by any pod, then got PartiallyFailed result.

k get pvc -n azure-csi-test
NAME               STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nginx-logs-e2e-1   Bound     pvc-e4dda753-8c88-4194-93b5-959a01be1de4   1Gi        RWO            managed-csi    16h
nginx-logs-e2e-2   Pending                                                                        managed-csi    16h
nginx-logs-e2e-3   Pending                                                                        managed-csi    16h
nginx-logs-e2e-4   Pending                                                                        managed-csi    16h
nginx-logs-e2e-5   Pending                                                                        managed-csi    16h

velero describe backup backup-csi-6 --details
Name:         backup-csi-6
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/resource-timeout=10m0s
              velero.io/source-cluster-k8s-gitversion=v1.28.0
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=28

Phase:  PartiallyFailed (run `velero backup logs backup-csi-6` for more information)


Errors:
  Velero:    name: /nginx-logs-e2e-2 message: /Error backing up item error: /error executing custom action (groupResource=persistentvolumeclaims, namespace=azure-csi-test, name=nginx-logs-e2e-2): rpc error: code = Unknown desc = PVC azure-csi-test/nginx-logs-e2e-2 has no volume backing this claim
             name: /nginx-logs-e2e-3 message: /Error backing up item error: /error executing custom action (groupResource=persistentvolumeclaims, namespace=azure-csi-test, name=nginx-logs-e2e-3): rpc error: code = Unknown desc = PVC azure-csi-test/nginx-logs-e2e-3 has no volume backing this claim
             name: /nginx-logs-e2e-4 message: /Error backing up item error: /error executing custom action (groupResource=persistentvolumeclaims, namespace=azure-csi-test, name=nginx-logs-e2e-4): rpc error: code = Unknown desc = PVC azure-csi-test/nginx-logs-e2e-4 has no volume backing this claim
             name: /nginx-logs-e2e-5 message: /Error backing up item error: /error executing custom action (groupResource=persistentvolumeclaims, namespace=azure-csi-test, name=nginx-logs-e2e-5): rpc error: code = Unknown desc = PVC azure-csi-test/nginx-logs-e2e-5 has no volume backing this claim

Describe the solution you'd like
Warning should be enough to let user notice this workload might have issue or not.

Anything else you would like to add:

Environment:

  • Velero version (use velero version):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@danfengliu danfengliu added Needs triage We need discussion to understand problem and decide the priority target/FC labels Dec 20, 2023
@danfengliu danfengliu added this to the v1.13 milestone Dec 20, 2023
@blackpiglet
Copy link
Contributor

blackpiglet commented Dec 20, 2023

I agree we should do better in this scenario.
There are some similar cases to this. Let's settle on a unified solution to all of them.
IMO, we should consider the k8s resources as crucial for the Velero, such as Pod, PV, and PVC.

First, the potential errors should be converted to warnings.
Second, need to consider whether the volumes should be tracked by the skipped PV trackers.

@ywk253100 ywk253100 removed Needs triage We need discussion to understand problem and decide the priority target/1.13-rc1 labels Dec 20, 2023
@ywk253100 ywk253100 removed this from the v1.13 milestone Dec 20, 2023
@hsinhoyeh
Copy link

hi team, thanks for creating this backup/restore tooling. Unfortunately, we encountered this issue when our serverless applications relied on an RWX mode PVC. We choose to do a backup at midnight, where the traffic is low and it also keeps the running pod minimized to zero. But, the backup didn't cover our PVCs for serverless :(

@blackpiglet
Copy link
Contributor

@hsinhoyeh
Could you give more information about your scenario?
Do you know if you use the Filesystem or volume snapshot backup?
Is the PVC mounted by multiple pods when the backup is in progress?

@hsinhoyeh
Copy link

@hsinhoyeh Could you give more information about your scenario? Do you know if you use the Filesystem or volume snapshot backup? Is the PVC mounted by multiple pods when the backup is in progress?

Hi @blackpiglet we use file system for backup. the PVC is supposed to be mounted by multiple pods (with mode: RWM). having say that, our multiple pods are mostly read from the PVC (during backup), not writing it.

@blackpiglet
Copy link
Contributor

@hsinhoyeh
Thanks for the feedback.
Could you share the backup command or the backup CR YAML?

@blackpiglet
Copy link
Contributor

If there is no pod mounting the PVC when a backup is ongoing, the file-system backup cannot cover the PVC, because the file-system uploader needs to read the PVC's volume data by the mounting directory for the pod on the k8s node.
Please read the PodVolumeBackup description to understand how it works: https://velero.io/docs/v1.13/file-system-backup/#custom-resource-and-controllers.

For your scenario, if the PVC's volume supports the snapshot function, then we can use snapshot to back up the data.

@reasonerjt
Copy link
Contributor

This is working as expected I don't think we wanna change the error into a warning, which will be a breakchange.

@stp-bsh
Copy link

stp-bsh commented Jul 12, 2024

Is there any way to exclude PVCs with unbound PVs? As long as this is not the case I would see this as as a Warning and not as Error.

@Varjoissa
Copy link

Yes, even though this issue is closed, I think it should be re-opened.
PVC's can still be in Pending state for several reasons. And if so, it should either warn or skip snapshotting the data, since there is no PV.
It sounds out-of-scope for Velero to judge whether a PVC should be provisioned. If it is not provisioned with a PV there is just simply no PV to backup/snapshot.

Is there no possibility to add a pre-backup hook or add a status filter to the resource-policy configmap?

@kaovilai
Copy link
Member

kaovilai commented Feb 3, 2025

Is there no possibility to add a pre-backup hook or add a status filter to the resource-policy configmap?

Can you clarify what is needed here?

@kaovilai
Copy link
Member

kaovilai commented Feb 3, 2025

Is there any way to exclude PVCs with unbound PVs? As long as this is not the case I would see this as as a Warning and not as Error.

@stp-bsh That could be a separate new issue if you want to open. Unbound =/= not used by any pod (ie. this issue.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants