Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trident Protect restore of pvc in new namespace fails because is created in trident-protect namespace #6

Open
cbalan58 opened this issue Jan 30, 2025 · 0 comments

Comments

@cbalan58
Copy link

As title says it, i try to test the restore of a pvc (mainly but also its VM) backed up and existing in one namespace, into another (empty) namespace.
The issue is that 9 out of 10 times the restore will end up creating the pvc in trident-protect namespace.
Other times, without changing anything except the name of the restore CR, it works and creates the pvc in the intended namespace.
Here is the yaml i use ( i tried tridenctl-protect as well, same results, but command is a bit too long to paste in here):

apiVersion: protect.trident.netapp.io/v1
kind: BackupRestore
metadata:
  name: test-cos-restore-new-9
  namespace: test-migr2
spec:
  appArchivePath: test-cos-new-prt_8da00fde-7804-4f2b-9afe-ff074e434250/backups/test-cos-new_0b7bf817-709c-4c0e-a144-4d68186cada2
  appVaultRef: s3-atl2
  namespaceMapping: [{"source": "test-cos", "destination": "test-migr2"}]
  storageClassMapping:
    - destination: aff-volume
      source: aff-volume
  resourceFilter:
    resourceSelectionCriteria: "Include"
    resourceMatchers:
      - labelSelectors: ["pvcname=feds-pvc"]
        names: ["feds-pvc"]
        namespaces: ["test-cos"]
      - labelSelectors: ["vmname=feds-cos"]
        names: ["feds-cos"]
        namespaces: ["test-cos"]

Everything is internal and for test purposes, so no need to wory about appvault name or path being disclosed.
I had to include 2 resourceMatchers because although the backup contains the PVC bound to this VM, by restoring only the VM, when working, would complain about pvc not existing and as such the vm was not able to start since pvc is included in its definition. StorageclassMapping seems to be for no scope in here since there is only this storageclass in our cluster.

Versions:
trident-protect-100.2410.1, trident-operator-100.2410.0
Nodes run on Debian GNU/Linux 12, kernel 6.1.0-23-amd64, containerd://1.7.20 with Kubernetes at v1.29.8
Kubevirt is on top at v1.3.1

How i do it:
-- after a restore worked OK:
tridentctl-protect get backuprestore -n test-migr2

+--------------------------+----------+-----------+--------+--------------------------------+
|           NAME           | APPVAULT |   STATE   |  AGE   |             ERROR              |
+--------------------------+----------+-----------+--------+--------------------------------+
| test-cos-restore-new-8   | s3-atl2  | Failed    | 22m55s | VolumeRestoreHandler failed    |
|                          |          |           |        | with permanent error k...      |
| test-cos-restore-new-9   | s3-atl2  | Completed | 1m42s  |                                |
+--------------------------+----------+-----------+--------+--------------------------------+

-- checking, they are restored as it should:

k -n test-migr2 get pvc
NAME       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
feds-pvc   Bound    pvc-cebef5e5-ff9b-4453-989f-b251fe1f6fad   1Gi        RWO            aff-volume     <unset>                 83s
k -n test-migr2 get vms
NAME       AGE   STATUS    READY
feds-cos   89s   Running   True

-- delete all:
k -n test-migr2 delete vm feds-cos
virtualmachine.kubevirt.io "feds-cos" deleted
k -n test-migr2 delete pvc feds-pvc
persistentvolumeclaim "feds-pvc" deleted
k -n migr2 get all,pvc
Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
No resources found
k delete -f test-cos-restore-to-new-new-ns.yaml
backuprestore.protect.trident.netapp.io "test-cos-restore-new-9" deleted

-- edit yaml to set new name for restore CR name
vim test-cos-restore-to-new-new-ns.yaml

-- run another restore:
k apply -f test-cos-restore-to-new-new-ns.yaml

-- result:
tridentctl-protect get backuprestore -n test-migr2

+-------------------------+----------+--------+-------+--------------------------------+
|          NAME           | APPVAULT | STATE  |  AGE  |             ERROR              |
+-------------------------+----------+--------+-------+--------------------------------+
| test-cos-restore-new-10 | s3-atl2  | Failed | 6m59s | VolumeRestoreHandler failed    |
|                         |          |        |       | with permanent error k...      |
+-------------------------+----------+--------+-------+--------------------------------+

--- or better logged as:

k -n test-migr2 get backuprestore.protect.trident.netapp.io/test-cos-restore-new-10|tail
NAME                      STATE    ERROR                                                                                                                                                                 AGE
test-cos-restore-new-10   Failed   VolumeRestoreHandler failed with permanent error kopiaVolumeRestore timed out for volume trident-protect/feds-pvc-b7f5e21a687e061098dbbb2c4cbddca0: permanent error   10m
-- the pvc is again in trident-protect namespace instead of my test-migr2:
 k -n trident-protect  get pvc
NAME                                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
feds-pvc-b7f5e21a687e061098dbbb2c4cbddca0   Bound    pvc-988fb8ef-030a-4dcd-b1fe-d738af254a7e   1Gi        RWO            aff-volume     <unset>                 15m

-- grepping for error in trident-protect pod i get only these:
2025-01-30T12:46:55.334333584Z ERROR KopiaVolumeRestore has timed out {"controller": "kopiavolumerestore", "controllerGroup": "protect.trident.netapp.io", "controllerKind": "KopiaVolumeRestore", "KopiaVolumeRestore": {"name":"kvr-c0167ecb-af03-45ab-a461-72d658bbb44d-c3eca3af-1c0d-4b","namespace":"test-migr2"}, "namespace": "test-migr2", "name": "kvr-c0167ecb-af03-45ab-a461-72d658bbb44d-c3eca3af-1c0d-4b", "reconcileID": "b9bb55f8-2dde-43c0-9103-1699dc857847", "correlationid": "b40fd216-ea23-4ae8-8bcd-a7dbe9d7e1f0", "resourceid": "6bdcf83a-b36d-416e-aeed-4e8eea7bfb7f", "lastUpdatedAt": "2025-01-30 12:41:55 +0000 UTC", "error": "progress has not been updated in the allotted time"}
2025-01-30T12:46:55.34862339Z ERROR Reconciler error {"controller": "kopiavolumerestore", "controllerGroup": "protect.trident.netapp.io", "controllerKind": "KopiaVolumeRestore", "KopiaVolumeRestore": {"name":"kvr-c0167ecb-af03-45ab-a461-72d658bbb44d-c3eca3af-1c0d-4b","namespace":"test-migr2"}, "namespace": "test-migr2", "name": "kvr-c0167ecb-af03-45ab-a461-72d658bbb44d-c3eca3af-1c0d-4b", "reconcileID": "b9bb55f8-2dde-43c0-9103-1699dc857847", "error": "progress has not been updated in the allotted time"}
2025-01-30T12:46:55.354051338Z ERROR KopiaVolumeRestore has timed out {"controller": "kopiavolumerestore", "controllerGroup": "protect.trident.netapp.io", "controllerKind": "KopiaVolumeRestore", "KopiaVolumeRestore": {"name":"kvr-c0167ecb-af03-45ab-a461-72d658bbb44d-c3eca3af-1c0d-4b","namespace":"test-migr2"}, "namespace": "test-migr2", "name": "kvr-c0167ecb-af03-45ab-a461-72d658bbb44d-c3eca3af-1c0d-4b", "reconcileID": "fd96e6f9-9bb9-4ac4-bfb9-63cc3bbf5ae2", "correlationid": "b40fd216-ea23-4ae8-8bcd-a7dbe9d7e1f0", "resourceid": "6bdcf83a-b36d-416e-aeed-4e8eea7bfb7f", "lastUpdatedAt": "2025-01-30 12:41:55 +0000 UTC", "error": "progress has not been updated in the allotted time"}
2025-01-30T12:46:55.361262051Z ERROR Failed to update status {"controller": "kopiavolumerestore", "controllerGroup": "protect.trident.netapp.io", "controllerKind": "KopiaVolumeRestore", "KopiaVolumeRestore": {"name":"kvr-c0167ecb-af03-45ab-a461-72d658bbb44d-c3eca3af-1c0d-4b","namespace":"test-migr2"}, "namespace": "test-migr2", "name": "kvr-c0167ecb-af03-45ab-a461-72d658bbb44d-c3eca3af-1c0d-4b", "reconcileID": "fd96e6f9-9bb9-4ac4-bfb9-63cc3bbf5ae2", "correlationid": "b40fd216-ea23-4ae8-8bcd-a7dbe9d7e1f0", "resourceid": "6bdcf83a-b36d-416e-aeed-4e8eea7bfb7f", "error": "Operation cannot be fulfilled on kopiavolumerestores.protect.trident.netapp.io "kvr-c0167ecb-af03-45ab-a461-72d658bbb44d-c3eca3af-1c0d-4b": the object has been modified; please apply your changes to the latest version and try again"}

So, is something delaying restore operations and trident-protect defaults to its namespace? Nodes are not under any pressure and this is a test environment with only few deployments. Any idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant