Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero doesnt do well with Minio bucket versioning #8466

Open
pdstefan opened this issue Nov 27, 2024 · 4 comments
Open

Velero doesnt do well with Minio bucket versioning #8466

pdstefan opened this issue Nov 27, 2024 · 4 comments
Assignees
Labels
Area/Storage/Minio For marking the issues where backend storage is minio

Comments

@pdstefan
Copy link

What steps did you take and what happened:

This will probably not make it past triage, but I hit this issue and solved it, someone else might benefit from this.

I am using helm to install minio, have been reusing the same values.yaml for a couple of years now.

mode: standalone
persistence:
  enabled: true
  annotations: {}
  labels:
    failure-domain.beta.kubernetes.io/zone: eu-de-03
  existingClaim: "minio-realtime"

  storageClass: ""
  VolumeName: ""
  accessMode: ""
  size: ""
users:
  - accessKey: velero
    secretKey: veXXXXXXXXXXX
    policy: consoleAdmin

buckets:
  - name: velero
    policy: none
    purge: false
    versioning: false
    objectlocking: false

metrics:
  serviceMonitor:
    enabled: true
    includeNode: false
    public: true
    additionalLabels: {}
    relabelConfigs: {}
    relabelConfigsCluster: {}

resources:
  requests:
    memory: 1Gi

deploymentUpdate:
  type: Recreate
  maxUnavailable: 0
  maxSurge: 100%

With this deployment and velero 15.1.0 I created a backup with tth 2hours.

NAME                                 STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
ora-nhc-demo4                        Completed   0        0          2024-11-26 11:22:15 +0000 UTC   52m       default            <none>

Once the backup expired velero kept trying to sync the expired backup:

time="2024-11-27T11:36:21Z" level=info msg="Found 1 backups in the backup location that do not exist in the cluster and need to be synced" backupLocation=velero-realtime/default controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:138"
time="2024-11-27T11:36:21Z" level=info msg="Attempting to sync backup into cluster" backup=ora-nhc-demo4 backupLocation=velero-realtime/default controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:146"
time="2024-11-27T11:36:21Z" level=info msg="plugin process exited" backupLocation=velero-realtime/default cmd=/plugins/velero-plugin-for-aws controller=backup-sync id=43286 logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/plugins/velero-plugin-for-aws

I started digging around. mc (minio client) shows the folder in the bucket as empty.

bash-5.1# mc ls realtime/velero/backups/ora-nhc-demo4/
bash-5.1# 

If I exec into the minio container I see something very different:

bash-5.1$ ls -al /export/velero/backups/ora-nhc-demo4/
total 56
drwxr-sr-x. 14 1000 1000 4096 Nov 26 11:22 .
drwxr-sr-x. 17 1000 1000 4096 Nov 27 08:41 ..
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 ora-nhc-demo4-csi-volumesnapshotclasses.json.gz
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 ora-nhc-demo4-csi-volumesnapshotcontents.json.gz
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 ora-nhc-demo4-csi-volumesnapshots.json.gz
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 ora-nhc-demo4-itemoperations.json.gz
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 ora-nhc-demo4-logs.gz
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 ora-nhc-demo4-podvolumebackups.json.gz
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 ora-nhc-demo4-resource-list.json.gz
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 ora-nhc-demo4-results.gz
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 ora-nhc-demo4-volumeinfo.json.gz
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 ora-nhc-demo4-volumesnapshots.json.gz
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 ora-nhc-demo4.tar.gz
drwxr-sr-x.  2 1000 1000 4096 Nov 26 11:22 velero-backup.json

Turns out versioning on the bucket is suspended not un-versioned , even though the helm values say versioning: false

bash-5.1# mc version info realtime/velero
realtime/velero versioning is suspended

I listed the default values of the minio helm chart and behold the magic of satan:

buckets: []
  #   # Name of the bucket
  # - name: bucket1
  #   # Policy to be set on the
  #   # bucket [none|download|upload|public]
  #   policy: none
  #   # Purge if bucket exists already
  #   purge: false
  #   # set versioning for
  #   # bucket [true|false]
  #   versioning: false # remove this key if you do not want versioning feature
  #   # set objectlocking for
  #   # bucket [true|false] NOTE: versioning is enabled by default if you use locking
  #   objectlocking: false
  # - name: bucket2
  #   policy: none
  #   purge: false
  #   versioning: true
  #   # set objectlocking for
  #   # bucket [true|false] NOTE: versioning is enabled by default if you use locking
  #   objectlocking: false

# versioning: false # remove this key if you do not want versioning feature <--- 👎 👎 👎 Remove the line, setting it to false will suspend versioning, not disable it completely.

What did you expect to happen:

Velero should either work with versioned s3 buckets or not pickup the metadata from the versioned blobs if a file was deleted (ex:mc works correctly with versions and does not list any deleted file that has "active" versions)

The following information will help us better understand what's going on:

https://transfer.kronsoft.cloud/yp7er0/bundle-2024-11-27-11-42-30.tar.gz -> expires in 7d

Anything else you would like to add:

Environment:

  • Velero version (use velero version):
Client:
        Version: v1.15.0
        Git commit: 1d4f1475975b5107ec35f4d19ff17f7d1fcb3edf
Server:
        Version: v1.15.0

  • Velero features (use velero client config get features): features: <NOT SET>
    • Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"29+", GitVersion:"v1.29.2-r0-29.0.11.8", GitCommit:"f7c06b6c27b944deff9812a9ea859c06e26611e1", GitTreeState:"clean", BuildDate:"2024-05-29T14:51:00Z", GoVersion:"go1.21.8", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: Open Telekom Cloud
  • OS (e.g. from /etc/os-release):
  • Velero deployed via helm
NAME                    CHART VERSION   APP VERSION     DESCRIPTION
vmware-tanzu/velero     8.1.0           1.15.0          A Helm chart for velero

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@Lyndon-Li Lyndon-Li added the Area/Storage/Minio For marking the issues where backend storage is minio label Nov 28, 2024
@Lyndon-Li Lyndon-Li self-assigned this Dec 2, 2024
@Lyndon-Li
Copy link
Contributor

Thanks for the sharing, this is a helpful troubleshooting.

Velero is calling standard S3 API, any objects returned by it are regarded valid by Velero, so Velero has no way to filter out some objects in this scenarios.
Looks like the problem is on minio side --- the S3 API implementation doesn't work in the same behavior of mc.

@chrisamti
Copy link

chrisamti commented Jan 16, 2025

on the contrary. I made a pr in pkg/persistence/object_store.go func (s *objectBackupStore) ListBackups() ([]string, error) {} some time ago - but it never made it into your code.

func (s *objectBackupStore) ListBackups() ([]string, error) {
	prefixes, err := s.objectStore.ListCommonPrefixes(s.bucket, s.layout.subdirs["backups"], "/")
	if err != nil {
		return nil, err
	}
	if len(prefixes) == 0 {
		return []string{}, nil
	}

	output := make([]string, 0, len(prefixes))

	for _, prefix := range prefixes {
		// values returned from a call to ObjectStore's
		// ListCommonPrefixes method return the *full* prefix, inclusive
		// of s.backupsPrefix, and include the delimiter ("/") as a suffix. Trim
		// each of those off to get the backup name.
		backupName := strings.TrimSuffix(strings.TrimPrefix(prefix, s.layout.subdirs["backups"]), "/")

		// if a bucket is minio and versioned, the s3 folder still exists, even if all s3 objects have been deleted.
		// We should only take s3 folders having a velero-backup.json object.
		if ok, errObjectExists := s.objectStore.ObjectExists(s.bucket, s.layout.getBackupMetadataKey(backupName)); !ok || errObjectExists != nil {
			// velero-backup.json or error do not add to list
			if errObjectExists != nil {
				s.logger.WithError(errObjectExists).Debugf("could not check if %s/%s exists", s.bucket, s.layout.getBackupMetadataKey(backupName))
			}
			continue
		}

		output = append(output, backupName)
	}

	return output, nil
}

It simply checks, if a folder contains the anyway needed velero-backup.json file inside a backup folder.
If missing, velero should not add the backup to the list.
Does not brake with native s3 AWS API, but makes velero compatible to minio.

We are running this patch since version v1.12.1 without any problems.

I could create another pull request if someone is interested.

@kaovilai
Copy link
Member

Sure please do. We will review.

@Lyndon-Li
Copy link
Contributor

on the contrary. I made a pr in pkg/persistence/object_store.go func (s *objectBackupStore) ListBackups() ([]string, error) {}

Eventually, the problem comes down to minio side.
This idea looks like having Velero not to report a problematic backup folder, in a general way, so as to bypass the minio problem. But that makes side effects --- when some data corruption exists in the backup storage, e.g., some critical files missing in the backup folder, Velero will lose the ability to report this error.

chrisamti added a commit to chrisamti/velero that referenced this issue Jan 21, 2025
chrisamti added a commit to chrisamti/velero that referenced this issue Jan 21, 2025
chrisamti added a commit to chrisamti/velero that referenced this issue Jan 21, 2025
…inio, if bucket is versioned.

Signed-off-by: Christian Jürges <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Storage/Minio For marking the issues where backend storage is minio
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants