Implement abort running backup job #2098

dns2utf8 · 2019-12-03T13:39:51Z

Describe the problem/challenge you have
If one submits a backup job and forgets the narrowing select the system is blocked.

Describe the solution you'd like
I would like to be able to abort a job so the queue continues to work.

Anything else you would like to add:
Having short jobs that run every minute pile up very quickly

Environment:

velero version
Client:
Version: v1.2.0
Git commit: 5d00849
Server:
Version: v1.2.0
kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.6", GitCommit:"7015f71e75f670eb9e7ebd4b5749639d42e20079", GitTreeState:"clean", BuildDate:"2019-11-13T11:20:18Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes installer & version: v1.15.3
Cloud provider or hardware configuration: Intel(R) Xeon(R) CPU E5-2650
OS (e.g. from /etc/os-release): Ubuntu 18.04.3 LTS (Bionic Beaver)

The text was updated successfully, but these errors were encountered:

skriss · 2019-12-06T20:39:00Z

Adding to this, it would be nice to be able to:

abort a running job
detect and either fail or restart jobs that are hung

We might be able to get some of this behavior for free if we move to running backups/restores in their own Pods/Jobs, per #1653

d3473r · 2021-02-12T23:16:53Z

I started a backup job on a tainted node on which no restic workload was running until i added the appropriate toleration key.
As i couldn't abort the running backup I had to wait ~4 hours until the job runned into the timeout :/
The abort feature would be really helpfull in a misshap like this.

dsu-igeek · 2021-05-10T20:52:43Z

Should also handle aborting restores in progress

KamranAzeem · 2022-03-14T16:41:24Z

I was stuck in similar situation. Started a backup job only to realize later that I forgot something important.

Instead of waiting for 2+ hours, I decided to delete velero and re-installed it. We had a tight maintenance window so I took this path. A backup abort would be very helpful in such situations.

ywk253100 · 2022-03-24T09:41:18Z

This would be easier to cancel the in-progress backup in #4772

reasonerjt · 2022-06-17T09:01:53Z

I think this is a valid requirement, however currently the whole backup is running one reconcile action.
We can only abort a running backup after we rework the mechanism to trigger it. We may combine it with the effort to support running backup concurrently.
I'll leave it in the backlog.

aterrell5 · 2022-12-13T20:22:26Z

Similar issue and requirement from production. Accidently left a namespace and objects in a backup job that didn't need it. No way to stop or abort the backup from going to the offsite storage bucket consuming space, bandwidth, money, and time. Desperately need a way to cleanly abort a task. The issue will only get more problematic with time as the dataset and workloads grow.

benedikt-bartscher · 2023-02-05T23:48:38Z

smae here, i typed a wrong label selector and created a backup of my whole cluster...

kaovilai · 2025-02-04T15:22:38Z

cc: @sseago @draghuram
re: backup abort

We can ignore retry on NotExists so it only retry on other errors.
Then we can watch backup for delete events and then we can cancel context of the backup.
A finalizer should be added to backup while backup is running, removed when cancel cleanup is done.

sseago · 2025-02-04T19:32:06Z

@kaovilai Watching delete might be helpful for cleaning up after questionable user actions, but I don't think kubectl backup delete should be the documented or expected API for backup cancellation. We want an explicit cancel -- either by introducing a backup.spec.cancel field or by adding a BackupCancellationRequest CR.

skriss added the Enhancement/User End-User Enhancement to Velero label Dec 6, 2019

skriss added the P2 - Long-term important label Dec 6, 2019

eleanor-millman added the Reviewed Q2 2021 label May 10, 2021

eleanor-millman mentioned this issue May 10, 2021

restore stuck with InProgress status #2171

Closed

eleanor-millman removed the P2 - Long-term important label Sep 15, 2021

ywk253100 mentioned this issue Sep 16, 2021

velero restore in nil status #4128

Closed

ywk253100 mentioned this issue Mar 25, 2022

After deleted a inprogressed restore, all created restore in hang state #4556

Closed

reasonerjt added kind/requirement Reviewed Q2 2021 Enhancement/User End-User Enhancement to Velero and removed Enhancement/User End-User Enhancement to Velero Reviewed Q2 2021 labels May 20, 2022

eleanor-millman added the 1.10-candidate The label used for 1.10 planning discussion. label May 25, 2022

reasonerjt added backlog Needs investigation and removed 1.10-candidate The label used for 1.10 planning discussion. labels Jun 17, 2022

Lyndon-Li self-assigned this Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement abort running backup job #2098

Implement abort running backup job #2098

dns2utf8 commented Dec 3, 2019

skriss commented Dec 6, 2019

d3473r commented Feb 12, 2021

dsu-igeek commented May 10, 2021

KamranAzeem commented Mar 14, 2022

ywk253100 commented Mar 24, 2022

reasonerjt commented Jun 17, 2022

aterrell5 commented Dec 13, 2022

benedikt-bartscher commented Feb 5, 2023

kaovilai commented Feb 4, 2025 •

edited

Loading

sseago commented Feb 4, 2025

Implement abort running backup job #2098

Implement abort running backup job #2098

Comments

dns2utf8 commented Dec 3, 2019

skriss commented Dec 6, 2019

d3473r commented Feb 12, 2021

dsu-igeek commented May 10, 2021

KamranAzeem commented Mar 14, 2022

ywk253100 commented Mar 24, 2022

reasonerjt commented Jun 17, 2022

aterrell5 commented Dec 13, 2022

benedikt-bartscher commented Feb 5, 2023

kaovilai commented Feb 4, 2025 • edited Loading

sseago commented Feb 4, 2025

kaovilai commented Feb 4, 2025 •

edited

Loading