External USB drives possibly interrupting backup by going to sleep #7797

AlbertGoma · 2022-09-30T09:57:29Z

Qubes OS release

Upgrade from 4.0 to 4.1.1

Brief summary

When doing a backup of relatively large VMs on an external USB drive the motor stops spinning at a certain point and the resulting file contains only a sequential portion of the data.

It's already mentioned it in my comment in issue #7567 as this might be a cause for I/O errors and the fix could probably solve both issues.

Steps to reproduce

Attach an external USB hard drive (in this particular scenario, a 3.5" SATA hard drive on a USB 3.0 dock) with a valid partition table, enough space and a healthy filesystem (in this case GPT and ext4).
Start a non-networked disposable VM and mount the drive's block device sys-usb:sda (not the USB device) on it.
Start the Qubes Backup tool and select around 40 VMs with a few of them having a storage use over 200GiB each, exceeding 900GiB in total. Have some of those GiB filled from /dev/zero and some others from /dev/urandom, just in case.
Uncheck the Compress backup checkbox and click Next.
Set the disposable VM as the Target qube and choose a Backup directory in the external hard drive's filesystem.
Click Next until the backup starts.
Wait until the backup is apparently finished and the hard drive motor has stopped spinning.

Expected behavior

The restored VMs' logical volumes' storage byte count is identical to the original one before starting the backup.

Actual behavior

In the Qubes Backup Restore tool an I/O error popped out and half of the VMs showed 0 bytes of Disk Usage in the Qube Manager.

When doing an emergency recovery all of those 0-byte VMs had an Unexpected EOF error in all of their chunks when decrypting them with scrypt. One of the VMs' chunks were readable until the 490'th.

DemiMarie · 2022-10-01T00:12:16Z

First, Qubes Backup should definitely fail. That means it should indicate an error and return a non-zero exit code. If it does not, that is a bug.

The second is that your hardware might have problems. One possibility is a failing hard drive, but another is that it uses device-managed shingled magnetic recording (SMR). SMR drives can freeze for long periods of time during garbage collection, and this can cause Linux to treat them as failed and disconnect them.

AlbertGoma · 2022-10-01T15:05:26Z

Regarding the hard drive I looked at the manufacturer's technical specifications sheet and all the 3.5" versions of that category used CMR (which I assume must be an acronym for Conventional Magnetic Recording). It could have been failing but it's supposed to be a high-end one within its 5-year warranty and it hasn't shown any other signs of failure yet. I could scan it for bad sectors if that might be useful.

Today I tried to reproduce the error on the same hard drive using the same USB 3.0 docking station but unfortunately the backup finished and verified successfully. However:

My current R4.1.1's sys-usb and the disposable VM where I mounted the drive are both based on the current debian-11 template rather than R4.0's old fedora-32. The kernel version of the disposable VM under both Qubes releases must have been different as well, as it uses PVH virtualization, but I don't remember which was the last version I had in R4.0. As sys-usb uses HVM I understand it uses the latest kernel installed on the template.
According to my phone's Clock app's stopwatch, 20 minutes and few seconds after python3 -m qubes.tarwriter started the hard drive's motor stopped, but when scrypt enc - /tmp/randomname/vmXX/private.img.XXX.enc appeared in Dom0's Task Manager the motor started spinning again.
I only tried to backup a single AppVM with the following data in the /home/user directory:

-rw-r--r-- 1 user user  50G Oct  1 09:21 urandom.1
-rw-r--r-- 1 user user 100G Oct  1 09:47 urandom.3
-rw-r--r-- 1 user user 200G Oct  1 09:37 zero.2
-rw-r--r-- 1 user user  50G Oct  1 09:49 zero.4

The qubes-backup file size is 153,656.37 MiB while the Disk Usage displayed by the Qube Manager is 419,471.36 MiB, therefore sparse zeroes have been left out.

So sleep happens, although not causing any I/O errors under these settings. The old fedora-32 template was saved from the disaster, so maybe I should try again using it for both sys-usb and dispVM in HVM mode and with enough backed up VMs to almost fill the entire drive so it causes multiple sleep events within the same backup session.

rustybird · 2022-10-01T15:48:26Z

The qubes-backup file size is 153,656.37 MiB while the Disk Usage displayed by the Qube Manager is 419,471.36 MiB, therefore sparse zeroes have been left out.

So sleep happens, although not causing any I/O errors under these settings.

That's normal on LVM.

ddevz · 2022-10-07T16:39:12Z

... I made the risky decision of not verifying the backup's integrity, as it would have required a similar amount of hours ...

While I recommend doing verifies in the future, dont feel too bad about that decision because that the "verify" does not actually seem to verify that the backup happened, meaning that you could have done the verify and gotten the "everything backed up fine", and still had the same problem. (The EOF message implies to me that this would have happened to you) (note: I've just turned the verify problem into it's own issue at #7809 )

AlbertGoma · 2023-07-16T12:48:47Z

In case it may be useful, the USB 3.0 dock I used to perform the backup was a Sharkoon QuickPort Combo USB3.0. Both the computer and the dock were plugged into an Uninterruptible Power Suply.

andrewdavidwong · 2023-07-16T13:37:27Z

To be clear: This happens only when using the dock; it does not happen when bypassing the dock and plugging the external USB hard drive directly into the computer?

AlbertGoma · 2023-07-17T10:05:39Z

This dock's function is to allow using internal drives as external ones. After the failure I kept doing backups on that hard drive but bypassing the dock and plugging the drive directly into the motherboard's SATA port. When I do backups like this the motor never stops and the verify seems to work fine. (However I never dared to restore them on my PC yet. I could install Qubes on another drive and try to restore them there to confirm the verification process didn't give a false success message if that may be useful)

SarneWeber · 2025-01-22T15:49:37Z

Affects R4.2.3

SarneWeber · 2025-01-22T20:14:54Z

In the cube that the drive is attached to I ran a script that created and delete a file every 5 seconds hoping that would make the drive stay awake, but the backup still got stuck unfortunately

DemiMarie · 2025-01-22T22:56:39Z

In the cube that the drive is attached to I ran a script that created and delete a file every 5 seconds hoping that would make the drive stay awake, but the backup still got stuck unfortunately

You need to call sync or fsync to ensure the changes make it all the way out to storage, rather than just staying in a cache somewhere.

rustybird · 2025-01-23T12:36:58Z

Is it even a problem if the drive spins down? Normally it should just transparently spin up again when data transfer resumes. This is not supposed to upset filesystem mounts etc.

I think it's usually not that spinning down causes a pause in the backup, but the reverse: Some temporary or indefinite pause in the backup causes the drive to spin down.

@SarneWeber Do you see any kernel errors?

SarneWeber · 2025-01-24T18:17:32Z

Thanks for reminding me to call sync. I tried it with that, but got the same results unfortunately.

Then, I tried something else - I tried to use ssh to store the backup on a different computer, with a different (this time internal) drive. The drive has not gone to sleep, and yet the backup is still stuck at the same amount of bytes written as the case was with my external USB drive. Therefore, I believe I was mistaken, as rustybird suggested.

I am still curious about what the problem is that causes my backups to get stuck. But I'm unsure if a comment on this issue is an appropiate place, as the description of the issue likely does not match my issue. So please tell me if it is inappropriate.

Here's some details about my issue:

When the backup has failed and has gotten stuck (always after roughly the same amount of bytes written, with less than 0.1% difference), neither the gzip nor the scrypt proces shows up in top.
If I try to backup only shutdown qubes (and if I exclude dom0 as well), the backup still gets stuck.
If I turn off compression, the backup still gets stuck, this time with a larger file size.
I see no concerning messages in journalctl or dmesg in dom0, the disposable VM that the drive is attached to, or sys-usb. That is, apart from journalctl messages in dom0 that say [Time+date] dom0 qubesd[3004]: socket.send() raised exception.
Small backups do work
The backup still hangs if I keep using my laptop throughout the backup process
I have one particularly large qube. If I make a backup with that one disabled and with all VMs that are turned on disabled, the backup still fails.
I'm currently in the process of testing the backup with only the large qube. The large qube is significantly larger than the failed backup files.
If I try to verify these failed backups, the recovery tool succesfully recognises there's an unexpected end of file.
Sometimes the Qubes Backup window becomes unresponsive
I tried the CLI utility once in verbose mode, but got the same result.

andrewdavidwong · 2025-01-25T06:08:34Z

@SarneWeber, if I understand you correctly, you're saying that this issue does not really affect Qubes 4.2 after all and therefore should not have been reopened. In that case, I'll re-close this issue.

Regarding your related problem, please note that this issue tracker (qubes-issues) is not intended to serve as a help desk or tech support center. Instead, we've set up other venues where you can ask for help and support, ask questions, and have discussions. (By contrast, the issue tracker is more of a technical tool intended to support our developers in their work.) Thank you for your understanding.

github-actions · 2025-01-25T06:09:02Z

This issue is being closed because:

This issue is believed to affect only Qubes OS 4.1 (and possibly earlier).
Qubes OS 4.1 has reached end-of-life (EOL).

If anyone believes that this issue should be reopened, please leave a comment saying so.
(For example, if a bug still affects Qubes OS 4.2, then the comment "Affects 4.2" will suffice.)

rustybird · 2025-01-25T11:49:20Z

@SarneWeber This could be a combination of two problems:

Some of your VMs are causing an error when they are being backed up. Try to narrow it down to one individual VM by doing a binary search (i.e. divide the VMs to be backed up in half to see which half hangs forever and repeat the process with that half). Then maybe create a new ticket or forum post?
What should be a noisy fatal error sometimes causes the backup system to silently hang forever. I've opened Some fatal backup errors cause the backup system to silently hang forever (unless backing up to dom0?) #9739 to track this aspect.

AlbertGoma added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug labels Sep 30, 2022

AlbertGoma mentioned this issue Sep 30, 2022

Qubes Backup hangs if there is an I/O error #7567

Closed

DemiMarie added P: critical Priority: critical. Between "major" and "blocker" in severity. and removed P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Oct 1, 2022

andrewdavidwong added C: core C: usb proxy needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. hardware support labels Oct 1, 2022

andrewdavidwong added this to the Release 4.1 updates milestone Oct 1, 2022

andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023

andrewdavidwong removed this from the Release 4.1 updates milestone Aug 13, 2023

andrewdavidwong added the eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) label Dec 7, 2024

This comment has been minimized.

Sign in to view

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2024

andrewdavidwong removed the needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. label Dec 7, 2024

andrewdavidwong reopened this Jan 22, 2025

andrewdavidwong added needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. affects-4.2 This issue affects Qubes OS 4.2. and removed eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) T: bug labels Jan 22, 2025

andrewdavidwong closed this as completed Jan 25, 2025

andrewdavidwong added eol-4.1 Closed because Qubes 4.1 has reached end-of-life (EOL) and removed needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. affects-4.2 This issue affects Qubes OS 4.2. labels Jan 25, 2025

andrewdavidwong closed this as not planned Won't fix, can't repro, duplicate, stale Jan 25, 2025

rustybird mentioned this issue Jan 25, 2025

Some fatal backup errors cause the backup system to silently hang forever (unless backing up to dom0?) #9739

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External USB drives possibly interrupting backup by going to sleep #7797

External USB drives possibly interrupting backup by going to sleep #7797

AlbertGoma commented Sep 30, 2022

DemiMarie commented Oct 1, 2022

AlbertGoma commented Oct 1, 2022 •

edited

Loading

rustybird commented Oct 1, 2022

ddevz commented Oct 7, 2022

AlbertGoma commented Jul 16, 2023

andrewdavidwong commented Jul 16, 2023

AlbertGoma commented Jul 17, 2023

This comment has been minimized.

SarneWeber commented Jan 22, 2025

SarneWeber commented Jan 22, 2025

DemiMarie commented Jan 22, 2025

rustybird commented Jan 23, 2025 •

edited

Loading

SarneWeber commented Jan 24, 2025 •

edited

Loading

andrewdavidwong commented Jan 25, 2025

github-actions bot commented Jan 25, 2025

rustybird commented Jan 25, 2025 •

edited

Loading

External USB drives possibly interrupting backup by going to sleep #7797

External USB drives possibly interrupting backup by going to sleep #7797

Comments

AlbertGoma commented Sep 30, 2022

Qubes OS release

Brief summary

Steps to reproduce

Expected behavior

Actual behavior

DemiMarie commented Oct 1, 2022

AlbertGoma commented Oct 1, 2022 • edited Loading

rustybird commented Oct 1, 2022

ddevz commented Oct 7, 2022

AlbertGoma commented Jul 16, 2023

andrewdavidwong commented Jul 16, 2023

AlbertGoma commented Jul 17, 2023

This comment has been minimized.

SarneWeber commented Jan 22, 2025

SarneWeber commented Jan 22, 2025

DemiMarie commented Jan 22, 2025

rustybird commented Jan 23, 2025 • edited Loading

SarneWeber commented Jan 24, 2025 • edited Loading

andrewdavidwong commented Jan 25, 2025

github-actions bot commented Jan 25, 2025

rustybird commented Jan 25, 2025 • edited Loading

AlbertGoma commented Oct 1, 2022 •

edited

Loading

rustybird commented Jan 23, 2025 •

edited

Loading

SarneWeber commented Jan 24, 2025 •

edited

Loading

rustybird commented Jan 25, 2025 •

edited

Loading