Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed skip flags when calling _ltfs_search_index_wp #493

Merged

Conversation

amissael95
Copy link
Contributor

@amissael95 amissael95 commented Jan 25, 2025

Summary of changes

This pull request includes following changes or fixes.

Description

When a tape cartridge with a write permanent error is trying to be mounted and the MAM (cartridge memory) attribute of the Index Partion (IP) stores a generation number different than the MAM attribute of the Data Partition (DP), the mount process fails even when ltfs can still search for the index in the other partition.

The code that belongs to the logic that handles WP happens on IP, defined "can_skip_ip" flag as false and it shall be true. On the other hand, the code that handles WP happens on DP, defined the "can_skip_ip" as true and it should be false. The logic is now adjusted.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have confirmed my fix is effective or that my feature works

@amissael95
Copy link
Contributor Author

amissael95 commented Jan 25, 2025

TEST 1

Using file back end

Preconditions:

  • Write perm error happen on both partitions (gotten by changing tfs.vendor.IBM.forceErrorWrite attr)
  • The Generation of MAM Coherency for the DP is greater than the Generation of the IP (gotten by changing "count" value in MAM hex file)
  • The VCR Value of the MAM Coherency is less than the MAM VCR value
  • The search of the final index in IP failed (gotten by corrupting the index file for IP)

Before code change (current behavior)

Result: Tape cannot be mounted

# /usr/local/bin/ltfs -o tape_backend=file -o devname=/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5 /ltfs
2d3f LTFS14000I LTFS starting, LTFS version 2.4.5.1 (Prelim), log level 2.
2d3f LTFS14058I LTFS Format Specification version 2.4.0.
2d3f LTFS14104I Launched by "/usr/local/bin/ltfs -o tape_backend=file -o devname=/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5 /ltfs".
2d3f LTFS14105I This binary is built for Linux (x86_64).
2d3f LTFS14106I GCC version is 4.8.5 20150623 (Red Hat 4.8.5-44).
2d3f LTFS17087I Kernel version: Linux version 3.10.0-1160.62.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Wed Mar 23 09:04:02 UTC 2022 i386.
2d3f LTFS17089I Distribution: NAME="Red Hat Enterprise Linux Server".
2d3f LTFS17089I Distribution: Red Hat Enterprise Linux Server release 7.9 (Maipo).
2d3f LTFS17089I Distribution: Red Hat Enterprise Linux Server release 7.9 (Maipo).
2d3f LTFS14063I Sync type is "time", Sync time is 300 sec.
2d3f LTFS17085I Plugin: Loading "file" tape backend.
2d3f LTFS17085I Plugin: Loading "unified" iosched backend.
2d3f LTFS14095I Set the tape device write-anywhere mode to avoid cartridge ejection.
2d3f LTFS30000I Opening a device through generic file driver (/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5).
2d3f LTFS30081I Getting the device directory (/tmp/ltfs11583).
2d3f LTFS30082I No device directory is specified (/tmp/ltfs11583).
2d3f LTFS30001I Opening a redirecting file through generic file driver (/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5).
2d3f LTFS30061E Cannot unlock medium: unit not ready.
2d3f LTFS17160I Maximum device block size is 4194304.
2d3f LTFS11330I Loading cartridge.
2d3f LTFS30048I Loading a directory through generic file driver (/gpfs/virtual_library/VLIB01/V00001L5).
2d3f LTFS11332I Load successful.
2d3f LTFS17157I Changing the drive setting to write-anywhere mode.
2d3f LTFS11005I Mounting the volume.
2d3f LTFS11333I A cartridge with write-perm error is detected on both partitions. Seek the newest index (IP: Gen = 5, VCR = 4) (DP: Gen = 6, VCR = 4) (VCR = 5).
2d3f LTFS17264I The index on DP is newer, but MAM shows a permanent write error happened on both partitions.
2d3f LTFS17283I Detected unmatched VCR value between MAM and VCR (4, 5).
2d3f LTFS17284I Seaching the final index in IP.
2d3f LTFS17037E XML parser: failed to read from XML stream.
2d3f LTFS17016E Cannot parse index direct from medium (-5000).
2d3f LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
2d3f LTFS17037E XML parser: failed to read from XML stream.
2d3f LTFS17016E Cannot parse index direct from medium (-5000).
2d3f LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
2d3f LTFS17285E Failed to search the final index in IP (1).
2d3f LTFS14013E Cannot mount the volume.

After code change (new behavior)

Change on line 1661 in ltfs_mount function:

/* Index of IP could be corrupted. So set skip flag to true */
ret = _ltfs_search_index_wp(recover_symlink, true, &seekpos, vol);

Result: The mount is now successful! since the index could be found from DP:

# /usr/local/bin/ltfs -o tape_backend=file -o devname=/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5 /ltfs
913 LTFS14000I LTFS starting, LTFS version 2.4.5.1 (Prelim), log level 2.
913 LTFS14058I LTFS Format Specification version 2.4.0.
913 LTFS14104I Launched by "/usr/local/bin/ltfs -o tape_backend=file -o devname=/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5 /ltfs".
913 LTFS14105I This binary is built for Linux (x86_64).
913 LTFS14106I GCC version is 4.8.5 20150623 (Red Hat 4.8.5-44).
913 LTFS17087I Kernel version: Linux version 3.10.0-1160.62.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Wed Mar 23 09:04:02 UTC 2022 i386.
913 LTFS17089I Distribution: NAME="Red Hat Enterprise Linux Server".
913 LTFS17089I Distribution: Red Hat Enterprise Linux Server release 7.9 (Maipo).
913 LTFS17089I Distribution: Red Hat Enterprise Linux Server release 7.9 (Maipo).
913 LTFS14063I Sync type is "time", Sync time is 300 sec.
913 LTFS17085I Plugin: Loading "file" tape backend.
913 LTFS17085I Plugin: Loading "unified" iosched backend.
913 LTFS14095I Set the tape device write-anywhere mode to avoid cartridge ejection.
913 LTFS30000I Opening a device through generic file driver (/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5).
913 LTFS30081I Getting the device directory (/tmp/ltfs2323).
913 LTFS30082I No device directory is specified (/tmp/ltfs2323).
913 LTFS30001I Opening a redirecting file through generic file driver (/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5).
913 LTFS30061E Cannot unlock medium: unit not ready.
913 LTFS17160I Maximum device block size is 4194304.
913 LTFS11330I Loading cartridge.
913 LTFS30048I Loading a directory through generic file driver (/gpfs/virtual_library/VLIB01/V00001L5).
913 LTFS11332I Load successful.
913 LTFS17157I Changing the drive setting to write-anywhere mode.
913 LTFS11005I Mounting the volume.
913 LTFS11333I A cartridge with write-perm error is detected on both partitions. Seek the newest index (IP: Gen = 5, VCR = 4) (DP: Gen = 6, VCR = 4) (VCR = 5).
913 LTFS17264I The index on DP is newer, but MAM shows a permanent write error happened on both partitions.
913 LTFS17283I Detected unmatched VCR value between MAM and VCR (4, 5).
913 LTFS17284I Seaching the final index in IP.
913 LTFS17037E XML parser: failed to read from XML stream.
913 LTFS17016E Cannot parse index direct from medium (-5000).
913 LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
913 LTFS17037E XML parser: failed to read from XML stream.
913 LTFS17016E Cannot parse index direct from medium (-5000).
913 LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
913 LTFS17289I Skip parsing the final index on IP.
913 LTFS17284I Seaching the final index in DP.
913 LTFS17037E XML parser: failed to read from XML stream.
913 LTFS17016E Cannot parse index direct from medium (-5000).
913 LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
913 LTFS17288I Detected the final indexes (IP: Gen = 0, Pos = 0) (DP: Gen = 5, Pos = 35).
913 LTFS17287I Making R/O mount from the location (b, 35).
913 LTFS17227I Tape attribute: Vendor = IBM     .
913 LTFS17227I Tape attribute: Application Name = LTFS                            .
913 LTFS17227I Tape attribute: Application Version = 2.4.7.0 .
913 LTFS17227I Tape attribute: Medium Label = .
913 LTFS17228I Tape attribute: Text Localization ID = 0x81.
913 LTFS17227I Tape attribute: Barcode = V00001                          .
913 LTFS17227I Tape attribute: Application Format Version = 2.4.0           .
913 LTFS17228I Tape attribute: Volume Lock Status = 0x06.
913 LTFS17227I Tape attribute: Media Pool name = .
913 LTFS11031I Volume mounted successfully. V00001 : Gen = 5 / (b, 35) -> (b, 24) / VDRIVE0001.
913 LTFS14019I Medium is write protected. Mounting read-only.
913 LTFS14111I Initial setup completed successfully.
913 LTFS14112I Invoke 'mount' command to check the result of final setup.
913 LTFS14113I Specified mount point is listed if succeeded.

@amissael95
Copy link
Contributor Author

@juliocelon could you please help me to set the issue to be fixed by this PR automatically and set the reviewers? Looks like I am unable to do it with my current rights,.

@amissael95
Copy link
Contributor Author

amissael95 commented Jan 25, 2025

TEST 2

Using file backend:

Preconditions:

  • Write perm error happen on both partitions (gotten by changing tfs.vendor.IBM.forceErrorWrite attr)
  • The Generation of MAM Coherency for both partitions are the same
  • The VCR Value of the MAM Coherency is less than the MAM VCR value
  • The search of the final index in IP failed (gotten by corrupting the index file for IP)

Before code change (current behavior)

Result: Tape is successfully mounted if the index on DP can be gotten:

# /usr/local/bin/ltfs -o tape_backend=file -o devname=/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5 /ltfs
7ff0 LTFS14000I LTFS starting, LTFS version 2.4.5.1 (Prelim), log level 2.
7ff0 LTFS14058I LTFS Format Specification version 2.4.0.
7ff0 LTFS14104I Launched by "/usr/local/bin/ltfs -o tape_backend=file -o devname=/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5 /ltfs".
7ff0 LTFS14105I This binary is built for Linux (x86_64).
7ff0 LTFS14106I GCC version is 4.8.5 20150623 (Red Hat 4.8.5-44).
7ff0 LTFS17087I Kernel version: Linux version 3.10.0-1160.62.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Wed Mar 23 09:04:02 UTC 2022 i386.
7ff0 LTFS17089I Distribution: NAME="Red Hat Enterprise Linux Server".
7ff0 LTFS17089I Distribution: Red Hat Enterprise Linux Server release 7.9 (Maipo).
7ff0 LTFS17089I Distribution: Red Hat Enterprise Linux Server release 7.9 (Maipo).
7ff0 LTFS14063I Sync type is "time", Sync time is 300 sec.
7ff0 LTFS17085I Plugin: Loading "file" tape backend.
7ff0 LTFS17085I Plugin: Loading "unified" iosched backend.
7ff0 LTFS14095I Set the tape device write-anywhere mode to avoid cartridge ejection.
7ff0 LTFS30000I Opening a device through generic file driver (/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5).
7ff0 LTFS30081I Getting the device directory (/tmp/ltfs32752).
7ff0 LTFS30082I No device directory is specified (/tmp/ltfs32752).
7ff0 LTFS30001I Opening a redirecting file through generic file driver (/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5).
7ff0 LTFS30061E Cannot unlock medium: unit not ready.
7ff0 LTFS17160I Maximum device block size is 4194304.
7ff0 LTFS11330I Loading cartridge.
7ff0 LTFS30048I Loading a directory through generic file driver (/gpfs/virtual_library/VLIB01/V00001L5).
7ff0 LTFS11332I Load successful.
7ff0 LTFS17157I Changing the drive setting to write-anywhere mode.
7ff0 LTFS11005I Mounting the volume.
7ff0 LTFS11333I A cartridge with write-perm error is detected on both partitions. Seek the newest index (IP: Gen = 5, VCR = 4) (DP: Gen = 5, VCR = 4) (VCR = 5).
7ff0 LTFS17264I The index on IP is newer, but MAM shows a permanent write error happened on both partitions.
7ff0 LTFS17283I Detected unmatched VCR value between MAM and VCR (4, 5).
7ff0 LTFS17284I Seaching the final index in IP.
7ff0 LTFS17037E XML parser: failed to read from XML stream.
7ff0 LTFS17016E Cannot parse index direct from medium (-5000).
7ff0 LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
7ff0 LTFS17037E XML parser: failed to read from XML stream.
7ff0 LTFS17016E Cannot parse index direct from medium (-5000).
7ff0 LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
7ff0 LTFS17289I Skip parsing the final index on IP.
7ff0 LTFS17284I Seaching the final index in DP.
7ff0 LTFS17037E XML parser: failed to read from XML stream.
7ff0 LTFS17016E Cannot parse index direct from medium (-5000).
7ff0 LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
7ff0 LTFS17288I Detected the final indexes (IP: Gen = 0, Pos = 0) (DP: Gen = 5, Pos = 35).
7ff0 LTFS17287I Making R/O mount from the location (b, 35).
7ff0 LTFS17227I Tape attribute: Vendor = IBM     .
7ff0 LTFS17227I Tape attribute: Application Name = LTFS                            .
7ff0 LTFS17227I Tape attribute: Application Version = 2.4.7.0 .
7ff0 LTFS17227I Tape attribute: Medium Label = .
7ff0 LTFS17228I Tape attribute: Text Localization ID = 0x81.
7ff0 LTFS17227I Tape attribute: Barcode = V00001                          .
7ff0 LTFS17227I Tape attribute: Application Format Version = 2.4.0           .
7ff0 LTFS17228I Tape attribute: Volume Lock Status = 0x06.
7ff0 LTFS17227I Tape attribute: Media Pool name = .
7ff0 LTFS11031I Volume mounted successfully. V00001 : Gen = 5 / (b, 35) -> (b, 24) / VDRIVE0001.
7ff0 LTFS14019I Medium is write protected. Mounting read-only.
7ff0 LTFS14111I Initial setup completed successfully.
7ff0 LTFS14112I Invoke 'mount' command to check the result of final setup.
7ff0 LTFS14113I Specified mount point is listed if succeeded.

After code change (new behavior)

Change on line 1691 at ltfs_mount function:

/* Index of DP could be corrupted. So set skip flag to false */
ret = _ltfs_search_index_wp(recover_symlink, false, &seekpos, vol);

Result: The tape is not mounted

# /usr/local/bin/ltfs -o tape_backend=file -o devname=/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5 /ltfs
5ae2 LTFS14000I LTFS starting, LTFS version 2.4.5.1 (Prelim), log level 2.
5ae2 LTFS14058I LTFS Format Specification version 2.4.0.
5ae2 LTFS14104I Launched by "/usr/local/bin/ltfs -o tape_backend=file -o devname=/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5 /ltfs".
5ae2 LTFS14105I This binary is built for Linux (x86_64).
5ae2 LTFS14106I GCC version is 4.8.5 20150623 (Red Hat 4.8.5-44).
5ae2 LTFS17087I Kernel version: Linux version 3.10.0-1160.62.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Wed Mar 23 09:04:02 UTC 2022 i386.
5ae2 LTFS17089I Distribution: NAME="Red Hat Enterprise Linux Server".
5ae2 LTFS17089I Distribution: Red Hat Enterprise Linux Server release 7.9 (Maipo).
5ae2 LTFS17089I Distribution: Red Hat Enterprise Linux Server release 7.9 (Maipo).
5ae2 LTFS14063I Sync type is "time", Sync time is 300 sec.
5ae2 LTFS17085I Plugin: Loading "file" tape backend.
5ae2 LTFS17085I Plugin: Loading "unified" iosched backend.
5ae2 LTFS14095I Set the tape device write-anywhere mode to avoid cartridge ejection.
5ae2 LTFS30000I Opening a device through generic file driver (/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5).
5ae2 LTFS30081I Getting the device directory (/tmp/ltfs23266).
5ae2 LTFS30082I No device directory is specified (/tmp/ltfs23266).
5ae2 LTFS30001I Opening a redirecting file through generic file driver (/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5).
5ae2 LTFS30061E Cannot unlock medium: unit not ready.
5ae2 LTFS17160I Maximum device block size is 4194304.
5ae2 LTFS11330I Loading cartridge.
5ae2 LTFS30048I Loading a directory through generic file driver (/gpfs/virtual_library/VLIB01/V00001L5).
5ae2 LTFS11332I Load successful.
5ae2 LTFS17157I Changing the drive setting to write-anywhere mode.
5ae2 LTFS11005I Mounting the volume.
5ae2 LTFS11333I A cartridge with write-perm error is detected on both partitions. Seek the newest index (IP: Gen = 5, VCR = 4) (DP: Gen = 5, VCR = 4) (VCR = 5).
5ae2 LTFS17264I The index on IP is newer, but MAM shows a permanent write error happened on both partitions.
5ae2 LTFS17283I Detected unmatched VCR value between MAM and VCR (4, 5).
5ae2 LTFS17284I Seaching the final index in IP.
5ae2 LTFS17037E XML parser: failed to read from XML stream.
5ae2 LTFS17016E Cannot parse index direct from medium (-5000).
5ae2 LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
5ae2 LTFS17037E XML parser: failed to read from XML stream.
5ae2 LTFS17016E Cannot parse index direct from medium (-5000).
5ae2 LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
5ae2 LTFS17285E Failed to search the final index in IP (1).
5ae2 LTFS14013E Cannot mount the volume.

@amissael95
Copy link
Contributor Author

amissael95 commented Jan 25, 2025

Hello @piste-jp,

Please refer to the "TEST 2" for the case when the Generations of the MAM Coherency are the same. The current flags change will make the mount to fail.

I have not found the part of the code yet where the Generation (Count Value) of the MAM Coherency could be left with the same value after a write perm error on both partitions but looks like the quicker solution is to modify line 1647 at ltfs.c with <= to allow the index to be searched on both partitions:

 if (vol->ip_coh.count <= vol->dp_coh.count) {

I will look forward to your comments.

@amissael95
Copy link
Contributor Author

amissael95 commented Jan 27, 2025

@piste-jp do you think that by making the modification on line 1647 described in above comment could cause any side issue? Can we really trust on any of both partitions if a write perm error happened on both? In my point of view, allow to search on both partitions in those scenarios, where the Generation of the MAM Coherency could be the same, is safe to be implemented

Copy link

@syaoraang syaoraang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for me

@piste-jp
Copy link
Member

do you think that by making the modification on line 1647 described in above comment could cause any side issue?

I don't think so. Thus, I suggested. Do you think there are side effects?

Can we really trust on any of both partitions if a write perm error happened on both?

Yes, I think so.

In my point of view, allow to search on both partitions in those scenarios, where the Generation of the MAM Coherency could be the same, is safe to be implemented

I cannot agree this.

The block starts below shall handle the case DP has a recent index, see the comment in L1644. But I believe DP is not a recent if coherency count is same.

ltfs/src/libltfs/ltfs.c

Lines 1641 to 1650 in 7271446

if (vol->ip_coh.count < vol->dp_coh.count) {
if (vollock != PWE_MAM_IP && vollock != PWE_MAM) {
/*
* The index on DP is newer but MAM shows write perm doesn't happen in IP.
* If LTFS failed to write IP with non-medium reason error (like cable pull on locate)
* while write error handling at DP in the previous session, this condition would happen.
*/
ltfsmsg(LTFS_INFO, 17264I, "DP", vl_print);
}

So it it is equal, we must treat IP shall have a latest index. Only exception is double write perm case.

I believe we need to build the reasonable logic which can be followed by other people. But in my point of view, your code doesn't have it and just hides the problem found in TEST2.

@amissael95
Copy link
Contributor Author

@piste-jp,

Yes, I think so.

Ok, we can really trust on any of both partitions if a write perm error happened on both

I cannot agree this.

I did not follow this, if we can really trust on any/either of both partitions if a write permanent error happens on both, then it would be okay to search on both partitions, why you do not agree that allow to search on both partitions in that scenario is safe to be implemented?

The block starts below shall handle the case DP has a recent index, see the comment in L1644. . But I believe DP is not a recent if coherency count is same

I saw the comment in L1644 but it is not the case we are talking here; to be clear, I am talking about a write perm error on both partitions where a MAM coherency count is the same.

So it it is equal, we must treat IP shall have a latest index. Only exception is double write perm case.

I meant a double write perm case

I believe we need to build the reasonable logic which can be followed by other people. But in my point of view, your code doesn't have it and just hides the problem found in TEST2.

The code that is on the PR does not include the change described in comment #493 (comment), which is the following change:

if (vol->ip_coh.count <= vol->dp_coh.count) {

so are you seeing that change is not the way to solve the issue on TEST 2?

@piste-jp
Copy link
Member

I cannot agree this.

I did not follow this, if we can really trust on any/either of both partitions if a write permanent error happens on both, then it would be okay to search on both partitions, why you do not agree that allow to search on both partitions in that scenario is safe to be implemented?

See my suggestion. If coherency counts are equal and both partitions if a write perm, allow skip IP. So what is the problem?

@piste-jp
Copy link
Member

The block starts below shall handle the case DP has a recent index, see the comment in L1644. . But I believe DP is not a recent if coherency count is same

I saw the comment in L1644 but it is not the case we are talking here; to be clear, I am talking about a write perm error on both partitions where a MAM coherency count is the same.

I believe when "a MAM coherency count is the same.", jump into the else block starts from L1670. And evaluate "a write perm error on both partitions" or not.

What is your point?

@piste-jp
Copy link
Member

I believe we need to build the reasonable logic which can be followed by other people. But in my point of view, your code doesn't have it and just hides the problem found in TEST2.

The code that is on the PR does not include the change described in comment #493 (comment), which is the following change:

if (vol->ip_coh.count <= vol->dp_coh.count) {

so are you seeing that change is not the way to solve the issue on TEST 2?

Your suggestion, if (vol->ip_coh.count <= vol->dp_coh.count) {, might suppress a behavior of TEST2. But I just suggest better way to make code reading easy for other coders.

  • vol->ip_coh.count < vol->dp_coh.count: Tape drive wrote down the index of DP, but cannot write down the index of IP
  • vol->ip_coh.count == vol->dp_coh.count: Cannot write any index

I don't want to handle 2 cases above in a same condition block.

@amissael95
Copy link
Contributor Author

See my suggestion. If coherency counts are equal and both partitions if a write perm, allow skip IP. So what is the problem?

Correct, I do not see any problem if we allow skip IP if coherency counts are equals.

I believe when "a MAM coherency count is the same.", jump into the else block starts from L1670. And evaluate "a write perm error on both partitions" or not.
What is your point?

The point is that it is not true now with the current change on commit c05bad9, since now, the "else" block will have the skip flag set to false, so the tape will be unable to be mounted

I don't want to handle 2 cases above in a same condition block.

okay, the change in this PR to avoid the problem on TEST2 should be the following:

vol->ip_coh.count < vol->dp_coh.count:
Tape drive wrote down the index of DP, but cannot write down the index of IP, so set skip_flag to true to allow search on DP.

vol->ip_coh.count == vol->dp_coh.count:
Tape drive was not able to write any index (double write perm error), so set skip_flag set to true to allow search on DP.

vol->ip_coh.count > vol->dp_coh.count (else block): Tape drive wrote down the index of IP, so skip_flag set to false to prevent search on DP.

Do you agree? Specially with the vol->ip_coh.count == vol->dp_coh.count case.

@piste-jp
Copy link
Member

I believe when "a MAM coherency count is the same.", jump into the else block starts from L1670. And evaluate "a write perm error on both partitions" or not.
What is your point?

The point is that it is not true now with the current change on commit c05bad9, since now, the "else" block will have the skip flag set to false, so the tape will be unable to be mounted

It looks you made a big misunderstanding... Please take a look my suggestion carefully with line numbers.

My suggestion is

		if (vol->ip_coh.count < vol->dp_coh.count) {
			if (vollock != PWE_MAM_IP && vollock != PWE_MAM) {
				/*
				 * The index on DP is newer but MAM shows write perm doesn't happen in IP.
				 * If LTFS failed to write IP with non-medium reason error (like cable pull on locate)
				 * while write error handling at DP in the previous session, this condition would happen.
				 */
				ltfsmsg(LTFS_INFO, 17264I, "DP", vl_print);
			}

			if (volume_change_ref != vol->dp_coh.volume_change_ref) {
				/*
				 * Cannot trust the index info on MAM, search the last indexes
				 * This would happen when the drive returns an error against acquiring the VCR
				 * while write error handling.
				 */
				ltfsmsg(LTFS_INFO, 17283I,
						(unsigned long long)vol->dp_coh.volume_change_ref,
						(unsigned long long)volume_change_ref);

				ret = _ltfs_search_index_wp(recover_symlink, true, &seekpos, vol);
				if (ret < 0)
					goto out_unlock;

			} else {
				ltfsmsg(LTFS_INFO, 17286I, "DP", (unsigned long long)volume_change_ref);
				seekpos.partition = ltfs_part_id2num(vol->label->partid_dp, vol);
				seekpos.block = vol->dp_coh.set_id;
			}
		} else {
			if (vollock != PWE_MAM_DP && vollock != PWE_MAM) {
				/*
				 * The index on IP is newer but MAM shows write perm doesn't happen in DP.
				 * LTFS already have written an index on DP when it is writing an index on IP,
				 * so this condition wouldn't happen logically.
				 */
				ltfsmsg(LTFS_INFO, 17264I, "IP", vl_print);
			}

			if (volume_change_ref != vol->ip_coh.volume_change_ref) {
				/*
				 * Cannot trust the index info on MAM, search the last indexes
				 * This would happen when the drive returns an error against acquiring the VCR
				 * while write error handling.
				 */
				ltfsmsg(LTFS_INFO, 17283I,
						(unsigned long long)vol->dp_coh.volume_change_ref,
						(unsigned long long)volume_change_ref);

				/* Index of IP could be corrupted. So set skip flag */
				if (vollock == PWE_MAM_BOTH) {
				    /* Index of IP could be corrupted (because of double write perm). So set skip flag to true */
				    ret = _ltfs_search_index_wp(recover_symlink, true, &seekpos, vol);
				} else {
				    /* Index of DP could be corrupted. So set skip flag to false */
				    ret = _ltfs_search_index_wp(recover_symlink, false, &seekpos, vol);
				}
				if (ret < 0)
					goto out_unlock;

			} else {
				ltfsmsg(LTFS_INFO, 17286I, "IP", (unsigned long long)volume_change_ref);
				seekpos.partition = ltfs_part_id2num(vol->label->partid_ip, vol);
				seekpos.block = vol->ip_coh.set_id;
			}
		}

@amissael95
Copy link
Contributor Author

It looks you made a big misunderstanding... Please take a look my suggestion carefully with line numbers.

I do not think there is a big misunderstanding since your suggestion follow the idea we have talked before and the last comment I wrote.

I really appreciate your support @piste-jp. I will implement the suggestion you wrote and test it.

Regards

@amissael95
Copy link
Contributor Author

amissael95 commented Jan 29, 2025

TEST 3

Same preconditions than in TEST 2:

  • Write perm error happen on both partitions (gotten by changing tfs.vendor.IBM.forceErrorWrite attr)
  • The Generation of MAM Coherency for both partitions are the same
  • The VCR Value of the MAM Coherency is less than the MAM VCR value
  • The search of the final index in IP failed (gotten by corrupting the index file for IP)

Now, including change on a4e44c2 the tape with double write perm error is able to be mounted successfully.

# /usr/local/bin/ltfs -o tape_backend=file -o devname=/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5 /ltfs
22f4 LTFS14000I LTFS starting, LTFS version 2.4.5.1 (Prelim), log level 2.
22f4 LTFS14058I LTFS Format Specification version 2.4.0.
22f4 LTFS14104I Launched by "/usr/local/bin/ltfs -o tape_backend=file -o devname=/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5 /ltfs".
22f4 LTFS14105I This binary is built for Linux (x86_64).
22f4 LTFS14106I GCC version is 4.8.5 20150623 (Red Hat 4.8.5-44).
22f4 LTFS17087I Kernel version: Linux version 3.10.0-1160.62.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Wed Mar 23 09:04:02 UTC 2022 i386.
22f4 LTFS17089I Distribution: NAME="Red Hat Enterprise Linux Server".
22f4 LTFS17089I Distribution: Red Hat Enterprise Linux Server release 7.9 (Maipo).
22f4 LTFS17089I Distribution: Red Hat Enterprise Linux Server release 7.9 (Maipo).
22f4 LTFS14063I Sync type is "time", Sync time is 300 sec.
22f4 LTFS17085I Plugin: Loading "file" tape backend.
22f4 LTFS17085I Plugin: Loading "unified" iosched backend.
22f4 LTFS14095I Set the tape device write-anywhere mode to avoid cartridge ejection.
22f4 LTFS30000I Opening a device through generic file driver (/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5).
22f4 LTFS30081I Getting the device directory (/tmp/ltfs8948).
22f4 LTFS30082I No device directory is specified (/tmp/ltfs8948).
22f4 LTFS30001I Opening a redirecting file through generic file driver (/gpfs/virtual_library/VLIB01/Drive_VDRIVE0001.ULT3580-TD5).
22f4 LTFS30061E Cannot unlock medium: unit not ready.
22f4 LTFS17160I Maximum device block size is 4194304.
22f4 LTFS11330I Loading cartridge.
22f4 LTFS30048I Loading a directory through generic file driver (/gpfs/virtual_library/VLIB01/V00001L5).
22f4 LTFS11332I Load successful.
22f4 LTFS17157I Changing the drive setting to write-anywhere mode.
22f4 LTFS11005I Mounting the volume.
22f4 LTFS11333I A cartridge with write-perm error is detected on both partitions. Seek the newest index (IP: Gen = 5, VCR = 4) (DP: Gen = 5, VCR = 4) (VCR = 5).
22f4 LTFS17283I Detected unmatched VCR value between MAM and VCR (4, 5).
22f4 LTFS17284I Seaching the final index in IP.
22f4 LTFS17037E XML parser: failed to read from XML stream.
22f4 LTFS17016E Cannot parse index direct from medium (-5000).
22f4 LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
22f4 LTFS17037E XML parser: failed to read from XML stream.
22f4 LTFS17016E Cannot parse index direct from medium (-5000).
22f4 LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
22f4 LTFS17289I Skip parsing the final index on IP.
22f4 LTFS17284I Seaching the final index in DP.
22f4 LTFS17037E XML parser: failed to read from XML stream.
22f4 LTFS17016E Cannot parse index direct from medium (-5000).
22f4 LTFS11194W Cannot read index: failed to read and parse XML data (-5000).
22f4 LTFS17288I Detected the final indexes (IP: Gen = 0, Pos = 0) (DP: Gen = 5, Pos = 35).
22f4 LTFS17287I Making R/O mount from the location (b, 35).
22f4 LTFS17227I Tape attribute: Vendor = IBM     .
22f4 LTFS17227I Tape attribute: Application Name = LTFS                            .
22f4 LTFS17227I Tape attribute: Application Version = 2.4.7.0 .
22f4 LTFS17227I Tape attribute: Medium Label = .
22f4 LTFS17228I Tape attribute: Text Localization ID = 0x81.
22f4 LTFS17227I Tape attribute: Barcode = V00001                          .
22f4 LTFS17227I Tape attribute: Application Format Version = 2.4.0           .
22f4 LTFS17228I Tape attribute: Volume Lock Status = 0x06.
22f4 LTFS17227I Tape attribute: Media Pool name = .
22f4 LTFS11031I Volume mounted successfully. V00001 : Gen = 5 / (b, 35) -> (b, 24) / VDRIVE0001.
22f4 LTFS14019I Medium is write protected. Mounting read-only.
22f4 LTFS14111I Initial setup completed successfully.
22f4 LTFS14112I Invoke 'mount' command to check the result of final setup.
22f4 LTFS14113I Specified mount point is listed if succeeded.

Message LTFS17264I The index on IP is newer... is not shown as expected because the MAM count for IP is not really newer than the MAM count for DP.

@amissael95
Copy link
Contributor Author

amissael95 commented Jan 29, 2025

@piste-jp,

I have added and tested your proposal in this PR, see:
https://github.com/LinearTapeFileSystem/ltfs/pull/493/files
#493 (comment)

The only modification I did was checking if the MAM count for IP is really newer than the MAM count for DP, as follow:

if (vol->ip_coh.count > vol->dp_coh.count && vollock != PWE_MAM_DP && vollock != PWE_MAM) {

This is to avoid the message LTFS17264I The index on IP is newer... to be shown in such scenario when it is not true.

Please let me know if there any comment, I think the change looks good.

Copy link
Member

@piste-jp piste-jp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Contributor

@juliocelon juliocelon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Missael! It looks great to me.

Copy link

@chukero chukero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

@juliocelon juliocelon merged commit 666a9f4 into LinearTapeFileSystem:v2.4-stable Jan 30, 2025
9 checks passed
@amissael95 amissael95 deleted the fix-mount-fails branch January 30, 2025 20:54
@piste-jp
Copy link
Member

Merged to the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants