Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix data-loss issue with replays w/o checksums #689

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vassilit
Copy link
Collaborator

@vassilit vassilit commented Mar 2, 2025

Closes: #672

See bug details in #672 (comment).

This is just a bugfix, but I feel we could fix a bit more, although I'm not sure if that wouldn't introduce a regression. I haven't checked all the consumers of file->digest.

In replay.c we do:

    file->digest = rm_digest_new(RM_DIGEST_EXT, 0);
[...]
    /* Fake the checksum using RM_DIGEST_EXT */
    JsonNode *cksum_node = json_object_get_member(object, "checksum");
    if(cksum_node != NULL) {
        const char *cksum = json_object_get_string_member(object, "checksum");
        if(cksum != NULL) {
            rm_digest_update(file->digest, (unsigned char *)cksum, strlen(cksum));
        }
    }

I would do instead:

    file->digest = NULL;
[...]
    /* Fake the checksum using RM_DIGEST_EXT */
    JsonNode *cksum_node = json_object_get_member(object, "checksum");
    if(cksum_node != NULL) {
        const char *cksum = json_object_get_string_member(object, "checksum");
        if(cksum != NULL) {
            file->digest = rm_digest_new(RM_DIGEST_EXT, 0);
            rm_digest_update(file->digest, (unsigned char *)cksum, strlen(cksum));
        }
    }

And reintroduce checks for file->digest == NULL ; that would avoid pointless allocations.
Another way to be safe would be to check for file->inode to confirm that they belong to the same group.
Are hardlinks the only situation when rmlint does not generate checksums ?

PS: @sahib if you still follow this repository, the bird theme is nice. It's a change from the overly formal naming in code of other projects :)

@vassilit vassilit force-pushed the fix_replay_nochecksums branch from 2ac5bfc to 5967cbf Compare March 2, 2025 00:44
@vassilit vassilit force-pushed the fix_replay_nochecksums branch from 5967cbf to ee27d64 Compare March 2, 2025 13:58
@vassilit vassilit requested review from sahib and SeeSpotRun March 2, 2025 14:48
@vassilit vassilit added this to the 2.10.3 To Be Determined milestone Mar 2, 2025
@RayOei
Copy link
Collaborator

RayOei commented Mar 3, 2025

I would suggest to move this to its own issue as a future improvement, so it doesn't get forgotten?

@vassilit vassilit removed request for sahib and SeeSpotRun March 3, 2025 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data loss with --replay and hard-linked files due to missing checksums.
2 participants