Ignore repair_run_by_cluster_v2 rows with no corresponding repair #1478

SesquipedalianDefenestrator · 2024-03-11T22:10:40Z

When bad things (like Reaper filling the drive with logs) happen, it's possible to end up with repair_run_by_cluster_v2 rows with no corresponding repair row, which breaks Reaper. So, just skip them.

github-actions · 2024-03-11T22:10:52Z

No linked issues found. Please add the corresponding issues in the pull request description.
Use GitHub automation to close the issue when a PR is merged

SesquipedalianDefenestrator · 2024-03-11T22:15:07Z

I'm not actually sure if anything will clean up the repair_run_by_cluster_v2 rows, but I'm not sure it's a problem. The mismatch between repair_run and repair_run_by_cluster_v2 is frequent enough to be a pain (basically, a percentage of the time when anything goes wrong with the cluster and Reaper logs until the drive fills), but not often enough to generate measurable load.

src/server/src/main/java/io/cassandrareaper/storage/repairrun/CassandraRepairRunDao.java

Per comment, rewrite to stick a filter between instead of shoving a ternary into the map. Cleaner and easier to read.

SesquipedalianDefenestrator · 2024-03-14T05:15:13Z

Looks like there's an issue for this already as well: #1463

Miles-Garnsey · 2024-03-18T02:51:13Z

I'm looking at this, having some issues with my local setup as far as testing goes so bear with me while I try to work around them.

Miles-Garnsey

In terms of testing, I've followed the procedure below:

Try to repro the issue from main by:

First commenting out the testSSLHotReload (which is a problem to run locally) then building the repo into a docker image, kind loading it into a kind cluster, spinning up a K8ssandraCluster with the new Reaper image.
Creating a new repair via the UI.
Confirming the repair exists in both repair_run_by_cluster_v2 and repair_run
Deleting from repair_run via delete from repair_run where id = d03c24a0-e59e-11ee-9a67-5dd395ae716d

When reloading the UI I then see:

Which I think is what we want based on the error repro'd here.

I then go and rebuild the image and re-deploy everything using the branch from this PR. Following the same procedure, I no longer see the NPE visible in Reaper's logs. Instead, the UI simply returns no results for repair runs in the DONE or RUNNING state.

I'm going to chat to @adejanovski about whether we need some more work to ensure that repair runs are purged from repair_run_by_cluster_v2, and whether this data inconsistency should be logged as a warning. But I'll approve this for now on the basis that it eliminates the NPE.

Ignore repair_run_by_cluster_v2 rows with no corresponding repair

3669f9e

When bad things (like Reaper filling the drive with logs) happen, it's possible to end up with repair_run_by_cluster_v2 rows with no corresponding repair row, which breaks Reaper. So, just skip them.

FieteO reviewed Mar 12, 2024

View reviewed changes

src/server/src/main/java/io/cassandrareaper/storage/repairrun/CassandraRepairRunDao.java Outdated Show resolved Hide resolved

Cleaner version of omitting missing repair_run rows

efcf59c

Per comment, rewrite to stick a filter between instead of shoving a ternary into the map. Cleaner and easier to read.

SesquipedalianDefenestrator mentioned this pull request Mar 14, 2024

NPE when listing repairs #1463

Closed

SesquipedalianDefenestrator requested a review from FieteO March 14, 2024 05:14

adejanovski linked an issue Mar 14, 2024 that may be closed by this pull request

NPE when listing repairs #1463

Closed

adejanovski requested review from Miles-Garnsey and removed request for FieteO March 14, 2024 12:35

Miles-Garnsey approved these changes Mar 19, 2024

View reviewed changes

Miles-Garnsey merged commit 88a9981 into thelastpickle:master Mar 19, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore repair_run_by_cluster_v2 rows with no corresponding repair #1478

Ignore repair_run_by_cluster_v2 rows with no corresponding repair #1478

SesquipedalianDefenestrator commented Mar 11, 2024

github-actions bot commented Mar 11, 2024

SesquipedalianDefenestrator commented Mar 11, 2024

SesquipedalianDefenestrator commented Mar 14, 2024

Miles-Garnsey commented Mar 18, 2024

Miles-Garnsey left a comment

Ignore repair_run_by_cluster_v2 rows with no corresponding repair #1478

Ignore repair_run_by_cluster_v2 rows with no corresponding repair #1478

Conversation

SesquipedalianDefenestrator commented Mar 11, 2024

github-actions bot commented Mar 11, 2024

SesquipedalianDefenestrator commented Mar 11, 2024

SesquipedalianDefenestrator commented Mar 14, 2024

Miles-Garnsey commented Mar 18, 2024

Miles-Garnsey left a comment

Choose a reason for hiding this comment