Skip to content

Commit

Permalink
Consider worker failures as cancelled executions in tests
Browse files Browse the repository at this point in the history
Summary:
When a sandcastle job that runs tests using a worker tool is cancelled, the sandcastle worker will kill buck, which in turn will kill the worker, and once that is killed, it'll kill the test runner too. In the time between the worker is killed and the test runner is killed, the test runner still attempts to run tests via the worker, but the worker returns `GatherOutputStatus::SpawnFailed` (see https://www.internalfb.com/code/fbsource/[88a84f351975921f560270e1096a8687d6f62160]/fbcode/buck2/app/buck2_execute_impl/src/executors/worker.rs?lines=479%2C492). This causes the test runner to report a ton of fatal test results (e.g. https://www.internalfb.com/intern/testinfra/diagnostics/12666374014604584.844425084763262.1735210981).

To avoid this, I'd like to tell the test runner that test execution was cancelled instead of completed (un)successfully. This will produce omitted test results.

Reviewed By: shashkambham

Differential Revision: D67734577

fbshipit-source-id: fc7616ec2cf26260a46ff721474e52e63da254da
  • Loading branch information
Ron Mordechai authored and facebook-github-bot committed Jan 4, 2025
1 parent a17643e commit 7aa181c
Show file tree
Hide file tree
Showing 3 changed files with 56 additions and 12 deletions.
26 changes: 26 additions & 0 deletions app/buck2_execute/src/execute/manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,13 @@ pub trait CommandExecutionManagerExt: Sized {
additional_message: Option<String>,
) -> CommandExecutionResult;

fn worker_failure(
self,
execution_kind: CommandExecutionKind,
stderr: String,
timing: CommandExecutionMetadata,
) -> CommandExecutionResult;

fn timeout(
self,
execution_kind: CommandExecutionKind,
Expand Down Expand Up @@ -298,6 +305,25 @@ where
)
}

fn worker_failure(
self,
execution_kind: CommandExecutionKind,
stderr: String,
timing: CommandExecutionMetadata,
) -> CommandExecutionResult {
self.result(
CommandExecutionStatus::WorkerFailure { execution_kind },
Default::default(),
CommandStdStreams::Local {
stdout: Default::default(),
stderr: stderr.into_bytes(),
},
None,
timing,
None,
)
}

fn timeout(
self,
execution_kind: CommandExecutionKind,
Expand Down
37 changes: 25 additions & 12 deletions app/buck2_execute_impl/src/executors/local.rs
Original file line number Diff line number Diff line change
Expand Up @@ -535,18 +535,31 @@ impl LocalExecutor {
// We are lying about the std streams here because we don't have a good mechanism
// to report that the command does not exist, and because that's exactly what RE
// also does when this happens.
manager.failure(
execution_kind,
Default::default(),
CommandStdStreams::Local {
stdout: Default::default(),
stderr: format!("Spawning executable `{}` failed: {}", args[0], reason)
.into_bytes(),
},
None,
*timing,
None,
)
if matches!(execution_kind, CommandExecutionKind::Local { .. }) {
manager.failure(
execution_kind,
Default::default(),
CommandStdStreams::Local {
stdout: Default::default(),
stderr: format!("Spawning executable `{}` failed: {}", args[0], reason)
.into_bytes(),
},
None,
*timing,
None,
)
} else {
// Workers executing tests often employ a health check to avoid producing
// invalid test results. Differentiating a worker spawn failure from a normal
// spawn or execution failure allows the test runner to handle this case
// accordingly.
manager.worker_failure(
execution_kind,
// Could probably use a better error message.
format!("Spawning executable `{}` failed: {}", args[0], reason),
*timing,
)
}
}
GatherOutputStatus::TimedOut(duration) => {
manager.timeout(execution_kind, duration, std_streams, *timing, None)
Expand Down
5 changes: 5 additions & 0 deletions app/buck2_test/src/orchestrator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1066,6 +1066,11 @@ impl<'b> BuckTestOrchestrator<'b> {
execution_kind: Some(execution_kind),
outputs,
},
CommandExecutionStatus::WorkerFailure {
execution_kind: CommandExecutionKind::LocalWorker { .. },
} => {
return Err(ExecuteError::Cancelled(Cancelled));
}
CommandExecutionStatus::Failure { execution_kind }
| CommandExecutionStatus::WorkerFailure { execution_kind } => ExecuteData {
stdout,
Expand Down

0 comments on commit 7aa181c

Please sign in to comment.