Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
allow failures setting nextWorkerNumberToUse (#626)
Issue: Sometimes, the underlying store isn't available for the extensive writes made to update nextWorkerNumberToUse. An exception is thrown cascaded upstream preventing further writes and updates to worker from JobActor. To handle it, we tolerate some (consecutive) write failures for this call before propagating the exception upstream. Invariant: This is fine from a constraints perspective. Here's why: On new mantis master leader election, job and worker state is read from the DB. If the worker count was synced correctly no issue. If not and there were extra workers than workerMax stored in DB, those extra workers will be killed when TEs send heartbeat. If not and workerMax in DB is higher than actual workers, we'll schedule those workers on leader election.
- Loading branch information