Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: abort transmission after response timeout, do not retry, do not restart controller unless callback is missing #6408

Merged
merged 3 commits into from
Oct 17, 2023

Conversation

AlCalzone
Copy link
Member

@AlCalzone AlCalzone commented Oct 17, 2023

for #6402

This PR changes how timed out Send Data responses are handled. Previously, we'd retry the command up to 3 times without awaiting the end of the command cycle, which can likely put the controller in a weird state, possibly triggering the "unresponsive controller" recovery unnecessarily.

Now we abort the transmission and wait for the callback to be received (expected to be with status NoACK), then treat it like a NoACK (node marked dead/asleep).

If the callback doesn't come, the recovery will kick in, possibly restarting the controller.

Since we now actively abort, the default timeout was reduced back to 10s, as not responding to incoming commands for an extended amount of time can cause further problems. Users regularly experiencing these timeouts should fix their mesh or increase the timeout.

@AlCalzone
Copy link
Member Author

@zwave-js-bot automerge

@zwave-js-bot zwave-js-bot enabled auto-merge (squash) October 17, 2023 09:18
@zwave-js-bot zwave-js-bot merged commit 61b3fc9 into master Oct 17, 2023
@zwave-js-bot zwave-js-bot deleted the response-timeout branch October 17, 2023 09:22
AlCalzone added a commit that referenced this pull request Oct 17, 2023
This release includes several more fixes and workarounds for the problematic interaction between some controller firmware bugs and the automatic controller recovery introduced in the `v12` release:
* Added a workaround to recognize corrupted `ACK` frames after soft-reset of controllers running an 7.19.x firmware or higher. Previously this triggered the unresponsive controller detection and recovery process. (#6409)
* When the response to a `Send Data` command times out, the command is now aborted, instead of retrying and potentially putting the controller in a bad state due to not waiting for the command cycle to complete. When this happens, Z-Wave JS no longer attempts to recover the controller by restarting it, unless the callback is also missing. (#6408)
* When the callback to a `Send Data` command continues to be missing after restarting the controller, Z-Wave JS no longer restarts itself. Instead the old behavior of marking the node as `dead` is now restored, as the node being unresponsive/unreachable is most likely the actual problem. (#6403)
* In addition, the `Send Data` callback timeout has been reduced to 30 seconds and ongoing transmissions are now aborted before reaching this timeout. This should limit the impact of the controller taking excessively long to transmit, especially in busy networks with lots of unsolicited reporting and end nodes expecting a timely response (#6411)

### Features
* The `Driver` constructor now accepts multiple sets of options and curated presets are available (#6412)

### Additional Bugfixes
* Only auto-refresh `Meter` and `Multilevel Sensor CC` values if none were updated recently (#6398)
* Export all option types for `Configuration CC` (#6413)

### Config file changes
* Add NEO Cool Cam Repeater (#6332)
* Increase report timeout for Aeotec Multisensor 6 to 2s (#6397)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants