-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes can receive a wake up violation when they are actually shutting down #29
Comments
So if I understand correctly: the farmer bot requests a boot by switching the target from down to up. While the node is apparently booting, the bot switches the target back to down, then back to up to request a second boot. The behavior is correct, since the node did not finish its expected boot sequence for the first request (it must both send an uptime report and switch its power state, the latter only happens if its target is up). When the farmer bot was initially implemented, it was agreed that for verification purposes, a node MUST answer every power on request by fully booting. This is also what underpins the random wakeups. As a side note, there is no specific ordering of calls in zos atm, and calls from multiple tasks which happen at the same time are inherently racy. |
The power target isn't switched back to down until after the node has fully booted and set it's power state to up. It's allowed by tfchain create additional events to set the power target to up, even when the target is already up. So in this case the bot is attempting repeatedly to wake the node before the node finishes booting.
That's good to know. In my observations, nodes tend to set their power state to up in the block immediately following their first uptime report after waking up. |
In that case this is a bug in tfchain. Aside from the fact that an event should be emitted to notify of a change in state (while the state isn't changed here), the event is explicitly called (PowerTargetChanged). I guess it should be easy to update the tfchain code to prevent events from being emitted if there is no actual change. Aside from that, if the intent is that someting observes the current state, that something should just query the latest chain state. |
I've observed a rare possibility that a node can receive a wake up violation for failing to boot within 30 minutes when the node is in fact shutting down.
Here's the sequence of events:
power_managed
andpower_managed_boot
set toNone
Up
for this node. Maybe this shouldn't happen in normal circumstances, but it can and actually has. Since the power state for this node is stillDown
at this point,power_managed_boot
will be setUp
in the next block after its first uptime report, typicallyDown
for this node more than 30 minutes after the target change toUp
Down
and thus bothpower_managed
andpower_managed_boot
are notNone
If we accept that it's legitimate to send multiple power target changes until a node wakes up, then this definitely shouldn't result in a violation.
Perhaps the solution would be to reorder the sequence of operations in Zos, but I guess that it was implemented this way for a reason, and of course rolling out changes to Zos is slow.
The text was updated successfully, but these errors were encountered: