Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

Unusual blocktime with same validator in subsequent blocks after OracleRequest #882

Closed
mialbu opened this issue Oct 17, 2022 · 4 comments
Closed

Comments

@mialbu
Copy link

mialbu commented Oct 17, 2022

Observation

The blocks 2336997 and 2336998 showed some inconsistent behaviour. They have timestamps 90 seconds apart, and they seem to have had the same validator (NSPCCpw8YmgNDYWiBfXJHRfz38NDjv6WW3), which, afaik, is the only Neo-Go node amongst the 7 validators by that time with the others running a C# node. Another observation is that in block 2336997 there's a transaction with an OracleRequest, and in the next block (2336998) is a transaction with an OracleResponse.

The blocks before 2336997 and after 2336998 seem to have had normal behaviour.

Expected behaviour

With the same validator in subsequent blocks, it means (in my understanding) that the other validators failed to build and propose a block. In that scenario, I would expect this validator to only propose a new block after exponential time, i.e., waiting a lot longer than 90 seconds (6*15 seconds, with 15 seconds block time).

Potential Issues

@vang1ong7ang, @igormcoelho, and I have already looked into it to some extent, but we haven't found anything concrete yet to explain the observation. The current assumption is that the 6 validators that run a C# node could somehow not manage to include an OracleResponse for the corresponding previous OracleRequest in a block, i.e., there might be an issue with the Oracle plugin there. Further, it seems the Neo Go node proposed a block with linear (instead of exponential) timeout after the C# nodes did not propose any block, which could explain why the Neo Go node proposed a block already after 90 seconds. However, it seems that the Go dbft implementation contains this calculation correctly here. @roman-khimov

@vang1ong7ang, @igormcoelho, please correct me if I got something wrong in this one.

@mialbu mialbu changed the title Unusual blocktime after OracleRequest and same validator in subsequent blocks Unusual blocktime with same validator in subsequent blocks after OracleRequest Oct 17, 2022
@roman-khimov
Copy link

2336997 is a boring regular one:

Oct 14 21:11:23 267045 neogo-n3c[20321]: 2022-10-14T21:11:23.117+0300#011INFO#011initializing dbft#011{"height": 2336997, "view": 0, "index": 5, "role": "Primary"}
Oct 14 21:11:23 267045 neogo-n3c[20321]: 2022-10-14T21:11:23.201+0300#011INFO#011persisted to disk#011{"blocks": 1, "keys": 24, "headerHeight": 2336996, "blockHeight": 2336996, "took": "5.128348ms"}
Oct 14 21:11:27 267045 neogo-n3c[20321]: 2022-10-14T21:11:27.203+0300#011INFO#011persisted to disk#011{"blocks": 0, "keys": 2, "headerHeight": 2336996, "blockHeight": 2336996, "took": "3.861498ms"}
Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.076+0300#011INFO#011sending PrepareRequest#011{"height": 2336997, "view": 0}
Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.482+0300#011INFO#011received PrepareResponse#011{"validator": 4}
Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.494+0300#011INFO#011received PrepareResponse#011{"validator": 2}
Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.532+0300#011INFO#011received PrepareResponse#011{"validator": 1}
Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.533+0300#011INFO#011received PrepareResponse#011{"validator": 0}
Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.534+0300#011INFO#011sending Commit#011{"height": 2336997, "view": 0}
Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.878+0300#011INFO#011received Commit#011{"validator": 2}
Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.887+0300#011INFO#011received Commit#011{"validator": 1}
Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.981+0300#011INFO#011received Commit#011{"validator": 0}
Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.989+0300#011INFO#011received Commit#011{"validator": 4}
Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.989+0300#011INFO#011approving block#011{"height": 2336997, "hash": "dbb0913c32edd179d9d7e85315de92f5166060ade658b9efe7e620efd1fe9ab4", "tx_count": 1, "merkle": "a95b9f9ed242bcb10cb8aebef6a4e705756dba4876839dc85ebd84bd024a7e48", "prev": "740f5278d7011ec045e2545ed9ad9ae92dcb02ec5cfbbad3a8e78505ed27ad22"}

2336998 is a bit more interesting:

Oct 14 21:11:39 267045 neogo-n3c[20321]: 2022-10-14T21:11:39.039+0300#011INFO#011initializing dbft#011{"height": 2336998, "view": 0, "index": 5, "role": "Backup"}
Oct 14 21:11:39 267045 neogo-n3c[20321]: 2022-10-14T21:11:39.213+0300#011INFO#011persisted to disk#011{"blocks": 1, "keys": 102, "headerHeight": 2336997, "blockHeight": 2336997, "took": "4.97205ms"}
Oct 14 21:11:45 267045 neogo-n3c[20321]: 2022-10-14T21:11:45.217+0300#011INFO#011persisted to disk#011{"blocks": 0, "keys": 2, "headerHeight": 2336997, "blockHeight": 2336997, "took": "3.608943ms"}
Oct 14 21:12:08 267045 neogo-n3c[20321]: 2022-10-14T21:12:08.989+0300#011INFO#011skip change view#011{"nc": 0, "nf": 6}
Oct 14 21:12:09 267045 neogo-n3c[20321]: 2022-10-14T21:12:09.463+0300#011INFO#011received ChangeView#011{"validator": 1, "reason": "Timeout", "new view": 1}
Oct 14 21:12:09 267045 neogo-n3c[20321]: 2022-10-14T21:12:09.564+0300#011INFO#011received ChangeView#011{"validator": 0, "reason": "Timeout", "new view": 1}
Oct 14 21:12:09 267045 neogo-n3c[20321]: 2022-10-14T21:12:09.961+0300#011INFO#011received ChangeView#011{"validator": 2, "reason": "Timeout", "new view": 1}
Oct 14 21:12:11 267045 neogo-n3c[20321]: 2022-10-14T21:12:11.201+0300#011INFO#011received ChangeView#011{"validator": 4, "reason": "Timeout", "new view": 1}
Oct 14 21:13:08 267045 neogo-n3c[20321]: 2022-10-14T21:13:08.990+0300#011INFO#011request change view#011{"view": 0, "height": 2336998, "reason": "Timeout", "new_view": 1, "nc": 0, "nf": 1}
Oct 14 21:13:08 267045 neogo-n3c[20321]: 2022-10-14T21:13:08.991+0300#011INFO#011changing dbft view#011{"height": 2336998, "view": 1, "index": 5, "role": "Primary"}
Oct 14 21:13:08 267045 neogo-n3c[20321]: 2022-10-14T21:13:08.991+0300#011INFO#011sending PrepareRequest#011{"height": 2336998, "view": 1}
Oct 14 21:13:09 267045 neogo-n3c[20321]: 2022-10-14T21:13:09.409+0300#011INFO#011received PrepareResponse#011{"validator": 2}
Oct 14 21:13:09 267045 neogo-n3c[20321]: 2022-10-14T21:13:09.446+0300#011INFO#011received PrepareResponse#011{"validator": 1}
Oct 14 21:13:09 267045 neogo-n3c[20321]: 2022-10-14T21:13:09.456+0300#011INFO#011received PrepareResponse#011{"validator": 0}
Oct 14 21:13:09 267045 neogo-n3c[20321]: 2022-10-14T21:13:09.493+0300#011INFO#011received PrepareResponse#011{"validator": 4}
Oct 14 21:13:09 267045 neogo-n3c[20321]: 2022-10-14T21:13:09.494+0300#011INFO#011sending Commit#011{"height": 2336998, "view": 1}
Oct 14 21:13:09 267045 neogo-n3c[20321]: 2022-10-14T21:13:09.903+0300#011INFO#011received Commit#011{"validator": 0}
Oct 14 21:13:09 267045 neogo-n3c[20321]: 2022-10-14T21:13:09.929+0300#011INFO#011received Commit#011{"validator": 1}
Oct 14 21:13:09 267045 neogo-n3c[20321]: 2022-10-14T21:13:09.942+0300#011INFO#011received Commit#011{"validator": 4}
Oct 14 21:13:11 267045 neogo-n3c[20321]: 2022-10-14T21:13:11.225+0300#011INFO#011received PrepareResponse#011{"validator": 3}
Oct 14 21:13:11 267045 neogo-n3c[20321]: 2022-10-14T21:13:11.391+0300#011INFO#011received Commit#011{"validator": 0}
Oct 14 21:13:11 267045 neogo-n3c[20321]: 2022-10-14T21:13:11.391+0300#011INFO#011received Commit#011{"validator": 1}
Oct 14 21:13:11 267045 neogo-n3c[20321]: 2022-10-14T21:13:11.391+0300#011INFO#011received Commit#011{"validator": 4}
Oct 14 21:13:11 267045 neogo-n3c[20321]: 2022-10-14T21:13:11.391+0300#011INFO#011received Commit#011{"validator": 5}
Oct 14 21:13:11 267045 neogo-n3c[20321]: 2022-10-14T21:13:11.583+0300#011INFO#011received Commit#011{"validator": 3}
Oct 14 21:13:11 267045 neogo-n3c[20321]: 2022-10-14T21:13:11.584+0300#011INFO#011approving block#011{"height": 2336998, "hash": "69b7bfde3dedebe667846a3a5798b09630a21528b67f91a19472fe201757d324", "tx_count": 2, "merkle": "2493ffa02ef83ddc6ccdb4517d8eeecc48640e6177194916dc774b18767ee446", "prev": "dbb0913c32edd179d9d7e85315de92f5166060ade658b9efe7e620efd1fe9ab4"}

One simple view change and we're done. It just happened so that for this block/view number combination our node is the one to create PrepareRequest. The only thing that might be interesting is this line:

Oct 14 21:12:08 267045 neogo-n3c[20321]: 2022-10-14T21:12:08.989+0300#011INFO#011skip change view#011{"nc": 0, "nf": 6}

but that's neo-project/neo#2057, the node has not seen any traffic from other nodes, so it's a little shy to send CV, instead it sends a recovery requests and waits for other messages. Then 4 CVs are delivered and then after the second timer event (+60 seconds) it feels brave enough to join the CV party.

I would expect this validator to only propose a new block after exponential time, i.e., waiting a lot longer than 90 seconds (6*15 seconds, with 15 seconds block time).

I think this expectation is not a correct one, once we're past 15s blocks will be produced right after the view change (if primary node is alive for this view of course). Exponential backoff doesn't mean that the first view change will happen after 6*15, in fact it could happen right after 30s, but then timers will be adjusted to give more and more time after each CV.

The current assumption is that the 6 validators that run a C# node could somehow not manage to include an OracleResponse for the corresponding previous OracleRequest in a block, i.e., there might be an issue with the Oracle plugin there

  1. While technically it's possible to have an oracle node that at the same time is a CN, I don't think that's the setup we have (at least at NSPCC those are two different node instances). And if they're properly separated, then CNs get the response transaction via P2P, they don't care much about what oracle nodes do to create this transaction.
  2. In this particular case it's one C# node that failed to create a proposal for a new block. It happens, maybe it was down at this moment, not the end of the world. I'd worry if I'm to see that this happens with every oracle request/response, but I doubt that's the case, at least in some previous tests we've seen C# nodes including oracle response transactions into blocks without any issues.

@mialbu
Copy link
Author

mialbu commented Oct 21, 2022

Hey @roman-khimov thanks for the logs and the explanation! Okay, then it seems everything is fine. 🙏🏼

So, just to get this straight:

As the NSPCC node was speaker for the last block, it sets its timeout to 15 << 0+1 (=30s) with view=0. So, after not receiving anything until this timeout was reached, i.e., 30s after the last block was approved, it sent a recovery request (skip change view log).

T21:11:38.989 (0s)

Oct 14 21:11:38 267045 neogo-n3c[20321]: 2022-10-14T21:11:38.989+0300#011INFO#011approving block#011{"height": 2336997, "hash": "dbb0913c32edd179d9d7e85315de92f5166060ade658b9efe7e620efd1fe9ab4", "tx_count": 1, "merkle": "a95b9f9ed242bcb10cb8aebef6a4e705756dba4876839dc85ebd84bd024a7e48", "prev": "740f5278d7011ec045e2545ed9ad9ae92dcb02ec5cfbbad3a8e78505ed27ad22"}

T21:12:08.989 (+30s)

Oct 14 21:12:08 267045 neogo-n3c[20321]: 2022-10-14T21:12:08.989+0300#011INFO#011skip change view#011{"nc": 0, "nf": 6}

Then, the timeout is set to 15 << 1+1 (= 60s), and 60s after that recovery request, the node actually changes the view (request change view and changing dbft view logs) and then immediately sends a PrepareRequest to propose the next block.

T21:13:08.990 (+60s -> +90s in total)

Oct 14 21:13:08 267045 neogo-n3c[20321]: 2022-10-14T21:13:08.990+0300#011INFO#011request change view#011{"view": 0, "height": 2336998, "reason": "Timeout", "new_view": 1, "nc": 0, "nf": 1}
Oct 14 21:13:08 267045 neogo-n3c[20321]: 2022-10-14T21:13:08.991+0300#011INFO#011changing dbft view#011{"height": 2336998, "view": 1, "index": 5, "role": "Primary"}
Oct 14 21:13:08 267045 neogo-n3c[20321]: 2022-10-14T21:13:08.991+0300#011INFO#011sending PrepareRequest#011{"height": 2336998, "view": 1}
...
Oct 14 21:13:11 267045 neogo-n3c[20321]: 2022-10-14T21:13:11.584+0300#011INFO#011approving block#011{"height": 2336998, "hash": "69b7bfde3dedebe667846a3a5798b09630a21528b67f91a19472fe201757d324", "tx_count": 2, "merkle": "2493ffa02ef83ddc6ccdb4517d8eeecc48640e6177194916dc774b18767ee446", "prev": "dbb0913c32edd179d9d7e85315de92f5166060ade658b9efe7e620efd1fe9ab4"}

Note: By now, I know for a fact that actually a node was not running at the time.

Did I get this right?

@roman-khimov
Copy link

Yep, that's how it all has happened.

@mialbu
Copy link
Author

mialbu commented Oct 21, 2022

Yep, that's how it all has happened.

All right, thanks for verifying. 👍🏼

@mialbu mialbu closed this as completed Oct 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants