-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If node in the snapshot prune phase, the synchronization process will be affected, causing the block to lag behind #587
Comments
Will do more investigate. |
You node not on chain tip? |
|
"[snapshots] pruning state " |
You can grep logs about |
Could you please provide more logs regarding |
|
Try with i7ie instance, 14.44Mgas/s is too slow. Faster cpu frequencies will benefit more when in live sync. |
@blxdyx r7a.4xlarge + io2 ? how much io2 ebs's iops ? |
8000 IOPS |
Are you using io2 or gp3 volumes for storage? If it's an 8000 IOPS gp3 volume, it might not be sufficient, right? Let me run a test. |
Hi @blxdyx
![]() Our i4i.4xlarge also achieves a speed of 125.54M gas/s. However, the node blocks still fall behind in synchronization during the snapshot prune state. |
Not live sync. It can reach 300Mgas/s during history sync in our node |
Sorry, it's an 16000 IOPS gp3 volume, you could try it. |
It will affect the performance, but if your node has a stable live sync speed of 40M gas/s, it shouldn't lag. |
What's about stop the chain-scanning tasks? May you need bigger ram and higher cpu frequencies. |
The latest log shows it's on chain tip, when after steps 4308. Before this the node still do history sync. |
This block lagging issue occurs approximately once a day, always during the prune state. |
But the logs seem shows that your node restart at |
After the restart you will do history sync. That's why the step 4134 need so much time to pruneSmallBatches. |
It shows it never on the chain tip. You can cat the mgas in this node. |
This node has consistently stayed at the chain tip for about 99% of the time, maintaining the latest block height. Only 1% of the time, when the logs show "snapshot state prune," does the block height fall behind. This is because we’ve been using it for chain scanning. We have multiple Erigon nodes, and we also compare the block height with the official nodes and our other nodes. If this node falls behind, an alert is triggered immediately, and we are notified right away. |
That's on-chain tip. But you logs shows before |
The memory is sufficient. I've tested it on i3en.3xlarge,i4i.4xlarge,r7a.4xlarge. It have the same logs 。That is, every day there will be 10 minutes to 20 minutes of block not on-chain tip. |
So could you show the log before the |
[ ![]() This node only falls behind during the pruning period, and at all other times, it is at the on-chain tip, maintaining the latest block. |
Other nodes are similar. But mgas/s is going to be higher than this node. |
The reason is your device need improve. You can see during the log your speed is nearly |
The |
thanks a lot .let me hava a try |
Are you sure you're using a gp3 storage with 16k IOPS? Why is it that with the same machine and storage I agreed on, it can't even reach 40M gas/s without any API requests? |
Does these affect the speed? |
No, i make sure it reach 40Mgas/s from our grafana. |
It may affect the Mmap or other system calls performance, maybe try bigger memory limit. |
https://github.com/erigontech/erigon?tab=readme-ov-file#htop-shows-incorrect-memory-usage |
Reopen if still have problem. |
I'm getting this from time to time but it doesn't seem like I have a memory limit set in the container? @atlasW which arguments did you end up using for your conf?
|
cpu, ram and disk all will affect. Could you show more detail. |
No matter how I set up and configure it inside the container, the problem couldn’t be resolved. I tried various Docker configurations and node settings, but none of them could prevent this issue. In the end, I abandoned the container and ran it directly on the system, which finally made it stable. |
Welp, I'm not sure what it could be, it only started happening with Erigon3, it never happened with the previous versions. I don't think docker should affect the i/o performance
|
version:bsc-erigon_v1.3.0-alpha7 ec2 instance: i4i.4xlarge Startup parameter:/root/bsc-erigon/erigon \ --datadir="/root/node" \ --chain=bsc \ --port=30303 \ --http.port=8545 \ --authrpc.port=8551 \ --torrent.port=42069 \ --private.api.addr=127.0.0.1:9090 \ --http --ws \ --http.addr=0.0.0.0 \ --rpc.batch.limit=300 \ --http.vhosts=* \ --prune.mode=archive \ --prune.distance.blocks=900000 \ --prune.distance=900000 \ --http.api=eth,debug,net,trace,web3,erigon,bsc \ --nat=none
issue: We have three Erigon nodes, but these nodes enter the snapshot prune phase simultaneously. During this phase, block synchronization is affected, causing the nodes to fall behind in block height. This situation lasts for approximately 17 minutes each time. Is there any way to mitigate this impact?


The text was updated successfully, but these errors were encountered: