Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forest doesn't recover from drand failure #5055

Open
LesnyRumcajs opened this issue Dec 9, 2024 · 0 comments
Open

Forest doesn't recover from drand failure #5055

LesnyRumcajs opened this issue Dec 9, 2024 · 0 comments
Labels
Type: Bug Something isn't working

Comments

@LesnyRumcajs
Copy link
Member

LesnyRumcajs commented Dec 9, 2024

Describe the bug

Reported on Slack by @parthshah1

When Lotus Miner + Lotus are unable to get the drand beacon they fail with:

[        42.161] [     service_lotus-1] [err] 2024-12-06T02:46:13.407Z	WARN	rpc	[email protected]/handler.go:421	error in RPC call to 'Filecoin.StateGetBeaconEntry': drand failed Get request:
[        42.161] [     service_lotus-1] [err]     github.com/filecoin-project/lotus/chain/beacon/drand.(*DrandBeacon).Entry.func1
[        42.161] [     service_lotus-1] [err]         /lotus/chain/beacon/drand/drand.go:166
[        42.161] [     service_lotus-1] [err]   - no valid clients - decoding response: EOF - decoding response: EOF - decoding response: EOF
[        42.162] [service_lotus-miner-1] [err] 2024-12-06T02:46:13.408Z	ERROR	miner	miner/miner.go:274	failed getting beacon entry: drand failed Get request: no valid clients - decoding response: EOF - decoding response: EOF - decoding response: EOF
[        42.162] [service_lotus-miner-1] [err] 2024-12-06T02:46:13.409Z	INFO	storageminer	storage/winning_prover.go:70	Computing WinningPoSt ;[{SealProof:5 SectorNumber:0 SectorKey:<nil> SealedCID:bagboea4b5abcaxqsx6tl25pbxyq3uyyqnwghmdmdrkyy5odehf4eotl6is2yp3zu}]; [79 244 33 56 182 139 242 46 79 79 12 85 255 6 75 126 190 6 129 110 37 10 97 226 86 195 112 190 76 143 86 111]
[        42.162] [service_lotus-miner-1] [err] 2024-12-06T02:46:13.409Z	INFO	advmgr	sealer/manager_post.go:24	GenerateWinningPoSt run at lotus-miner
[        42.164] [     service_lotus-1] [err] 2024-12-06T02:46:13.410Z	WARN	fullnode	full/mpool.go:202	Push from ID address (t0100), adjusting to t3w4jtmrcx3mcvfdmjbbkrkyod7bgqupkjl4lrvej7zlgfkabekkirdlsuqzqm6fugfauvnhwjq6k2skrjvxja
[        42.166] [service_lotus-miner-1] [err] 2024-12-06T02:46:13.412Z	INFO	main	lotus-miner/init.go:625	Waiting for message: bafy2bzacebki4eylo4ruswp6einmq2y223fglaiaceldjafbpyyzw7flwi2ei
[        42.341] [service_lotus-miner-1] [err] 2024-12-06T02:46:13.587Z	INFO	storageminer	storage/winning_prover.go:77	GenerateWinningPoSt took 178.620219ms
[        42.341] [service_lotus-miner-1] [err] 2024-12-06T02:46:13.587Z	INFO	miner	miner/warmup.go:87	winning PoSt warmup successful	{"took": 0.181291282}
[        43.164] [     service_lotus-1] [err] 2024-12-06T02:46:14.410Z	WARN	rpc	[email protected]/handler.go:421	error in RPC call to 'Filecoin.StateGetBeaconEntry': drand failed Get request:
[        43.164] [     service_lotus-1] [err]     github.com/filecoin-project/lotus/chain/beacon/drand.(*DrandBeacon).Entry.func1

After retry, they manage fine and the miner starts mining blocks. Forest during startup isn’t able to catch the drand beacon, it never syncs and just keeps on throwing “validation errors”.

[      1149.766] [      service_forest] [inf] 2024-12-06T03:04:41.012644Z  WARN forest_filecoin::chain_sync::tipset_syncer: Validating block [CID = bafy2bzaceab7cz4luun43frmnssroiwnprsvrfunepsht3mmtcy4ez7ce7twc] in EPOCH = 2 failed: Validation error: Validation error: Consensus error: Failed to validate blocks random beacon values: Error validating data: beacon entry was invalid
[      1149.766] [      service_forest] [inf] 2024-12-06T03:04:41.012690Z ERROR forest_filecoin::chain_sync::tipset_syncer: Sync messages check state failed for tipset range
[      1149.766] [      service_forest] [inf] 2024-12-06T03:04:41.012726Z ERROR forest_filecoin::chain_sync::chain_muxer: Bootstrapping failed, re-evaluating the network head to retry the bootstrap. Error = TipsetRangeSyncer(Validation("Validation error: Consensus error: Failed to validate blocks random beacon values: Error validating data: beacon entry was invalid"))

To reproduce

It's likely reproducible by cutting internet access from the relevant services and then restoring it. Either that, or other means or injecting drand fault.

Log output

events_with_drand.log
events_3.log

Expected behaviour

Forest is able to follow the chain, as Lotus does.

Screenshots

Environment (please complete the following information):

  • OS: Linux
  • Forest commit: dd120ad
  • Lotus: lotus version 1.31.0+2k+git.198ee01.dirty

Other information and links

The setup is in https://github.com/FilecoinFoundationWeb/Filecoin-Antithesis, which is similar to our devnet compose but with custom drand service injected. Ping for access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something isn't working
Projects
Status: New
Development

No branches or pull requests

1 participant