Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random failure in worker node doing PC1: handle me:write tcp 192.168.2.4:3456->192.168.2.100:58696: i/o timeout #3585

Closed
RobQuistNL opened this issue Sep 5, 2020 · 9 comments

Comments

@RobQuistNL
Copy link
Contributor

Describe the bug
A worker was doing 2 PC'1, and it randomly stopped:

2020-09-05T23:29:04.368 INFO storage_proofs_porep::stacked::vanilla::proof > generating layer: 7
2020-09-05T23:36:45.261 INFO storage_proofs_porep::stacked::vanilla::proof >   storing labels on disk
2020-09-05T23:37:09.822 INFO storage_proofs_porep::stacked::vanilla::proof >   generated layer 11 store with id layer-11
2020-09-05T23:37:09.823 INFO storage_proofs_porep::stacked::vanilla::proof >   setting exp parents
2020-09-05T23:37:12.338 INFO filecoin_proofs::api::seal > seal_pre_commit_phase1:finish
2020-09-05T23:37:12.371 INFO filcrypto::proofs::api > seal_pre_commit_phase1: finish
2020-09-05T23:37:12.389Z	ERROR	rpc	[email protected]/websocket.go:121	handle me:write tcp 192.168.2.4:3456->192.168.2.100:58696: i/o timeout

To Reproduce
No idea, machine was just doing the PC1's and it stopped.

Expected behavior
No i/o timeout, or if some hardware fails, the connection is retried.

Version 0.5.10:

@RobQuistNL
Copy link
Contributor Author

@jennijuju
Copy link
Member

Quick update - this should be improved in a coming release.

@RobQuistNL
Copy link
Contributor Author

Do you know what commit fixed it @jennijuju ? I'm just curious :D

@jennijuju
Copy link
Member

Do you know what commit fixed it @jennijuju ? I'm just curious :D

It's still a WIP, but you can take a sneak peak haha -> filecoin-project/go-jsonrpc#27

@RobQuistNL
Copy link
Contributor Author

Awesome 😎 thanks :)

@BigOceanGG
Copy link
Contributor

@RobQuistNL How to fix it?

@Meatball13
Copy link

Meatball13 commented Oct 2, 2020

Did a fix ever make it into release? I've got one worker that seems to do it once every day or two. Happened on both 0.8.0 and 0.8.1. Only way to fix is to kill the worker process and restart it (usually restarting the PC1 process over again)

@RobQuistNL
Copy link
Contributor Author

On 0.8.1 now too, and my workers keep disconnecting from the mian miner..

2020-10-03T15:31:58.853 INFO storage_proofs_porep::stacked::vanilla::proof >   generated layer 11 store with id layer-11
2020-10-03T15:31:58.855 INFO storage_proofs_porep::stacked::vanilla::proof >   setting exp parents
2020-10-03T15:32:01.419 INFO filecoin_proofs::api::seal > seal_pre_commit_phase1:finish
2020-10-03T15:32:01.428 INFO filcrypto::proofs::api > seal_pre_commit_phase1: finish
2020-10-03T15:32:01.435Z	ERROR	rpc	[email protected]/websocket.go:121	handle me:write tcp 192.168.2.4:3456->192.168.2.100:38632: i/o timeout
2020-10-03T15:38:33.370Z	INFO	stores	stores/http_handler.go:55	SERVE GET /remote/sealed/s-t02388-378

@cryptowhizzard
Copy link

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants