forked from jl777/komodo
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for staking code and thread shutdown #505
Closed
Closed
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
acd2b3b
wait for mining threads to stop
dimxy 24f7d50
convert staking array to vector (to correctly destroy members) and ma…
dimxy 87ac53d
staking logging improved
dimxy f582ad9
maxkp replaced with array.capacity
dimxy ba1b22a
'struct' eliminated
dimxy 5098d67
unneeded memset removed
dimxy dcbf657
Update src/miner.cpp
dimxy f8c17ce
add interruption_point in komodo_waituntilelegible
dimxy 12261b9
change sleeps to interruptible in miner
dimxy 106c9d3
Merge branch 'dev' into dimxy-fix-staking-array
dimxy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the lack of interruption points, using both interrupt_all() and join_all() can (and "randomly" will) bring everything to a grinding halt when between interruptibles and unjoinable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about lack of interruption points.
Currently we do not have join what leads to inaccurate thread shutdown.
And adding join_all works well in marmara and fixed a crash when a user called 'setgenerate false' and 'setgenerate true' quickly in a sequence.
But I missed an extra interruption_point (from the marmara code) in komodo_waituntilelegible which does a long loop which would make thread shutdowns faster, adding it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually where adding the join_all() creates the issue. Most specifically, it creates an issue with NN mining and CreateNewBlock (which MCL doesn't use).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what issue you mean? I believe joins may create an issue when a thread can never be joined and hangs because it may have a long loop without interruption points or sleep or wait calls. If we have such loops we need to fix this to provide thread graceful shutdown to work.
(I am going to run my dev NN node on this branch for testing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BitcoinMiner
only contains a singleinterruption_point()
That point comes after the while loop at:
komodo/src/miner.cpp
Line 1969 in dcbf657
which can hold up to 17.5 minutes, as set by:
komodo/src/miner.cpp
Line 860 in dcbf657
Ironically, the likelihood of that longest pause is increased by the new "stall reduction" code increasing the possibility for
gpucount
to reach 0 (assuming the rand hits 1056, which is possible).Being async, the thought would be that this part would stick on its own and there would be no care; however, in reality, there are about a dozen circumstances where it locks everything (one of those circumstances being that threads are ignorant of each other and on advanced hardware come back with multiple solves when many miner threads are used).
When you look at what "should be" vs "what is", it shouldn't be a problem because NNs "should be" only running one thread; however, this is becoming decreasingly true with more and more NNs seeking to hit smaller and more predictable gaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good catch, this loop.
However I believe when a user calls setgenerate false and then setgenerate true without join there will be two running threads for some time and this is basically not good at all (in a staking chain this could even create a crash as there is a static komodo_staking *array var that could be corrupted in this case). So I think threads should be stopped gracefully by joining.
This loop you mentioned has sleep() inside and we can replace it on boost::this_thread::sleep_for function (which allows to interrupt the thread) and we should check other remaining loops in miner.cpp to add interruption points in a similar way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However you want to do it, just want to make sure that you/everyone is aware that doing it as-is will lock up NNs; so, whatever way it's done to protect stakers that could crash needs to be done in such a way as to protect both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about your doubts though.
Deleting a thread object without join is an obvious bug IMO and should be fixed.
Fixing it may add some delay on daemon stopping or setgenerate false but this is not a lock-up if we do this properly.
Checking all loops in miner.cpp...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw this code does not work at all:
komodo/src/miner.cpp
Line 251 in 91ea37b
Maybe we should fix this too as it was intended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't fully tested it yet, but (along with the others)
komodo/src/miner.cpp
Line 1989 in 12261b9
does look like it'll solve the issue of my concern. ty