Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: fan in incoming txns into backlog worker #6126

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

algorandskiy
Copy link
Contributor

@algorandskiy algorandskiy commented Sep 10, 2024

Summary

While investigating p2p TX traffic and performance, I found transaction pool mutex congestion. This PR is a PoC to use backlogWorker as a pool only accessor similarly to wsnet.
Implementation summary:

  • Added syncCh channel to work item wi
  • When it is initiated from gossipsub validator, syncCh is initiated and awaited making validation synchronous as before.
  • At each tx validation check backlogWorker checks if syncCh is set and responds with validation result to the channel.

Additionally, there are couple more fixes that helped with TPS as well:

  • add some workers to pull sub.Next() faster.
  • cache signed transaction ID to reuse in remember and txpool recomputing on a next block arrival.

Test Plan

  1. Unit tests passed
  2. Cluster TPS test showed +700-900 TPS gain (total 7.5k TPS) on a single txn payment scenario.
    image

Copy link

codecov bot commented Sep 10, 2024

Codecov Report

Attention: Patch coverage is 35.41667% with 31 lines in your changes missing coverage. Please review.

Project coverage is 50.77%. Comparing base (8832ed5) to head (ab33303).

Files with missing lines Patch % Lines
data/txHandler.go 8.33% 17 Missing and 5 partials ⚠️
data/transactions/signedtxn.go 0.00% 4 Missing and 1 partial ⚠️
network/p2pNetwork.go 78.94% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6126      +/-   ##
==========================================
- Coverage   51.73%   50.77%   -0.97%     
==========================================
  Files         644      644              
  Lines       86519    86529      +10     
==========================================
- Hits        44763    43937     -826     
- Misses      38889    39683     +794     
- Partials     2867     2909      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@algorandskiy algorandskiy requested a review from cce September 10, 2024 20:41
@algorandskiy algorandskiy force-pushed the pavel/p2p-txhander-bg branch 4 times, most recently from a65acca to 2e055ca Compare September 20, 2024 18:54
@algorandskiy algorandskiy changed the title WIP: p2p: fan in incoming txns into backlog worker p2p: fan in incoming txns into backlog worker Sep 21, 2024
@algorandskiy algorandskiy marked this pull request as ready for review September 21, 2024 00:12
@algorandskiy
Copy link
Contributor Author

Merged master in.

@@ -59,9 +61,18 @@ type SignedTxnWithAD struct {

// ID returns the Txid (i.e., hash) of the underlying transaction.
func (s SignedTxn) ID() Txid {
if s.cachedID != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: did we test performance both with and without the cachedID change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not recall but can perf tests

}

if handler.checkAlreadyCommitted(wi) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are cutting down on duplicated logic between what the backlogWorker is doing vs incoming message validation.


transactionMessagesRemember.Inc(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relying on backlog worker for this?

@@ -2774,6 +2774,8 @@ func TestTxHandlerValidateIncomingTxMessage(t *testing.T) {

handler, err := makeTestTxHandler(ledger, cfg)
require.NoError(t, err)
handler.Start()
defer handler.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handler variable gets set to a new test handler on line 2810 - which handler will this refer to when stop() is called (I would think latest referenced object).

if err != nil {
if err != pubsub.ErrSubscriptionCancelled && err != context.Canceled {
n.log.Errorf("Error reading from subscription %v, peerId %s", err, n.service.ID())
const threads = incomingThreads / 2 // perf tests showed that 10 (half of incomingThreads) was optimal in terms of TPS (attempted 1, 5, 10, 20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was on our default specced hardware?

Copy link
Contributor

@gmalouf gmalouf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put a few comments in, I think I follow the syncCh changes, assume perf testing was to isolate that change specifically (vs other ones in the PR) was done as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants