Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: gen-tx don't run in parallel for single node #1645

Merged
merged 4 commits into from
Oct 17, 2024

Conversation

yihuang
Copy link
Collaborator

@yihuang yihuang commented Oct 17, 2024

Solution:

  • use multiprocessing library to do parallel tx gen

👮🏻👮🏻👮🏻 !!!! REFERENCE THE PROBLEM YOUR ARE SOLVING IN THE PR TITLE AND DESCRIBE YOUR SOLUTION HERE !!!! DO NOT FORGET !!!! 👮🏻👮🏻👮🏻

PR Checklist:

  • Have you read the CONTRIBUTING.md?
  • Does your PR follow the C4 patch requirements?
  • Have you rebased your work on top of the latest master?
  • Have you checked your code compiles? (make)
  • Have you included tests for any non-trivial functionality?
  • Have you checked your code passes the unit tests? (make test)
  • Have you checked your code formatting is correct? (go fmt)
  • Have you checked your basic code style is fine? (golangci-lint run)
  • If you added any dependencies, have you checked they do not contain any known vulnerabilities? (go list -json -m all | nancy sleuth)
  • If your changes affect the client infrastructure, have you run the integration test?
  • If your changes affect public APIs, does your PR follow the C4 evolution of public contracts?
  • If your code changes public APIs, have you incremented the crate version numbers and documented your changes in the CHANGELOG.md?
  • If you are contributing for the first time, please read the agreement in CONTRIBUTING.md now and add a comment to this pull request stating that your PR is in accordance with the Developer's Certificate of Origin.

Thank you for your code, it's appreciated! :)

Summary by CodeRabbit

  • New Features

    • Enhanced transaction generation process with parallelization for improved performance.
    • New function to split ranges into equal parts for better job distribution.
  • Bug Fixes

    • Resolved various issues related to transaction validation and multisig accounts.
    • Improved handling of acknowledgment processes and governance parameters.
  • Documentation

    • Updated CHANGELOG.md to reflect recent changes and improvements.

Solution:
- use multiprocessing library to do parallel tx gen
@yihuang yihuang requested a review from a team as a code owner October 17, 2024 04:18
@yihuang yihuang requested review from JayT106 and thomas-nguy and removed request for a team October 17, 2024 04:18
CHANGELOG.md Outdated Show resolved Hide resolved
Signed-off-by: yihuang <[email protected]>
@yihuang yihuang requested a review from mmsqe October 17, 2024 04:21
Copy link

codecov bot commented Oct 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 17.87%. Comparing base (3b38bcc) to head (84eb4e6).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1645   +/-   ##
=======================================
  Coverage   17.87%   17.87%           
=======================================
  Files          72       72           
  Lines        5170     5170           
=======================================
  Hits          924      924           
  Misses       4123     4123           
  Partials      123      123           

@yihuang yihuang added this pull request to the merge queue Oct 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Oct 17, 2024
Copy link
Contributor

coderabbitai bot commented Oct 17, 2024

Walkthrough

The pull request updates the CHANGELOG.md to document enhancements in transaction generation and various bug fixes. It introduces parallel transaction generation in transaction.py, utilizing multiprocessing for efficiency. Additionally, a new utility function for splitting ranges is added in utils.py. The changes collectively aim to improve performance, enhance testing capabilities, and fix critical issues in the system.

Changes

File Path Change Summary
CHANGELOG.md Updated with new entry for parallel test transactions and improvements in v1.4.0-rc1, including bug fixes.
testground/benchmark/benchmark/transaction.py Refactored gen function for parallel transaction generation; added Job named tuple for job handling.
testground/benchmark/benchmark/utils.py Added split(a: int, n: int) function to split a range into n parts.

Possibly related PRs

Suggested reviewers

  • mmsqe
  • calvinaco

🐇 "In fields of code where bunnies play,
New transactions hop in a parallel way.
With jobs in chunks, they swiftly align,
Bugs fixed and features, all looking fine!
So let’s celebrate with a joyful cheer,
For a robust system, we hold so dear!" 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@yihuang yihuang enabled auto-merge October 17, 2024 07:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)
testground/benchmark/benchmark/utils.py (1)

176-181: LGTM! Consider a more specific function name.

The split function is well-implemented and efficiently splits a range into n parts. The logic is correct and handles uneven divisions properly. The docstring and type hints improve readability and maintainability.

Consider renaming the function to something more specific, like split_range or partition_range, to avoid potential confusion with the built-in split method for strings and to better describe its purpose.

-def split(a: int, n: int):
+def split_range(a: int, n: int):
CHANGELOG.md (1)

7-8: LGTM! Consider fixing the PR link format.

The addition of parallel test transaction generation for single nodes is a valuable improvement that should enhance testing efficiency.

Consider updating the format of the second entry to match the first one:

-* (testground)[#1644](https://github.com/crypto-org-chain/cronos/pull/1644) load generator retry with backoff on error.
+* [#1644](https://github.com/crypto-org-chain/cronos/pull/1644) load generator retry with backoff on error.

This will make the PR link consistent with the other entries in the changelog.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~7-~7: Possible missing comma found.
Context: ...-chain/cronos/pull/1645) Gen test tx in parallel even in single node. * (testground)[#16...

(AI_HYDRA_LEO_MISSING_COMMA)

testground/benchmark/benchmark/transaction.py (2)

68-69: Use the logging module for thread-safe output in multiprocessing

Using print statements within multiprocessing code can result in jumbled or out-of-order console output due to concurrent access to stdout. The logging module is thread-safe and provides better control over log messages.

Consider replacing print with the logging module:

+    import logging
+
+    # Configure logging at the beginning of your script or module
+    logging.basicConfig(level=logging.INFO)
+
     for acct, txs in zip(accounts, acct_txs):
         for nonce in range(job.num_txs):
             txs.append(acct.sign_transaction(job.create_tx(nonce)).rawTransaction.hex())
             total += 1
             if total % 1000 == 0:
-                print("generated", total, "txs for node", job.global_seq)
+                logging.info(f"Generated {total} txs for node {job.global_seq}")

This change ensures that log messages are properly managed and thread-safe during multiprocessing.


1-5: Remove unused imports if not necessary

The imports asyncio and aiohttp are used for asynchronous operations later in the file, but please ensure that all imported modules are necessary for this script.

If any imports are not used, consider removing them to keep the code clean.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3b38bcc and 8fd2ae7.

📒 Files selected for processing (3)
  • CHANGELOG.md (1 hunks)
  • testground/benchmark/benchmark/transaction.py (3 hunks)
  • testground/benchmark/benchmark/utils.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md

[uncategorized] ~7-~7: Possible missing comma found.
Context: ...-chain/cronos/pull/1645) Gen test tx in parallel even in single node. * (testground)[#16...

(AI_HYDRA_LEO_MISSING_COMMA)

🔇 Additional comments (1)
testground/benchmark/benchmark/utils.py (1)

176-181: Verify the usage of the new function in the codebase.

The new split function looks good, but it's important to ensure it's being used correctly throughout the project.

Let's check for its usage:

This will help us understand how and where the function is being used, ensuring it's implemented correctly in other parts of the codebase.

✅ Verification successful

Adjusting the search to target .py files explicitly.


The usage of the new split function has been verified and is correctly utilized in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for usage of the split function in Python files.

# Test: Look for import statements or direct usage of the split function
rg --type python -e "from.*utils.*import.*split" -e "utils\.split\(" -e "split\([^\"']" -C 3

Length of output: 127


Script:

#!/bin/bash
# Description: Search for usage of the split function in Python files by targeting .py extensions.

# Test: Look for import statements or direct usage of the split function
rg -g "*.py" -e "from\s+.*utils\s+import\s+split" -e "utils\.split\(" -e "split\([^\"']" -C 3

Length of output: 2769

testground/benchmark/benchmark/transaction.py Show resolved Hide resolved
testground/benchmark/benchmark/transaction.py Outdated Show resolved Hide resolved
testground/benchmark/benchmark/transaction.py Outdated Show resolved Hide resolved
@yihuang yihuang added this pull request to the merge queue Oct 17, 2024
@yihuang yihuang removed this pull request from the merge queue due to a manual request Oct 17, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
testground/benchmark/benchmark/transaction.py (1)

74-89: LGTM with a minor suggestion: Parallel transaction generation

The refactoring of the gen function to use multiprocessing is well-implemented and should significantly improve performance for transaction generation. The approach of splitting work into chunks based on CPU count is appropriate.

One minor suggestion:
When combining results using itertools.zip_longest, you might introduce None values if job results have different lengths. Consider filtering out these None values:

all_txs = []
for txs in itertools.zip_longest(*itertools.chain(*acct_txs)):
    all_txs.extend(filter(None, txs))

This change ensures that only valid transactions are added to all_txs.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 8fd2ae7 and 84eb4e6.

📒 Files selected for processing (1)
  • testground/benchmark/benchmark/transaction.py (3 hunks)
🧰 Additional context used
🔇 Additional comments (3)
testground/benchmark/benchmark/transaction.py (3)

2-5: LGTM: New imports for parallel processing

The added imports (itertools, multiprocessing, os, and namedtuple from collections) are appropriate for implementing parallel transaction generation. These additions align well with the PR's objective of enabling parallel execution for gen-tx.


55-57: LGTM: Well-structured Job namedtuple

The Job namedtuple is well-defined with all necessary fields for parallel transaction generation. This structure will help in organizing and passing job-related data efficiently between processes.


Line range hint 1-91: Overall: Excellent implementation of parallel transaction generation

The changes made to this file successfully implement parallel transaction generation, which aligns perfectly with the PR's objective. Key points:

  1. The use of multiprocessing should significantly improve performance, especially for large numbers of transactions.
  2. The code structure is clean and maintainable, with clear separation of concerns between job definition, execution, and result aggregation.
  3. The core logic of transaction creation remains intact, minimizing the risk of introducing new bugs.

These changes should result in a substantial performance improvement for the gen-tx process when running on a single node with multiple CPU cores.

To further validate the improvements:

This script will help verify that the parallel implementation indeed provides a performance boost.

testground/benchmark/benchmark/transaction.py Show resolved Hide resolved
@yihuang yihuang added this pull request to the merge queue Oct 17, 2024
Merged via the queue into crypto-org-chain:main with commit f3746f6 Oct 17, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants