Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add test reporting doc to benchmarks dir #3238

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

stubbsta
Copy link

@stubbsta stubbsta commented Jan 14, 2025

Description

This PR is a first pass at adding a nwaku test summary page which aims to provide a quick reference for anyone implementing the waku protocol using nwaku to see the expected performance as well as have quick access to test reports.

Changes

  • Added file docs/benchmarks/test-results-summary.md

How to test

  1. Pull https://github.com/waku-org/docs.waku.org
  2. Edit url to use this branch: https://github.com/waku-org/docs.waku.org/blob/develop/fetch-content.js#L64
  3. yarn build
  4. yarn serve

@stubbsta stubbsta changed the title Add test reporting doc to benchmarks dir docs: Add test reporting doc to benchmarks dir Jan 14, 2025
@stubbsta
Copy link
Author

This is a draft to get inputs on the formatting and content.
The Quick ref section definitely needs work. I would appreciate some info on how to relate the data back to Status App.

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I've added a couple of comments and suggestions on what other sections to add.

docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Outdated Show resolved Hide resolved
docs/benchmarks/test-results-summary.md Show resolved Hide resolved
@stubbsta stubbsta force-pushed the add-performance-benchmarks-overview branch from 2362ed1 to e01a61e Compare January 20, 2025 06:00
@stubbsta
Copy link
Author

@fryorcraken I wonder if the TL;DR section is not too wordy? Is the requirement not to have it be something very short that can be read quickly and easily remembered such as:

  • Relay network average bandwidth usage: x KB/s
  • Disv5 average bandwidth usage: y KB/s
  • etc.

and then if the reader wants more info (such as the network size and message rate for the simulations where the above values were obtained, they can look at the Insights section and if they want even more info they can go look at the reports on notion?

@stubbsta stubbsta requested a review from jm-clius January 27, 2025 09:30
Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments. :)


> ## TL;DR
>
> - libp2p bandwidth usage fluctuates between 5 and 15 KB/s for topologies of up to 1000 nodes, with average bandwidth usage at **10 KB/s**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the bandwidth numbers to make sense, we need to add the message rate and size. Perhaps just mentioning the average and max bandwidth is enough?

This is expected for Relay networks and the slight fluctuation could be due to simulation artifacts or chance differences in routing or connectivity between test runs.
> - The average time for a message to propagate to 100% of nodes in topologies of up to 2000 Relay nodes is **0.4s**.
> - The average per-node bandwidth usage of the discv5 protocol is **8 KB/s** for incoming traffic and **7.4 KB/s** for outgoing traffic.
This is for a network with 100 continuously online nodes, sending 1KB messages at 1s intervals.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the only relevant detail for discv5 here is the number of nodes and not the message size or rate. However, do we have some understanding if the discv5 bandwidth usage does fluctuate much with number of nodes? If not, we can leave out the number of nodes too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the results here: https://www.notion.so/Measure-DiscV5-bandwidth-with-Waku-discovery-1698f96fb65c80659fa1fbfdac49b1ef?pvs=4#16a8f96fb65c8060ac93dd35e2b9c464
there is some fluctuation when comparing the bandwidth usage for varying total nodes in the network.
The data from the referenced test includes the data from the start of the simulation, which we have not yet determined how much it impacts the bandwidth usage results.
Because of this I think it's best to keep the total number of nodes in, until we get more information, if you agree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would make sense. I would still suggest leaving out the message size/rate stats as it describes a domain separate from discv5.

Comment on lines +39 to +41
| [Relay](https://www.notion.so/Waku-regression-testing-v0-34-1618f96fb65c803bb7bad6ecd6bafff9) (1000 nodes) | 0.05 | 1.6 |
| [Mixed](https://www.notion.so/Mixed-environment-analysis-1688f96fb65c809eb235c59b97d6e15b) (210 nodes) | 0.0125 | 0.007 |
| [Non-persistent Relay](https://www.notion.so/High-Churn-Relay-Store-Reliability-16c8f96fb65c8008bacaf5e86881160c) (510 nodes)| 0.0125 | 0.25 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just add very brief description of what "Relay", "Mixed" and "Non-persistent Relay" means, so that a reader doesn't have to click the links to get an intuitive understanding.


## Testing
### DST
The VAC DST team performs regression testing on all new **nwaku** releases, comparing performance with previous versions. They simulate large Waku networks with a variety of network and protocol configurations that are representative of real-world usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The VAC DST team performs regression testing on all new **nwaku** releases, comparing performance with previous versions. They simulate large Waku networks with a variety of network and protocol configurations that are representative of real-world usage.
The VAC DST team performs regression testing on all new **nwaku** releases, comparing performance with previous versions.
They simulate large Waku networks with a variety of network and protocol configurations that are representative of real-world usage.

Semantic breaks, here and further down. :)


> ## TL;DR
>
> - libp2p bandwidth usage fluctuates between 5 and 15 KB/s for topologies of up to 1000 nodes, with average bandwidth usage at **10 KB/s**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know the configured degree D here - I think this is just the default of 6? Perhaps not worth mentioning if this is a "well-known fact" about Waku.

So, on second thought I think we can simplify this TL;DR, focus on the critical conclusion and use less domain terms. For example, our first sentence suggests that we have concluded an average of 10 KB/s only up to 1000 nodes, but in the next sentence we say roughly the same but this time for up to 2000 nodes. I'd suggest something like:

Waku bandwidth (minus traffic related to discv5 Discovery) averages ~10KB/s for a message injection rate of X KB/s for any topology size* (*confirmed up to 2000 nodes).

I think X is 1KB/s (i.e. 1KB message every 1 second)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants