Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on fuzzer benchmarking setup #4590

Closed
wuestholz opened this issue Mar 17, 2023 · 3 comments
Closed

Feedback on fuzzer benchmarking setup #4590

wuestholz opened this issue Mar 17, 2023 · 3 comments

Comments

@wuestholz
Copy link

I'm trying to compare the Forge fuzzer with Echidna on several benchmark contracts.

To make the comparison as fair as possible, I've created a benchmark generator that automatically generates challenging contracts. The benchmarks intentionally use a limited subset of Solidity to avoid language features that could be handled differently by different tools. Each contract contains ~50 assertions (some can fail, but others cannot due to infeasible path conditions). (If you're curious, you can find one of the benchmarks here. The benchmark-generation approach is inspired by the Fuzzle benchmark generator for C-based fuzzers.) To find the assertions that can fail, a fuzzer needs to generate up to ~15 transactions and satisfy some input constraints for each transaction.

Since I'm not deeply familiar with the Forge fuzzer I'd like to check if there are any potential issues with my benchmark setup before sharing results.

Since the fuzzer does not support limiting the execution time (see issue at #4517), I'm repeatedly running the fuzzer for shorter periods until the time limit for all fuzzers (for instance, 1 hour for each contract). For each of these shorter fuzzing campaigns I'm using the following settings that deviate from the defaults:

  • fuzz.runs: 2048 (instead of 256)
  • fuzz.max_test_rejects: 1073741823 (instead of 65536)
  • invariant.runs: 2048 (instead of 256)
  • invariant.depth: 30 (instead of 15)

The motivation for increasing the runs setting is that 256 will terminate very quickly and I'd like to keep the overhead of repeatedly starting the fuzzer low.

I increased the depth to 30 since some assertions may require up to ~15 transactions, and some generated transactions may fail. Echidna uses 100 by default, and I'm happy to also use the same setting for Forge.

I also increased max_test_rejects to avoid terminating the fuzzer early (although with 2048 runs this bound would probably not be hit anyways).

Please let me know if you see any potential issues with this setup.

@grandizzy
Copy link
Collaborator

grandizzy commented May 1, 2024

hey @wuestholz a little bit late here, but giving it a try anyway :) I don't see any issue with your setup, you can safely bump the depth to 100 as echidna default, or even more. You can also change the runs to more than 2048. Re max_test_rejects that's useful only if you use vm.assume cheatcode (see https://book.getfoundry.sh/cheatcodes/assume#assume), so you can leave it default.
If you still on for the benchmark I can help with any issue you may encounter.

Also, this makes sense #990 (comment) with a little tweak setting runs to max possible, will be proposing an implementation soon. Thank you.

@wuestholz
Copy link
Author

wuestholz commented May 16, 2024

@grandizzy Thanks! That's good to know. :) You can see the current benchmarking setup for Foundry at https://github.com/Consensys/daedaluzz/blob/master/run-foundry.sh. Feel free to close this issue.

@zerosnacks
Copy link
Member

Marking as resolved

@jenpaff jenpaff moved this from Todo to Completed in Foundry Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants