Inquiry Regarding Edge Coverage Discrepancy #2

zhanghaoran1135 · 2024-04-01T09:37:27Z

In my reproduction experiment, the edge coverage on readelf reached only 10,606 within 24 hours and basically no longer rising. However, you mentioned an edge coverage of 53,859 in the paper, which is significantly different from my experimental result.

I use
./fuzz -i readelf_new -o seeds -l 7507 ./readelf -a @@
and
python nn.py ./readelf -a
to start the program I pulled from github project. And according to log file(log_fuzz)
2024-03-30 09:18:03: gradient fuzzing state: line_cnt 10 and edge num 5116 I start my program
2024-03-31 09:18:48: gradient fuzzing state: line_cnt 100 and edge num 10622 which is 24 hours
2024-04-01 01:49:13: gradient fuzzing state: line_cnt 60 and edge num 10947 hardly have any grouth
My running devices are RTX4090*2 and Intel (R) Xeon (R) Gold 6248R.

Can you give me some suggestion on this phenomenon. Thanks!

TIAmoCA · 2024-04-01T12:45:02Z

你好我也是用虚拟机跑了很久objdump覆盖率也提不上去，不知道是怎么回事，能讨论一下吗

zhanghaoran1135 · 2024-04-01T13:24:02Z

你好我也是用虚拟机跑了很久objdump覆盖率也提不上去，不知道是怎么回事，能讨论一下吗

我用docker、主机环境都测试过，neuzz、MTFuzz对readelf测试，边缘覆盖率都在10000左右基本不上升了，和论文写的50000多差挺大，是不是作者用的计算指标或者算的方法不太一样，想请教一波

TIAmoCA · 2024-04-01T13:26:23Z

是的，我的更少，objdump我测了一直在5500左右。。。。

Tricker-z · 2024-04-01T16:09:07Z

Hi

Thank you for your query regarding our method for measuring code coverage. I understand you've noticed some discrepancies between the metric described in our paper and what's implemented in this GitHub repository. I appreciate your diligence in seeking clarification on this matter.

To resolve any confusion, I would recommend reviewing Section 3.4.1 in our paper, where we discuss the rationale behind our choice of metrics, including any variations and the impact they have on our results.

To be specific, AFL utilizes trace_bits to store the state of edge coverage, with each edge being represented by 8 bits to denote different coverage counts. The AFL framework monitors changes at this bit level, preserving seed files for each unique change observed, which is central to its fuzzing efficiency and effectiveness in uncovering unique paths through a codebase.

In our research, we've adopted a nuanced approach to measuring coverage by not only considering the edge coverage data (i.e., whether an edge has been hit) but also by performing an in-depth analysis of all bit-level changes within trace_bits. This methodology allows us to capture a more granular level of coverage, accounting for the varying frequencies of edge executions which, in turn, offers insights into the thoroughness of the test suite and its ability to stimulate different behaviors in the target software.

This distinction is critical, as it enables us to evaluate the effectiveness of test cases on a more detailed scale than merely observing whether an edge was covered. By quantifying all bit changes, we assess the diversity and depth of the test coverage, providing a more comprehensive measure of the fuzzing process's effectiveness.

Thank you once again for your interest in our work. Your engagement with the finer points of our methodology is invaluable to us and the broader research community.

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry Regarding Edge Coverage Discrepancy #2

Inquiry Regarding Edge Coverage Discrepancy #2

zhanghaoran1135 commented Apr 1, 2024

TIAmoCA commented Apr 1, 2024

zhanghaoran1135 commented Apr 1, 2024

TIAmoCA commented Apr 1, 2024

Tricker-z commented Apr 1, 2024

Inquiry Regarding Edge Coverage Discrepancy #2

Inquiry Regarding Edge Coverage Discrepancy #2

Comments

zhanghaoran1135 commented Apr 1, 2024

TIAmoCA commented Apr 1, 2024

zhanghaoran1135 commented Apr 1, 2024

TIAmoCA commented Apr 1, 2024

Tricker-z commented Apr 1, 2024