Do a GC run before running the benchmarks #72

udesou · 2023-10-10T05:40:45Z

When I was running the benchmarks with MMTk, I noticed that for some reason, running it through a script would trigger GC in the benchmark, whereas running it via command line wouldn't.
The fix: making sure full GC is ran before running the benchmark to get rid of any garbage that may exist. Since this may also be a problem for the Julia GC, I'm opening this PR.

NB: when collecting statistics via MMTk harness methods, it always does a GC before the benchmark, exactly to target this issue. However, we are using Julia's GC stats (as much as we can), and that's why it was problematic in our case.

qinsoon · 2023-10-11T00:12:45Z

A bit more information on this PR: The GCBenchmarks are executed by a wrapper (run_benchmarks.jl) and the gctime macro in Julia. Any objects created in this Julia code before the actual benchmark commences will also be present in the heap. These objects aren't allocated by the benchmark itself, but they might influence GC decisions and statistics for the benchmark. In practice, we discovered a scenario where we observed different GC decisions (1 GC vs. 0 GC) when executing run_benchmarks.jl through a Python script compared to running run_benchmarks.jl directly in a terminal using the same command line. We suspect that Julia might allocate resources differently based on the runtime environment. These varying runtime environments can result in different initial heap states, leading to distinct GC behaviors. By performing an explicit GC before the actual benchmark to clear any residual garbage, we can minimize this discrepancy. This adjustment resolved the specific issue we encountered.

oscardssmith · 2023-10-11T00:45:46Z

I think this looks relatively reasonable.

steveblackburn · 2023-10-11T01:58:32Z

Wait. I'm confused by this.

The key question is whether the workload is properly harnessed. This is a key concept in our methodology --- that we control precisely what we measure. We typically do not 'harness' the entire execution, rather we have the core of the measurable workload harnessed, and we (optionally) run that harnessed core multiple times.

It sounds to me like you're comparing a harnessed workload to the total wall clock time for the workload.

If that's correct, this is not an MMTk issue; it's just a methodological problem (which is easily fixed :-).

udesou added 2 commits October 10, 2023 05:32

Do a GC run before running the benchmarks

9de7b3f

Removing println

6f1ccf3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do a GC run before running the benchmarks #72

Do a GC run before running the benchmarks #72

udesou commented Oct 10, 2023

qinsoon commented Oct 11, 2023

oscardssmith commented Oct 11, 2023

steveblackburn commented Oct 11, 2023

Do a GC run before running the benchmarks #72

Are you sure you want to change the base?

Do a GC run before running the benchmarks #72

Conversation

udesou commented Oct 10, 2023

qinsoon commented Oct 11, 2023

oscardssmith commented Oct 11, 2023

steveblackburn commented Oct 11, 2023