Skip to content

Commit

Permalink
Updated benchmark results
Browse files Browse the repository at this point in the history
Signed-off-by: Andrea Zoppi <[email protected]>
  • Loading branch information
TexZK committed Apr 7, 2024
1 parent 32761bb commit 12d778c
Show file tree
Hide file tree
Showing 12 changed files with 32 additions and 46 deletions.
45 changes: 32 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,37 +108,56 @@ A basic benchmark suite is run via the following commands:
```sh
cd PATH_TO_PROJECT_ROOT/builddir
meson test --benchmark
meson compile benchmark-report
meson compile benchmark-report-tda8425
meson compile benchmark-report-ym7128
meson compile benchmark-report-ymf262
```


### OPL3 Benchmark Results
### Benchmark Results

Some preliminary benchmarks were run against some very different CPUs:

| System | OS | CPU | SIMD | Notes
|:-|:-|:-|:-|:-|
| PC | Windows 10 | i7 6700k | x86 SSE4.1 + AVX2 | Home PC |
| PC | Windows 10 | i7 6700k | x86 SSE4.1 + AVX2 | 2016 gaming PC |
| BeagleBone Black | Debian 11 | ARM Cortex-A8 | ARMv7 NEON | Headless |
| Raspberry Pi 5 | Debian 12 | ARM Cortex-A76 | ARMv7 NEON | Headless + Heatsink Fan |

All the systems were updated to their latest software and OS releases.
The compiler was *GCC* for all these machines.
All the scores were played via `aymo_ymf262_play --benchmark --loops 3`, except for the *BBB* which did not loop (too slow!).

All the systems run `--cpu-ext dummy`, which mimics the overhead of the test harness itself (mostly the score decoder), to subtract it from the actual benchmarks.
The reference implementation is *NukedOPL3*, run as `--cpu-ext none`.

Here's a summary of the results:
All the benchmarks results are normalized against the plain *C* implementation, run as `--cpu-ext none`.

| CPU | SIMD | Ratio | DevSt | Speedup |
|:-|:-|-:|-:|-:|
| i7 6700k | x86 SSE4.1 | 0.590 | 0.026 | 1.695 |
| i7 6700k | x86 AVX2 | 0.302 | 0.013 | 3.315 |
| ARM Cortex-A8 | ARMv7 NEON | 0.575 | 0.035 | 1.740 |
| ARM Cortex-A76 | ARMv7 NEON | 0.374 | 0.010 | 2.671 |

![Benchmark Results](./doc/benchmarks/benchmark-results.png)
#### TDA8425

A basic *TDA8425* can be emulated with simple DSP techniques (mostly IIR filters), so the implementation can be rather straightforward.

Surprisingly, the *BBB* shows a much higher speedup compared to the other SIMD I tested.
Perhaps the plain C implementation cannot be optimized by the CPU core itself, as done with higher grade CPUs.
This somehow shows the benefits of *AYMO* for older embedded systems.

![Benchmark Results](./doc/benchmarks-tda8425.png)


#### YM7128

The *YM7128* is a simple fixed-point delay unit, with lots of parallel computations.
The results are indeed very interesting for all the SIMD architectures under test, consistently showing some nice speedup.

![Benchmark Results](./doc/benchmarks-ym7128.png)


#### YMF262

The reference *OPL3* implementation is *NukedOPL3*.

All the *OPL3* scores were played via `aymo_ymf262_play --benchmark --loops 3`, except for the *BBB* which did not loop (too slow!).

![YMF262 Benchmark Results](./doc/benchmarks-ymf262.png)


## Integration
Expand Down
Binary file added doc/benchmarks-tda8425.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/benchmarks-ym7128.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/benchmarks-ymf262.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 0 additions & 11 deletions doc/benchmarks/BBB_ARM-A8.csv

This file was deleted.

11 changes: 0 additions & 11 deletions doc/benchmarks/PC_i7-6700k.csv

This file was deleted.

11 changes: 0 additions & 11 deletions doc/benchmarks/RPi5_ARM-A76.csv

This file was deleted.

Binary file added doc/benchmarks/bbb-testlog.7z
Binary file not shown.
Binary file removed doc/benchmarks/benchmark-results.png
Binary file not shown.
Binary file modified doc/benchmarks/benchmark-results.xlsx
Binary file not shown.
Binary file added doc/benchmarks/pc-testlog.7z
Binary file not shown.
Binary file added doc/benchmarks/rpi5-testlog.7z
Binary file not shown.

0 comments on commit 12d778c

Please sign in to comment.