[benchmarks] use manifest to build compute-runtime dependencies #2621

pbalcer · 2025-01-27T14:18:18Z

Instead of hardcoding compute runtime dependencies, scripts will now fetch its manifest file to see what are the correct versions of dependencies to build them.
This patch also adds an option to build IGC from source.

github-actions · 2025-01-27T14:19:48Z

Compute Benchmarks level_zero run (with params: --build-igc):
https://github.com/oneapi-src/unified-runtime/actions/runs/12991035463

github-actions · 2025-01-27T15:22:58Z

Compute Benchmarks level_zero run (--build-igc):
https://github.com/oneapi-src/unified-runtime/actions/runs/12991035463
Job status: success. Test status: success.

Summary

Total 138 benchmarks in mean.
Geomean 100.475%.
Improved 18 Regressed 16 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 100.814%

Benchmark	This PR	baseline	Relative perf	Change	-
api_overhead_benchmark_ur SubmitKernel out of order	15.783000 μs	16.073 μs	101.84%	1.84%	.
api_overhead_benchmark_sycl SubmitKernel out of order	22.899000 μs	23.287 μs	101.69%	1.69%	.
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	2.139000 μs	2.175 μs	101.68%	1.68%	.
api_overhead_benchmark_l0 SubmitKernel out of order	11.464000 μs	11.629 μs	101.44%	1.44%	.
api_overhead_benchmark_ur SubmitKernel in order	16.490000 μs	16.703 μs	101.29%	1.29%	.
api_overhead_benchmark_l0 SubmitKernel in order	11.673000 μs	11.800 μs	101.09%	1.09%	.
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	1.698000 μs	1.706 μs	100.47%	0.47%	.
api_overhead_benchmark_ur SubmitKernel in order with measure completion	21.409000 μs	21.473 μs	100.30%	0.30%	.
api_overhead_benchmark_ur SubmitKernel out of order CPU count	105463.000000 instr	105463.000 instr	100.00%	0.00%	.
api_overhead_benchmark_ur SubmitKernel in order CPU count	110815.000000 instr	110815.000 instr	100.00%	0.00%	.
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count	123991.000000 instr	123991.000 instr	100.00%	0.00%	.
api_overhead_benchmark_sycl SubmitKernel in order	24.665 μs	24.664000 μs	100.00%	-0.00%	.

Relative perf in group memory (4): 99.971%

Benchmark	This PR	baseline	Relative perf	Change	-
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	251.988000 μs	256.472 μs	101.78%	1.78%	.
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	219.662 μs	219.201000 μs	99.79%	-0.21%	.
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	5.977 μs	5.932000 μs	99.25%	-0.75%	.
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	3.046 GB/s	3.074000 GB/s	99.09%	-0.91%	.

Relative perf in group miscellaneous (1): 100.582%

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	856.272000 bw GB/s	861.253 bw GB/s	100.58%	0.58%	.

Relative perf in group multithread (10): 100.258%

Benchmark	This PR	baseline	Relative perf	Change	-
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	112224.361000 μs	114139.645 μs	101.71%	1.71%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	8958.534000 μs	9073.725 μs	101.29%	1.29%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	42651.621000 μs	43064.999 μs	100.97%	0.97%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	46873.158000 μs	47306.654 μs	100.92%	0.92%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	6938.607000 μs	6943.025 μs	100.06%	0.06%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	1210.240000 μs	1210.999 μs	100.06%	0.06%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	2093.331 μs	2083.870000 μs	99.55%	-0.45%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	26850.745 μs	26707.698000 μs	99.47%	-0.53%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	7868.101 μs	7821.718000 μs	99.41%	-0.59%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	17373.086 μs	17230.283000 μs	99.18%	-0.82%	.

Relative perf in group graph (10): 99.728%

Benchmark	This PR	baseline	Relative perf	Change	-
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100	676.098000 μs	679.477 μs	100.50%	0.50%	.
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10	61.653000 μs	61.889 μs	100.38%	0.38%	.
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10	71760.962000 μs	71856.495 μs	100.13%	0.13%	.
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10	54.102000 μs	54.135 μs	100.06%	0.06%	.
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100	353481.427 μs	353404.211000 μs	99.98%	-0.02%	.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100	353402.926 μs	353223.514000 μs	99.95%	-0.05%	.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10	72593.537 μs	72543.241000 μs	99.93%	-0.07%	.
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100	57499.277 μs	57263.652000 μs	99.59%	-0.41%	.
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10	5703.589 μs	5615.778000 μs	98.46%	-1.54%	.
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10	5707.830 μs	5611.771000 μs	98.32%	-1.68%	.

Relative perf in group Velocity-Bench (9): 98.895%

Benchmark	This PR	baseline	Relative perf	Change	-
Velocity-Bench svm	0.138100 s	0.141 s	101.88%	1.88%	.
Velocity-Bench dl-cifar	23.625200 s	23.892 s	101.13%	1.13%	.
Velocity-Bench dl-mnist	2.400 s	2.390000 s	99.58%	-0.42%	.
Velocity-Bench CudaSift	205.640 ms	204.632000 ms	99.51%	-0.49%	.
Velocity-Bench Hashtable	351.692 M keys/sec	353.884706 M keys/sec	99.38%	-0.62%	.
Velocity-Bench Easywave	237.000 ms	235.000000 ms	99.16%	-0.84%	.
Velocity-Bench QuickSilver	116.890 MMS/CTT	118.320000 MMS/CTT	98.79%	-1.21%	.
Velocity-Bench Sobel Filter	626.625 ms	615.149000 ms	98.17%	-1.83%	.
Velocity-Bench Bitcracker	38.533 s	35.731600 s	92.73%	-7.27%	---

Relative perf in group Runtime (8): 102.173%

Benchmark	This PR	baseline	Relative perf	Change	-
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	264.998000 ms	277.037 ms	104.54%	4.54%	++
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	267.945000 ms	275.224 ms	102.72%	2.72%	+
Runtime_IndependentDAGTaskThroughput_SingleTask	258.683000 ms	265.060 ms	102.47%	2.47%	+
Runtime_DAGTaskThroughput_BasicParallelFor	1711.305000 ms	1747.525 ms	102.12%	2.12%	+
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	282.341000 ms	287.518 ms	101.83%	1.83%	.
Runtime_DAGTaskThroughput_SingleTask	1648.390000 ms	1678.531 ms	101.83%	1.83%	.
Runtime_DAGTaskThroughput_HierarchicalParallelFor	1694.925000 ms	1718.971 ms	101.42%	1.42%	.
Runtime_DAGTaskThroughput_NDRangeParallelFor	1674.339000 ms	1682.917 ms	100.51%	0.51%	.

Relative perf in group MicroBench (14): 98.816%

Benchmark	This PR	baseline	Relative perf	Change	-
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	618.128 ms	618.122000 ms	100.00%	-0.00%	.
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	618.130 ms	618.120000 ms	100.00%	-0.00%	.
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	617.531 ms	617.480000 ms	99.99%	-0.01%	.
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	617.708 ms	617.529000 ms	99.97%	-0.03%	.
MicroBench_LocalMem_int32_4096	29.896 ms	29.887000 ms	99.97%	-0.03%	.
MicroBench_LocalMem_fp32_4096	29.938 ms	29.884000 ms	99.82%	-0.18%	.
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	4.798 ms	4.764000 ms	99.29%	-0.71%	.
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	4.873 ms	4.832000 ms	99.16%	-0.84%	.
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	5.072 ms	5.024000 ms	99.05%	-0.95%	.
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	4.753 ms	4.690000 ms	98.67%	-1.33%	.
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	4.960 ms	4.854000 ms	97.86%	-2.14%	-
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	5.281 ms	5.130000 ms	97.14%	-2.86%	-
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	4.848 ms	4.700000 ms	96.95%	-3.05%	-
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	4.944 ms	4.730000 ms	95.67%	-4.33%	--

Relative perf in group Pattern (10): 99.907%

Benchmark	This PR	baseline	Relative perf	Change	-
Pattern_SegmentedReduction_NDRange_int16	2.265000 ms	2.266 ms	100.04%	0.04%	.
Pattern_SegmentedReduction_Hierarchical_int64	11.783000 ms	11.784 ms	100.01%	0.01%	.
Pattern_SegmentedReduction_Hierarchical_int32	11.592 ms	11.588000 ms	99.97%	-0.03%	.
Pattern_SegmentedReduction_Hierarchical_fp32	11.590 ms	11.585000 ms	99.96%	-0.04%	.
Pattern_SegmentedReduction_NDRange_fp32	2.166 ms	2.165000 ms	99.95%	-0.05%	.
Pattern_SegmentedReduction_Hierarchical_int16	11.808 ms	11.799000 ms	99.92%	-0.08%	.
Pattern_SegmentedReduction_NDRange_int64	2.340 ms	2.338000 ms	99.91%	-0.09%	.
Pattern_SegmentedReduction_NDRange_int32	2.166 ms	2.164000 ms	99.91%	-0.09%	.
Pattern_Reduction_Hierarchical_int32	16.735 ms	16.716000 ms	99.89%	-0.11%	.
Pattern_Reduction_NDRange_int32	16.803 ms	16.720000 ms	99.51%	-0.49%	.

Relative perf in group ScalarProduct (6): 100.001%

Benchmark	This PR	baseline	Relative perf	Change	-
ScalarProduct_NDRange_fp32	3.735000 ms	3.773 ms	101.02%	1.02%	.
ScalarProduct_Hierarchical_int64	11.481000 ms	11.502 ms	100.18%	0.18%	.
ScalarProduct_Hierarchical_fp32	10.149000 ms	10.158 ms	100.09%	0.09%	.
ScalarProduct_NDRange_int64	5.462 ms	5.461000 ms	99.98%	-0.02%	.
ScalarProduct_Hierarchical_int32	10.535 ms	10.533000 ms	99.98%	-0.02%	.
ScalarProduct_NDRange_int32	3.816 ms	3.769000 ms	98.77%	-1.23%	.

Relative perf in group USM (7): 103.950%

Benchmark	This PR	baseline	Relative perf	Change	-
USM_Allocation_latency_fp32_device	0.053000 ms	0.067 ms	126.42%	26.42%	++++++++++
USM_Allocation_latency_fp32_shared	0.055000 ms	0.057 ms	103.64%	3.64%	+
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	1.242000 ms	1.256 ms	101.13%	1.13%	.
USM_Allocation_latency_fp32_host	37.351 ms	37.342000 ms	99.98%	-0.02%	.
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	1.689 ms	1.684000 ms	99.70%	-0.30%	.
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	1.856 ms	1.850000 ms	99.68%	-0.32%	.
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	1.078 ms	1.074000 ms	99.63%	-0.37%	.

Relative perf in group VectorAddition (3): 98.001%

Benchmark	This PR	baseline	Relative perf	Change	-
VectorAddition_fp32	1.482 ms	1.468000 ms	99.06%	-0.94%	.
VectorAddition_int64	3.124 ms	3.061000 ms	97.98%	-2.02%	-
VectorAddition_int32	1.521 ms	1.475000 ms	96.98%	-3.02%	-

Relative perf in group Polybench (3): 114.720%

Benchmark	This PR	baseline	Relative perf	Change	-
Polybench_2mm	1.039000 ms	1.227 ms	118.09%	18.09%	+++++++
Polybench_3mm	1.482000 ms	1.729 ms	116.67%	16.67%	++++++
Polybench_Atax	6.283000 ms	6.885 ms	109.58%	9.58%	++++

Relative perf in group Kmeans (1): 113.656%

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	14.148000 ms	16.080 ms	113.66%	13.66%	+++++

Relative perf in group LinearRegressionCoeff (1): 90.624%

Benchmark	This PR	baseline	Relative perf	Change	-
LinearRegressionCoeff_fp32	1032.596 ms	935.779000 ms	90.62%	-9.38%	----

Relative perf in group MolecularDynamics (1): 100.000%

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	0.029000 ms	0.029 ms	100.00%	0.00%	.

Relative perf in group llama.cpp (6): 99.852%

Benchmark	This PR	baseline	Relative perf	Change	-
llama.cpp Prompt Processing Batched 128	836.032466 token/s	829.273 token/s	100.82%	0.82%	.
llama.cpp Prompt Processing Batched 256	868.211240 token/s	867.896 token/s	100.04%	0.04%	.
llama.cpp Text Generation Batched 256	62.237 token/s	62.451865 token/s	99.66%	-0.34%	.
llama.cpp Text Generation Batched 512	62.241 token/s	62.506870 token/s	99.57%	-0.43%	.
llama.cpp Text Generation Batched 128	62.201 token/s	62.469368 token/s	99.57%	-0.43%	.
llama.cpp Prompt Processing Batched 512	426.311 token/s	428.586901 token/s	99.47%	-0.53%	.

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): 101.561%

Benchmark	This PR	baseline	Relative perf	Change	-
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc	2630.560000 ns	2723.560 ns	103.54%	3.54%	+
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider	2071.280000 ns	2119.200 ns	102.31%	2.31%	+
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider>	289.569000 ns	294.824 ns	101.81%	1.81%	.
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider>	3167.420 ns	3124.490000 ns	98.64%	-1.36%	.

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): 99.254%

Benchmark	This PR	baseline	Relative perf	Change	-
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider	192.351000 ns	195.800 ns	101.79%	1.79%	.
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider>	212.179000 ns	213.357 ns	100.56%	0.56%	.
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc	709.420 ns	699.961000 ns	98.67%	-1.33%	.
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider>	280.799 ns	269.830000 ns	96.09%	-3.91%	-

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): 99.774%

Benchmark	This PR	baseline	Relative perf	Change	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider	1751.980000 ns	1896.370 ns	108.24%	8.24%	+++
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider>	261.517 ns	260.987000 ns	99.80%	-0.20%	.
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc	1444.560 ns	1399.010000 ns	96.85%	-3.15%	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider>	3360.370 ns	3183.170000 ns	94.73%	-5.27%	--

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): 98.885%

Benchmark	This PR	baseline	Relative perf	Change	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider	186.423000 ns	192.753 ns	103.40%	3.40%	+
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc	750.539 ns	737.865000 ns	98.31%	-1.69%	.
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider>	319.202 ns	310.425000 ns	97.25%	-2.75%	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider>	211.336 ns	204.412000 ns	96.72%	-3.28%	-

Relative perf in group alloc/min (4): 98.696%

Benchmark	This PR	baseline	Relative perf	Change	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider>	1049.430000 ns	1083.760 ns	103.27%	3.27%	+
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider>	974.306 ns	960.784000 ns	98.61%	-1.39%	.
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc	179.353 ns	174.373000 ns	97.22%	-2.78%	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc	836.629 ns	801.763000 ns	95.83%	-4.17%	--

Relative perf in group multiple (12): 100.742%

Benchmark	This PR	baseline	Relative perf	Change	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc	31894.300000 ns	34482.100 ns	108.11%	8.11%	+++
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc	30336.600000 ns	31243.600 ns	102.99%	2.99%	+
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider>	73983.000000 ns	75587.500 ns	102.17%	2.17%	+
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider>	42807.500000 ns	43475.800 ns	101.56%	1.56%	.
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider	1189430.000000 ns	1201570.000 ns	101.02%	1.02%	.
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider>	27318.200000 ns	27465.300 ns	100.54%	0.54%	.
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc	140767.000000 ns	141214.000 ns	100.32%	0.32%	.
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider>	15142.900 ns	15099.900000 ns	99.72%	-0.28%	.
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider	148534.000 ns	147271.000000 ns	99.15%	-0.85%	.
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider>	1206450.000 ns	1185020.000000 ns	98.22%	-1.78%	.
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc	4283.900 ns	4207.320000 ns	98.21%	-1.79%	.
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider>	164679.000 ns	160292.000000 ns	97.34%	-2.66%	-

Details

Benchmark details - environment, command...

api_overhead_benchmark_l0 SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=1

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=100

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=100

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Velocity-Bench dl-cifar

Environment Variables:

Command:

/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

VectorAddition_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

VectorAddition_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

VectorAddition_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

Polybench_2mm

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/2mm.csv --size=512

Polybench_3mm

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/3mm.csv --size=512

Polybench_Atax

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Atax.csv --size=8192

Kmeans_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Kmeans.csv --size=700000000

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

MolecularDynamics

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/MolecularDynamics.csv --size=8196

llama.cpp Prompt Processing Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Prompt Processing Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Prompt Processing Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

alloc/size:10000/0/4096/iterations:200000/threads:4 glibc