-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove virtual methods from ur_mem_handle_t_ #2620
base: main
Are you sure you want to change the base?
Conversation
Compute Benchmarks level_zero run (with params: ): |
893743b
to
aea4883
Compare
Compute Benchmarks level_zero run (): SummaryTotal 138 benchmarks in mean. (result is better) Performance change in benchmark groupsRelative perf in group api (12): 100.144%
Relative perf in group memory (4): 100.303%
Relative perf in group miscellaneous (1): 106.845%
Relative perf in group multithread (10): 100.053%
Relative perf in group graph (10): 98.886%
Relative perf in group Velocity-Bench (9): 99.011%
Relative perf in group Runtime (8): 98.330%
Relative perf in group MicroBench (14): 99.797%
Relative perf in group Pattern (10): 99.964%
Relative perf in group ScalarProduct (6): 100.096%
Relative perf in group USM (7): 101.680%
Relative perf in group VectorAddition (3): 100.503%
Relative perf in group Polybench (3): 100.403%
Relative perf in group Kmeans (1): 100.044%
Relative perf in group LinearRegressionCoeff (1): 100.098%
Relative perf in group MolecularDynamics (1): 100.000%
Relative perf in group llama.cpp (6): 100.040%
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): 106.148%
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): 100.357%
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): 96.625%
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): 89.777%
Relative perf in group alloc/min (4): 100.146%
Relative perf in group multiple (12): 102.635%
DetailsBenchmark details - environment, command...api_overhead_benchmark_l0 SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_l0 SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_sycl SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1 api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 miscellaneous_benchmark_sycl VectorSumEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without eventsEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without eventsEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0 graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1 graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0 graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=1 graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=0 --numKernels=10 graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=10 graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=100 graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=0 --numKernels=10 graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=10 graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=100 api_overhead_benchmark_ur SubmitKernel out of order CPU countEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel out of orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order CPU countEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in orderEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU countEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 api_overhead_benchmark_ur SubmitKernel in order with measure completionEnvironment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 Velocity-Bench HashtableEnvironment Variables:Command:/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify Velocity-Bench BitcrackerEnvironment Variables:Command:/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000 Velocity-Bench CudaSiftEnvironment Variables:Command:/home/pmdk/bench_workdir/cudaSift/cudaSift Velocity-Bench EasywaveEnvironment Variables:Command:/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120 Velocity-Bench QuickSilverEnvironment Variables:QS_DEVICE=GPU Command:/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp Velocity-Bench Sobel FilterEnvironment Variables:OPENCV_IO_MAX_IMAGE_PIXELS=1677721600 Command:/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5 Velocity-Bench dl-cifarEnvironment Variables:Command:/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl Velocity-Bench dl-mnistEnvironment Variables:NEOReadDebugKeys=1 Command:/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO Velocity-Bench svmEnvironment Variables:Command:/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m Runtime_IndependentDAGTaskThroughput_SingleTaskEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_BasicParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_HierarchicalParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_IndependentDAGTaskThroughput_NDRangeParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768 Runtime_DAGTaskThroughput_SingleTaskEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_BasicParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_HierarchicalParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 Runtime_DAGTaskThroughput_NDRangeParallelForEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680 MicroBench_HostDeviceBandwidth_1D_H2D_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_H2D_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_H2D_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_D2H_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_D2H_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_D2H_ContiguousEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_H2D_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_H2D_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_H2D_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_1D_D2H_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_2D_D2H_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_HostDeviceBandwidth_3D_D2H_StridedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512 MicroBench_LocalMem_int32_4096Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000 MicroBench_LocalMem_fp32_4096Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000 Pattern_Reduction_NDRange_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000 Pattern_Reduction_Hierarchical_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000 ScalarProduct_NDRange_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_NDRange_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_NDRange_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 ScalarProduct_Hierarchical_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int16Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_NDRange_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int16Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 Pattern_SegmentedReduction_Hierarchical_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000 USM_Allocation_latency_fp32_deviceEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Allocation_latency_fp32_hostEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Allocation_latency_fp32_sharedEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000 USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetchEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192 VectorAddition_int32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000 VectorAddition_int64Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000 VectorAddition_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000 Polybench_2mmEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/2mm.csv --size=512 Polybench_3mmEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/3mm.csv --size=512 Polybench_AtaxEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Atax.csv --size=8192 Kmeans_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Kmeans.csv --size=700000000 LinearRegressionCoeff_fp32Environment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/LinearRegressionCoeff.csv --size=1638400000 MolecularDynamicsEnvironment Variables:Command:/home/pmdk/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/MolecularDynamics.csv --size=8196 llama.cpp Prompt Processing Batched 128Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 128Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Prompt Processing Batched 256Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 256Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Prompt Processing Batched 512Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf llama.cpp Text Generation Batched 512Environment Variables:Command:/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf alloc/size:10000/0/4096/iterations:200000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:4 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv |
aea4883
to
1f999aa
Compare
We want to transition to handle pointers containing the ddi table as the first element. For this to work, handle object must not have a vtable. Since ur_mem_handle_t_ is relatively simple, it's easy enough to roll out our own version of dynamic dispatch.
1f999aa
to
843c049
Compare
We want to transition to handle pointers containing the ddi table as the
first element. For this to work, handle object must not have a vtable.
Since ur_mem_handle_t_ is relatively simple, it's easy enough to roll
out our own version of dynamic dispatch.