-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace loader handles with field at start of handle data #2622
base: main
Are you sure you want to change the base?
Conversation
9646680
to
1a50495
Compare
1a50495
to
974b9eb
Compare
Compute Benchmarks level_zero run (with params: ): |
Compute Benchmarks level_zero run (): SummaryTotal 38 benchmarks in mean. (result is better) Performance change in benchmark groupsRelative perf in group memory (4): 100.808%
Relative perf in group api (12): 101.685%
Relative perf in group Velocity-Bench (9): 99.170%
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): 98.331%
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): 99.292%
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): 102.184%
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): 98.976%
Relative perf in group alloc/min (4): 100.564%
Relative perf in group multiple (12): 100.475%
Relative perf in group miscellaneous (1): cannot calculate
Relative perf in group multithread (10): cannot calculate
Relative perf in group graph (10): cannot calculate
Relative perf in group Runtime (8): cannot calculate
Relative perf in group MicroBench (14): cannot calculate
Relative perf in group Pattern (10): cannot calculate
Relative perf in group ScalarProduct (6): cannot calculate
Relative perf in group USM (7): cannot calculate
Relative perf in group VectorAddition (3): cannot calculate
Relative perf in group Polybench (3): cannot calculate
Relative perf in group Kmeans (1): cannot calculate
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Relative perf in group MolecularDynamics (1): cannot calculate
Relative perf in group llama.cpp (6): cannot calculate
DetailsBenchmark details - environment, command...memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100 memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024 api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024Environment Variables:Command:/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024 Velocity-Bench dl-mnistEnvironment Variables:NEOReadDebugKeys=1 Command:/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO alloc/size:10000/0/4096/iterations:200000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:4 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibcEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_providerEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_poolEnvironment Variables:Command:/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv |
a5e38c1
to
3d54672
Compare
42ac088
to
953f359
Compare
We want to transition to handle pointers containing the ddi table as the first element. For this to work, handle object must not have a vtable. Since ur_mem_handle_t_ is relatively simple, it's easy enough to roll out our own version of dynamic dispatch.
953f359
to
3bfdda6
Compare
This replaces the handle logic in the loader from wrapped pointers to a ddi table at the start of the handle struct itself. Just testing something...
3bfdda6
to
3c26247
Compare
Currently only works for L0 (v1) and Hip.