-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-place versions of mttkrp
and khatrirao
#43
Conversation
Without buffers for temp arrays.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #43 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 8 8
Lines 220 240 +20
=========================================
+ Hits 220 240 +20 ☔ View full report in Codecov by Sentry. |
Using GCPDecompositions.jl/src/kernels.jl Lines 54 to 56 in 2bbf6f1
Benchmark Report for
|
ID | time ratio | memory ratio |
---|---|---|
["mttkrp", "size=(100, 100, 100), rank=10, mode=2"] |
1.00 (5%) | 1.02 (1%) ❌ |
["mttkrp", "size=(100, 100, 100), rank=100, mode=2"] |
1.00 (5%) | 1.02 (1%) ❌ |
["mttkrp", "size=(100, 100, 100), rank=150, mode=1"] |
0.95 (5%) ✅ | 1.00 (1%) |
["mttkrp", "size=(100, 100, 100), rank=150, mode=2"] |
0.92 (5%) ✅ | 1.02 (1%) ❌ |
["mttkrp", "size=(100, 100, 100), rank=200, mode=2"] |
0.99 (5%) | 1.02 (1%) ❌ |
["mttkrp", "size=(100, 100, 100), rank=250, mode=1"] |
1.06 (5%) ❌ | 1.00 (1%) |
["mttkrp", "size=(100, 100, 100), rank=250, mode=2"] |
0.96 (5%) | 1.02 (1%) ❌ |
["mttkrp", "size=(100, 100, 100), rank=300, mode=2"] |
0.96 (5%) | 1.02 (1%) ❌ |
["mttkrp", "size=(100, 100, 100), rank=50, mode=2"] |
0.98 (5%) | 1.02 (1%) ❌ |
["mttkrp", "size=(1000, 100, 30), rank=10, mode=2"] |
1.08 (5%) ❌ | 1.33 (1%) ❌ |
["mttkrp", "size=(1000, 100, 30), rank=100, mode=2"] |
1.02 (5%) | 1.33 (1%) ❌ |
["mttkrp", "size=(1000, 100, 30), rank=200, mode=2"] |
1.01 (5%) | 1.33 (1%) ❌ |
["mttkrp", "size=(1000, 100, 30), rank=300, mode=2"] |
1.02 (5%) | 1.33 (1%) ❌ |
["mttkrp", "size=(150, 150, 150), rank=10, mode=2"] |
1.00 (5%) | 1.01 (1%) ❌ |
["mttkrp", "size=(150, 150, 150), rank=100, mode=2"] |
0.99 (5%) | 1.01 (1%) ❌ |
["mttkrp", "size=(150, 150, 150), rank=150, mode=2"] |
1.00 (5%) | 1.01 (1%) ❌ |
["mttkrp", "size=(150, 150, 150), rank=200, mode=2"] |
1.00 (5%) | 1.01 (1%) ❌ |
["mttkrp", "size=(150, 150, 150), rank=250, mode=2"] |
0.99 (5%) | 1.01 (1%) ❌ |
["mttkrp", "size=(150, 150, 150), rank=300, mode=2"] |
1.02 (5%) | 1.01 (1%) ❌ |
["mttkrp", "size=(150, 150, 150), rank=50, mode=2"] |
1.06 (5%) ❌ | 1.01 (1%) ❌ |
["mttkrp", "size=(200, 200, 200), rank=10, mode=1"] |
0.93 (5%) ✅ | 1.00 (1%) |
["mttkrp", "size=(200, 200, 200), rank=10, mode=2"] |
0.98 (5%) | 1.01 (1%) ❌ |
["mttkrp", "size=(30, 100, 1000), rank=10, mode=2"] |
1.03 (5%) | 1.33 (1%) ❌ |
["mttkrp", "size=(30, 100, 1000), rank=100, mode=1"] |
1.09 (5%) ❌ | 1.00 (1%) |
["mttkrp", "size=(30, 100, 1000), rank=100, mode=2"] |
1.00 (5%) | 1.33 (1%) ❌ |
["mttkrp", "size=(30, 100, 1000), rank=200, mode=1"] |
1.11 (5%) ❌ | 1.00 (1%) |
["mttkrp", "size=(30, 100, 1000), rank=200, mode=2"] |
1.01 (5%) | 1.33 (1%) ❌ |
["mttkrp", "size=(30, 100, 1000), rank=300, mode=2"] |
1.02 (5%) | 1.32 (1%) ❌ |
["mttkrp", "size=(50, 50, 50), rank=10, mode=2"] |
1.06 (5%) ❌ | 1.04 (1%) ❌ |
["mttkrp", "size=(50, 50, 50), rank=100, mode=2"] |
0.93 (5%) ✅ | 1.04 (1%) ❌ |
["mttkrp", "size=(50, 50, 50), rank=150, mode=2"] |
0.93 (5%) ✅ | 1.04 (1%) ❌ |
["mttkrp", "size=(50, 50, 50), rank=200, mode=2"] |
0.93 (5%) ✅ | 1.04 (1%) ❌ |
["mttkrp", "size=(50, 50, 50), rank=250, mode=2"] |
0.91 (5%) ✅ | 1.04 (1%) ❌ |
["mttkrp", "size=(50, 50, 50), rank=300, mode=2"] |
0.91 (5%) ✅ | 1.04 (1%) ❌ |
["mttkrp", "size=(50, 50, 50), rank=50, mode=2"] |
0.94 (5%) ✅ | 1.04 (1%) ❌ |
Benchmark Group List
Here's a list of all the benchmark groups executed by this job:
["mttkrp"]
Julia versioninfo
Target
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000 arm64 arm
CPU: Apple M1 Max:
speed user nice sys idle irq
#1-10 2400 MHz 1482743 s 0 s 567725 s 20188677 s 0 s
Memory: 64.0 GB (36831.0625 MB free)
Uptime: 488287.0 sec
Load Avg: 5.26708984375 2.78662109375 2.3720703125
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 1 on 8 virtual cores
Baseline
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000 arm64 arm
CPU: Apple M1 Max:
speed user nice sys idle irq
#1-10 2400 MHz 1484453 s 0 s 568304 s 20189652 s 0 s
Memory: 64.0 GB (36949.71875 MB free)
Uptime: 488320.0 sec
Load Avg: 5.80224609375 3.189453125 2.5341796875
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 1 on 8 virtual cores
MTTKRP benchmark plots
Runtime vs. size (for square tensors)
Below are plots showing the runtime in miliseconds of MTTKRP as a function of the size of the square tensor, for varying ranks and modes:
ndims = 3, rank = 10, mode = 1 | ndims = 3, rank = 10, mode = 2 | ndims = 3, rank = 10, mode = 3 | ndims = 3, rank = 50, mode = 1 | ndims = 3, rank = 50, mode = 2 | ndims = 3, rank = 50, mode = 3 | ndims = 3, rank = 100, mode = 1 | ndims = 3, rank = 100, mode = 2 | ndims = 3, rank = 100, mode = 3 | ndims = 3, rank = 150, mode = 1 | ndims = 3, rank = 150, mode = 2 | ndims = 3, rank = 150, mode = 3 | ndims = 3, rank = 200, mode = 1 | ndims = 3, rank = 200, mode = 2 | ndims = 3, rank = 200, mode = 3 | ndims = 3, rank = 250, mode = 1 | ndims = 3, rank = 250, mode = 2 | ndims = 3, rank = 250, mode = 3 | ndims = 3, rank = 300, mode = 1 | ndims = 3, rank = 300, mode = 2 | ndims = 3, rank = 300, mode = 3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Target |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baseline |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Runtime vs. rank
Below are plots showing the runtime in miliseconds of MTTKRP as a function of the size of the rank, for varying sizes and modes:
size = (30, 100, 1000), mode = 1 | size = (30, 100, 1000), mode = 2 | size = (30, 100, 1000), mode = 3 | size = (50, 50, 50), mode = 1 | size = (50, 50, 50), mode = 2 | size = (50, 50, 50), mode = 3 | size = (100, 100, 100), mode = 1 | size = (100, 100, 100), mode = 2 | size = (100, 100, 100), mode = 3 | size = (150, 150, 150), mode = 1 | size = (150, 150, 150), mode = 2 | size = (150, 150, 150), mode = 3 | size = (200, 200, 200), mode = 1 | size = (200, 200, 200), mode = 2 | size = (200, 200, 200), mode = 3 | size = (1000, 100, 30), mode = 1 | size = (1000, 100, 30), mode = 2 | size = (1000, 100, 30), mode = 3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Target |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baseline |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Runtime vs. mode
Below are plots showing the runtime in miliseconds of MTTKRP as a function of the mode, for varying sizes and ranks:
size = (30, 100, 1000), rank = 10 | size = (30, 100, 1000), rank = 100 | size = (30, 100, 1000), rank = 200 | size = (30, 100, 1000), rank = 300 | size = (50, 50, 50), rank = 10 | size = (50, 50, 50), rank = 50 | size = (50, 50, 50), rank = 100 | size = (50, 50, 50), rank = 150 | size = (50, 50, 50), rank = 200 | size = (50, 50, 50), rank = 250 | size = (50, 50, 50), rank = 300 | size = (100, 100, 100), rank = 10 | size = (100, 100, 100), rank = 50 | size = (100, 100, 100), rank = 100 | size = (100, 100, 100), rank = 150 | size = (100, 100, 100), rank = 200 | size = (100, 100, 100), rank = 250 | size = (100, 100, 100), rank = 300 | size = (150, 150, 150), rank = 10 | size = (150, 150, 150), rank = 50 | size = (150, 150, 150), rank = 100 | size = (150, 150, 150), rank = 150 | size = (150, 150, 150), rank = 200 | size = (150, 150, 150), rank = 250 | size = (150, 150, 150), rank = 300 | size = (200, 200, 200), rank = 10 | size = (200, 200, 200), rank = 50 | size = (200, 200, 200), rank = 100 | size = (200, 200, 200), rank = 150 | size = (200, 200, 200), rank = 200 | size = (200, 200, 200), rank = 250 | size = (200, 200, 200), rank = 300 | size = (1000, 100, 30), rank = 10 | size = (1000, 100, 30), rank = 100 | size = (1000, 100, 30), rank = 200 | size = (1000, 100, 30), rank = 300 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Target |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baseline |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
After handling the special case of a Khatri-Rao product with only one matrix, we recover the earlier performance. Benchmark Report for
|
ID | time ratio | memory ratio |
---|---|---|
["mttkrp", "size=(100, 100, 100), rank=150, mode=2"] |
1.08 (5%) ❌ | 1.00 (1%) |
["mttkrp", "size=(100, 100, 100), rank=250, mode=3"] |
1.07 (5%) ❌ | 1.00 (1%) |
["mttkrp", "size=(1000, 100, 30), rank=10, mode=3"] |
0.93 (5%) ✅ | 1.00 (1%) |
["mttkrp", "size=(1000, 100, 30), rank=100, mode=3"] |
0.93 (5%) ✅ | 1.00 (1%) |
["mttkrp", "size=(150, 150, 150), rank=10, mode=1"] |
0.94 (5%) ✅ | 1.00 (1%) |
["mttkrp", "size=(200, 200, 200), rank=50, mode=1"] |
0.95 (5%) ✅ | 1.00 (1%) |
["mttkrp", "size=(200, 200, 200), rank=50, mode=2"] |
0.43 (5%) ✅ | 1.00 (1%) |
["mttkrp", "size=(30, 100, 1000), rank=100, mode=1"] |
0.94 (5%) ✅ | 1.00 (1%) |
["mttkrp", "size=(50, 50, 50), rank=10, mode=2"] |
1.13 (5%) ❌ | 1.00 (1%) |
["mttkrp", "size=(50, 50, 50), rank=100, mode=3"] |
0.95 (5%) ✅ | 1.00 (1%) |
["mttkrp", "size=(50, 50, 50), rank=50, mode=2"] |
1.06 (5%) ❌ | 1.00 (1%) |
Benchmark Group List
Here's a list of all the benchmark groups executed by this job:
["mttkrp"]
Julia versioninfo
Target
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000 arm64 arm
CPU: Apple M1 Max:
speed user nice sys idle irq
#1-10 2400 MHz 1499358 s 0 s 574884 s 20313069 s 0 s
Memory: 64.0 GB (35929.046875 MB free)
Uptime: 489776.0 sec
Load Avg: 6.3447265625 4.35791015625 3.13671875
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 1 on 8 virtual cores
Baseline
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000 arm64 arm
CPU: Apple M1 Max:
speed user nice sys idle irq
#1-10 2400 MHz 1500949 s 0 s 575574 s 20314036 s 0 s
Memory: 64.0 GB (36006.421875 MB free)
Uptime: 489809.0 sec
Load Avg: 6.70654296875 4.61376953125 3.27001953125
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 1 on 8 virtual cores
MTTKRP benchmark plots
Runtime vs. size (for square tensors)
Below are plots showing the runtime in miliseconds of MTTKRP as a function of the size of the square tensor, for varying ranks and modes:
ndims = 3, rank = 10, mode = 1 | ndims = 3, rank = 10, mode = 2 | ndims = 3, rank = 10, mode = 3 | ndims = 3, rank = 50, mode = 1 | ndims = 3, rank = 50, mode = 2 | ndims = 3, rank = 50, mode = 3 | ndims = 3, rank = 100, mode = 1 | ndims = 3, rank = 100, mode = 2 | ndims = 3, rank = 100, mode = 3 | ndims = 3, rank = 150, mode = 1 | ndims = 3, rank = 150, mode = 2 | ndims = 3, rank = 150, mode = 3 | ndims = 3, rank = 200, mode = 1 | ndims = 3, rank = 200, mode = 2 | ndims = 3, rank = 200, mode = 3 | ndims = 3, rank = 250, mode = 1 | ndims = 3, rank = 250, mode = 2 | ndims = 3, rank = 250, mode = 3 | ndims = 3, rank = 300, mode = 1 | ndims = 3, rank = 300, mode = 2 | ndims = 3, rank = 300, mode = 3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Target |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baseline |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Runtime vs. rank
Below are plots showing the runtime in miliseconds of MTTKRP as a function of the size of the rank, for varying sizes and modes:
size = (30, 100, 1000), mode = 1 | size = (30, 100, 1000), mode = 2 | size = (30, 100, 1000), mode = 3 | size = (50, 50, 50), mode = 1 | size = (50, 50, 50), mode = 2 | size = (50, 50, 50), mode = 3 | size = (100, 100, 100), mode = 1 | size = (100, 100, 100), mode = 2 | size = (100, 100, 100), mode = 3 | size = (150, 150, 150), mode = 1 | size = (150, 150, 150), mode = 2 | size = (150, 150, 150), mode = 3 | size = (200, 200, 200), mode = 1 | size = (200, 200, 200), mode = 2 | size = (200, 200, 200), mode = 3 | size = (1000, 100, 30), mode = 1 | size = (1000, 100, 30), mode = 2 | size = (1000, 100, 30), mode = 3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Target |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baseline |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Runtime vs. mode
Below are plots showing the runtime in miliseconds of MTTKRP as a function of the mode, for varying sizes and ranks:
size = (30, 100, 1000), rank = 10 | size = (30, 100, 1000), rank = 100 | size = (30, 100, 1000), rank = 200 | size = (30, 100, 1000), rank = 300 | size = (50, 50, 50), rank = 10 | size = (50, 50, 50), rank = 50 | size = (50, 50, 50), rank = 100 | size = (50, 50, 50), rank = 150 | size = (50, 50, 50), rank = 200 | size = (50, 50, 50), rank = 250 | size = (50, 50, 50), rank = 300 | size = (100, 100, 100), rank = 10 | size = (100, 100, 100), rank = 50 | size = (100, 100, 100), rank = 100 | size = (100, 100, 100), rank = 150 | size = (100, 100, 100), rank = 200 | size = (100, 100, 100), rank = 250 | size = (100, 100, 100), rank = 300 | size = (150, 150, 150), rank = 10 | size = (150, 150, 150), rank = 50 | size = (150, 150, 150), rank = 100 | size = (150, 150, 150), rank = 150 | size = (150, 150, 150), rank = 200 | size = (150, 150, 150), rank = 250 | size = (150, 150, 150), rank = 300 | size = (200, 200, 200), rank = 10 | size = (200, 200, 200), rank = 50 | size = (200, 200, 200), rank = 100 | size = (200, 200, 200), rank = 150 | size = (200, 200, 200), rank = 200 | size = (200, 200, 200), rank = 250 | size = (200, 200, 200), rank = 300 | size = (1000, 100, 30), rank = 10 | size = (1000, 100, 30), rank = 100 | size = (1000, 100, 30), rank = 200 | size = (1000, 100, 30), rank = 300 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Target |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baseline |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Considered splitting cases into different methods, but think it's probably actually simpler to leave together.
As desired, with new in-place methods, Benchmark Report for
|
ID | time ratio | memory ratio |
---|---|---|
["mttkrp", "size=(100, 100, 100), rank=200, mode=2"] |
0.92 (5%) ✅ | 0.99 (1%) |
["mttkrp", "size=(100, 100, 100), rank=200, mode=3"] |
1.07 (5%) ❌ | 1.00 (1%) |
["mttkrp", "size=(1000, 100, 30), rank=10, mode=1"] |
1.17 (5%) ❌ | 0.99 (1%) |
["mttkrp", "size=(1000, 100, 30), rank=10, mode=2"] |
0.94 (5%) ✅ | 0.98 (1%) ✅ |
["mttkrp", "size=(1000, 100, 30), rank=100, mode=2"] |
0.99 (5%) | 0.98 (1%) ✅ |
["mttkrp", "size=(1000, 100, 30), rank=200, mode=2"] |
0.97 (5%) | 0.98 (1%) ✅ |
["mttkrp", "size=(1000, 100, 30), rank=300, mode=1"] |
1.06 (5%) ❌ | 1.00 (1%) |
["mttkrp", "size=(1000, 100, 30), rank=300, mode=2"] |
0.98 (5%) | 0.98 (1%) ✅ |
["mttkrp", "size=(200, 200, 200), rank=10, mode=1"] |
1.12 (5%) ❌ | 1.00 (1%) |
["mttkrp", "size=(200, 200, 200), rank=10, mode=3"] |
0.95 (5%) ✅ | 1.00 (1%) |
["mttkrp", "size=(30, 100, 1000), rank=10, mode=2"] |
1.02 (5%) | 0.98 (1%) ✅ |
["mttkrp", "size=(30, 100, 1000), rank=100, mode=1"] |
1.07 (5%) ❌ | 1.00 (1%) |
["mttkrp", "size=(30, 100, 1000), rank=100, mode=2"] |
0.97 (5%) | 0.98 (1%) ✅ |
["mttkrp", "size=(30, 100, 1000), rank=200, mode=2"] |
0.97 (5%) | 0.98 (1%) ✅ |
["mttkrp", "size=(30, 100, 1000), rank=300, mode=1"] |
0.94 (5%) ✅ | 1.00 (1%) |
["mttkrp", "size=(30, 100, 1000), rank=300, mode=2"] |
0.97 (5%) | 0.98 (1%) ✅ |
["mttkrp", "size=(50, 50, 50), rank=10, mode=2"] |
0.97 (5%) | 0.97 (1%) ✅ |
["mttkrp", "size=(50, 50, 50), rank=100, mode=2"] |
0.90 (5%) ✅ | 0.98 (1%) ✅ |
["mttkrp", "size=(50, 50, 50), rank=150, mode=2"] |
0.91 (5%) ✅ | 0.98 (1%) ✅ |
["mttkrp", "size=(50, 50, 50), rank=200, mode=2"] |
0.89 (5%) ✅ | 0.98 (1%) ✅ |
["mttkrp", "size=(50, 50, 50), rank=250, mode=2"] |
0.86 (5%) ✅ | 0.98 (1%) ✅ |
["mttkrp", "size=(50, 50, 50), rank=300, mode=2"] |
0.87 (5%) ✅ | 0.98 (1%) ✅ |
["mttkrp", "size=(50, 50, 50), rank=50, mode=2"] |
0.86 (5%) ✅ | 0.98 (1%) ✅ |
Benchmark Group List
Here's a list of all the benchmark groups executed by this job:
["mttkrp"]
Julia versioninfo
Target
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000 arm64 arm
CPU: Apple M1 Max:
speed user nice sys idle irq
#1-10 2400 MHz 1588339 s 0 s 606505 s 21093654 s 0 s
Memory: 64.0 GB (33643.921875 MB free)
Uptime: 498837.0 sec
Load Avg: 5.25634765625 3.19482421875 3.0380859375
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 1 on 8 virtual cores
Baseline
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000 arm64 arm
CPU: Apple M1 Max:
speed user nice sys idle irq
#1-10 2400 MHz 1589686 s 0 s 607606 s 21094507 s 0 s
Memory: 64.0 GB (33974.0625 MB free)
Uptime: 498871.0 sec
Load Avg: 6.8251953125 3.79345703125 3.26025390625
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 1 on 8 virtual cores
MTTKRP benchmark plots
Runtime vs. size (for square tensors)
Below are plots showing the runtime in miliseconds of MTTKRP as a function of the size of the square tensor, for varying ranks and modes:
ndims = 3, rank = 10, mode = 1 | ndims = 3, rank = 10, mode = 2 | ndims = 3, rank = 10, mode = 3 | ndims = 3, rank = 50, mode = 1 | ndims = 3, rank = 50, mode = 2 | ndims = 3, rank = 50, mode = 3 | ndims = 3, rank = 100, mode = 1 | ndims = 3, rank = 100, mode = 2 | ndims = 3, rank = 100, mode = 3 | ndims = 3, rank = 150, mode = 1 | ndims = 3, rank = 150, mode = 2 | ndims = 3, rank = 150, mode = 3 | ndims = 3, rank = 200, mode = 1 | ndims = 3, rank = 200, mode = 2 | ndims = 3, rank = 200, mode = 3 | ndims = 3, rank = 250, mode = 1 | ndims = 3, rank = 250, mode = 2 | ndims = 3, rank = 250, mode = 3 | ndims = 3, rank = 300, mode = 1 | ndims = 3, rank = 300, mode = 2 | ndims = 3, rank = 300, mode = 3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Target |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baseline |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Runtime vs. rank
Below are plots showing the runtime in miliseconds of MTTKRP as a function of the size of the rank, for varying sizes and modes:
size = (30, 100, 1000), mode = 1 | size = (30, 100, 1000), mode = 2 | size = (30, 100, 1000), mode = 3 | size = (50, 50, 50), mode = 1 | size = (50, 50, 50), mode = 2 | size = (50, 50, 50), mode = 3 | size = (100, 100, 100), mode = 1 | size = (100, 100, 100), mode = 2 | size = (100, 100, 100), mode = 3 | size = (150, 150, 150), mode = 1 | size = (150, 150, 150), mode = 2 | size = (150, 150, 150), mode = 3 | size = (200, 200, 200), mode = 1 | size = (200, 200, 200), mode = 2 | size = (200, 200, 200), mode = 3 | size = (1000, 100, 30), mode = 1 | size = (1000, 100, 30), mode = 2 | size = (1000, 100, 30), mode = 3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Target |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baseline |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Runtime vs. mode
Below are plots showing the runtime in miliseconds of MTTKRP as a function of the mode, for varying sizes and ranks:
size = (30, 100, 1000), rank = 10 | size = (30, 100, 1000), rank = 100 | size = (30, 100, 1000), rank = 200 | size = (30, 100, 1000), rank = 300 | size = (50, 50, 50), rank = 10 | size = (50, 50, 50), rank = 50 | size = (50, 50, 50), rank = 100 | size = (50, 50, 50), rank = 150 | size = (50, 50, 50), rank = 200 | size = (50, 50, 50), rank = 250 | size = (50, 50, 50), rank = 300 | size = (100, 100, 100), rank = 10 | size = (100, 100, 100), rank = 50 | size = (100, 100, 100), rank = 100 | size = (100, 100, 100), rank = 150 | size = (100, 100, 100), rank = 200 | size = (100, 100, 100), rank = 250 | size = (100, 100, 100), rank = 300 | size = (150, 150, 150), rank = 10 | size = (150, 150, 150), rank = 50 | size = (150, 150, 150), rank = 100 | size = (150, 150, 150), rank = 150 | size = (150, 150, 150), rank = 200 | size = (150, 150, 150), rank = 250 | size = (150, 150, 150), rank = 300 | size = (200, 200, 200), rank = 10 | size = (200, 200, 200), rank = 50 | size = (200, 200, 200), rank = 100 | size = (200, 200, 200), rank = 150 | size = (200, 200, 200), rank = 200 | size = (200, 200, 200), rank = 250 | size = (200, 200, 200), rank = 300 | size = (1000, 100, 30), rank = 10 | size = (1000, 100, 30), rank = 100 | size = (1000, 100, 30), rank = 200 | size = (1000, 100, 30), rank = 300 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Target |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baseline |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
And yields benefits for Benchmark Report for
|
ID | time ratio | memory ratio |
---|---|---|
["gcp", "bernoulliOdds-size(X)=(15, 20, 25), rank(X)=1"] |
0.95 (5%) | 0.96 (1%) ✅ |
["gcp", "bernoulliOdds-size(X)=(15, 20, 25), rank(X)=2"] |
1.14 (5%) ❌ | 1.18 (1%) ❌ |
["gcp", "bernoulliOdds-size(X)=(30, 40, 50), rank(X)=1"] |
0.96 (5%) | 0.95 (1%) ✅ |
["gcp", "bernoulliOdds-size(X)=(30, 40, 50), rank(X)=2"] |
0.92 (5%) ✅ | 0.94 (1%) ✅ |
["gcp", "gamma-size(X)=(15, 20, 25), rank(X)=2"] |
0.95 (5%) | 0.98 (1%) ✅ |
["gcp", "gamma-size(X)=(30, 40, 50), rank(X)=1"] |
1.04 (5%) | 1.05 (1%) ❌ |
["gcp", "gamma-size(X)=(30, 40, 50), rank(X)=2"] |
0.95 (5%) | 0.96 (1%) ✅ |
["gcp", "least-squares-size(X)=(15, 20, 25), rank(X)=1"] |
0.64 (5%) ✅ | 0.42 (1%) ✅ |
["gcp", "least-squares-size(X)=(15, 20, 25), rank(X)=2"] |
0.68 (5%) ✅ | 0.35 (1%) ✅ |
["gcp", "least-squares-size(X)=(30, 40, 50), rank(X)=1"] |
0.88 (5%) ✅ | 0.55 (1%) ✅ |
["gcp", "least-squares-size(X)=(30, 40, 50), rank(X)=2"] |
0.89 (5%) ✅ | 0.41 (1%) ✅ |
["gcp", "poisson-size(X)=(30, 40, 50), rank(X)=1"] |
0.97 (5%) | 0.99 (1%) ✅ |
["gcp", "poisson-size(X)=(30, 40, 50), rank(X)=2"] |
0.97 (5%) | 0.97 (1%) ✅ |
Benchmark Group List
Here's a list of all the benchmark groups executed by this job:
["gcp"]
Julia versioninfo
Target
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000 arm64 arm
CPU: Apple M1 Max:
speed user nice sys idle irq
#1-10 2400 MHz 1592886 s 0 s 608959 s 21131960 s 0 s
Memory: 64.0 GB (36595.609375 MB free)
Uptime: 499293.0 sec
Load Avg: 2.35498046875 2.36669921875 2.69384765625
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 1 on 8 virtual cores
Baseline
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
uname: Darwin 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000 arm64 arm
CPU: Apple M1 Max:
speed user nice sys idle irq
#1-10 2400 MHz 1594429 s 0 s 609486 s 21139488 s 0 s
Memory: 64.0 GB (36540.078125 MB free)
Uptime: 499390.0 sec
Load Avg: 2.14990234375 2.32861328125 2.64208984375
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 1 on 8 virtual cores
All tests pass, except on nightly (which is likely due to the other issues). Merged! |
Fixes #42