-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up test suite #412
Comments
The doctests take longer than the nofib tests on my machine. I think that is because it has to recompile all modules before it can run the doctests. |
Blackscholes is the test that takes the longest on my machine (about 2 seconds). It doesn't seem like it would benefit from using Otherwise maybe the generation of the array can be sped up. Now, it first generates a list and then converts that to an array (maybe some of this is fused away?). |
Generating large test arrays is important for the parallel backends, because they can use different implementations for larger arrays (for example the GPU backends need to use multiple thread blocks after a certain (hardware dependent) size). It definitely could be a problem with the random number/array generation, we should look into that as well. |
See the big edit at the bottom So, I think we could generate random arrays much more quickly using import qualified Prelude
import Data.Array.Accelerate
import Data.Array.Accelerate.Data.Bits
import Data.Function ((&))
rotl64 :: Exp Word64 -> Exp Int -> Exp Word64
rotl64 x r = x `unsafeShiftL` r .|. x `unsafeShiftR` (64 - r)
fmix64 :: Exp Word64 -> Exp Word64
fmix64 key = key
& (\x -> x `xor` (x `unsafeShiftR` 33))
& (* 0xff51afd7ed558ccd)
& (\x -> x `xor` (x `unsafeShiftR` 33))
& (* 0xc4ceb9fe1a85ec53)
& (\x -> x `xor` (x `unsafeShiftR` 33))
murmur :: Exp Word64 -> Exp Word64 -> Exp Word64
murmur seed key = key
& (* 0x87c37b91114253d5)
& (\x -> rotl64 x 31)
& (* 0x4cf5ad432745937f)
& xor seed
& fmix64
random :: Shape sh => Exp Word64 -> Exp sh -> Acc (Array sh Word64)
random seed sh = generate sh (murmur seed . fromIntegral . toIndex sh) This only produces uniformly pseudo random This is the source code of murmur that I adapted in the code above: https://github.com/aappleby/smhasher/blob/61a0530f28277f2e850bfc39600ce61d02b518de/src/MurmurHash3.cpp#L289-L331 Here the randomness of murmur is discussed: https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633 Another problem is that murmur is not intended for such short inputs, so it might be better to look for some other hash function, but I could not find any that were specifically intended for this purpose. Edit: This post might help, but I haven't read it completely yet: https://mostlymangling.blogspot.com/2018/07/on-mixing-functions-in-fast-splittable.html Edit: I just realized that of course random array generation already exists in accelerate. I looked at mwc-random-accelerate (specifically: https://github.com/tmcdonell/mwc-random-accelerate/blob/e840871e2edbc583bc90230b1bb9d9452e89d3d6/Data/Array/Accelerate/System/Random/MWC.hs#L133). It seems that that uses a sequential random generation method, which is probably much slower than the method I describe here, so there may still be some use for this method. Big edit: import qualified Prelude
import Data.Array.Accelerate
import Data.Array.Accelerate.Data.Bits
import Data.Function ((&))
xnasamx :: Exp Word64 -> Exp Word64 -> Exp Word64
xnasamx x c = x
& xor c
& (\x -> x `xor` rotateR x 25 `xor` rotateR x 47)
& (* 0x9E6C63D0676A9A99)
& (\x -> x `xor` (x `unsafeShiftR` 23) `xor` (x `unsafeShiftR` 51))
& (* 0x9E6D62D06F6A9A9B)
& (\x -> x `xor` (x `unsafeShiftR` 23) `xor` (x `unsafeShiftR` 51))
& xor c
random :: Shape sh => Exp Word64 -> Exp sh -> Acc (Array sh Word64)
random seed sh = generate sh ((\x -> xnasamx x seed) . fromIntegral . toIndex sh) This should beat the Edit: The website https://www.pcg-random.org contains lots of very useful information about random number generators. PCG itself is not directly suitable for parallel random number generation, because it still uses a step function that is not simply incrementing the state. But they do use a permutation on the state to produce their final output, similar to the mixing functions I described here. So, their website and paper contains more information about this topic. |
I have some benchmark results:
I had to modify Here's the code: {-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE TypeApplications #-}
module Main where
import Internal
import System.Random (randomIO)
import qualified Data.Array.Accelerate.LLVM.PTX as PTX
import qualified Data.Array.Accelerate.LLVM.Native as Native
import qualified Data.Array.Accelerate as A
import qualified Data.Array.Accelerate.System.Random.MWC as MWC
import Data.Word
import Criterion.Main
randomNative :: Word64 -> Int -> A.Array A.DIM1 Word64
randomNative seed n =
$(let f :: A.Acc (A.Scalar Word64) -> A.Acc (A.Scalar A.DIM1) -> A.Acc (A.Array A.DIM1 Word64)
f seed sh = A.compute (random (A.the seed) (A.the sh))
in Native.runQ f)
(A.fromList A.Z [seed])
(A.fromList A.Z [(A.Z A.:. n)] :: A.Scalar A.DIM1)
randomPTX :: Word64 -> Int -> A.Array A.DIM1 Word64
randomPTX seed n =
$(let f :: A.Acc (A.Scalar Word64) -> A.Acc (A.Scalar A.DIM1) -> A.Acc (A.Array A.DIM1 Word64)
f seed sh = A.compute (random (A.the seed) (A.the sh))
in PTX.runQ f)
(A.fromList A.Z [seed])
(A.fromList A.Z [(A.Z A.:. n)] :: A.Scalar A.DIM1)
bench10 :: (Int -> Benchmarkable) -> [Benchmark]
bench10 f =
[ bench "1000" $ f 1000
, bench "10000" $ f 10000
, bench "100000" $ f 100000
]
main :: IO ()
main = do
seed <- randomIO
defaultMain
[ bgroup "xnasamx Native"
$ bench10 (whnf ((\arr -> A.linearIndexArray arr 0) . randomNative seed))
, bgroup "xnasamx PTX"
$ bench10 (whnf ((\arr -> A.linearIndexArray arr 0) . randomPTX seed))
, bgroup "mwc"
$ bench10 (whnfIO . fmap (\arr -> A.linearIndexArray arr 0) . MWC.randomArray (MWC.uniform @_ @Word64) . (A.Z A.:.))
] Where By the way, how can I properly force array computation? I have now used Edit: The benchmark above mostly measures the time it takes to allocate memory... running xnasamx in a tight strict loop in a single thread yields the following results:
|
Hi @noughtmare, sorry for the slow response. This looks great! I wonder will this will work together with heghehog's notion of random number generation; that is, shrinking on failure? At any rate, it looks like this could be used as the basis for a fast random number generation package (as opposed to |
We might be able to use the recently announced doctest-extract package to improve the speed of the doctests. About the hedgehog tests, yeah I've looked into it a little bit and it doesn't look very easy to integrate this way of generating random numbers into hedgehog. It also might be a little bit like a bootstrapping situation where accelerate is tested on arrays that are generated by accelerate code. |
I have pulled one of the PractRand generators out into the sfc-random-accelerate package. |
That looks great, probably even better than that nasam function. I didn't know PractRand included those generators. |
I just tried |
The standard accelerate test suite, used by all the backends, can be quite slow. Several of the tests are significantly slower than the others, for example segmented folds and scans, which I believe is because the reference implementations are very inefficient. Writing some more efficient reference implementations (e.g. using
Data.Vector.Unboxed
) should help speed things up.The text was updated successfully, but these errors were encountered: