GCN backend. #7

YellowOnion · 2016-06-15T09:22:03Z

Hey, I posted on the User group a few weeks back about AMD/OpenCL support, but this was suggested as an enhancement, I would love to attempt to get something working, could someone point me in the right direction on getting started?

tmcdonell · 2016-06-16T09:32:08Z

Great!

I think what is missing at the moment is:

Support in llvm-general for llvm >= 3.6. As far as I know the GCN backend was first released as part of llvm-3.6, and the current version of llvm-general is for llvm-3.5. Work there is currently underway for llvm-3.8 support, so that should just a matter of time.
Some way to talk to the AMD hardware. Basically, an equivalent of my cuda package which is a bunch of FFI bindings to the low-level CUDA driver to allocate memory, transfer data, launch a kernel, etc. If that doesn't already exist somewhere that is probably the biggest missing piece. (actually, I'm not even sure what kind of API AMD uses to control its hardware.)

Once we had those two things, getting an accelerate-llvm-gcn backend up and running should be relatively easy.

YellowOnion · 2016-06-26T03:29:53Z

I've been doing some research and it doesn't look promising.

llvm code that is generated seems to quite low level e.g. for drivers on linux (which means not portable, and means you need to mess around below the mesa stack),

OpenCL takes another form of binary (possibly partly compatible).

And the only other option I've found is to use HSA-runtime to upload the kernels, but HSA is designed for AMDs APUs so I'm not sure how useful that is either.

The final option is to find some way to have LLVM spit out SPIR (OpenCLs bytecode) so that any OpenCL device can work.

tmcdonell · 2016-06-27T04:24:07Z

The low-level nature of LLVM is fine; that's all handled by accelerate-llvm. Note that an accelerate-llvm-gcn (or whatever we call it) backend is generating code only for the kernel parts that are executed on the GPU, which is exactly what the accelerate-llvm-ptx and accelerate-cuda backends do. The current documentation for LLVM's AMDGPU target is anaemic, but otherwise not surprising compared to the NVPTX documentation. It fits in with how the PTX backend works.

I should probably mention that I expect to use no OpenCL at all. The accelerate-llvm-ptx backend is so named because, despite targeting 'CUDA capable NVIDIA GPUs', does not actually generate CUDA code, and I'd expect the same to happen here as well. CUDA is an umbrella term that covers stuff executing on the GPU (which we are interested in) as well as control and coordination carried out by the host (which we are not; that gets taken over by the accelerate runtime). OpenCL is the multi-vendor equivalent, so we're interested in what OpenCL does once you tell it you are targeting an AMD GPU. (Does that help clarify the goal at all?)

HSA actually looks like a promising avenue. This example looks like it is showing how to launch the same "hello world" kernel as at the bottom of the LLVM AMDGPU documentation. That example is all wrapped in C++ (worryingly), but it looks like the actual hsa.h interface is just regular C, which is fine (from a Haskell-FFI perspective).

I'd have to look closer at the other examples, and maybe a few others, but to me I think the next step is a Haskell FFI binding to the HSA Runtime API.

YellowOnion · 2016-06-27T07:08:03Z

Yeah I had found that repository, but most documentation I've found showed them targeting APUs this pages Title even states that it only supports Kaveri & Carrizo APUs.

I'll try get something working on my Non-APU system, I guess that would be the best way to prove myself wrong.

tmcdonell · 2016-06-28T04:24:25Z

It might be worthwhile shooting an email to AMD / LLVM mailing list or opening a github issue on that repo asking for advice / clarification / pointer to the correct documentation explaining how to use the LLVM AMD target.

I don't have a machine with an AMD card in it at the moment so I can't be much help trying things out, sorry.

tmcdonell · 2017-05-15T01:08:31Z

Possibly useful links:

lancelet · 2017-12-31T09:29:12Z

With MacOS (ROCm is Linux-only), I've been having problems getting amdgcn to work at all.

If I compile an OpenCL kernel with Apple's own openclc, like this:

/System/Library/Frameworks/OpenCL.framework/Libraries/openclc -c -emit-llvm -arch gpu_32 -o <output>.bc <input>.cl

I get a kernel which I can use with clCreateProgramWithBinary.

However, clang-5.0.0 (from nixpkgs) with amdgcn, invoked as follows:

clang -c -cl-std=CL1.2 -arch amdgcn -emit-llvm -Xclang -finclude-default-header -o <output>.bc <input>.cl

Produces an error like this with clCreateProgramWithBinary:

[CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:\nUnknown bitstream version!\n

Looking at a hex dump of the two files, they seem quite similar:

Apple:

00000000  de c0 17 0b 00 00 00 00  14 00 00 00 b0 04 00 00  |................|
00000010  ff ff ff ff 42 43 c0 de  21 0c 00 00 29 01 00 00  |....BC..!...)...|
...

AMDGCN:

00000000  de c0 17 0b 00 00 00 00  14 00 00 00 50 0c 00 00  |............P...|
00000010  ff ff ff ff 42 43 c0 de  35 14 00 00 05 00 00 00  |....BC..5.......|
...

So AFAICT, they appear to be "the same kind of thing". At least I'm not seeing two completely different set of magic numbers, etc.

Does anyone know if I'm missing something here? Has anyone come across instructions for running AMDGCN-compiled code on a Mac?

tmcdonell · 2017-12-31T09:38:21Z

If you have an older version of LLVM available, it might be worth trying that?

I know the NVIDIA tools are also based off of LLVM, but typically lag by a few releases.

typedrat · 2019-01-09T17:41:51Z

There's no cross-platform way to load GCN binaries other than OpenCL. (And no alternative of any kind on macOS!)

That's not to say that it'd be generating OpenCL code, but it has to work with the API a little to load the object files.

tmcdonell · 2019-01-11T10:49:00Z

@typedrat thanks for the info!

YellowOnion · 2019-09-06T10:26:41Z

I got an AMD GPU again and I got curious about this bug.

These have appeared:

https://github.com/RadeonOpenCompute/clang-ocl/blob/master/clang-ocl.in
https://rocm.github.io/QuickStartOCL.html

This mentions no Windows support which leaves me out of options: ROCm/clang-ocl#4

tmcdonell · 2020-06-05T17:18:28Z

There is some documentation on the AMD toolchain here.

typedrat · 2020-06-06T17:46:57Z

Unfortunately all of that is about the HSA stuff, which is the old name for ROCm, and is also Linux-only (and quite a pain to get working properly, from painful recent experience.) There's nothing to be done as it stands, AMD simply refuses to provide a workable target on other OSes.

If we are willing to only support Linux, the situation is relatively tractable. It's a bugbear and a half to get the ROCm runtime working correctly and AMD tries to force you into using Docker in unpleasant ways, but once you get it working the API is simple and nice and it's easy enough to fling IR into LLVM and turn it into something you can load, about on par with the level of complexity the existing NVIDIA backend has.
macOS support is impossible up front because OpenCL is the only way to load binaries on anything that isn't Linux and it's deprecated there. Even ignoring that and assuming it won't really go away, there isn't any documentation of the binary format, and we'd need it to manually munge the Linux-ROCm format that LLVM produces to trick Apple's OpenCL runtime into loading it. That's a reverse engineering project that I wouldn't want to depend on.
On Windows, there are two options:
1. Take advantage of the somewhat better documentation and try to force LLVM's output into clCreateProgramWithBinary as discussed for macOS. This would either be easy or impossible: if the ABI is the same the repackaging is trivial and has libraries available that can already do it, and if it isn't the problem is immediately intractable.
2. Hope that the announced WSL GPU compute support will be good enough.

tmcdonell · 2020-06-08T08:28:52Z

@typedrat wow thanks for the insight!

Well, I just bought a Radeon VII, so let's see what we can do.

gozzarda · 2022-02-09T15:32:11Z

This is a bit beyond my expertise, but in case it helps, I thought I would drop a mention of https://github.com/google/clspv.

This project apparently provides LLVM modules for targeting Vulkan compute shaders. I could be entirely wrong but this sounds like it could be the lynch pin of an accelerate-llvm-vulkan, with broad device support.

Sorry if this isn't helpful. I don't know enough to know if not.

tmcdonell · 2022-02-17T09:50:22Z

@gozzarda oh nice find, thanks!

typedrat · 2024-12-26T22:25:31Z

If there's still any interest in something like this, time has done a lot of the work for us, since there is now ROCm support on Windows and Macs with discrete GPUs and x86 processors are pretty firmly deprecated.

It's still a bit annoying to get ROCm working on Linux and the supported GPU list is a lot shorter than you'd really hope for it to be, but a ROCm-based backend is definitely becoming more feasible.

I'll probably be working on a low-level library for GPU compute on AMD hardware with Haskell soon no matter what.

tmcdonell mentioned this issue Dec 5, 2016

AMD GCN backend AccelerateHS/accelerate#338

Open

tmcdonell added the help wanted label Jan 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCN backend. #7

GCN backend. #7

YellowOnion commented Jun 15, 2016

tmcdonell commented Jun 16, 2016 •

edited

Loading

YellowOnion commented Jun 26, 2016

tmcdonell commented Jun 27, 2016

YellowOnion commented Jun 27, 2016 •

edited

Loading

tmcdonell commented Jun 28, 2016

tmcdonell commented May 15, 2017

lancelet commented Dec 31, 2017

tmcdonell commented Dec 31, 2017

typedrat commented Jan 9, 2019 •

edited

Loading

tmcdonell commented Jan 11, 2019

YellowOnion commented Sep 6, 2019 •

edited

Loading

tmcdonell commented Jun 5, 2020

typedrat commented Jun 6, 2020

tmcdonell commented Jun 8, 2020

gozzarda commented Feb 9, 2022

tmcdonell commented Feb 17, 2022

typedrat commented Dec 26, 2024

GCN backend. #7

GCN backend. #7

Comments

YellowOnion commented Jun 15, 2016

tmcdonell commented Jun 16, 2016 • edited Loading

YellowOnion commented Jun 26, 2016

tmcdonell commented Jun 27, 2016

YellowOnion commented Jun 27, 2016 • edited Loading

tmcdonell commented Jun 28, 2016

tmcdonell commented May 15, 2017

lancelet commented Dec 31, 2017

tmcdonell commented Dec 31, 2017

typedrat commented Jan 9, 2019 • edited Loading

tmcdonell commented Jan 11, 2019

YellowOnion commented Sep 6, 2019 • edited Loading

tmcdonell commented Jun 5, 2020

typedrat commented Jun 6, 2020

tmcdonell commented Jun 8, 2020

gozzarda commented Feb 9, 2022

tmcdonell commented Feb 17, 2022

typedrat commented Dec 26, 2024

tmcdonell commented Jun 16, 2016 •

edited

Loading

YellowOnion commented Jun 27, 2016 •

edited

Loading

typedrat commented Jan 9, 2019 •

edited

Loading

YellowOnion commented Sep 6, 2019 •

edited

Loading