Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCN backend. #7

Open
YellowOnion opened this issue Jun 15, 2016 · 17 comments
Open

GCN backend. #7

YellowOnion opened this issue Jun 15, 2016 · 17 comments

Comments

@YellowOnion
Copy link

Hey, I posted on the User group a few weeks back about AMD/OpenCL support, but this was suggested as an enhancement, I would love to attempt to get something working, could someone point me in the right direction on getting started?

@tmcdonell
Copy link
Member

tmcdonell commented Jun 16, 2016

Great!

I think what is missing at the moment is:

  • Support in llvm-general for llvm >= 3.6. As far as I know the GCN backend was first released as part of llvm-3.6, and the current version of llvm-general is for llvm-3.5. Work there is currently underway for llvm-3.8 support, so that should just a matter of time.
  • Some way to talk to the AMD hardware. Basically, an equivalent of my cuda package which is a bunch of FFI bindings to the low-level CUDA driver to allocate memory, transfer data, launch a kernel, etc. If that doesn't already exist somewhere that is probably the biggest missing piece. (actually, I'm not even sure what kind of API AMD uses to control its hardware.)

Once we had those two things, getting an accelerate-llvm-gcn backend up and running should be relatively easy.

@YellowOnion
Copy link
Author

I've been doing some research and it doesn't look promising.

llvm code that is generated seems to quite low level e.g. for drivers on linux (which means not portable, and means you need to mess around below the mesa stack),

OpenCL takes another form of binary (possibly partly compatible).

And the only other option I've found is to use HSA-runtime to upload the kernels, but HSA is designed for AMDs APUs so I'm not sure how useful that is either.

The final option is to find some way to have LLVM spit out SPIR (OpenCLs bytecode) so that any OpenCL device can work.

@tmcdonell
Copy link
Member

The low-level nature of LLVM is fine; that's all handled by accelerate-llvm. Note that an accelerate-llvm-gcn (or whatever we call it) backend is generating code only for the kernel parts that are executed on the GPU, which is exactly what the accelerate-llvm-ptx and accelerate-cuda backends do. The current documentation for LLVM's AMDGPU target is anaemic, but otherwise not surprising compared to the NVPTX documentation. It fits in with how the PTX backend works.

I should probably mention that I expect to use no OpenCL at all. The accelerate-llvm-ptx backend is so named because, despite targeting 'CUDA capable NVIDIA GPUs', does not actually generate CUDA code, and I'd expect the same to happen here as well. CUDA is an umbrella term that covers stuff executing on the GPU (which we are interested in) as well as control and coordination carried out by the host (which we are not; that gets taken over by the accelerate runtime). OpenCL is the multi-vendor equivalent, so we're interested in what OpenCL does once you tell it you are targeting an AMD GPU. (Does that help clarify the goal at all?)

HSA actually looks like a promising avenue. This example looks like it is showing how to launch the same "hello world" kernel as at the bottom of the LLVM AMDGPU documentation. That example is all wrapped in C++ (worryingly), but it looks like the actual hsa.h interface is just regular C, which is fine (from a Haskell-FFI perspective).

I'd have to look closer at the other examples, and maybe a few others, but to me I think the next step is a Haskell FFI binding to the HSA Runtime API.

@YellowOnion
Copy link
Author

YellowOnion commented Jun 27, 2016

Yeah I had found that repository, but most documentation I've found showed them targeting APUs this pages Title even states that it only supports Kaveri & Carrizo APUs.

I'll try get something working on my Non-APU system, I guess that would be the best way to prove myself wrong.

@tmcdonell
Copy link
Member

It might be worthwhile shooting an email to AMD / LLVM mailing list or opening a github issue on that repo asking for advice / clarification / pointer to the correct documentation explaining how to use the LLVM AMD target.

I don't have a machine with an AMD card in it at the moment so I can't be much help trying things out, sorry.

@tmcdonell
Copy link
Member

@lancelet
Copy link

With MacOS (ROCm is Linux-only), I've been having problems getting amdgcn to work at all.

If I compile an OpenCL kernel with Apple's own openclc, like this:

/System/Library/Frameworks/OpenCL.framework/Libraries/openclc -c -emit-llvm -arch gpu_32 -o <output>.bc <input>.cl

I get a kernel which I can use with clCreateProgramWithBinary.

However, clang-5.0.0 (from nixpkgs) with amdgcn, invoked as follows:

clang -c -cl-std=CL1.2 -arch amdgcn -emit-llvm -Xclang -finclude-default-header -o <output>.bc <input>.cl

Produces an error like this with clCreateProgramWithBinary:

[CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:\nUnknown bitstream version!\n

Looking at a hex dump of the two files, they seem quite similar:

Apple:

00000000  de c0 17 0b 00 00 00 00  14 00 00 00 b0 04 00 00  |................|
00000010  ff ff ff ff 42 43 c0 de  21 0c 00 00 29 01 00 00  |....BC..!...)...|
...

AMDGCN:

00000000  de c0 17 0b 00 00 00 00  14 00 00 00 50 0c 00 00  |............P...|
00000010  ff ff ff ff 42 43 c0 de  35 14 00 00 05 00 00 00  |....BC..5.......|
...

So AFAICT, they appear to be "the same kind of thing". At least I'm not seeing two completely different set of magic numbers, etc.

Does anyone know if I'm missing something here? Has anyone come across instructions for running AMDGCN-compiled code on a Mac?

@tmcdonell
Copy link
Member

If you have an older version of LLVM available, it might be worth trying that?

I know the NVIDIA tools are also based off of LLVM, but typically lag by a few releases.

@typedrat
Copy link

typedrat commented Jan 9, 2019

There's no cross-platform way to load GCN binaries other than OpenCL. (And no alternative of any kind on macOS!)

That's not to say that it'd be generating OpenCL code, but it has to work with the API a little to load the object files.

@tmcdonell
Copy link
Member

@typedrat thanks for the info!

@YellowOnion
Copy link
Author

YellowOnion commented Sep 6, 2019

I got an AMD GPU again and I got curious about this bug.

These have appeared:

https://github.com/RadeonOpenCompute/clang-ocl/blob/master/clang-ocl.in
https://rocm.github.io/QuickStartOCL.html

This mentions no Windows support which leaves me out of options: ROCm/clang-ocl#4

@tmcdonell
Copy link
Member

There is some documentation on the AMD toolchain here.

@typedrat
Copy link

typedrat commented Jun 6, 2020

Unfortunately all of that is about the HSA stuff, which is the old name for ROCm, and is also Linux-only (and quite a pain to get working properly, from painful recent experience.) There's nothing to be done as it stands, AMD simply refuses to provide a workable target on other OSes.

  1. If we are willing to only support Linux, the situation is relatively tractable. It's a bugbear and a half to get the ROCm runtime working correctly and AMD tries to force you into using Docker in unpleasant ways, but once you get it working the API is simple and nice and it's easy enough to fling IR into LLVM and turn it into something you can load, about on par with the level of complexity the existing NVIDIA backend has.
  2. macOS support is impossible up front because OpenCL is the only way to load binaries on anything that isn't Linux and it's deprecated there. Even ignoring that and assuming it won't really go away, there isn't any documentation of the binary format, and we'd need it to manually munge the Linux-ROCm format that LLVM produces to trick Apple's OpenCL runtime into loading it. That's a reverse engineering project that I wouldn't want to depend on.
  3. On Windows, there are two options:
    1. Take advantage of the somewhat better documentation and try to force LLVM's output into clCreateProgramWithBinary as discussed for macOS. This would either be easy or impossible: if the ABI is the same the repackaging is trivial and has libraries available that can already do it, and if it isn't the problem is immediately intractable.
    2. Hope that the announced WSL GPU compute support will be good enough.

@tmcdonell
Copy link
Member

@typedrat wow thanks for the insight!

Well, I just bought a Radeon VII, so let's see what we can do.

@gozzarda
Copy link

gozzarda commented Feb 9, 2022

This is a bit beyond my expertise, but in case it helps, I thought I would drop a mention of https://github.com/google/clspv.

This project apparently provides LLVM modules for targeting Vulkan compute shaders. I could be entirely wrong but this sounds like it could be the lynch pin of an accelerate-llvm-vulkan, with broad device support.

Sorry if this isn't helpful. I don't know enough to know if not.

@tmcdonell
Copy link
Member

@gozzarda oh nice find, thanks!

@typedrat
Copy link

If there's still any interest in something like this, time has done a lot of the work for us, since there is now ROCm support on Windows and Macs with discrete GPUs and x86 processors are pretty firmly deprecated.

It's still a bit annoying to get ROCm working on Linux and the supported GPU list is a lot shorter than you'd really hope for it to be, but a ROCm-based backend is definitely becoming more feasible.

I'll probably be working on a low-level library for GPU compute on AMD hardware with Haskell soon no matter what.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants