Add amdgpu target #823

Flakebi · 2024-12-26T01:14:18Z

Proposal

Add the amdgpu target to rustc that allows to generate code for AMD GPUs.

The LLVM backend has good support for this backend. The goal is to expose this to Rust, enabling Rust as another language on these GPUs. The main target is compute capabilities. This is in contrast to the rust-gpu project, which targets graphics capabilities through spir-v (the graphics variant). The base runtime to run compute programs on AMD GPUs is HSA (Heterogeneous System Architecture), which is implemented in ROCR-Runtime. Therefore, the Rust backend should target the amdhsa OS. To support the target in rustc, LLVM needs to be compiled with the amdgpu backend enabled. On Windows (and Linux), HIP can be used to load the same compiled amdgpu programs.

There are two points in which the amdgpu target is different from other (mainstream/x86) targets.

Address spaces

Address spaces can be thought of as denoting different physical memory areas (this is a thought concept, it can be that way in hardware, but it does not need to be). In LLVM IR, each pointer has an address space, defaulting to addrspace(0) (this is implicit in textual IR, which is why you won’t see it there). Different address spaces have different properties, e.g. they can have a different pointer size, the nullptr can be different (e.g. 0 vs -1) and the machine instructions used to access them can be different.

The amdgpu LLVM backend makes heavy use of address spaces. This is also a problem for other targets that want to support Rust (though mostly more exotic ones). The use of address spaces leads to situations, where a pointer in one address space needs to be casted to a pointer in a different address space. In LLVM IR, bitcast is invalid for this case, addrspacecast needs to be used.

The changes to rustc code should be mostly about fixing problems in a rather contained way. (I don’t know for sure how contained it will be, but compiling core required surprisingly few changes as can be seen in rust-lang/rust#134740; disclaimer: I only tried running a very simple program with -Zbuild-std=core so far.) The changes will bring the rust llvm backend closer to how LLVM envisions address spaces, which should make it easier to support other future targets that use more address spaces (there was already some work for other targets, which in turn made it easier for amdgpu).

To get a feeling for what LLVM address spaces are used for, here is the list of the important amdgpu address spaces (from https://llvm.org/docs/AMDGPUUsage.html#address-spaces):

0 (generic/flat): This is a “catch all”. Loads and stores to addrspace(0) may go to any of the below address spaces. The hardware switches at runtime, depending on the pointer. This works for all pointers, but is the slowest.
1 (global): This is basically VRAM (i.e. the most “normal” memory).
3 (local/LDS): This is different memory in hardware, basically a software-defined cache. Loads and stores are a lot faster than to VRAM, but not globally visible to all other threads. (Known as groupshared in HLSL or shared in GLSL.)
4 (constant): Same memory region as global (1), but guaranteed to be constant throughout the program. This allows using different instructions that are faster by going through a different cache.
5 (private/scratch): This is used for the stack, all allocas need to be in addrspace(5). A thread can only access its own private memory.

Basic support for the amdgpu target means using address spaces 1 (incoming pointers), 5 (allocas) and 0 (if we don’t know which one it is). Support for groupshared memory in the language requires its own RFC (something similar to thread_local probably makes sense).

Casting pointers to `addrspace(0)` before use

I experimented more and came to the conclusion that all pointers need to be casted to addrspace (0) before they are used (this affects alloca and global variables). If we don’t do that, things
go wrong with the below code:

fn f(p: *const i8 /* addrspace(0) */) -> *const i8 /* addrspace(0) */ {
	let local = 0i8; /* addrspace(5) */
	let res = if cond { p } else { &raw const local };
	res
}

results in

	%local = alloca addrspace(5) i8
	%res = alloca addrspace(5) ptr

if:
	; Store 64-bit flat pointer
	store ptr %p, ptr addrspace(5) %res

else:
	; Store 32-bit scratch pointer
	store ptr addrspace(5) %local, ptr addrspace(5) %res

ret:
	; Load and return 64-bit flat pointer
	%res.load = load ptr, ptr addrspace(5) %res
	ret ptr %res.load

This may store a 32-bit pointer and read it back as a 64-bit pointer, which is obviously wrong and
cannot work. Instead, we need to addrspacecast %local to ptr addrspace(0), then we store and load
the correct type.

So, I think the way to go is casting every pointer to addrspace(0) immediately after creating an
alloca or a global. For alloca, the change is just 2 lines, for globals it is a bit more involved due to vtables, where the global variable is modified after it is created and therefore we need to look through the addrspacecast constexpr when adding attributes.

See https://rust-lang.zulipchat.com/#narrow/channel/233931-t-compiler.2Fmajor-changes/topic/Add.20amdgpu.20target.20compiler-team.23823/near/491175958.

Many processors / target-cpus

Every generation of GPUs uses different machine code (to some extent). LLVM supports them as different “cpu”s or processors (-Ctarget-cpu= argument for rustc).

There are two challenges for this regarding Rust support

A single processor needs to be the default in the target description.
If Rust would distribute compiled libraries (e.g. core), it would need to be for all processors to be useful.

There is no obvious choice for the default processor. If some processor is the default and a user tries to use the amdgpu target without overwriting the target-cpu, it likely results in an unusable binary. We could use a non-existing “cpu” as the default, resulting in compiler errors, to make users aware of the need to set a target-cpu.
E.g "please specify -Ctarget-cpu" results in failing compilations and warnings:

'please specify -Ctarget-cpu' is not a recognized processor for this target (ignoring processor)

Alternatively, some choice can be made, like gfx900.

There is a PR that fails compilation if no cpu is specified explicitly: rust-lang/rust#135030

Regarding 2., there may be a generic backend for amdgpu in the future, relying on spir-v (the compute variant) which would solve both these issues, but that is a long way to go and unsure if it eventually happens. For the reason of binary size alone, it does not make sense for Rust to distribute pre-compiled code for the amdgpu backend. Users should instead specify their processor via -Ctarget-cpu= and compile core via -Zbuild-std=core or similar means.

The list of processors supported by LLVM is here: https://llvm.org/docs/AMDGPUUsage.html#processors

Related issues

The PR to add the amdgpu target: Add amdgpu target rust#134740
New tracking issue: Tracking Issue for amdgpu target rust#135024
The original tracking issue for amdgpu (has been closed for a while): Tracking issue for targeting AMDGPU devices rust#51575

Mentors or Reviewers

None yet, this is my first Rust contribution :)

Process

The main points of the Major Change Process are as follows:

File an issue describing the proposal.
A compiler team member or contributor who is knowledgeable in the area can second by writing @rustbot second.
- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a -C flag, then full team check-off is required.
- Compiler team members can initiate a check-off via @rfcbot fcp merge on either the MCP or the PR.
Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

The text was updated successfully, but these errors were encountered:

rustbot · 2024-12-26T01:14:20Z

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

Concerns or objections to the proposal should be discussed on Zulip and formally registered here by adding a comment with the following syntax:

@rfcbot concern reason-for-concern 
<description of the concern>

Concerns can be lifted with:

@rfcbot resolve reason-for-concern

See documentation at https://forge.rust-lang.org

cc @rust-lang/compiler

workingjubilee · 2025-01-13T18:28:18Z

This target will require quite a bit of consideration by rustc due to being unusual in many ways, but I see no problem with adding it on an experimental basis (so, tier 3), especially given that we've already added PTX targets.

@rustbot second

LegNeato · 2025-01-13T23:07:58Z

Just a quick note, rust-gpu has shifted its stated focus to compute over graphics...it just so happens that many existing contributors are graphics-focused 🍻

LegNeato · 2025-01-13T23:08:38Z

We are very interested in address spaces as well, /cc @eddyb

Flakebi added major-change A proposal to make a major change to rustc T-compiler Add this label so rfcbot knows to poll the compiler team labels Dec 26, 2024

rust-lang locked and limited conversation to collaborators Dec 26, 2024

rustbot added the to-announce Announce this issue on triage meeting label Dec 26, 2024

Flakebi mentioned this issue Dec 26, 2024

Add amdgpu target rust-lang/rust#134740

Open

apiraino removed the to-announce Announce this issue on triage meeting label Dec 26, 2024

Flakebi mentioned this issue Jan 2, 2025

Tracking Issue for amdgpu target rust-lang/rust#135024

Open

12 tasks

ZuseZ4 mentioned this issue Jan 3, 2025

Expose experimental LLVM features for automatic differentiation and GPU offloading rust-lang/rust-project-goals#109

Open

9 tasks

rust-lang unlocked this conversation Jan 5, 2025

rustbot added the final-comment-period The FCP has started, most (if not all) team members are in agreement label Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add amdgpu target #823

Add amdgpu target #823

Flakebi commented Dec 26, 2024 •

edited

Loading

rustbot commented Dec 26, 2024

workingjubilee commented Jan 13, 2025

LegNeato commented Jan 13, 2025

LegNeato commented Jan 13, 2025

Add amdgpu target #823

Add amdgpu target #823

Comments

Flakebi commented Dec 26, 2024 • edited Loading

Proposal

Address spaces

Casting pointers to addrspace(0) before use

Many processors / target-cpus

Related issues

Mentors or Reviewers

Process

Comments

rustbot commented Dec 26, 2024

workingjubilee commented Jan 13, 2025

LegNeato commented Jan 13, 2025

LegNeato commented Jan 13, 2025

Flakebi commented Dec 26, 2024 •

edited

Loading

Casting pointers to `addrspace(0)` before use