Adds API for non-caching-store and load #1476

ranapratap55 · 2025-01-02T07:28:57Z

RCCL provides "low latency" protocols for communication between agents, where the entire message consisting of data and flags is packed into a single L2 cache line. This is usually accomplished using atomic relaxed instructions in LLVM. But the 128-byte version of this protocol (LL-128) requires 128-bit load or store instructions that bypass the cache and are not broken up into multiple instructions. The nontemporal builtin is not always suitable for this use case.

The proposed approach is to provide a C++ function template that encapsulates an inline assembly call. This asm is intended to use the appropriate load/store parameters for each combination of data size and architecture.

ranapratap55 · 2025-01-02T07:34:03Z

created a new PR with both load and store. Added test cases for byte, 2-byte, 4, 8, 16.

closed the old pr #1289

wenkaidu · 2025-01-02T20:55:12Z

@ranapratap55 can you clarify if the non-caching load and store can work over coarse grained device memory, which is the memory type used in your tests?

ranapratap55 · 2025-01-08T04:41:09Z

@ranapratap55 can you clarify if the non-caching load and store can work over coarse grained device memory, which is the memory type used in your tests?

yes, it works on coarse-grained memory. I have used the hipMalloc() which allocates coarse-grained memory.

wenkaidu · 2025-01-08T17:08:55Z

@ranapratap55 can you clarify if the non-caching load and store can work over coarse grained device memory, which is the memory type used in your tests?

yes, it works on coarse-grained memory. I have used the hipMalloc() which allocates coarse-grained memory.

Can you port and test https://github.com/ROCm/rccl/tree/develop/tools/p2p-latency-test to use the new API, and add the commit to this PR? This is more accurate testing for RCCL use cases. Please try changing memory from "uncached" to coarse-grained memory in these tests.

mustafabar · 2025-01-15T14:50:19Z

src/device/non_caching_load.h

+        #else
+            #define BITS "glc slc dlc"
+        #endif
+        #define WAIT ((0 << 14) | (0x3f << 8) | (0x7) << 4)


please add a comment where you got the WAIT values from

mustafabar · 2025-01-15T14:58:11Z

src/device/non_caching_load.h

+__attribute__((always_inline))
+__host__ __device__ T __non_caching_load(const T* p)
+{
+    #if !defined(__GFX11__) && !defined(GFX12)


There are two families of instructions (e.g *_u8 vs *_ubyte). These need to have some coverage in the tests pointed out by @wenkaidu by targeting 1 arch from each.

mustafabar · 2025-01-15T15:04:03Z

tools/non-caching-store/non-caching-store.cpp

+    return;
+}
+
+int main(int argc, char **argv)


Can you add validation cpp?
For example, cover memory consistency cases of read-after-write, write-after-write, read-after-read and so on

Adds API for non-caching-store

44bbfe5

ranapratap55 requested review from wenkaidu, gilbertlee-amd, akolliasAMD, edgargabriel, PedramAlizadeh, nusislam, nileshnegi, KawtharShafie, AtlantaPepsi, mberenjk, corey-derochie-amd, mustafabar, thananon and haripriya-amd as code owners January 2, 2025 07:28

Adds API for non-caching-load

d13355b

ranapratap55 changed the title ~~Adds API for non-caching-store~~ Adds API for non-caching-store and load Jan 2, 2025

mustafabar reviewed Jan 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds API for non-caching-store and load #1476

Adds API for non-caching-store and load #1476

ranapratap55 commented Jan 2, 2025

ranapratap55 commented Jan 2, 2025 •

edited

Loading

wenkaidu commented Jan 2, 2025

ranapratap55 commented Jan 8, 2025 •

edited

Loading

wenkaidu commented Jan 8, 2025

mustafabar Jan 15, 2025

mustafabar Jan 15, 2025

mustafabar Jan 15, 2025

Adds API for non-caching-store and load #1476

Are you sure you want to change the base?

Adds API for non-caching-store and load #1476

Conversation

ranapratap55 commented Jan 2, 2025

ranapratap55 commented Jan 2, 2025 • edited Loading

wenkaidu commented Jan 2, 2025

ranapratap55 commented Jan 8, 2025 • edited Loading

wenkaidu commented Jan 8, 2025

mustafabar Jan 15, 2025

Choose a reason for hiding this comment

mustafabar Jan 15, 2025

Choose a reason for hiding this comment

mustafabar Jan 15, 2025

Choose a reason for hiding this comment

ranapratap55 commented Jan 2, 2025 •

edited

Loading

ranapratap55 commented Jan 8, 2025 •

edited

Loading