Enable Triton Auto-tuning in XLA #81

zoranjovanovic-ns · 2024-12-15T00:03:33Z

No description provided.

third_party/triton/temporary/fix_InsertInstructionSchedHints.patch

i-chaochen · 2025-01-08T11:53:14Z

xla/service/gpu/autotuning/gemm_fusion_autotuner_test.cc

@@ -565,6 +607,9 @@ ENTRY e {

 // TODO(b/344770374): Make this test not fragile.
 TEST_F(GemmFusionAutotunerTest, DoNotRunAutotuningKernelSpillingRegisters) {
+  if (isRocm()) {
+    GTEST_SKIP() << "Not supported on ROCm.";
+  }
  const std::string kHloText = R"(


hmmm...does triton rocm auotune not have register spliing prevention?

It seems that test case does not trigger register spilling on ROCm.

oh, yes, that rings a bell.....I remember you mentioned it and we had discussion before long while ago....! it's more about test case itself. But do we know how does triton triggers the register spilling on ROCm?

Probably need to create corresponding test case for ROCm, but did not had time yet to focus on it.

i-chaochen · 2025-01-08T11:54:31Z

xla/service/gpu/autotuning/gemm_fusion_autotuner_test.cc

@@ -758,6 +803,9 @@ ENTRY main {
 }

 TEST_F(GemmFusionAutotunerDumpTest, DumpingWorks) {
+  if (isRocm()) {
+    GTEST_SKIP() << "cuBLAS not selected on ROCM.";
+  }


does this not fallback to rocblas or hipblaslt?

If I remember correctly, on ROCm Triton is selected, it seems that the difference between rocblas and Triton is small and and ROCm Triton is selected.

I thought it's designed for gemm auotuner fallback to cublas when triton gemm is not good enough. But after I checked gemm_fusion_autotuner.cc, correct me if I'm wrong, this fallback flag is for some partcilar cases which is not related to gemm fusion autotuner over cudnn, triton, cublas and custom kernel.

except this one, it only used here as well

xla/xla/service/gpu/gpu_compiler_test.cc

Lines 547 to 575 in 2bd6f7e

TEST_P(FloatNormalizationTest, Fp8Normalization) {

// TODO(b/344573710) Make this test not require a GPU when AutotuneCacheKey is

// more stable.

const PrimitiveType lhs_type = GetParam().first;

const PrimitiveType rhs_type = GetParam().second;

const std::string lhs_name =

primitive_util::LowercasePrimitiveTypeName(lhs_type);

const std::string rhs_name =

primitive_util::LowercasePrimitiveTypeName(rhs_type);

const std::string module_str = absl::Substitute(R"(

HloModule sch

ENTRY main {

parameter = $0[1600,1600]{1,0} parameter(0)

parameter.1 = $1[1600,1600]{1,0} parameter(1)

neg = $1[1600,1600]{1,0} negate(parameter.1)

dot = f16[1600,1600]{1,0} dot(parameter,neg), lhs_contracting_dims={1}, rhs_contracting_dims={0}

constant = f16[] constant(0)

broadcast = f16[1600,1600]{1,0} broadcast(constant), dimensions={}

ROOT maximum = f16[1600,1600]{1,0} maximum(dot,broadcast)

})",

lhs_name, rhs_name);

auto optimize_module = [&](bool enable_triton, bool enable_blas,

bool enable_blas_fallback)

-> absl::StatusOr<std::unique_ptr<HloModule>> {

HloModuleConfig config;

DebugOptions debug_options = GetDebugOptionsForTest();

debug_options.set_xla_gpu_cublas_fallback(enable_blas_fallback);

i-chaochen · 2025-01-08T11:56:38Z

xla/service/gpu/autotuning/gemm_fusion_autotuner_test.cc

@@ -1148,6 +1202,9 @@ TEST_F(GemmFusionAutotunerTest, CreatesCustomKernelFusionConfigs) {
 }

 TEST_F(GemmFusionAutotunerTest, GeneratesConfigForUpcastGemmWithPrologue) {
+  if (isRocm()) {
+    GTEST_SKIP() << "Not supported on ROCm.";
+  }


may I ask why GeneratesConfigForUpcastGemmWithPrologue) and GeneratesConfigForUpcastGemmWithPrologueAndEpilogue are not supported? is it because expecting CustomKernelFusionConfig ? If so, could you mention it in the comment?

i-chaochen

LGTM

thanks!

[ROCm] Enable gemm fusion autotuner.

2d9229a

zoranjovanovic-ns requested a review from jayfurmanek December 15, 2024 00:03

[ROCm] Fixed an issue with InstructionSchedHintsPass

2bd6f7e

i-chaochen reviewed Jan 8, 2025

View reviewed changes

third_party/triton/temporary/fix_InsertInstructionSchedHints.patch Show resolved Hide resolved

i-chaochen reviewed Jan 8, 2025

View reviewed changes

i-chaochen approved these changes Jan 8, 2025

View reviewed changes

i-chaochen merged commit f55fc88 into rocm-jaxlib-v0.4.35-qa Jan 8, 2025
6 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Triton Auto-tuning in XLA #81

Enable Triton Auto-tuning in XLA #81

zoranjovanovic-ns commented Dec 15, 2024

i-chaochen Jan 8, 2025

zoranjovanovic-ns Jan 8, 2025

i-chaochen Jan 8, 2025

zoranjovanovic-ns Jan 8, 2025

i-chaochen Jan 8, 2025

zoranjovanovic-ns Jan 8, 2025

i-chaochen Jan 8, 2025 •

edited

Loading

i-chaochen Jan 8, 2025 •

edited

Loading

i-chaochen left a comment

	TEST_P(FloatNormalizationTest, Fp8Normalization) {
	// TODO(b/344573710) Make this test not require a GPU when AutotuneCacheKey is
	// more stable.
	const PrimitiveType lhs_type = GetParam().first;
	const PrimitiveType rhs_type = GetParam().second;
	const std::string lhs_name =
	primitive_util::LowercasePrimitiveTypeName(lhs_type);
	const std::string rhs_name =
	primitive_util::LowercasePrimitiveTypeName(rhs_type);
	const std::string module_str = absl::Substitute(R"(
	HloModule sch

	ENTRY main {
	parameter = $0[1600,1600]{1,0} parameter(0)
	parameter.1 = $1[1600,1600]{1,0} parameter(1)
	neg = $1[1600,1600]{1,0} negate(parameter.1)
	dot = f16[1600,1600]{1,0} dot(parameter,neg), lhs_contracting_dims={1}, rhs_contracting_dims={0}
	constant = f16[] constant(0)
	broadcast = f16[1600,1600]{1,0} broadcast(constant), dimensions={}
	ROOT maximum = f16[1600,1600]{1,0} maximum(dot,broadcast)
	})",
	lhs_name, rhs_name);

	auto optimize_module = [&](bool enable_triton, bool enable_blas,
	bool enable_blas_fallback)
	-> absl::StatusOr<std::unique_ptr<HloModule>> {
	HloModuleConfig config;
	DebugOptions debug_options = GetDebugOptionsForTest();
	debug_options.set_xla_gpu_cublas_fallback(enable_blas_fallback);

Enable Triton Auto-tuning in XLA #81

Enable Triton Auto-tuning in XLA #81

Conversation

zoranjovanovic-ns commented Dec 15, 2024

i-chaochen Jan 8, 2025

Choose a reason for hiding this comment

zoranjovanovic-ns Jan 8, 2025

Choose a reason for hiding this comment

i-chaochen Jan 8, 2025

Choose a reason for hiding this comment

zoranjovanovic-ns Jan 8, 2025

Choose a reason for hiding this comment

i-chaochen Jan 8, 2025

Choose a reason for hiding this comment

zoranjovanovic-ns Jan 8, 2025

Choose a reason for hiding this comment

i-chaochen Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

i-chaochen Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

i-chaochen left a comment

Choose a reason for hiding this comment

i-chaochen Jan 8, 2025 •

edited

Loading

i-chaochen Jan 8, 2025 •

edited

Loading