Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement batched serial syr #2497

Merged
merged 5 commits into from
Feb 11, 2025

Conversation

yasahi-hpc
Copy link
Contributor

This PR implements syr function.

Following files are added:

  1. KokkosBatched_Syr_Serial_Impl.hpp: Internal interfaces
  2. KokkosBatched_Syr_Serial_Internal.hpp: Implementation details
  3. KokkosBatched_Syr.hpp: APIs
  4. Test_Batched_SerialSyr.hpp: Unit tests for that

Detailed description

It performs the rank 1 operation A:= alpha*x*x**T + A ({s,d,c,z}syr) or A:= alpha*x*x**H + A ({c,z}her)
Here, the matrix has the following shape.

  • x: (batch_count, n)
    On entry, it contains the n elements of x.
  • A: (batch_count, lda, n)
    On entry, the leading n by n part of the array A must contain the matrix of coefficients.
    On exit, A is overwritten by the updated matrix.
    It should be noted that only the Upper or Lower triangular part is modified.
    Imaginary parts of the diagonal elements are suppressed for {c, z}her.

Parallelization would be made in the following manner. This is efficient only when
A is given in LayoutLeft for GPUs and LayoutRight for CPUs (parallelized over batch direction).

Kokkos::parallel_for('syr', 
    Kokkos::RangePolicy<execution_space> policy(0, n),
    [=](const int k) {
        auto aa = Kokkos::subview(m_a, k, Kokkos::ALL(), Kokkos::ALL());
        auto xx = Kokkos::subview(m_x k, Kokkos::ALL());

        KokkosBatched::SerialSyr<UploType, TransType>::invoke(xx, aa);
    });

Tests

  1. Make a random x, and A, while copying A into A_ref. The reference A_ref is computed by A_ref:= alpha*x*x**T + A_ref or A_ref:= alpha*x*x**H + A_ref at host. Finally, we confirm A computed by serial ger and A_ref are the same. A == 0 case is tested as well.
  2. Simple and small analytical test, i.e. choose x, and A as follows to confirm A is updated as expected. Both upper and lower cases are tested.
A = [[1, -3, -2,  0],
     [0,  1, -3, -2],
     [0,  0,  1, -3],
     [0,  0,  0,  1]]
x = [1, 2, 3, 4]
Ref = [[ 2.5,  0.,   2.5,  6., ],
       [ 0.,   7.,   6.,  10., ],
       [ 0.,   0.,  14.5, 15., ],
       [ 0.,   0.,   0.,  25., ]]

@lucbv lucbv self-requested a review February 9, 2025 21:55
@lucbv lucbv added the AT2-CI-APPROVAL Approve CI to run at SNL label Feb 9, 2025
Copy link
Contributor

@lucbv lucbv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there not test or implementation for the non-transpose case? Are we expecting that users will not want non-transpose case?

template <>
struct SerialSyr<Uplo::Lower, Trans::NoTranspose>
...

@yasahi-hpc
Copy link
Contributor Author

yasahi-hpc commented Feb 10, 2025

Actually, the Trans template argument is used in a slightly different manner. Usually, Trans arg corresponds to TRANS parameter in lapack which specifies the form of the system of equations. Here, I use Trans arg to specify whether the transpose operation in syr is an usual transpose (Trans::Transpose) or hermitian (Trans::ConjTranspose). Thus, I did not implement Trans::NoTranspose.

Maybe I can introduce Trans::NoTranspose specialization with static_assertion to warn users not to use this specialization. What do you think?

 // Dummy interface, we need transpose for {s,d,c,z}syr or {c,z}her
 template <>
 struct SerialSyr<Uplo::Upper, Trans::NoTranspose> {
   template <typename ScalarType, typename XViewType, typename AViewType>
   KOKKOS_INLINE_FUNCTION static int invoke(const ScalarType /*alpha*/, const XViewType /*&x*/, const AViewType /*&A*/) {
     static_assert(false, "KokkosBatched::syr: Use Trans::Transpose for {s,d,c,z}syr or Trans::ConjTranspose for {c,z}her");
   }
 };

Copy link
Contributor

@lucbv lucbv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some small changes are needed to be more specific about the supported options.

Yuuichi Asahi added 5 commits February 11, 2025 14:54
@yasahi-hpc yasahi-hpc force-pushed the implement-batched-serial-syr branch from d7ac236 to b48a9b2 Compare February 11, 2025 05:55
@yasahi-hpc yasahi-hpc requested a review from lucbv February 11, 2025 07:35
@yasahi-hpc
Copy link
Contributor Author

@lucbv
Thank you for the review. I have fixed accordingly.

@lucbv lucbv merged commit fbb9b8d into kokkos:develop Feb 11, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AT2-CI-APPROVAL Approve CI to run at SNL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants