Allow some of the sparse utility functions to handle larger matrices #2541

viclafargue · 2025-01-14T15:22:14Z

Answers rapidsai/cuml#6204

copy-pr-bot · 2025-01-16T18:47:26Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

pablete · 2025-01-22T05:54:13Z

Any updates here?
Thanks for working on the fix for very large matrices!

viclafargue · 2025-01-22T13:34:24Z

Any updates here? Thanks for working on the fix for very large matrices!

These updates should fix some of the RAFT utilities to handle larger matrices and allow cuML's UMAP to process very large datasets. It is ready for review.

cjnolet · 2025-01-23T17:51:35Z

/ok to test

dantegd · 2025-01-28T17:00:13Z

/ok to test

dantegd · 2025-01-28T19:07:03Z

/ok to test

dantegd

changes lgtm, @wphicks @divyegala maybe you want to take a second look, but tests seems to pass fine all around

cjnolet · 2025-01-28T19:24:44Z

Thanks @dantegd. I've asked @viclafargue to test the cuML side to make sure the hardcoded changes from uint32 to uint64 aren't going to cause any perf regressions or concerns.

cpp/include/raft/sparse/linalg/detail/degree.cuh

divyegala · 2025-02-01T00:23:58Z

/ok to test

…utilities

…to fix-sparse-utilities

divyegala · 2025-02-03T20:09:18Z

/ok to test

divyegala · 2025-02-03T22:13:03Z

/ok to test

…utilities

divyegala · 2025-02-04T02:01:15Z

/ok to test

cjnolet · 2025-02-04T21:09:50Z

/ok to test

cjnolet · 2025-02-04T21:30:46Z

/ok to test

divyegala · 2025-02-10T22:05:23Z

/ok to test

cjnolet · 2025-02-10T22:44:49Z

cpp/include/raft/sparse/linalg/degree.cuh

 {
-  detail::coo_degree_scalar<64>(rows, vals, nnz, scalar, results, stream);
+  detail::coo_degree_scalar<64>(rows, vals, (uint64_t)nnz, scalar, results, stream);


Did we miss one here? Why's this hardcoded?

cjnolet · 2025-02-10T22:45:08Z

cpp/include/raft/sparse/linalg/detail/degree.cuh

 {
-  int row = (blockIdx.x * TPB_X) + threadIdx.x;
-  if (row < nnz) { atomicAdd(results + rows[row], (T)1); }
+  uint64_t row = (blockIdx.x * TPB_X) + threadIdx.x;


cjnolet · 2025-02-10T22:45:46Z

cpp/include/raft/sparse/linalg/detail/degree.cuh

 {
  int row = (blockIdx.x * TPB_X) + threadIdx.x;
  if (row < nnz && vals[row] != 0.0) { raft::myAtomicAdd(results + rows[row], 1); }
 }

-template <int TPB_X = 64, typename T>
+template <uint64_t TPB_X = 64, typename T, typename outT, typename nnz_t>


This shuold be int always because it's an int type. Cast to nnz_t in place if you need this to match another type.

cjnolet · 2025-02-10T22:45:59Z

cpp/include/raft/sparse/linalg/detail/degree.cuh

 {
-  int row = (blockIdx.x * TPB_X) + threadIdx.x;
-  if (row < nnz && vals[row] != scalar) { raft::myAtomicAdd(results + rows[row], 1); }
+  uint64_t row = (blockIdx.x * TPB_X) + threadIdx.x;


Here too- why hardcoded? this should be nnz_t

cjnolet · 2025-02-10T22:46:07Z

cpp/include/raft/sparse/linalg/detail/degree.cuh

@@ -90,9 +90,9 @@ RAFT_KERNEL coo_degree_scalar_kernel(
 * @param results: output row counts
 * @param stream: cuda stream to use
 */
-template <int TPB_X = 64, typename T>
+template <uint64_t TPB_X = 64, typename T, typename outT, typename nnz_t>


Same as above w/ int.

cjnolet · 2025-02-10T22:47:29Z

cpp/include/raft/sparse/linalg/detail/symmetrize.cuh

@@ -104,7 +104,7 @@ RAFT_KERNEL coo_symmetrize_kernel(int* row_ind,
      // Note that if we did find a match, we don't need to
      // compute `res` on it here because it will be computed
      // in a different thread.
-      if (!found_match && vals[idx] != 0.0) {
+      if (!found_match && cur_val != 0.0) {


I'm a little apprehensive about this. This is changing the actual value of this... are we sure this is correct?

cjnolet · 2025-02-10T22:47:48Z

cpp/include/raft/sparse/linalg/detail/symmetrize.cuh

@@ -142,7 +142,7 @@ void coo_symmetrize(COO<T>* in,

  ASSERT(!out->validate_mem(), "Expecting unallocated COO for output");

-  rmm::device_uvector<int> in_row_ind(in->n_rows, stream);
+  rmm::device_uvector<uint64_t> in_row_ind(in->n_rows, stream);


cjnolet · 2025-02-10T22:48:31Z

cpp/include/raft/sparse/op/detail/filter.cuh

                       cudaStream_t stream)
 {
-  rmm::device_uvector<int> ex_scan(n, stream);
-  rmm::device_uvector<int> cur_ex_scan(n, stream);
+  rmm::device_uvector<uint64_t> ex_scan(n, stream);


nnz_t? no hardcoding please.

cjnolet · 2025-02-10T22:49:32Z

cpp/include/raft/sparse/op/detail/sort.h

@@ -83,10 +83,11 @@ void coo_sort(IdxT m, IdxT n, IdxT nnz, IdxT* rows, IdxT* cols, T* vals, cudaStr
 * @param in: COO to sort by row
 * @param stream: the cuda stream to use
 */
-template <typename T, typename IdxT = int>
-void coo_sort(COO<T, IdxT>* const in, cudaStream_t stream)
+template <typename T, typename IdxT = int, typename nnz_t = uint64_t>


do we need the defaul there? Can we get away without it?

cjnolet · 2025-02-10T22:53:25Z

cpp/include/raft/sparse/op/detail/filter.cuh

-  rmm::device_uvector<int> cur_ex_scan(n, stream);
+  rmm::device_uvector<uint64_t> ex_scan(n, stream);
+  rmm::device_uvector<uint64_t> cur_ex_scan(n, stream);
+  RAFT_CUDA_TRY(cudaMemsetAsync(ex_scan.data(), 0, (nnz_t)n * sizeof(uint64_t), stream));


should this be sizeof(nnz_t)?

cjnolet · 2025-02-10T22:55:20Z

cpp/include/raft/sparse/solver/detail/lanczos.cuh

@@ -151,7 +152,7 @@ int performLanczosIteration(raft::resources const& handle,

  RAFT_EXPECTS(A != nullptr, "Null matrix pointer.");

-  index_type_t n = A->nrows_;
+  uint64_t n = A->nrows_;


nnz_type_t?

cjnolet · 2025-02-10T22:55:47Z

cpp/include/raft/sparse/solver/detail/lanczos.cuh

@@ -1160,7 +1162,7 @@ int computeLargestEigenvectors(
  constexpr value_type_t zero = 0;

  // Matrix dimension
-  index_type_t n = A->nrows_;
+  uint64_t n = A->nrows_;


nnz_type_t?

Fix sparse utilities

2e06ab9

viclafargue requested a review from a team as a code owner January 14, 2025 15:22

github-actions bot added the cpp label Jan 14, 2025

cjnolet assigned viclafargue Jan 16, 2025

viclafargue force-pushed the fix-sparse-utilities branch from 25e924d to dc1ca34 Compare January 16, 2025 18:47

Additionnal fixes

3cd3880

viclafargue force-pushed the fix-sparse-utilities branch from dc1ca34 to 3cd3880 Compare January 21, 2025 17:04

viclafargue added bug Something isn't working non-breaking Non-breaking change labels Jan 22, 2025

viclafargue mentioned this pull request Jan 22, 2025

[BUG] Batched nn-descent UMAP unexpectedly throws OOM error on dataset that should succeed with UVM rapidsai/cuml#6204

Open

viclafargue changed the title ~~Fix sparse utilities issues with large matrices~~ Allow some of the sparse utility functions to handle larger matrices Jan 22, 2025

Revert coo_remove_scalar_kernel code

347ea4e

dantegd added the improvement Improvement / enhancement to an existing function label Jan 22, 2025

Merge branch 'branch-25.02' into fix-sparse-utilities

f2d6e09

cjnolet removed the improvement Improvement / enhancement to an existing function label Jan 23, 2025

viclafargue added 4 commits January 25, 2025 09:58

Merge branch 'branch-25.02' into fix-sparse-utilities

c087b50

check style

92e4af1

compilation fix

dc72acc

fix tests

8e58d29

FIX style fixes

c75cd87

dantegd approved these changes Jan 28, 2025

View reviewed changes

cjnolet reviewed Jan 29, 2025

View reviewed changes

cpp/include/raft/sparse/linalg/detail/degree.cuh Outdated Show resolved Hide resolved

cjnolet reviewed Jan 29, 2025

View reviewed changes

cpp/include/raft/sparse/linalg/detail/degree.cuh Outdated Show resolved Hide resolved

divyegala changed the base branch from branch-25.02 to branch-25.04 February 1, 2025 00:23

Merge branch 'branch-25.04' into fix-sparse-utilities

060426c

divyegala added 3 commits February 3, 2025 20:06

some fixes for cuml

f17e921

Merge remote-tracking branch 'upstream/branch-25.04' into fix-sparse-…

594080c

…utilities

Merge branch 'fix-sparse-utilities' of github.com:viclafargue/raft in…

04aaf9a

…to fix-sparse-utilities

fixes for cuml

5b30389

wphicks mentioned this pull request Feb 3, 2025

Fix Laplacian calculation in spectral partitioning #2568

Draft

divyegala added 2 commits February 3, 2025 23:43

more cuml fixes

a444f90

Merge remote-tracking branch 'upstream/branch-25.04' into fix-sparse-…

b1ebf87

…utilities

csadorf mentioned this pull request Feb 4, 2025

Avoid limited memory adaptor issue in balanced KMeans #2570

Draft

Merge branch 'branch-25.04' into fix-sparse-utilities

4998f95

cjnolet reviewed Feb 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow some of the sparse utility functions to handle larger matrices #2541

Allow some of the sparse utility functions to handle larger matrices #2541

viclafargue commented Jan 14, 2025

copy-pr-bot bot commented Jan 16, 2025

pablete commented Jan 22, 2025

viclafargue commented Jan 22, 2025 •

edited

Loading

cjnolet commented Jan 23, 2025

dantegd commented Jan 28, 2025

dantegd commented Jan 28, 2025

dantegd left a comment

cjnolet commented Jan 28, 2025

divyegala commented Feb 1, 2025

divyegala commented Feb 3, 2025

divyegala commented Feb 3, 2025

divyegala commented Feb 4, 2025

cjnolet commented Feb 4, 2025

cjnolet commented Feb 4, 2025

divyegala commented Feb 10, 2025

cjnolet Feb 10, 2025

cjnolet Feb 10, 2025

cjnolet Feb 10, 2025

cjnolet Feb 10, 2025

cjnolet Feb 10, 2025

cjnolet Feb 10, 2025

cjnolet Feb 10, 2025

cjnolet Feb 10, 2025

cjnolet Feb 10, 2025

cjnolet Feb 10, 2025

cjnolet Feb 10, 2025

cjnolet Feb 10, 2025

Allow some of the sparse utility functions to handle larger matrices #2541

Are you sure you want to change the base?

Allow some of the sparse utility functions to handle larger matrices #2541

Conversation

viclafargue commented Jan 14, 2025

copy-pr-bot bot commented Jan 16, 2025

pablete commented Jan 22, 2025

viclafargue commented Jan 22, 2025 • edited Loading

cjnolet commented Jan 23, 2025

dantegd commented Jan 28, 2025

dantegd commented Jan 28, 2025

dantegd left a comment

Choose a reason for hiding this comment

cjnolet commented Jan 28, 2025

divyegala commented Feb 1, 2025

divyegala commented Feb 3, 2025

divyegala commented Feb 3, 2025

divyegala commented Feb 4, 2025

cjnolet commented Feb 4, 2025

cjnolet commented Feb 4, 2025

divyegala commented Feb 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viclafargue commented Jan 22, 2025 •

edited

Loading