Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow some of the sparse utility functions to handle larger matrices #2541

Open
wants to merge 33 commits into
base: branch-25.04
Choose a base branch
from

Conversation

viclafargue
Copy link
Contributor

@viclafargue viclafargue requested a review from a team as a code owner January 14, 2025 15:22
@github-actions github-actions bot added the cpp label Jan 14, 2025
Copy link

copy-pr-bot bot commented Jan 16, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@pablete
Copy link

pablete commented Jan 22, 2025

Any updates here?
Thanks for working on the fix for very large matrices!

@viclafargue
Copy link
Contributor Author

viclafargue commented Jan 22, 2025

Any updates here? Thanks for working on the fix for very large matrices!

These updates should fix some of the RAFT utilities to handle larger matrices and allow cuML's UMAP to process very large datasets. It is ready for review.

@viclafargue viclafargue changed the title Fix sparse utilities issues with large matrices Allow some of the sparse utility functions to handle larger matrices Jan 22, 2025
@dantegd dantegd added the improvement Improvement / enhancement to an existing function label Jan 22, 2025
@cjnolet cjnolet removed the improvement Improvement / enhancement to an existing function label Jan 23, 2025
@cjnolet
Copy link
Member

cjnolet commented Jan 23, 2025

/ok to test

@dantegd
Copy link
Member

dantegd commented Jan 28, 2025

/ok to test

@dantegd
Copy link
Member

dantegd commented Jan 28, 2025

/ok to test

Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes lgtm, @wphicks @divyegala maybe you want to take a second look, but tests seems to pass fine all around

@cjnolet
Copy link
Member

cjnolet commented Jan 28, 2025

Thanks @dantegd. I've asked @viclafargue to test the cuML side to make sure the hardcoded changes from uint32 to uint64 aren't going to cause any perf regressions or concerns.

@divyegala divyegala changed the base branch from branch-25.02 to branch-25.04 February 1, 2025 00:23
@divyegala
Copy link
Member

/ok to test

@divyegala
Copy link
Member

/ok to test

@divyegala
Copy link
Member

/ok to test

@divyegala
Copy link
Member

/ok to test

1 similar comment
@cjnolet
Copy link
Member

cjnolet commented Feb 4, 2025

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Feb 4, 2025

/ok to test

@divyegala
Copy link
Member

/ok to test

{
detail::coo_degree_scalar<64>(rows, vals, nnz, scalar, results, stream);
detail::coo_degree_scalar<64>(rows, vals, (uint64_t)nnz, scalar, results, stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we miss one here? Why's this hardcoded?

{
int row = (blockIdx.x * TPB_X) + threadIdx.x;
if (row < nnz) { atomicAdd(results + rows[row], (T)1); }
uint64_t row = (blockIdx.x * TPB_X) + threadIdx.x;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nnz_t?

{
int row = (blockIdx.x * TPB_X) + threadIdx.x;
if (row < nnz && vals[row] != 0.0) { raft::myAtomicAdd(results + rows[row], 1); }
}

template <int TPB_X = 64, typename T>
template <uint64_t TPB_X = 64, typename T, typename outT, typename nnz_t>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shuold be int always because it's an int type. Cast to nnz_t in place if you need this to match another type.

{
int row = (blockIdx.x * TPB_X) + threadIdx.x;
if (row < nnz && vals[row] != scalar) { raft::myAtomicAdd(results + rows[row], 1); }
uint64_t row = (blockIdx.x * TPB_X) + threadIdx.x;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too- why hardcoded? this should be nnz_t

@@ -90,9 +90,9 @@ RAFT_KERNEL coo_degree_scalar_kernel(
* @param results: output row counts
* @param stream: cuda stream to use
*/
template <int TPB_X = 64, typename T>
template <uint64_t TPB_X = 64, typename T, typename outT, typename nnz_t>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above w/ int.

@@ -104,7 +104,7 @@ RAFT_KERNEL coo_symmetrize_kernel(int* row_ind,
// Note that if we did find a match, we don't need to
// compute `res` on it here because it will be computed
// in a different thread.
if (!found_match && vals[idx] != 0.0) {
if (!found_match && cur_val != 0.0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little apprehensive about this. This is changing the actual value of this... are we sure this is correct?

@@ -142,7 +142,7 @@ void coo_symmetrize(COO<T>* in,

ASSERT(!out->validate_mem(), "Expecting unallocated COO for output");

rmm::device_uvector<int> in_row_ind(in->n_rows, stream);
rmm::device_uvector<uint64_t> in_row_ind(in->n_rows, stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nnz_t

cudaStream_t stream)
{
rmm::device_uvector<int> ex_scan(n, stream);
rmm::device_uvector<int> cur_ex_scan(n, stream);
rmm::device_uvector<uint64_t> ex_scan(n, stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nnz_t? no hardcoding please.

@@ -83,10 +83,11 @@ void coo_sort(IdxT m, IdxT n, IdxT nnz, IdxT* rows, IdxT* cols, T* vals, cudaStr
* @param in: COO to sort by row
* @param stream: the cuda stream to use
*/
template <typename T, typename IdxT = int>
void coo_sort(COO<T, IdxT>* const in, cudaStream_t stream)
template <typename T, typename IdxT = int, typename nnz_t = uint64_t>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the defaul there? Can we get away without it?

rmm::device_uvector<int> cur_ex_scan(n, stream);
rmm::device_uvector<uint64_t> ex_scan(n, stream);
rmm::device_uvector<uint64_t> cur_ex_scan(n, stream);
RAFT_CUDA_TRY(cudaMemsetAsync(ex_scan.data(), 0, (nnz_t)n * sizeof(uint64_t), stream));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be sizeof(nnz_t)?

@@ -151,7 +152,7 @@ int performLanczosIteration(raft::resources const& handle,

RAFT_EXPECTS(A != nullptr, "Null matrix pointer.");

index_type_t n = A->nrows_;
uint64_t n = A->nrows_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nnz_type_t?

@@ -1160,7 +1162,7 @@ int computeLargestEigenvectors(
constexpr value_type_t zero = 0;

// Matrix dimension
index_type_t n = A->nrows_;
uint64_t n = A->nrows_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nnz_type_t?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cpp non-breaking Non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants