Account for RAFT update to SNMG APIs #454

viclafargue · 2024-11-08T17:25:54Z

NCCL clique initialization function PR in RAFT : rapidsai/raft#2487

Removed instantiation of raft::comms::build_comms_nccl_only (rapidsai/raft#2465)

cjnolet · 2025-01-07T22:03:11Z

cpp/include/cuvs/neighbors/mg.hpp

 * @param[in] index_params configure the index building
 * @param[in] index_dataset a row-major matrix on host [n_rows, dim]
 *
 * @return the constructed IVF-Flat MG index
 */
-auto build(const raft::device_resources& handle,
+auto build(const raft::device_resources_snmg& clique,


One of the main goals with having the "snmg" resources object match the single-gpu object is that we wanted to be able to remove this additional MG. We should now be able to accept device_resources and then check to see if a nccl clique has been set on it (which would imply that it's a multi-gpu resources object and not a single-gpu object). The whole goal with doing this was to consolidate code paths.

The NCCL clique is not set as a resource anymore, but we should still be able to implement the dispatching by checking the dynamic type of the device_resources. The real question then is, do we truly want dispatching on both the regular API (cuvs::neighbors::build) and the mg namespace (cuvs::neighbors::mg::build)? It kind of make sense that a user providing a device_resources_snmg instance to the regular API (cuvs::neighbors::build) would want things to be deployed on multiple GPUs. However, the reverse is not necessarily true. A user who explicitly chose the mg namespace, but did not provide the adequate device_resources_snmg would fallback to single GPU, potentially unintentionally. Is this what we want?

I propose that we implement the dispatching mechanism solely on the regular API (cuvs::neighbors::build) in a dedicated follow-up PR? This also allows the MG doc to explicitly avert users that they should use an adequate device_resources_snmg to use the MG API. What do you think?

Account for RAFT update

ee98593

viclafargue requested review from a team as code owners November 8, 2024 17:25

github-actions bot added cpp CMake labels Nov 8, 2024

viclafargue added 6 commits November 15, 2024 14:25

use new device_resources_snmg

3a99b40

improved device_resources_snmg

e16b68e

Merge branch 'branch-24.12' into account-for-raft-update

cdd5cfb

switch from RAFT_LOG_INFO to RAFT_LOG_DEBUG for mg logs

96e69fc

clique as device_resource

657bf9e

updating MG tests

1fdccd4

cjnolet changed the title ~~Account for RAFT update~~ Account for RAFT update to SNMG APIs Dec 18, 2024

cjnolet assigned viclafargue Dec 18, 2024

cjnolet reviewed Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for RAFT update to SNMG APIs #454

Account for RAFT update to SNMG APIs #454

viclafargue commented Nov 8, 2024

cjnolet Jan 7, 2025

viclafargue Jan 8, 2025 •

edited

Loading

Account for RAFT update to SNMG APIs #454

Are you sure you want to change the base?

Account for RAFT update to SNMG APIs #454

Conversation

viclafargue commented Nov 8, 2024

cjnolet Jan 7, 2025

Choose a reason for hiding this comment

viclafargue Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

viclafargue Jan 8, 2025 •

edited

Loading