Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding correlation distance metric in oneDAL primitives #3059

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

richardnorth3
Copy link

@richardnorth3 richardnorth3 commented Feb 3, 2025

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@icfaust
Copy link
Contributor

icfaust commented Feb 3, 2025

inline sycl::event reduce_by_columns(sycl::queue& q,

https://github.com/uxlfoundation/oneDAL/blob/main/cpp/oneapi/dal/backend/primitives/stat/cov.hpp#L34
should help you in calculating the means for each row (column_wise reduction to generate the sum, then calculate the mean using the second function)

this will leverage the GPUs capabilities

ndview<Float, 2>& out,
const event_vector& deps = {}) const {
const std::int64_t n = u.get_dimension(0);
auto u_sum = ndarray<Float, 1>::empty({ 1 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up @richardnorth3 ! https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.correlation.html should be a good reference for the means necessary. The mean doesn't need to be a single value, but a vector the size of a single sample with each being the mean value for each feature (i.e. for each row). This will then be a vector subtraction. Let me know if I am misunderstanding things about your implementation. Good progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey sorry about that, the definition is the mean of the 1d array, so my recommendations have been a little bit wrong.

@richardnorth3
Copy link
Author

richardnorth3 commented Feb 24, 2025

https://github.com/richardnorth3/oneDAL/blob/9b09822dd84b8eb2ee555e08b4576c8ce7d9c0e2/cpp/oneapi/dal/backend/primitives/distance/correlation_distance_dpc.cpp#L50-L95

After heavily mediating over numerous drafts and codebase reviews, I ultimately came back to your original advice @icfaust after an epiphany. This version of the correlation distance reuses most of the cosine distance code but calculates the deviations first, passes those values to get_inversed_norms(), and proceeds with the same functionality as the cosine distance computation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants