Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Initial additions for fp32 and windows GPU test support #1778

Merged
merged 29 commits into from
Jun 10, 2024

Conversation

ethanglaser
Copy link
Contributor

@ethanglaser ethanglaser commented Mar 27, 2024

Description

Changes for green CI on GPU validation on devices without fp64 support (includes windows GPU sklearnex validation)

To be merged with infra PR 713

Changes include:

  • Correct sequencing of _convert_to_supported() function for proper dtype
  • Use of result.dtype instead of pytest parameter dtype so that in non fp32-supported systems where fp64 is passed into the estimator, the necessary tolerance is selected
  • Minor general threshold realignments based on initial test results
  • Disabling of forest GPU tests as results are inconsistent, task create to look further into the issue
  • Add required platform for spmd algo examples (spmd backend not created for windows)

@samir-nasibli samir-nasibli changed the title CI: Initial additions for fp32 and windows GPU test support ENH: Initial additions for fp32 and windows GPU test support Apr 3, 2024
@icfaust
Copy link
Contributor

icfaust commented Apr 16, 2024

@ethanglaser PRs #1674 should solve some of the outstanding issues with IncrementalEmpiricalCovariance for fp32 sklearnex testing, but it needs #1795 to be pulled in first. I will make a meeting for #1674, but could you review #1795? Its a relatively small change, already passes CI

@samir-nasibli
Copy link
Contributor

@ethanglaser please rebase your branch

@ethanglaser
Copy link
Contributor Author

ethanglaser commented May 7, 2024

Latest CI with just GPU tests: http://intel-ci.intel.com/ef0e2f47-e2c3-f1f1-aee4-a4bf010d0e2e

@ethanglaser ethanglaser changed the title ENH: Initial additions for fp32 and windows GPU test support TEST: Initial additions for fp32 and windows GPU test support May 8, 2024
@icfaust icfaust marked this pull request as ready for review May 15, 2024 04:40
@icfaust icfaust requested a review from Alexsandruss as a code owner May 15, 2024 04:40
Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small things, otherwise looks fine.

@@ -66,7 +66,8 @@ def test_dense_self_rbf_kernel(queue):
result = rbf_kernel(X, queue=queue)
expected = sklearn_rbf_kernel(X)

assert_allclose(result, expected, rtol=1e-14)
tol = 1e-5 if result.dtype == np.float32 else 1e-14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, a 10^9 change in performance. Would this also warrant an investigation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why 1e-14 was used here specifically - generally we don't go more specific than 1e-7. For rtol, 1e-5 should be acceptable, as this is a typical threshold used in other fp32 testing

@ethanglaser
Copy link
Contributor Author

Should be ready for final review

@ethanglaser ethanglaser requested review from icfaust, homksei and Vika-F May 23, 2024 16:19
@ethanglaser ethanglaser requested a review from avolkov-intel May 23, 2024 16:28
@ethanglaser ethanglaser changed the title TEST: Initial additions for fp32 and windows GPU test support CI: Initial additions for fp32 and windows GPU test support May 23, 2024
@ethanglaser
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor Author

CI with infra branch (includes fp32 + windows GPU validation): http://intel-ci.intel.com/ef2436e8-ef5e-f182-b1e2-a4bf010d0e2e

@@ -85,6 +85,8 @@ def test_generated_dataset(queue, dtype, n_dim, n_cluster):
d, i = nn.fit(rs_centroids).kneighbors(cs)
# We have applied 2 sigma rule once
desired_accuracy = int(0.9973 * n_cluster)
if d.dtype == np.float64:
desired_accuracy = desired_accuracy - 1
Copy link
Contributor

@md-shafiul-alam md-shafiul-alam Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the logic behind the desired accuracy -1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it matches a minor threshold change added to test_kmeans (https://github.com/intel/scikit-learn-intelex/blob/main/onedal/cluster/tests/test_kmeans.py#L87) not sure exactly how the desired_accuracy was set but seems the threshold doesn't necessarily matchup with actual performance

Copy link
Contributor

@md-shafiul-alam md-shafiul-alam Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the kmeans++ init issue may get fixed with the changes proposed here uxlfoundation/oneDAL#2796. The onedal code uses 1 trial at the moment.

Copy link
Contributor

@md-shafiul-alam md-shafiul-alam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, but worth waiting for @icfaust about all the force_all_finite business

@ethanglaser ethanglaser merged commit 563c65a into uxlfoundation:main Jun 10, 2024
16 checks passed
@ethanglaser
Copy link
Contributor Author

Thanks for reviews :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants