Added support for compiling the CUDA stubs on Windows. #15518

eaplatanios · 2024-07-30T23:19:17Z

I'm not 100% that I'm doing the right thing but I'll just say that after this, I got rid of some compilation errors on Windows and the linker seems to be happy so I think it may be ok. But let me describe my reasoning:

This commit added support for lazily loading symbols from the CUDA shared libraries.
As discussed in this issue, the current approach is not supported on Windows due to the use of GNU assembly (it results in compilation errors with MSVC).
After reading a little into this, I believe that Windows libraries built with MSVC already load the linked dynamic libraries lazily and so it appears that this trampoline mechanism is not needed on Windows.
For this reason, I made the trampoline bits conditional on not being on Windows hoping that the symbols will not directly be resolved to the CUDA dependencies that are coming in via the CUDA header files.

@metab0t is this what you were suggesting in the end of that discussion?

cc @ddunl (this should be the last PR that touches the TSL code for Windows CUDA support)

@metab0t

Imported from GitHub PR openxla/xla#15518 I'm not 100% that I'm doing the right thing but I'll just say that after this, I got rid of some compilation errors on Windows and the linker seems to be happy so I think it may be ok. But let me describe my reasoning: 1. This [commit](openxla/xla@b021ae8) added support for lazily loading symbols from the CUDA shared libraries. 2. As discussed in [this issue](openxla/xla#6993), the current approach is not supported on Windows due to the use of GNU assembly (it results in compilation errors with MSVC). 3. After reading a little into this, I believe that Windows libraries built with MSVC already load the linked dynamic libraries lazily and so it appears that this trampoline mechanism is not needed on Windows. 4. For this reason, I made the trampoline bits conditional on not being on Windows hoping that the symbols will not directly be resolved to the CUDA dependencies that are coming in via the CUDA header files. @metab0t is this what you were suggesting in the end of that discussion? cc @ddunl (this should be the last PR that touches the TSL code for Windows CUDA support) Copybara import of the project: -- ccd500cea1fa08f65c680d67b02d444c29422981 by eaplatanios <[email protected]>: Added support for compiling the CUDA stubs on Windows. Merging this change closes #15518 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15518 from eaplatanios:u/eaplatanios/cuda-stubs-windows ccd500cea1fa08f65c680d67b02d444c29422981 PiperOrigin-RevId: 659724373

@metab0t

Imported from GitHub PR openxla/xla#15518 I'm not 100% that I'm doing the right thing but I'll just say that after this, I got rid of some compilation errors on Windows and the linker seems to be happy so I think it may be ok. But let me describe my reasoning: 1. This [commit](openxla/xla@b021ae8) added support for lazily loading symbols from the CUDA shared libraries. 2. As discussed in [this issue](openxla/xla#6993), the current approach is not supported on Windows due to the use of GNU assembly (it results in compilation errors with MSVC). 3. After reading a little into this, I believe that Windows libraries built with MSVC already load the linked dynamic libraries lazily and so it appears that this trampoline mechanism is not needed on Windows. 4. For this reason, I made the trampoline bits conditional on not being on Windows hoping that the symbols will not directly be resolved to the CUDA dependencies that are coming in via the CUDA header files. @metab0t is this what you were suggesting in the end of that discussion? cc @ddunl (this should be the last PR that touches the TSL code for Windows CUDA support) Copybara import of the project: -- ccd500cea1fa08f65c680d67b02d444c29422981 by eaplatanios <[email protected]>: Added support for compiling the CUDA stubs on Windows. Merging this change closes #15518 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15518 from eaplatanios:u/eaplatanios/cuda-stubs-windows ccd500cea1fa08f65c680d67b02d444c29422981 PiperOrigin-RevId: 659724373

@metab0t

Imported from GitHub PR openxla/xla#15518 I'm not 100% that I'm doing the right thing but I'll just say that after this, I got rid of some compilation errors on Windows and the linker seems to be happy so I think it may be ok. But let me describe my reasoning: 1. This [commit](openxla/xla@b021ae8) added support for lazily loading symbols from the CUDA shared libraries. 2. As discussed in [this issue](openxla/xla#6993), the current approach is not supported on Windows due to the use of GNU assembly (it results in compilation errors with MSVC). 3. After reading a little into this, I believe that Windows libraries built with MSVC already load the linked dynamic libraries lazily and so it appears that this trampoline mechanism is not needed on Windows. 4. For this reason, I made the trampoline bits conditional on not being on Windows hoping that the symbols will not directly be resolved to the CUDA dependencies that are coming in via the CUDA header files. @metab0t is this what you were suggesting in the end of that discussion? cc @ddunl (this should be the last PR that touches the TSL code for Windows CUDA support) Copybara import of the project: -- ccd500cea1fa08f65c680d67b02d444c29422981 by eaplatanios <[email protected]>: Added support for compiling the CUDA stubs on Windows. Merging this change closes #15518 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15518 from eaplatanios:u/eaplatanios/cuda-stubs-windows ccd500cea1fa08f65c680d67b02d444c29422981 PiperOrigin-RevId: 659724373

@metab0t

Imported from GitHub PR openxla/xla#15518 I'm not 100% that I'm doing the right thing but I'll just say that after this, I got rid of some compilation errors on Windows and the linker seems to be happy so I think it may be ok. But let me describe my reasoning: 1. This [commit](openxla/xla@b021ae8) added support for lazily loading symbols from the CUDA shared libraries. 2. As discussed in [this issue](openxla/xla#6993), the current approach is not supported on Windows due to the use of GNU assembly (it results in compilation errors with MSVC). 3. After reading a little into this, I believe that Windows libraries built with MSVC already load the linked dynamic libraries lazily and so it appears that this trampoline mechanism is not needed on Windows. 4. For this reason, I made the trampoline bits conditional on not being on Windows hoping that the symbols will not directly be resolved to the CUDA dependencies that are coming in via the CUDA header files. @metab0t is this what you were suggesting in the end of that discussion? cc @ddunl (this should be the last PR that touches the TSL code for Windows CUDA support) Copybara import of the project: -- ccd500cea1fa08f65c680d67b02d444c29422981 by eaplatanios <[email protected]>: Added support for compiling the CUDA stubs on Windows. Merging this change closes #15518 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15518 from eaplatanios:u/eaplatanios/cuda-stubs-windows ccd500cea1fa08f65c680d67b02d444c29422981 PiperOrigin-RevId: 659724373

dimitar-asenov · 2024-08-07T10:38:53Z

This fails our Ubuntu CI tests, because the linker can't find a ton of CUDA symbols. Likely what's happening is that the new config causes the necessary .S and/or .cc files to not be linked at all. However, I couldn't quite figure out what exactly is the issue. I suspect that something is going wrong when replacing if_cuda_is_configured with @local_config_cuda, but I'm not sure.

@eaplatanios Can you have a look, do you know what might be going wrong?

dimitar-asenov · 2024-08-07T13:38:04Z

I suspect that what's going wrong is that if_cuda_is_configured was replaced with is_cuda_enabled. According to the documentation of these rules, the former does not need --config=cuda, whereas the latter does. So the condition became stricter and is no longer true, leading to the symbols being undefined.

dimitar-asenov · 2024-08-07T14:16:31Z

@chsigg I see that a long time ago, you made some changes to the if_cuda_is_configured macro and friends. Do you have a suggestion for what can be done here to make this PR work?

beckerhe · 2024-08-07T14:33:56Z

Hey,
sorry for reacting so late. I have a few concerns with this change:

We are currently trying to make a bigger change on how we consume CUDA in XLA. PR is here: Introduce hermetic CUDA in Google ML projects. #10673 This change will interfere with that, so I would prefer if we land this after the mentioned PR gets merged. (Sorry for the inconvenience - it's a massive change that affects XLA, Tensorflow, and JAX.)
We shouldn't modify the trampoline files since these are mostly autogenerated. Manually updating them would make it harder to update them. In fact I would like to have them autogenerated during the build automatically instead of having a copy checked in. Instead I suggest the following: 1. Rename the cublas cc-library rule into cublas_stub. 2. Then create an alias rule alias(name = "cublas", actual = select(...)) which either depends on cublas_stub or on the library itself. Something similar will be done as part of Introduce hermetic CUDA in Google ML projects. #10673 and can probably be reused/extended for that purpose. Have a look at the changes of xla/tsl/cuda/BUILD.bazel in Introduce hermetic CUDA in Google ML projects. #10673 for more details.

Henning

vam-google · 2024-08-15T00:53:57Z

@eaplatanios @ddunl Please check this comment in a parallel thread #6993 (comment). tl;dr; unless there is a plan to give proper cuda support on windows for XLA, which would be a huge task, I don't think we shoudl be trying to compile cuda-specific code on Windows.

eaplatanios · 2024-08-30T17:51:58Z

@vam-google that sounds fine to me. Sorry for the late response; I was traveling over the past month. I also decided to give up on CUDA support on Windows for XLA basically for the reason you mentioned. It does seem to be a huge task and I can just sort out something to work with WSL on Windows rather than take this on right now. Thanks for reviewing this!

Added support for compiling the CUDA stubs on Windows.

ccd500c

eaplatanios mentioned this pull request Jul 30, 2024

cuda_stub for Windows #6993

Open

NaiyerRizz requested a review from ezhulenev July 31, 2024 06:07

NaiyerRizz self-assigned this Jul 31, 2024

ezhulenev requested a review from ddunl July 31, 2024 17:02

dimitar-asenov requested a review from beckerhe August 5, 2024 08:54

ddunl added the kokoro:force-run Forces CI to rerun label Aug 5, 2024

ddunl approved these changes Aug 5, 2024

View reviewed changes

kokoro-team removed the kokoro:force-run Forces CI to rerun label Aug 5, 2024

copybara-service bot mentioned this pull request Aug 6, 2024

PR #15518: Added support for compiling the CUDA stubs on Windows. tensorflow/tensorflow#73195

Draft

dimitar-asenov requested a review from chsigg August 7, 2024 14:15

golechwierowicz self-requested a review August 9, 2024 08:31

golechwierowicz removed their request for review August 20, 2024 08:15

eaplatanios closed this Aug 30, 2024

eaplatanios mentioned this pull request Aug 30, 2024

Fixed remaining issues for Windows CUDA support. #15565

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for compiling the CUDA stubs on Windows. #15518

Added support for compiling the CUDA stubs on Windows. #15518

eaplatanios commented Jul 30, 2024

dimitar-asenov commented Aug 7, 2024

dimitar-asenov commented Aug 7, 2024

dimitar-asenov commented Aug 7, 2024 •

edited

Loading

beckerhe commented Aug 7, 2024 •

edited

Loading

vam-google commented Aug 15, 2024

eaplatanios commented Aug 30, 2024

Added support for compiling the CUDA stubs on Windows. #15518

Added support for compiling the CUDA stubs on Windows. #15518

Conversation

eaplatanios commented Jul 30, 2024

dimitar-asenov commented Aug 7, 2024

dimitar-asenov commented Aug 7, 2024

dimitar-asenov commented Aug 7, 2024 • edited Loading

beckerhe commented Aug 7, 2024 • edited Loading

vam-google commented Aug 15, 2024

eaplatanios commented Aug 30, 2024

dimitar-asenov commented Aug 7, 2024 •

edited

Loading

beckerhe commented Aug 7, 2024 •

edited

Loading