cuda_stub for Windows #6993

metab0t · 2023-11-14T13:24:34Z

This commit b021ae8 introduces cuda_stub but it is not implemented for Windows.
Can we run make_stub.py under Windows?

The text was updated successfully, but these errors were encountered:

hawkinsp · 2023-11-17T13:02:34Z

What happens when you try it on Windows? It might just need a BUILD file change.

I thought about this when making that change, and my conclusion about Windows was "this will probably work, under the Windows x64 ABI convention".

The use of a GNU-syntax assembler script might be problematic; we might need to use nasm instead.

metab0t · 2023-11-17T15:22:30Z

It just reaches this branch

xla/third_party/tsl/tsl/cuda/stub.bzl

Line 24 in 5f3417f

"//conditions:default": "NOT_IMPLEMENTED_FOR_THIS_PLATFORM_OR_ARCHITECTURE",

and errors: "NOT_IMPLEMENTED_FOR_THIS_PLATFORM_OR_ARCHITECTURE"

I don't have access to the GPU machine during weekend and will post the error log next week if needed.

metab0t · 2023-11-19T13:20:50Z

The error log:

ERROR: C:/users/yangyue/_bazel_yangyue/736lra7s/external/tsl/tsl/cuda/BUILD.bazel:243:10: Executing genrule @tsl//tsl/cuda:cusparse_stub_gen failed: (Exit 127): bash.exe failed: error executing command (from target @tsl//tsl/cuda:cusparse_stub_gen)
  cd /d C:/users/yangyue/_bazel_yangyue/736lra7s/execroot/__main__
  SET CUDA_TOOLKIT_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3
    SET CUDNN_INSTALL_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3
    SET PATH=D:\msys64\usr\bin;D:\msys64\bin;C:\Windows;C:\Windows\System32;C:\Windows\System32\WindowsPowerShell\v1.0
    SET RUNFILES_MANIFEST_ONLY=1
    SET TF_CUDA_COMPUTE_CAPABILITIES=sm_52,sm_60,sm_70,sm_80,compute_90
    SET TF_CUDA_PATHS=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3
    SET TF_CUDA_VERSION=12.3
    SET TF_CUDNN_VERSION=8.9.5
  D:\msys64\usr\bin\bash.exe -c source external/bazel_tools/tools/genrule/genrule-setup.sh; NOT_IMPLEMENTED_FOR_THIS_PLATFORM_OR_ARCHITECTURE
# Configuration: 596e7d16644464d93740f9db8a95bec414df1574424c281b4be464edd2aa5f19
# Execution platform: @local_execution_config_platform//:platform
/usr/bin/bash: line 1: NOT_IMPLEMENTED_FOR_THIS_PLATFORM_OR_ARCHITECTURE: command not found
Target //jaxlib/tools:build_wheel failed to build
INFO: Elapsed time: 267.232s, Critical Path: 0.45s
INFO: 30 processes: 25 internal, 5 local.
FAILED: Build did NOT complete successfully

hawkinsp · 2023-11-20T13:33:47Z

Well, note that XLA on GPU on Windows is community-supported. So we'll welcome PRs, but you'll have to drive this!

For this specific issue: try adding a condition that handles Windows x86_64 the same way as Linux x86_64?

eaplatanios · 2024-07-28T22:29:38Z

@hawkinsp I've been trying to get CUDA support to compile on Windows and among other issues, I bumped into this one. I tried adding a case for Windows and the stubs are generated, but then when trying to compile them I get a bunch of compilation errors about that assembly code not having the right syntax (even the starting block comments in each file result in errors).

I can get past these compilation errors if I remove the *.tramp.S files from the sources though this seems wrong but I don't understand what's going on exactly with these stubs to be able to debug this.

FYI I've started putting up PRs with other changes required for Windows support (#15444, #15448).

eaplatanios · 2024-07-30T09:02:01Z

Ok I finally resolved all other issues and was able to compile the library by excluding the .tramp.S files. This of course results in linking errors down the line. I looked a bit into compiling those files and I think I cannot actually compile them using MSVC due to the syntax being unsupported. I'm guessing this is directly related to your earlier comment @hawkinsp:

The use of a GNU-syntax assembler script might be problematic; we might need to use nasm instead.

I'm not really familiar at all with these assembly files and their syntax. Do you know what a good next step for how to proceed would be?

metab0t · 2024-07-30T09:45:38Z

It seems that the trampoline written in assembly in https://github.com/yugr/Implib.so should be ported to Windows.

In fact, I wonder why xla needs to build the stubs as binary when it already knows all the names of functions, which can be looked up dynamically in runtime.

eaplatanios · 2024-07-30T23:19:55Z

@metab0t is this what you are suggesting? It compiles successfully and the linker seems happy with it so I think it might be good.

@metab0t

Imported from GitHub PR openxla/xla#15518 I'm not 100% that I'm doing the right thing but I'll just say that after this, I got rid of some compilation errors on Windows and the linker seems to be happy so I think it may be ok. But let me describe my reasoning: 1. This [commit](openxla/xla@b021ae8) added support for lazily loading symbols from the CUDA shared libraries. 2. As discussed in [this issue](openxla/xla#6993), the current approach is not supported on Windows due to the use of GNU assembly (it results in compilation errors with MSVC). 3. After reading a little into this, I believe that Windows libraries built with MSVC already load the linked dynamic libraries lazily and so it appears that this trampoline mechanism is not needed on Windows. 4. For this reason, I made the trampoline bits conditional on not being on Windows hoping that the symbols will not directly be resolved to the CUDA dependencies that are coming in via the CUDA header files. @metab0t is this what you were suggesting in the end of that discussion? cc @ddunl (this should be the last PR that touches the TSL code for Windows CUDA support) Copybara import of the project: -- ccd500cea1fa08f65c680d67b02d444c29422981 by eaplatanios <[email protected]>: Added support for compiling the CUDA stubs on Windows. Merging this change closes #15518 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15518 from eaplatanios:u/eaplatanios/cuda-stubs-windows ccd500cea1fa08f65c680d67b02d444c29422981 PiperOrigin-RevId: 659724373

@metab0t

Imported from GitHub PR openxla/xla#15518 I'm not 100% that I'm doing the right thing but I'll just say that after this, I got rid of some compilation errors on Windows and the linker seems to be happy so I think it may be ok. But let me describe my reasoning: 1. This [commit](openxla/xla@b021ae8) added support for lazily loading symbols from the CUDA shared libraries. 2. As discussed in [this issue](openxla/xla#6993), the current approach is not supported on Windows due to the use of GNU assembly (it results in compilation errors with MSVC). 3. After reading a little into this, I believe that Windows libraries built with MSVC already load the linked dynamic libraries lazily and so it appears that this trampoline mechanism is not needed on Windows. 4. For this reason, I made the trampoline bits conditional on not being on Windows hoping that the symbols will not directly be resolved to the CUDA dependencies that are coming in via the CUDA header files. @metab0t is this what you were suggesting in the end of that discussion? cc @ddunl (this should be the last PR that touches the TSL code for Windows CUDA support) Copybara import of the project: -- ccd500cea1fa08f65c680d67b02d444c29422981 by eaplatanios <[email protected]>: Added support for compiling the CUDA stubs on Windows. Merging this change closes #15518 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15518 from eaplatanios:u/eaplatanios/cuda-stubs-windows ccd500cea1fa08f65c680d67b02d444c29422981 PiperOrigin-RevId: 659724373

@metab0t

Imported from GitHub PR openxla/xla#15518 I'm not 100% that I'm doing the right thing but I'll just say that after this, I got rid of some compilation errors on Windows and the linker seems to be happy so I think it may be ok. But let me describe my reasoning: 1. This [commit](openxla/xla@b021ae8) added support for lazily loading symbols from the CUDA shared libraries. 2. As discussed in [this issue](openxla/xla#6993), the current approach is not supported on Windows due to the use of GNU assembly (it results in compilation errors with MSVC). 3. After reading a little into this, I believe that Windows libraries built with MSVC already load the linked dynamic libraries lazily and so it appears that this trampoline mechanism is not needed on Windows. 4. For this reason, I made the trampoline bits conditional on not being on Windows hoping that the symbols will not directly be resolved to the CUDA dependencies that are coming in via the CUDA header files. @metab0t is this what you were suggesting in the end of that discussion? cc @ddunl (this should be the last PR that touches the TSL code for Windows CUDA support) Copybara import of the project: -- ccd500cea1fa08f65c680d67b02d444c29422981 by eaplatanios <[email protected]>: Added support for compiling the CUDA stubs on Windows. Merging this change closes #15518 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15518 from eaplatanios:u/eaplatanios/cuda-stubs-windows ccd500cea1fa08f65c680d67b02d444c29422981 PiperOrigin-RevId: 659724373

@metab0t

Imported from GitHub PR openxla/xla#15518 I'm not 100% that I'm doing the right thing but I'll just say that after this, I got rid of some compilation errors on Windows and the linker seems to be happy so I think it may be ok. But let me describe my reasoning: 1. This [commit](openxla/xla@b021ae8) added support for lazily loading symbols from the CUDA shared libraries. 2. As discussed in [this issue](openxla/xla#6993), the current approach is not supported on Windows due to the use of GNU assembly (it results in compilation errors with MSVC). 3. After reading a little into this, I believe that Windows libraries built with MSVC already load the linked dynamic libraries lazily and so it appears that this trampoline mechanism is not needed on Windows. 4. For this reason, I made the trampoline bits conditional on not being on Windows hoping that the symbols will not directly be resolved to the CUDA dependencies that are coming in via the CUDA header files. @metab0t is this what you were suggesting in the end of that discussion? cc @ddunl (this should be the last PR that touches the TSL code for Windows CUDA support) Copybara import of the project: -- ccd500cea1fa08f65c680d67b02d444c29422981 by eaplatanios <[email protected]>: Added support for compiling the CUDA stubs on Windows. Merging this change closes #15518 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#15518 from eaplatanios:u/eaplatanios/cuda-stubs-windows ccd500cea1fa08f65c680d67b02d444c29422981 PiperOrigin-RevId: 659724373

vam-google · 2024-08-15T00:51:13Z

What is the final goal of this effort? Note, as for the stubs (including *.tramp.S part but not only): ELF (linux) and PE (Windows) dynamic linking models are very different, with Windows being much more demanding than Linux, and there is a big gap between a dynamic library working on Linux and similar approach working on Windows. Complications of getting windows .dll's to work include:

need to have exported symbols to be explicitly defined either via .def file or via __declspec(dllexport) keyword;
making sure that exported symbols are properly consumed via __declspec(dllimport) (otherwise it would fail miserably if the exported symbol is DATA);
need to create import library to accompany the .dll;
need to create export library in case bidirectional dependency between two .dlls
final dlls must include direct symbol->source.dll binding map for all symbols resolved dynamically, and that map is basically hardcoded inside each dll on linking-time (which complicates the build process a lot), unlike in ELF, when symbol->source.so binding is done in runtime
... many other small windows-specific peculiarities which you normally don't know about until they hit you on the head.

What I'm trying to say is that unless this has intention to also have a proper support of CUDA in runtime (i.r. run legit cuda-dependent tests) trying to make it just compile for windows is more likely to cause more harm than good. At the same time, proper maintaining of working CUDA on windows is a colossal task, which I don't think is possible without having a dedicated team (or at least a few people) and entire build/test infrastructure behind it. If there is no plan to have all that I truly don't think we should be trying to make this compile under Windows, because compiling it does not bring us much closer to it actually working, but introduces windows-specific logic in the build, which is not being properly tested but has to be maintained.

eaplatanios · 2024-08-30T17:53:28Z

@vam-google I responded here mentioning that I decided to give up on this. Dynamic linking on Windows is beyond my knowledge and this turned out to be a lot more work than I was expecting to be worth it. Instead I'll try to figure something out using WSL and the Linux CUDA build of XLA.

penpornk added enhancement New feature or request GPU XLA on GPU labels Feb 29, 2024

This was referenced Jul 30, 2024

Support TSL compilation on Windows with CUDA support. #15499

Closed

Added support for compiling the CUDA stubs on Windows. #15518

Closed

copybara-service bot mentioned this issue Aug 6, 2024

PR #15518: Added support for compiling the CUDA stubs on Windows. tensorflow/tensorflow#73195

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda_stub for Windows #6993

cuda_stub for Windows #6993

metab0t commented Nov 14, 2023

hawkinsp commented Nov 17, 2023

metab0t commented Nov 17, 2023

metab0t commented Nov 19, 2023

hawkinsp commented Nov 20, 2023

eaplatanios commented Jul 28, 2024 •

edited

Loading

eaplatanios commented Jul 30, 2024

metab0t commented Jul 30, 2024 •

edited

Loading

eaplatanios commented Jul 30, 2024 •

edited

Loading

vam-google commented Aug 15, 2024 •

edited

Loading

eaplatanios commented Aug 30, 2024

cuda_stub for Windows #6993

cuda_stub for Windows #6993

Comments

metab0t commented Nov 14, 2023

hawkinsp commented Nov 17, 2023

metab0t commented Nov 17, 2023

metab0t commented Nov 19, 2023

hawkinsp commented Nov 20, 2023

eaplatanios commented Jul 28, 2024 • edited Loading

eaplatanios commented Jul 30, 2024

metab0t commented Jul 30, 2024 • edited Loading

eaplatanios commented Jul 30, 2024 • edited Loading

vam-google commented Aug 15, 2024 • edited Loading

eaplatanios commented Aug 30, 2024

eaplatanios commented Jul 28, 2024 •

edited

Loading

metab0t commented Jul 30, 2024 •

edited

Loading

eaplatanios commented Jul 30, 2024 •

edited

Loading

vam-google commented Aug 15, 2024 •

edited

Loading