Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow v2.18.0 #408

Merged
merged 23 commits into from
Feb 9, 2025
Merged

Conversation

regro-cf-autotick-bot
Copy link
Contributor

It is very likely that the current package version for this feedstock is out of date.

Checklist before merging this PR:

  • Dependencies have been updated if changed: see upstream
  • Tests have passed
  • Updated license if changed and license_file is packaged

Information about this PR:

  1. Feel free to push to the bot's branch to update this PR if needed.
  2. The bot will almost always only open one PR per version.
  3. The bot will stop issuing PRs if more than 3 version bump PRs generated by the bot are open. If you don't want to package a particular version please close the PR.
  4. If you want these PRs to be merged automatically, make an issue with @conda-forge-admin,please add bot automerge in the title and merge the resulting PR. This command will add our bot automerge feature to your feedstock.
  5. If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase @conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

Pending Dependency Version Updates

Here is a list of all the pending dependency version updates for this repo. Please double check all dependencies before merging.

Name Upstream Version Current Version
bazel 7.4.0 Anaconda-Server Badge
cudnn 9.4.0.58 Anaconda-Server Badge
icu 2023-10-04 Anaconda-Server Badge
libjpeg-turbo 9e Anaconda-Server Badge
protobuf 28.3 Anaconda-Server Badge
tensorflow 2.18.0 Anaconda-Server Badge

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/cf-scripts/actions/runs/11511568857 - please use this URL for debugging.

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@jdblischak jdblischak mentioned this pull request Nov 12, 2024
@jdblischak
Copy link
Member

TensorFlow 2.18.0 supports numpy 2. Can we combine this with the numpy 2 migration in #389?

@hmaarrfk
Copy link
Contributor

yes, but somebody has to the hard work of getting the patches updated.

@h-vetinari
Copy link
Member

yes, but somebody has to the hard work of getting the patches updated.

@xhochy has already done that in #405 (it's also not that hard, I just did it before seeing that the patches are updated in that other PR already).

@njzjz
Copy link
Member

njzjz commented Nov 19, 2024

I cancel the running CI. As pointed at #405 (comment), the CI hangs at configure.py, due to the following change

tensorflow/tensorflow@9b5fa66#diff-4d5f3192809ec1b9add6b33007e0c50031ad9a0a2f3f55a481b506468824db2c

@xhochy
Copy link
Member

xhochy commented Nov 19, 2024

Thanks for the comment! I was a bit away (longer than expected) and did not remember on what my local changes were. They are related to the new hermetic CUDA. Someone should port the stuff I did in jaxlib over here.

@jakirkham
Copy link
Member

jakirkham commented Dec 19, 2024

Side note: It is possible to dlopen the stub library

Here is a simple example of this behavior from Linux ARM

conda create -n tst_cuda_stub python=3.12 ipython cuda-nvcc
conda activate tst_cuda_stub
In [1]: import ctypes

In [2]: ctypes.cdll.LoadLibrary(
   ...:     "/opt/conda/envs/tst/targets/sbsa-linux/lib/stubs/libcuda.so"
   ...: )
Out[2]: <CDLL '/opt/conda/envs/tst/targets/sbsa-linux/lib/stubs/libcuda.so', handle aaaacafa37c0 at 0xffff85e2e720>

Am just not clear on why the TensorFlow wants to load libcuda.so. So am not sure whether we should be handling it this way or not (given this is not really the same as loading the driver library)

@h-vetinari
Copy link
Member

would we retain the ability to use tensorflow compiled with cuda support to run on a CPU only machine?

We have an explicit requirement on __cuda for the CUDA-variant.

# avoid that people without GPUs needlessly download ~0.5-1GB
- __cuda # [cuda_compiler_version != "None"]

So we already don't support that (well, unless someone uses CONDA_CUDA_OVERRIDE). I'm not saying that losing this ability would be desirable, just trying to figure out why it's a concern in the first place.

@hmaarrfk
Copy link
Contributor

So we already don't support that (well, unless someone uses CONDA_CUDA_OVERRIDE). I'm not saying that losing this ability would be desirable, just trying to figure out why it's a concern in the first place.

I think this is a fair question. The reason that CONDA_CUDA_OVERRIDE exists is to allow advanced users to explicitely request CUDA packages when the "installation system" doesn't really support it.

  1. A login node to a super computer might have this. These are typically "interactive nodes" without much "compute". It would be good to not have to duplicate your environments for this.
  2. Creating a system "test" image with one's software.
    • Should be able to test CUDA stuff when a GPU is installed on new hardware.
    • Should be able to test CPU stuff when no GPU is detected (and not fail at import time).

I am personally in camp 2, though, years ago, I was in camp 1.

If I recall correctly, one of the (many) reasons we added __cuda to avoid CPU only users to fill their disk space (often on a storage limited laptop), and to save on their installation time.

@h-vetinari
Copy link
Member

Creating a system "test" image with one's software.

I mean, how well can you test your setup if you're on a system that will end up taking completely different code paths (CPU vs. GPU) compared to the target environment?

In any case, I'm in favour of keeping the ability to run without a GPU driver, but at the same time, I don't think it's worth an extreme maintenance investement if indeed upstream tensorflow now requires that.

@jaimergp
Copy link
Member

jaimergp commented Dec 19, 2024

According to their docs, no GPU should be needed at build time 🤔

image

Or does that mean that build with GPU support does require the actual drivers (and hence a GPU), but CPU-only wheels can be built without them? Nah, it does seem like the former. See this commit: tensorflow/docs@7d4187e

@jakirkham
Copy link
Member

Ok we could add the stub library to the library search path at build time

@njzjz
Copy link
Member

njzjz commented Dec 19, 2024

According to their docs, no GPU should be needed at build time 🤔

image

Or does that mean that build with GPU support does require the actual drivers (and hence a GPU), but CPU-only wheels can be built without them? Nah, it does seem like the former. See this commit: tensorflow/docs@7d4187e

Although bazel is able to download cuda including drivers, in conda-forge, we set the environment variable LOCAL_CUDA_PATH to use the local cuda provided by conda-forge, just like other dependencies.

Related documentation can be found here: https://github.com/openxla/xla/blob/main/docs/hermetic_cuda.md

"When CUDA forward compatibility mode is disabled, Bazel targets will use User Mode and Kernel Mode Drivers pre-installed on the system."

@h-vetinari
Copy link
Member

JFYI, we're currently building stuff on pytorch that's occupying the only available GPUs (half of them are currently offline too). Just in case you're wondering why stuff might not start. C.f. also conda-forge/pytorch-cpu-feedstock#314

@h-vetinari
Copy link
Member

The builds looks like they're passing on linux now. Is someone available to build the OSX side of things? I'd like to cancel the builds here for now to get in a big pytorch PR, especially since we need to still figure out osx builds here (and IMO we should include #411 before merging).

Thoughts @njzjz @hmaarrfk @xhochy @ngam @conda-forge/tensorflow?

@xhochy
Copy link
Member

xhochy commented Jan 10, 2025

I can build stuff. I haven't had time to catch up on any conda-forge stuff since Christmas but once this is in a buildable shape, feel free @h-vetinari to drop me a DM in the usual channls so that I actually look at it.

@njzjz
Copy link
Member

njzjz commented Jan 10, 2025

I'd like to cancel the builds here for now to get in a big pytorch PR, especially since we need to still figure out osx builds here (and IMO we should include #411 before merging).

I am okay with this.

@h-vetinari
Copy link
Member

Ah, I just noticed that this PR doesn't even include #407... So while I can include #411 here, it's possible that this brings new failures. I guess it'd be already a win to build any tensorflow 2.18, even if it's not very compatible with the rest of present-day conda-forge currently.

@hmaarrfk
Copy link
Contributor

this doesn't even feel rerendered with the latest abseil on conda-forge wide. I'll try to build locally.

@h-vetinari
Copy link
Member

this doesn't even feel rerendered with the latest abseil on conda-forge wide.

That's what I meant with my last comment.

I'll try to build locally.

Nice! With an abseil update, or as-is?

@hmaarrfk
Copy link
Contributor

Nice! With an abseil update, or as-is?

With a rerender

diff --git a/.ci_support/linux_64_c_compiler_version13cuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13python3.10.____cpython.yaml b/.ci_support/linux_64_c_compiler_version13cuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13python3.10.____cpython.yaml
index 81a4855..07907c6 100644
--- a/.ci_support/linux_64_c_compiler_version13cuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13python3.10.____cpython.yaml
+++ b/.ci_support/linux_64_c_compiler_version13cuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13python3.10.____cpython.yaml
@@ -37,13 +37,13 @@ libabseil:
 libcurl:
 - '8'
 libgrpc:
-- '1.65'
+- '1.67'
 libjpeg_turbo:
 - '3'
 libpng:
 - '1.6'
 libprotobuf:
-- 5.27.5
+- 5.28.3
 nccl:
 - '2'
 numpy:

@hmaarrfk
Copy link
Contributor

The CPU build completed, but the cuda build error'ed with:

+ /home/conda/recipe_root/add_py_toolchain.sh
+ bazel build tensorflow_estimator/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
Loading:
Loading:
Loading: 0 packages loaded
Analyzing: target //tensorflow_estimator/tools/pip_package:build_pip_package (1 packages loaded, 0 targets configured)
Analyzing: target //tensorflow_estimator/tools/pip_package:build_pip_package (42 packages loaded, 191 targets configured)
Analyzing: target //tensorflow_estimator/tools/pip_package:build_pip_package (45 packages loaded, 284 targets configured)
INFO: Analyzed target //tensorflow_estimator/tools/pip_package:build_pip_package (48 packages loaded, 333 targets configured).
INFO: Found 1 target...
 checking cached actions
[0 / 75] [Prepa] Creating source manifest for //tensorflow_estimator/tools/pip_package:build_pip_package ... (2 actions, 0 running)
ERROR: /home/conda/feedstock_root/build_artifacts/tensorflow-split_1736558176987/work/tensorflow-estimator/tensorflow_estimator/python/estimator/BUILD:936:11: Extracting tensorflow_estimator
 APIs for //tensorflow_estimator/python/estimator:export_output to bazel-out/k8-fastbuild/bin/tensorflow_estimator/python/estimator/export_output_extracted_tensorflow_estimator_api.json. fai
led: (Exit 1): extractor_wrapper failed: error executing command (from target //tensorflow_estimator/python/estimator:export_output) bazel-out/k8-opt-exec-2B5CBBC6/bin/tensorflow_estimator/p
ython/estimator/api/extractor_wrapper --output ... (remaining 6 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
Traceback (most recent call last):
  File "/home/conda/feedstock_root/build_artifacts/tensorflow-split_1736558176987/_build_env/share/bazel/5c6baf3e46c9012dcd9a59f49811a703/sandbox/processwrapper-sandbox/30/execroot/org_tenso
rflow_estimator/bazel-out/k8-opt-exec-2B5CBBC6/bin/tensorflow_estimator/python/estimator/api/extractor_wrapper.runfiles/org_tensorflow_estimator/tensorflow_estimator/python/estimator/api/ext
ractor_wrapper.py", line 18, in <module>
    from tensorflow.python.tools.api.generator2.extractor import extractor
  File "/home/conda/feedstock_root/build_artifacts/tensorflow-split_1736558176987/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p
lacehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.10/site-packages/tensorflow/__init__.py", line 40, in <module>
    from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow  # pylint: disable=unused-import
  File "/home/conda/feedstock_root/build_artifacts/tensorflow-split_1736558176987/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p
lacehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.10/site-packages/tensorflow/python/pywrap_tensorflow.py", line 34, in <module>
    self_check.preload_check()
  File "/home/conda/feedstock_root/build_artifacts/tensorflow-split_1736558176987/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p
lacehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.10/site-packages/tensorflow/python/platform/self_check.py", line 63, in preload_check
    from tensorflow.python.platform import _pywrap_cpu_feature_guard
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
ERROR: /home/conda/feedstock_root/build_artifacts/tensorflow-split_1736558176987/work/tensorflow-estimator/tensorflow_estimator/python/estimator/BUILD:1065:11: Extracting tensorflow_estimato
r APIs for //tensorflow_estimator/python/estimator:base_head to bazel-out/k8-fastbuild/bin/tensorflow_estimator/python/estimator/base_head_extracted_tensorflow_estimator_api.json. failed: (E
xit 1): extractor_wrapper failed: error executing command (from target //tensorflow_estimator/python/estimator:base_head) bazel-out/k8-opt-exec-2B5CBBC6/bin/tensorflow_estimator/python/estim
ator/api/extractor_wrapper --output ... (remaining 6 arguments skipped)

@h-vetinari
Copy link
Member

That's because the GPUs are currently physically offline: conda-forge/status#189

@hmaarrfk
Copy link
Contributor

That's because the GPUs are currently physically offline: conda-forge/status#189

this is on my machine locally. I guess I missed when we made it mandatory to expose the GPUs we have to docker (I have a GPU on the machine i compiled on). can we revert this mandatory need so i can compile things locallyy?

@h-vetinari
Copy link
Member

I think it's tensorflow itself that changed to require finding cuda at build time. If you manage to patch that out, that would be great, then we could build on CPU agents

@hmaarrfk
Copy link
Contributor

Hopefully this:

diff --git a/recipe/build_pkg.sh b/recipe/build_pkg.sh
index 22eb892..dc40b31 100644
--- a/recipe/build_pkg.sh
+++ b/recipe/build_pkg.sh
@@ -1,9 +1,19 @@
 #! /bin/bash

 set -exuo pipefail
+if [[ "${cuda_compiler_version}" == 12* ]]; then
+    # cuda-compat is used for providing libcuda.so.1 temporarily
+    cp $PREFIX/cuda-compat/libcuda.so.1 $PREFIX/lib/libcuda.so.1
+fi

 # install the whl making sure to use host pip/python if cross-compiling
 ${PYTHON} -m pip install --no-deps $SRC_DIR/tensorflow_pkg/*.whl

 # The tensorboard package has the proper entrypoint
 rm -f ${PREFIX}/bin/tensorboard
+
+if [[ "${cuda_compiler_version}" == 12* ]]; then
+    # This was needed to load in the cuda symbols correctly temporarily
+    # https://github.com/conda-forge/tensorflow-feedstock/pull/408#issuecomment-2585259178
+    rm -f $PREFIX/lib/libcuda.so.1
+fi

fixes things.

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 20, 2025
…nda packages

Imported from GitHub PR #20288

This fix emerged when looking in solving jax-ml/jax#24604 . In a nutshell, the official cuda package for conda (both in the `conda-forge` and `nvidia` conda channels) install the CUDA libraries in a different location with respect to PyPI packages, so the logic to find them needs to be augmented to be able to find the CUDA libraries when installed from conda packages.

I did not tested this with a tensorflow build, but probably this will also help in solving tensorflow/tensorflow#56927 .

xref: conda-forge/tensorflow-feedstock#408
xref: conda-forge/jaxlib-feedstock#288
Copybara import of the project:

--
a2ce85c by Silvio Traversaro <[email protected]>:

cuda_root_path: Find cuda libraries when installed with conda packages

Merging this change closes #20288

FUTURE_COPYBARA_INTEGRATE_REVIEW=#20288 from traversaro:fixloadcudaconda a2ce85c
PiperOrigin-RevId: 717411600
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 20, 2025
…nda packages

Imported from GitHub PR openxla/xla#20288

This fix emerged when looking in solving jax-ml/jax#24604 . In a nutshell, the official cuda package for conda (both in the `conda-forge` and `nvidia` conda channels) install the CUDA libraries in a different location with respect to PyPI packages, so the logic to find them needs to be augmented to be able to find the CUDA libraries when installed from conda packages.

I did not tested this with a tensorflow build, but probably this will also help in solving #56927 .

xref: conda-forge/tensorflow-feedstock#408
xref: conda-forge/jaxlib-feedstock#288
Copybara import of the project:

--
a2ce85cf9df1ede3f3c1843ede55d4c76673910e by Silvio Traversaro <[email protected]>:

cuda_root_path: Find cuda libraries when installed with conda packages

Merging this change closes #20288

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#20288 from traversaro:fixloadcudaconda a2ce85cf9df1ede3f3c1843ede55d4c76673910e
PiperOrigin-RevId: 717411600
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 20, 2025
…nda packages

Imported from GitHub PR #20288

This fix emerged when looking in solving jax-ml/jax#24604 . In a nutshell, the official cuda package for conda (both in the `conda-forge` and `nvidia` conda channels) install the CUDA libraries in a different location with respect to PyPI packages, so the logic to find them needs to be augmented to be able to find the CUDA libraries when installed from conda packages.

I did not tested this with a tensorflow build, but probably this will also help in solving tensorflow/tensorflow#56927 .

xref: conda-forge/tensorflow-feedstock#408
xref: conda-forge/jaxlib-feedstock#288
Copybara import of the project:

--
a2ce85c by Silvio Traversaro <[email protected]>:

cuda_root_path: Find cuda libraries when installed with conda packages

Merging this change closes #20288

FUTURE_COPYBARA_INTEGRATE_REVIEW=#20288 from traversaro:fixloadcudaconda a2ce85c
PiperOrigin-RevId: 717411600
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Jan 20, 2025
…nda packages

Imported from GitHub PR #20288

This fix emerged when looking in solving jax-ml/jax#24604 . In a nutshell, the official cuda package for conda (both in the `conda-forge` and `nvidia` conda channels) install the CUDA libraries in a different location with respect to PyPI packages, so the logic to find them needs to be augmented to be able to find the CUDA libraries when installed from conda packages.

I did not tested this with a tensorflow build, but probably this will also help in solving tensorflow/tensorflow#56927 .

xref: conda-forge/tensorflow-feedstock#408
xref: conda-forge/jaxlib-feedstock#288
Copybara import of the project:

--
a2ce85c by Silvio Traversaro <[email protected]>:

cuda_root_path: Find cuda libraries when installed with conda packages

Merging this change closes #20288

COPYBARA_INTEGRATE_REVIEW=#20288 from traversaro:fixloadcudaconda a2ce85c
PiperOrigin-RevId: 717440484
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Jan 20, 2025
…nda packages

Imported from GitHub PR openxla/xla#20288

This fix emerged when looking in solving jax-ml/jax#24604 . In a nutshell, the official cuda package for conda (both in the `conda-forge` and `nvidia` conda channels) install the CUDA libraries in a different location with respect to PyPI packages, so the logic to find them needs to be augmented to be able to find the CUDA libraries when installed from conda packages.

I did not tested this with a tensorflow build, but probably this will also help in solving #56927 .

xref: conda-forge/tensorflow-feedstock#408
xref: conda-forge/jaxlib-feedstock#288
Copybara import of the project:

--
a2ce85cf9df1ede3f3c1843ede55d4c76673910e by Silvio Traversaro <[email protected]>:

cuda_root_path: Find cuda libraries when installed with conda packages

Merging this change closes #20288

PiperOrigin-RevId: 717440484
@h-vetinari h-vetinari merged commit 9eb9209 into conda-forge:main Feb 9, 2025
6 of 20 checks passed
@regro-cf-autotick-bot regro-cf-autotick-bot deleted the 2.18.0_hee22a6 branch February 9, 2025 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.