Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.8.0 with system LLVM #213

Closed
wants to merge 53 commits into from
Closed

v2.8.0 with system LLVM #213

wants to merge 53 commits into from

Conversation

xhochy
Copy link
Member

@xhochy xhochy commented Mar 1, 2022

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Closes #155

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

  • It looks like the 'tensorflow-base' output doesn't have any tests.

@xhochy
Copy link
Member Author

xhochy commented Mar 1, 2022

This is a draft to use pre-built binaries for llvm and mlir. This depends on the following PRs:

I created a script called generate_llvm.py that generates the Bazel configuration for LLVM from the one that is part of the LLVM codebase. Sadly my efforts were only partially successful as I had a lot of missing symbols at the end. Thus this include the "hack" that the Support libraries link in all static libraries and hope that the linker does --as-needed linkage correctly. If we want to keep this hack, I can probably trim the Python script a bit more down than what has been needed for the non-hacky version.

@h-vetinari @hmaarrfk @ngam @isuruf You're probably the right audience to look at this and assess whether it would be interesting to push it over the finish line.

@xhochy
Copy link
Member Author

xhochy commented Mar 1, 2022

Locally on osx-*, this seems build successfully 💚

@h-vetinari
Copy link
Member

This is a draft to use pre-built binaries for llvm and mlir.

This is amazing. ❤️

I don't yet understand the symbol and bazel config issues, but I don't see why we shouldn't merge the precursor PRs and continue iterating here; though perhaps with the one caveat that if we think this has a reasonable chance of working, then I'd prefer to have a dedicated label llvm_tf (instead of llvm_dev) and builds à la: 14.0.0.tf28 or similar.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Mar 1, 2022

Are you able to add the dev packages to a channel and ask Azure to build them? Does this get the builds completed in time remotely?

@isuruf
Copy link
Member

isuruf commented Mar 1, 2022

You can have a look at https://github.com/conda-forge/tensorflow-feedstock/pull/78/files as well.

--as-needed doesn't work with static libraries.

@xhochy
Copy link
Member Author

xhochy commented Mar 2, 2022

You can have a look at https://github.com/conda-forge/tensorflow-feedstock/pull/78/files as well.

--as-needed doesn't work with static libraries.

Yes, the PR was my initial inspiration but sadly the LLVM Bazel definitions got a lot more complicated and diverged from what is built with CMake.

As --as-needed won't work with static libraries, I'll build my own version of it then 🤣

@xhochy
Copy link
Member Author

xhochy commented Mar 2, 2022

This is a draft to use pre-built binaries for llvm and mlir.

This is amazing. ❤️

I don't yet understand the symbol and bazel config issues, but I don't see why we shouldn't merge the precursor PRs and continue iterating here; though perhaps with the one caveat that if we think this has a reasonable chance of working, then I'd prefer to have a dedicated label llvm_tf (instead of llvm_dev) and builds à la: 14.0.0.tf28 or similar.

Would be nice to discuss that over at conda-forge/llvmdev-feedstock#148 I would also use these binaries for jaxlib but in the end that is also Tensorflow which actually builds it. Once this is discussed and merged, we can get the CI started here.

@xhochy
Copy link
Member Author

xhochy commented Mar 25, 2022

FTR: the latest commit took 582,69s user 589,93s system 13% cpu 2:29:32,04 total for conda mambabuild -m .ci_support/osx_arm64_python3.10.____cpython.yaml -c conda-forge/label/llvm_tf -c conda-forge/label/mlir_tf recipe on my M1 using an x86 mambaforge.

@xhochy
Copy link
Member Author

xhochy commented Apr 2, 2022

@conda-forge-admin please rerender

@xhochy
Copy link
Member Author

xhochy commented Apr 10, 2022

@conda-forge-admin please rerender

@xhochy xhochy marked this pull request as ready for review April 11, 2022 08:21
@xhochy
Copy link
Member Author

xhochy commented Apr 11, 2022

@conda-forge/tensorflow I would like to build this (locally). Could I get a review before I spent the CPU hours?

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a slightly-less-than-superficial pass, looking very good. Most of my comments are on the level of nits and not critical.

@@ -2,6 +2,26 @@

set -ex

if [[ "${target_platform}" != "${build_platform}" ]]; then
if [ ! -d "llvm-build" ]; then
conda create -y -p llvm-build llvmdev=14.0.0.tf280_55c71c9eac9b mlir=14.0.0.tf280_55c71c9eac9b -c conda-forge/label/llvm_tf -c conda-forge/label/mlir_tf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put the 14.0.0.tf280_55c71c9eac9b into cbc.yaml?

Also, this might benefit from a comment. I can see that in cross-compilation we're installing the conda-forge packages and moving the binaries into our build-prefix - I guess what I'm missing is that conda create [pkgs] will automatically use the build_platform as an arch?

Comment on lines +48 to +49
-e "s:\${MACOSX_SDK_VERSION}:${MACOSX_SDK_VERSION:-}:" \
-e "s:\${MACOSX_DEPLOYMENT_TARGET}:${MACOSX_DEPLOYMENT_TARGET:-}:" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent is off here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is correct. for some reason the tab character isn't rendering correclty.

recipe/gen-bazel-toolchain.sh Show resolved Hide resolved
Comment on lines +10 to +11
# TODO: Automatically extract this
llvm_commit = "55c71c9eac9bc7f956a05fa9258fad4f86565450"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing for cbc (next to 14.0.0.tf280_55c71c9eac9b)...?

# Generate LLVM build defintion

PREFIX = Path(os.environ["PREFIX"])
SYMBOL = "LLVM"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm overlooking something, but I find this a strange variable name. In the loop above you're really resolving symbols from the nm output, but calling a variable containing the string "LLLVM" a symbol is confusing me.

Comment on lines +230 to +231
# Load the BUILD file and run it with the above defined custom functions
exec(r.text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of the distance between creating r and using it, especially given it's very short name.

Can we do upstream_build_bazel = r.text after the r= above and then

Suggested change
# Load the BUILD file and run it with the above defined custom functions
exec(r.text)
# Run the upstream BUILD file with the above defined custom functions
exec(upstream_build_bazel)

here?

Comment on lines +363 to +364
# Load the BUILD file and run it with the above defined custom functions
exec(r.text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (even larger distance between definition and use)

recipe/meta.yaml Show resolved Hide resolved
recipe/meta.yaml Outdated Show resolved Hide resolved
Comment on lines +33 to +40
+ #tf_http_archive(
+ # name = "llvm-project",
+ # build_file = "//third_party/llvm:BUILD",
+ # sha256 = "22054926b426c66d8f2bc22071365df6e35f3aacf19ad943bc6167d4cae3bebb",
+ # strip_prefix = "no_valid_prefix",
+ # system_build_file = "//third_party/systemlibs:llvm.BUILD",
+ # urls = tf_mirror_urls("https://github.com/invalid/mirror/source.tar.gz"),
+ #)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented out code doesn't seem to have existed previously - any reason we're patching it in?

@ngam
Copy link
Contributor

ngam commented Apr 19, 2022

Thanks a lot @xhochy. I am not really knowledgeable enough to review the changes here, but I will happily observe and help build when ready.

@isuruf and @hmaarrfk could you have a quick look?

@hmaarrfk, you've built the previous linux/cuda ones --- I am not sure if you have the time or energy this time around, so I am willing to figure out a solution when the time comes. Just let me know. The idea is: let's not delay building because we the lack compute resources (I will get us something to work; I don't have access to local machines, but I will figure out something if/when needed).

Note: 2.9.0-rc0 has been released a week or so ago, so we could try to get that out asap as well.

Tagging @ganand1 for visibility but also if you feel like you could offer thoughts on this PR.

@@ -3,3 +3,5 @@
# available on 10.13
MACOSX_SDK_VERSION: # [osx and x86_64]
- "10.13" # [osx and x86_64]
channel_sources:
- conda-forge/label/mlir_tf,conda-forge/label/llvm_tf,conda-forge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already forgot what mlir stood for. Could you add a sentence or two as to what these are for.

Copy link
Contributor

@hmaarrfk hmaarrfk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is seriously complicated.

Can we have a summary as to why this is useful? We don't get our builds under the CI limit for the GPU builds.

It doesn't seem like these changes can be upstreamed.

I fear like this is simply not going to be maintainable.

@xhochy
Copy link
Member Author

xhochy commented Apr 26, 2022

Can we have a summary as to why this is useful? We don't get our builds under the CI limit for the GPU builds.

Getting GPU or OSX builds running in CI would have been something that would have offset the maintenance costs here. Without that I feel I have nothing in my hands to argue why this should be maintained.

recipe/meta.yaml Outdated Show resolved Hide resolved
Co-authored-by: h-vetinari <[email protected]>
Co-authored-by: Mark Harfouche <[email protected]>
@xhochy
Copy link
Member Author

xhochy commented Apr 26, 2022

Without that I feel I have nothing in my hands to argue why this should be maintained.

Still, I won't give up here.

@hmaarrfk hmaarrfk marked this pull request as draft May 18, 2022 02:03
@xhochy xhochy closed this Jan 10, 2024
@xhochy xhochy deleted the v2.8.0-llvm branch January 10, 2024 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] unbundle LLVM?
6 participants