Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding tensorflow_io_gcs_filesystem #17069

Closed
wants to merge 37 commits into from
Closed

Conversation

ngam
Copy link
Contributor

@ngam ngam commented Nov 24, 2021

Checklist

  • Title of this PR is meaningful: e.g. "Adding my_nifty_package", not "updated meta.yaml".
  • License file is packaged (see here for an example).
  • Source is from official source.
  • Package does not vendor other packages. (If a package uses the source of another package, they should be separate packages or the licenses of all packages need to be packaged).
  • If static libraries are linked in, the license of the static library is packaged.
  • Build number is 0.
  • A tarball (url) rather than a repo (e.g. git_url) is used in your recipe (see here for more details).
  • GitHub users listed in the maintainer section have posted a comment confirming they are willing to be listed there.
  • When in trouble, please check our knowledge base documentation before pinging a team.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/tensorflow_io_gcs_filesystem) and found it was in an excellent condition.

@h-vetinari
Copy link
Member

Hey @ngam, just looking at this - thanks for starting this off!

I think there's a bunch of things to fix though - I can offer to commit into this PR if you make me a collaborator on your fork of staged-recipes, otherwise I'll open a new PR.

@ngam
Copy link
Contributor Author

ngam commented Nov 25, 2021

I think there's a bunch of things to fix though - I can offer to commit into this PR if you make me a collaborator on your fork of staged-recipes, otherwise I'll open a new PR.

Of course, I have just added you as a collaborator. Feel free to do anything :)

@ngam
Copy link
Contributor Author

ngam commented Nov 25, 2021

This is their build description: https://github.com/tensorflow/io/blob/master/docs/development.md#centos-7

@ngam
Copy link
Contributor Author

ngam commented Nov 25, 2021

This has a cyclic dependency on tensorflow, so we probably need to do the tensorflow / tensorflow-base trickery again inside the tensorflow-feedstock. The build script here is pip installing the wheel from PyPI.

Does it make more sense to just put in the tensorflow build as a split or whatever?

Yes but I would value @h-vetinari opinion here as he just reversed that split :D

Sounds good, I added you both as collaborators --- feel free to do whatever you'd like. I am going to note again that building locally on an ubuntu 20.04.3 goes further than this.

@h-vetinari
Copy link
Member

Yes but I would value @h-vetinari opinion here as he just reversed that split :D

Ah, I didn't realize we had a cyclic dependency issue. That's super annoying.

Is the situation the same as with tf-estimator? There, it was my understanding that the cycle can be removed (in any case, maybe my recent changes aren't even functional, and it hasn't been merged yet 😋). I just trusted your opinion on this so far ;-)

@hmaarrfk: I'm not too sure why tensorflow-base still exists today. I think it was historically there to help with circular dependencies with estimator

@xhochy: Yes!

@h-vetinari
Copy link
Member

Actually, there's an unreleased commit upstream that breaks that cyclic dependency by making tensorflow an extras_require.

@ngam
Copy link
Contributor Author

ngam commented Dec 3, 2021

Actually, there's an unreleased commit upstream that breaks that cyclic dependency by making tensorflow an extras_require.

@h-vetinari should I close this for now and go back to the drawing board in conda-forge/tensorflow-feedstock#176? I don't think I will be able to pull this off, let's be real 🥲

@h-vetinari
Copy link
Member

So I looked at the upstream situation for about half an hour now, and all that comes to mind is "oh what a tangled web we weave..."

It could be possible to build the package by backporting the commit I noted, but even then, the entire test suite requires tensorflow. I mean, I'm not actually that sure how many of the tests (especially against cloud infra) we could be running at all, but I'd rather slim down the tensorflow-build than stuff more modules into it. So I might still try to make an importable package here... 🤷
Opinions welcome.

@h-vetinari
Copy link
Member

Ah, and tools/build/configure.py also wants tensorflow. 🤦

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2021

Just cross posting for clarity, I don't think that tensorfow depends on this package yet.
It seems to only be true if users want to manually opt in and set an environment variable.
https://github.com/tensorflow/tensorflow/pull/51460/files

Therefore, we should move forward with 2.7, then add this package when 2.7 is built.

@ngam
Copy link
Contributor Author

ngam commented Dec 5, 2021

@hmaarrfk I think the tests will fail without it --- so yes, in theory, it is not strictly needed. I built tf2.7 from feedstock and it was totally fine, it's just that some tests failed. I reported that here: conda-forge/tensorflow-feedstock#169

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2021

I might suggest building 0.21 first, then upgrading to 0.22 later. It would likely break the circular dependency.

@ngam
Copy link
Contributor Author

ngam commented Dec 5, 2021

ERROR: Could not find a version that satisfies the requirement tensorflow<2.7.0,>=2.6.0 (from versions: none)
ERROR: No matching distribution found for tensorflow<2.7.0,>=2.6.0

@ngam
Copy link
Contributor Author

ngam commented Dec 5, 2021

@hmaarrfk does this error basically say add protobuf?

bazel-out/k8-fastbuild/bin/external/local_config_tf/include/tensorflow/core/protobuf/error_codes.pb.h:10:10: fatal error: google/protobuf/port_def.inc: No such file or directory
   10 | #include <google/protobuf/port_def.inc>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2021

probably libprotobuf

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2021

we likely have to specify a protobuf include path

@ngam
Copy link
Contributor Author

ngam commented Dec 5, 2021

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2021

This recipe really won't be easy to build. I don't think we need it for TF 2.7 which really unpins alot of the other dependencies.

The issue is that TF (and tensorflow io) really likes to vendor dependencies, which is frowned upon at Conda-forge (though encouraged with pip+pypi).

Basically, somebody will have to audit every occurrence of something that looks like a fetch:

xtracting Bazel installation...
Starting local Bazel server and connecting to it...
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/googleapis/google-cloud-cpp/archive/v1.21.0.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/bazelbuild/rules_python/archive/38f86fb55b698c51e8510c807489c9f4e047480e.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
WARNING: Download from http://mirror.tensorflow.org/github.com/tensorflow/runtime/archive/b570a1921c9e55ac53c8972bd2bfd37cd0eb510d.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
DEBUG: /home/conda/.cache/bazel/_bazel_conda/c5d58fe02b94fd55582e27e7c3638976/external/org_tensorflow/third_party/repo.bzl:109:14: 
Warning: skipping import of repository 'aws-c-common' because it already exists.
DEBUG: /home/conda/.cache/bazel/_bazel_conda/c5d58fe02b94fd55582e27e7c3638976/external/org_tensorflow/third_party/repo.bzl:109:14: 
Warning: skipping import of repository 'aws-c-event-stream' because it already exists.
DEBUG: /home/conda/.cache/bazel/_bazel_conda/c5d58fe02b94fd55582e27e7c3638976/external/org_tensorflow/third_party/repo.bzl:109:14: 
Warning: skipping import of repository 'aws-checksums' because it already exists.
DEBUG: /home/conda/.cache/bazel/_bazel_conda/c5d58fe02b94fd55582e27e7c3638976/external/org_tensorflow/third_party/repo.bzl:109:14: 
Warning: skipping import of repository 'com_google_googleapis' because it already exists.
DEBUG: /home/conda/.cache/bazel/_bazel_conda/c5d58fe02b94fd55582e27e7c3638976/external/org_tensorflow/third_party/repo.bzl:109:14: 
Warning: skipping import of repository 'boringssl' because it already exists.
DEBUG: /home/conda/.cache/bazel/_bazel_conda/c5d58fe02b94fd55582e27e7c3638976/external/org_tensorflow/third_party/repo.bzl:109:14: 
Warning: skipping import of repository 'zlib' because it already exists.
DEBUG: /home/conda/.cache/bazel/_bazel_conda/c5d58fe02b94fd55582e27e7c3638976/external/org_tensorflow/third_party/repo.bzl:109:14: 
Warning: skipping import of repository 'snappy' because it already exists.
DEBUG: /home/conda/.cache/bazel/_bazel_conda/c5d58fe02b94fd55582e27e7c3638976/external/org_tensorflow/third_party/repo.bzl:109:14: 
Warning: skipping import of repository 'rules_python' because it already exists.
DEBUG: /home/conda/.cache/bazel/_bazel_conda/c5d58fe02b94fd55582e27e7c3638976/external/tf_runtime/third_party/cuda/dependencies.bzl:51:10: The following command will download NVIDIA proprietary software. By using the software you agree to comply with the terms of the license agreement that accompanies the software. If you do not agree to the terms of the license agreement, do not use the software.
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/grailbio/bazel-toolchain/archive/edd07e96a2ecaa131af9234d6582875d980c0ac7.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
WARNING: Download from

and remove it.

I'm also unsure of their configure command.

It might just be easier to write the bazel file yourself given how straightforward their configuration file is.

This isn't really something that I have too much time to work on right now (especially since it isn't a hard dependency yet) so i'm not sure how much I can help before the new year.

@ngam
Copy link
Contributor Author

ngam commented Dec 5, 2021

This isn't really something that I have too much time to work on right now (especially since it isn't a hard dependency yet) so i'm not sure how much I can help before the new year.

Yes, totally understand. I think it would be safe to ignore this for now. As you say, it is pretty hard; and I don't think I am personally in a position to pull it off. However, I will do my best to push this forward without disrupting the bigger project in the main pr.

@ngam ngam closed this Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants