Help mamba decide on CUDA variant #162

wolfv · 2021-11-04T11:30:54Z

Mamba currently can't properly decide which tensorflow cuda package to prefer since we don't directly depend on cudatoolkit in the tensorflow package.

Only in the tensorflow-base package we have a cudatoolkit dependency.

However, tensorflow-base is connected to tensorflow via an exact dependency.

If we'd also add the cudatoolkit to the tensorflow package, mamba would be able to select the best version.

OR we could remove the tensorflow variant completely and just have a single metapackage that depends on tensorflow-base {{ VERSION }} (but not exact). Then mamba could decide the tensorflow-base version more freely.

The text was updated successfully, but these errors were encountered:

wolfv · 2021-11-04T11:31:21Z

What happens currently is that mamba "randomly" picks cuda 11.1 (even though cuda 11.2 would probably be preffered).

183amir · 2021-11-04T21:13:44Z

I don't think the issue is here. Mamba should be able to handle this instead.

h-vetinari · 2021-11-05T03:15:53Z

I don't think the issue is here. Mamba should be able to handle this instead.

As long as we're not compromising on the quality of the packaging (i.e. everything works as it should), I don't see why we shouldn't accommodate mamba here. Yes, it would be nicer if it were fixed in mamba, but it's used more and more (not least in CF-CI), so keeping things knowingly suboptimal has to have a commensurate benefit for not doing it (especially as the workaround is really not painful), IMO.

wolfv · 2021-11-05T08:17:52Z

@183amir the problem in this case is that we have a variant package (tensorflow) that restricts the cuda version of downstream packages.
But we have no way of choosing which tensorflow package is the best without doing a global optimization.

We could change this in the following ways:

have only tensorflow pyXX meta-package (without CUDA, or even without Python!), then the tensorflow packages depends on tensorflow-base 2.6 (pyXX) and variant selection would work fine
Add a direct dependency to the same cudatoolkit from the build string, then variant selection would also work fine :)

Global optimization, with the amount of variant packages that exist in conda-forge, takes a looong time (because many branches have to be explored). I think it's good to make it as simple as possible for the solver to find the best solution.

wolfv · 2021-11-05T08:33:37Z

I tried to write down how the sorting and solving works here: https://mamba.readthedocs.io/en/latest/advanced_usage/package_resolution.html

183amir · 2021-11-05T16:16:16Z

As long as we're not compromising on the quality of the packaging (i.e. everything works as it should), I don't see why we shouldn't accommodate mamba here.

I agree.

have only tensorflow pyXX meta-package (without CUDA, or even without Python!), then the tensorflow packages depends on tensorflow-base 2.6 (pyXX) and variant selection would work fine

I don't think this is a good solution because the exact dependency on tensorflow-base is there to make sure the split packages are all installed from one build. That is, tensorflow, tensorflow-estimator, etc. are installed from one build.

Add a direct dependency to the same cudatoolkit from the build string, then variant selection would also work fine :)

I am not sure what this means. Could you please explain more? Maybe you could show how the recipe would look like.

izahn · 2021-11-05T16:53:31Z

This might be a separate topic, but do we actually need both tensorflow and tensorflow-base packages? Currently the only difference between mamba create -d -n test "tensorflow=*=cuda112*" and mamba create -d -n test "tensorflow-base=*=cuda112*" is that the first one installs tensorflow and tensorflow-estimator. The same is true for "tensorflow-base=*=cuda112*" vs. "tensorflow-base=*=cuda112*". Unless there is a common need to install tensorflow without tensorflow-estimator that doesn't make much sense to me.

wolfv · 2021-11-05T17:13:37Z

So right now we have the following situation:

tensorflow-py39-cuda112
depends on
- python 3.9
- tensorflow-base py39-cuda112 (heavily constrained, can only use a single tensorflow-base!)
tensorflow-py38-cuda112
- depends on python 3.8
- tensorflow-base py38-cuda112
tensorflow-py39-cuda111
- depends on python 3.8
- tensorflow-base py39-cuda111

So here, mamba can correctly decide for tensorflow-py39... but cannot select the cuda variant because it cannot judge by inspecting the first order dependencies.

If we add cudatoolkit at this level, it woudl work fine. Otherwise, we could modify it like this:

tensorflow-py39
- depends on python 3.8
- tensorflow-base py39*
   can be resolved to 
   - tensorflow-base py39-cuda102
   - tensorflow-base py39-cuda110
   - tensorflow-base py39-cuda111
   - tensorflow-base py39-cuda112

Then again we could resolve properly, because we can decide again for tensorflow-base and select the one that uses the highest cudatoolkit.

I don't know if this makes sense? It's kinda hard to explain.

Basically with the current setup we would need to look at all tensorflow packages, and then at all tensorflow-base packages to figure out which one is best.

What I am proposing is to

either make it possible to decide in tensorflow (by adding the cudatoolkit as dependency)
or to leave the decision when deciding for tensorflow-base (by removing the exact pin)

Eliminating tensorflow-base would also fix this :)

hmaarrfk · 2021-11-08T12:29:54Z

I kinda feel like we should add cuda-toolkit to tensorflow. I'm not too sure why tensorflow-base still exists today. I think it was historically there to help with circular dependencies with estimator

xhochy · 2021-11-08T12:42:40Z

I think it was historically there to help with circular dependencies with estimator

Yes!

hmaarrfk · 2021-11-08T12:48:41Z

I'm sorry, do we want to add :

cudatoolkit
or
cudnn
nccl

to the host section

wolfv · 2021-11-08T13:46:45Z

cudatoolkit is the one whose version is in the variant, right? so that should be enough to help mamba decide

hmaarrfk · 2021-11-08T14:14:07Z

I guess i don't see cudatoolkit anywhere in this recipe or in that of the pytorch recipe. I feel like we should make this recipe a model to follow for other recipes.

wolfv · 2021-11-08T14:32:02Z

I think it's exported from compiler('cuda')?

hmaarrfk · 2021-11-08T14:44:40Z

great. makes sense. let's add that.

it seems like we should be adding it to libtensorflow_cc as well right?

hmaarrfk · 2021-11-08T15:46:00Z

Ok, I've folded it in the recent migraiton.

h-vetinari · 2021-11-09T00:43:28Z

IIUC, we could get rid of tensorflow-base now that the estimator is folded into this feedstock?

How about we do this for 2.7.0?

hmaarrfk · 2021-11-09T02:28:03Z

we could do that.... if downstream packages are not depending on it.

There seems to be quite a few things on github that are now depending on it already. So we kinda have to carry it forward.

h-vetinari · 2021-11-09T02:53:43Z

If it's still needed, we could still keep it as a compatibility output that just 1:1 installs tensorflow, ideally together with an activation script that warns to change the dependence away from tensorflow-base.

hmaarrfk · 2021-11-09T03:01:33Z

We had tried to keep tensorflow as simply installing tensorflow-base exactly pinned, but we needed to add more constraints to it.

I'm not sure there is an easy answer to this.

h-vetinari · 2021-11-09T04:19:30Z

If we don't need tensorflow-base anymore to break circular dependencies with tf-estimator, then we can just fully absorb it into tensorflow in the recipe, and then add a new output tensorflow-base as a thin wrapper around tensorflow. I can't see how that wouldn't work...

hmaarrfk · 2021-11-09T13:12:46Z

In your sentence, swap tensorflow-base and tensorflow and that is what we had 2 weeks ago.

xhochy · 2021-11-09T13:26:00Z

I introduced tensorflow-base only to get tensorflow-estimator building inside the feedstock here. Will still need that or do we have a workaround for the cyclic dependency?

h-vetinari · 2021-11-09T13:27:58Z

In your sentence, swap tensorflow-base and tensorflow and that is what we had 2 weeks ago.

You said above:

We had tried to keep tensorflow as simply installing tensorflow-base exactly pinned, but we needed to add more constraints to it.

Why not still add those constraints and then call it tensorflow / tensorflow-base (whichever way the wrapper goes)?

hmaarrfk · 2021-11-09T13:36:35Z

Maybe i'm just confused. It might be just easier if you show me in a PR.

Can we agree on an order of priorities? I suggest:

Get the migration moving foward Migrate libprotobuf, grpcio, icu #144
Build windows (cpu only)
Cleanup the recipe (with what you are suggesting).

h-vetinari · 2021-11-10T22:17:37Z

Can we agree on an order of priorities?

I agree that #144 is priority number 1, and as I said I'm fine to clean up the tf-base situation together with 2.7.0. I don't think we should make these improvements dependent on windows builds, which has a large number of known & unknown unknowns.

IOW, from your list, I suggest 1. -> 3. -> 2.

hmaarrfk · 2022-05-26T13:58:12Z

has this been resolved?

h-vetinari mentioned this issue Nov 4, 2021

Migrate libprotobuf, grpcio, icu #144

Merged

6 tasks

This was referenced Nov 22, 2021

TODOs for next build #175

Open

Tensorflow 2.7.0 (+ other improvements) #176

Closed

adding tensorflow_io_gcs_filesystem conda-forge/staged-recipes#17069

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help mamba decide on CUDA variant #162

Help mamba decide on CUDA variant #162

wolfv commented Nov 4, 2021

wolfv commented Nov 4, 2021

183amir commented Nov 4, 2021

h-vetinari commented Nov 5, 2021

wolfv commented Nov 5, 2021

wolfv commented Nov 5, 2021

183amir commented Nov 5, 2021

izahn commented Nov 5, 2021

wolfv commented Nov 5, 2021 •

edited

Loading

hmaarrfk commented Nov 8, 2021

xhochy commented Nov 8, 2021

hmaarrfk commented Nov 8, 2021

wolfv commented Nov 8, 2021

hmaarrfk commented Nov 8, 2021

wolfv commented Nov 8, 2021

hmaarrfk commented Nov 8, 2021

hmaarrfk commented Nov 8, 2021

h-vetinari commented Nov 9, 2021

hmaarrfk commented Nov 9, 2021

h-vetinari commented Nov 9, 2021

hmaarrfk commented Nov 9, 2021

h-vetinari commented Nov 9, 2021

hmaarrfk commented Nov 9, 2021

xhochy commented Nov 9, 2021

h-vetinari commented Nov 9, 2021

hmaarrfk commented Nov 9, 2021

h-vetinari commented Nov 10, 2021

hmaarrfk commented May 26, 2022

Help mamba decide on CUDA variant #162

Help mamba decide on CUDA variant #162

Comments

wolfv commented Nov 4, 2021

wolfv commented Nov 4, 2021

183amir commented Nov 4, 2021

h-vetinari commented Nov 5, 2021

wolfv commented Nov 5, 2021

wolfv commented Nov 5, 2021

183amir commented Nov 5, 2021

izahn commented Nov 5, 2021

wolfv commented Nov 5, 2021 • edited Loading

hmaarrfk commented Nov 8, 2021

xhochy commented Nov 8, 2021

hmaarrfk commented Nov 8, 2021

wolfv commented Nov 8, 2021

hmaarrfk commented Nov 8, 2021

wolfv commented Nov 8, 2021

hmaarrfk commented Nov 8, 2021

hmaarrfk commented Nov 8, 2021

h-vetinari commented Nov 9, 2021

hmaarrfk commented Nov 9, 2021

h-vetinari commented Nov 9, 2021

hmaarrfk commented Nov 9, 2021

h-vetinari commented Nov 9, 2021

hmaarrfk commented Nov 9, 2021

xhochy commented Nov 9, 2021

h-vetinari commented Nov 9, 2021

hmaarrfk commented Nov 9, 2021

h-vetinari commented Nov 10, 2021

hmaarrfk commented May 26, 2022

wolfv commented Nov 5, 2021 •

edited

Loading