Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On tensorflow and incompatible protobuf version #288

Open
1 task done
drasmuss opened this issue Dec 5, 2022 · 16 comments
Open
1 task done

On tensorflow and incompatible protobuf version #288

drasmuss opened this issue Dec 5, 2022 · 16 comments
Labels

Comments

@drasmuss
Copy link

drasmuss commented Dec 5, 2022

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

Installing tensorflow-gpu=2.10 results in the protobuf dependency being installed with an incompatible version (4.x; tensorflow is only compatible with 3.x).

Note that installing tensorflow=2.10 (rather than tensorflow-gpu) results in a compatible 3.x protobuf version being installed. Or installing an older version (e.g., tensorflow-gpu=2.8) results in a compatible 3.x protobuf version being installed.

The installation with a (seemingly) incompatible protobuf version actually seems to work somehow, as long as you leave everything as is. But if the tensorflow installation ever gets triggered again for some reason (e.g. in some later step of an installation pipeline), then it will downgrade the protobuf version from 4.x to 3.x, and then the tensorflow installation will be broken.

To reproduce:

conda create -n tmp python=3.9
conda activate tmp
conda install -c conda-forge tensorflow-gpu=2.10
pip install tensorflow
python -c "import tensorflow"

Installed packages

_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
absl-py                   1.3.0              pyhd8ed1ab_0    conda-forge
aiohttp                   3.8.3            py39hb9d737c_1    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.2              pyhd8ed1ab_0    conda-forge
attrs                     22.1.0             pyh71513ae_1    conda-forge
blinker                   1.5                pyhd8ed1ab_0    conda-forge
brotlipy                  0.7.0           py39hb9d737c_1005    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.9.24            ha878542_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.2.0              pyhd8ed1ab_0    conda-forge
certifi                   2022.9.24          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39he91dace_2    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
cryptography              38.0.4           py39hd97740a_0    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
cudnn                     8.4.1.50             hed8a83a_0    conda-forge
flatbuffers               2.0.7                h27087fc_0    conda-forge
frozenlist                1.3.3            py39hb9d737c_0    conda-forge
gast                      0.4.0              pyh9f0ad1d_0    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
google-auth               2.15.0             pyh1a96a4e_0    conda-forge
google-auth-oauthlib      0.4.6              pyhd8ed1ab_0    conda-forge
google-pasta              0.2.0              pyh8c360ce_0    conda-forge
grpc-cpp                  1.47.1               h05bd8bd_7    conda-forge
grpcio                    1.47.1           py39h712372c_7    conda-forge
h5py                      3.7.0           nompi_py39h817c9c5_102    conda-forge
hdf5                      1.12.2          nompi_h2386368_100    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        5.1.0              pyha770c72_0    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
keras                     2.10.0           py39h06a4308_0
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.19.3               h3790be6_0    conda-forge
ld_impl_linux-64          2.38                 h1181459_1
libabseil                 20220623.0      cxx17_h48a1fff_5    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcurl                   7.86.0               h7bff187_1    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h6a678d5_6
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libnghttp2                1.47.0               hdcd2b5c_1    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libprotobuf               3.21.10              h6239696_0    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               haa6b8db_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
llvm-openmp               15.0.6               he0ac6c6_0    conda-forge
markdown                  3.4.1              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.1            py39hb9d737c_2    conda-forge
multidict                 6.0.2            py39hb9d737c_2    conda-forge
nccl                      2.14.3.1             h0800d71_0    conda-forge
ncurses                   6.3                  h5eee18b_3
numpy                     1.23.5           py39h3d75532_0    conda-forge
oauthlib                  3.2.2              pyhd8ed1ab_0    conda-forge
openssl                   1.1.1s               h0b41bf4_1    conda-forge
opt_einsum                3.3.0              pyhd8ed1ab_1    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pip                       22.2.2           py39h06a4308_0
protobuf                  4.21.10          py39h5a03fae_0    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.7                      py_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyjwt                     2.6.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 22.1.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.9.15               h7a1cb2a_2
python-flatbuffers        2.0                pyhd8ed1ab_0    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pyu2f                     0.1.5              pyhd8ed1ab_0    conda-forge
re2                       2022.06.01           h27087fc_1    conda-forge
readline                  8.2                  h5eee18b_0
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
requests-oauthlib         1.3.1              pyhd8ed1ab_0    conda-forge
rsa                       4.9                pyhd8ed1ab_0    conda-forge
scipy                     1.9.3            py39hddc5342_2    conda-forge
setuptools                65.5.0           py39h06a4308_0
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.9                hbd366e4_2    conda-forge
sqlite                    3.40.0               h5082296_0
tensorboard               2.10.0           py39h06a4308_0
tensorboard-data-server   0.6.1            py39hd97740a_4    conda-forge
tensorboard-plugin-wit    1.8.1              pyhd8ed1ab_0    conda-forge
tensorflow                2.10.0          cuda112py39h01bd6f0_0    conda-forge
tensorflow-base           2.10.0          cuda112py39h2957820_0    conda-forge
tensorflow-estimator      2.10.0          cuda112py39hd320b7a_0    conda-forge
tensorflow-gpu            2.10.0          cuda112py39h0bbbad9_0    conda-forge
termcolor                 2.1.1              pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h1ccaba5_0
typing-extensions         4.4.0                hd8ed1ab_0    conda-forge
typing_extensions         4.4.0              pyha770c72_0    conda-forge
tzdata                    2022f                h04d1e81_0
urllib3                   1.26.13            pyhd8ed1ab_0    conda-forge
werkzeug                  2.2.2              pyhd8ed1ab_0    conda-forge
wheel                     0.37.1             pyhd3eb1b0_0
wrapt                     1.14.1           py39hb9d737c_1    conda-forge
xz                        5.2.8                h5eee18b_0
yarl                      1.8.1            py39hb9d737c_0    conda-forge
zipp                      3.11.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge

Environment info

active environment : tmp
    active env location : /home/drasmuss/miniconda3/envs/tmp
            shell level : 2
       user config file : /home/drasmuss/.condarc
 populated config files :
          conda version : 22.9.0
    conda-build version : not installed
         python version : 3.9.5.final.0
       virtual packages : __cuda=12.0=0
                          __linux=5.15.74.2=0
                          __glibc=2.31=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/drasmuss/miniconda3  (writable)
      conda av data dir : /home/drasmuss/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/drasmuss/miniconda3/pkgs
                          /home/drasmuss/.conda/pkgs
       envs directories : /home/drasmuss/miniconda3/envs
                          /home/drasmuss/.conda/envs
               platform : linux-64
             user-agent : conda/22.9.0 requests/2.28.1 CPython/3.9.5 Linux/5.15.74.2-microsoft-standard-WSL2 ubuntu/20.04.5 glibc/2.31
                UID:GID : 1000:1000
             netrc file : None
           offline mode : False
@drasmuss drasmuss added the bug label Dec 5, 2022
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2022

Please try to create an environment following the instructions on the conda-forge.org main website
https://conda-forge.org/

image

For one, you are "mixing" defaults and conda-forge channel in your environment.

Does the environment created with the following command work:

mamba create --name cftf tensorflow=2.10*=cuda* python=3.9 --channel conda-forge --override-channels

@hmaarrfk hmaarrfk added question and removed bug labels Dec 5, 2022
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2022

I guess "tensorflow thinks" it is only compatible with 3.x, but we haven't found any usability issues moving to 4.x.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2022

are you hitting a bug/crash with the environment as is?

@drasmuss
Copy link
Author

drasmuss commented Dec 5, 2022

The probem is related to this part

But if the tensorflow installation ever gets triggered again for some reason (e.g. in some later step of an installation pipeline), then it will downgrade the protobuf version from 4.x to 3.x, and then the tensorflow installation will be broken.

Since tensorflow thinks that it is only compatible with 3.x, it's very easy for any other installation steps you do through pip or something like that (e.g. installing some other package that has a tensorflow dependency) to trigger protobuf 3.x to be installed. And as soon as that happens, the tensorflow installation will be broken (because it was built with 4.x).

@drasmuss
Copy link
Author

drasmuss commented Dec 5, 2022

Does the environment created with the following command work:
mamba create --name cftf tensorflow=2.10*=cuda* python=3.9 --channel conda-forge --override-channels

This shows the same behaviour of installing 4.x protobuf, but differs in that doing pip install tensorflow doesn't downgrade protobuf any more (it stays at 4.x). So that mostly fixes things I think; as long as it's less likely that protobuf gets downgraded later on, it's probably fine to install a version that's technically incompatible but doesn't have any actual issues it seems.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2022

Can you give an example of a package that would depend on tensorflow but trigger protobuf to be updated?

We run pip check in our recipe to ensure consistency.
https://github.com/conda-forge/tensorflow-feedstock/blob/main/recipe/meta.yaml#L250

mixing pip and conda is challenging https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-pkgs.html#installing-non-conda-packages

Maybe you can give us a concrete example of how things fail.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2022

I guess it seems that we aren't providing the stubs that tensorflow actually exists.

@drasmuss
Copy link
Author

drasmuss commented Dec 5, 2022

Can you give an example of a package that would depend on tensorflow but trigger protobuf to be updated?

Here is an example of this happening in the real world https://github.com/nengo/nengo-dl/actions/runs/3621436928/jobs/6104897544#step:5:1524. This is part of a large, somewhat complicated CI pipeline using a mix of libraries, so it isn't easy to control all parts to ensure things are only being installed in the cleanest way.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2022

And you are sure it isn't being triggered by: https://github.com/nengo/nengo-dl/blob/master/setup.py#L80 ??

@drasmuss
Copy link
Author

drasmuss commented Dec 5, 2022

Yes the same thing happens with or without that requirement (and the pip installation step in question that is downgrading protobuf doesn't involve that nengo-dl package https://github.com/nengo/nengo-dl/actions/runs/3621436928/jobs/6104897544#step:5:1524).

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2022

CONDA_OVERRIDE_CUDA=11.2 mamba create --name cftf tensorflow=2.10*=cuda* python=3.9 --channel conda-forge --override-channels --yes | tee log.txt
mamba activate cftf
pip install tensorflow | tee -a log.txt

But pip list still reports

protobuf                4.21.10

log.txt

So there must be some kind of dependency pulling protobuf down.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2022

CONDA_OVERRIDE_CUDA=11.2

I needed this flag since the computer i used doesn't have a CUDA GPU.

@drasmuss
Copy link
Author

drasmuss commented Dec 5, 2022

Did a bit more digging. I think the issue is actually triggered by tensorboard. If you create an environment with flexible channel priority, e.g.

conda config --set channel_priority flexible
conda create -n tmp python=3.9 tensorflow-gpu=2.10 -c conda-forge

then you end up with tensorboard installed from the main channel (rather than conda-forge):

tensorboard        pkgs/main/linux-64::tensorboard-2.10.0-py39h06a4308_0 None

versus if you switch the above to conda config --set channel_priority strict, or set --override-channels, then you end up with

tensorboard        conda-forge/noarch::tensorboard-2.10.1-pyhd8ed1ab_0 None

Then when you do pip install tensorflow this leads to different outcomes. I don't know enough about the details of pip/conda to say exactly what's going on. But I'm guessing that something about the first case (perhaps because it's 2.10.0 instead of 2.10.1) causes pip to re-examine the tensorboard requirements. And even though it doesn't end up reinstalling tensorboard (since 2.10.0/2.10.1 both satisfy the tensorflow requirements), it does notice that there is an "incompatible" version of protobuf installed, which causes the downgrade to 3.x.

So to sum up, setting channel_priority strict does make the problem go away. But without understanding what triggers pip to "re-examine" the protobuf requirements, I'm not sure whether that's a relatively permanent fix or just dodges this particular situation.

In any case, appreciate your time looking into this!

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2022

Thank you for digging into this.

that "flexible" solve should have resulted in tensorboard being installed from our channel but maybe we are out of sync. That said, it is quite "random" what the solver might find.

I see that the tensorboard feedstock is at 2.11, while tensorflow (2.10 on conda-forge) requires tensorboard 2.10 which may not be fully up to date (even though nothing jumps to mind immediately)

I unfortunately do not have a quick answer for you.

Maybe you can try to to use strict priorities? From my experience, the issue will only get worse if you continue to "mix" channels. But that is just my opinion.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Dec 5, 2022

It seems that it may also be a difference between conda and mamba. I have not been using conda in a while since it is rather slow.

I can recreate the effect with:

conda config --set channel_priority flexible
CONDA_OVERRIDE_CUDA=12.0 conda create --name cftf tensorflow-gpu=2.10 python=3.9 --channel conda-forge --channel defaults --override-channels

I notice that the following packages are picked out from main:

tensorboard        pkgs/main/linux-64::tensorboard-2.10.0-py39h06a4308_0 None
keras              pkgs/main/linux-64::keras-2.10.0-py39h06a4308_0 None
libprotobuf        conda-forge/linux-64::libprotobuf-3.21.10-h6239696_0 None
tensorflow-gpu     conda-forge/linux-64::tensorflow-gpu-2.10.0-cuda112py39h0bbbad9_0 None
protobuf           conda-forge/linux-64::protobuf-4.21.10-py39h5a03fae_0 None

It maybe that it is preferring the architecture specific package on the main channel compared to the noarch package on the conda-forge channel for keras. I'm not sure how the solver should behave:

  1. Choose the package from the higher priority channel that is noarch
  2. Choose the package from the lower priority channel that has potential optimizations for the architecture at hand.

Unfortunately, I think that the inclusion of the main channel means that I don't think we can do very much at the feedstock stage to help with this issue.

We do want to allow users to mix and match (at least, I mix and match for my usecase), however, it is up to users to be careful when they mix with outside of conda-forge.

In your case, it would mean using strict channel priorities.

@hmaarrfk hmaarrfk changed the title tensorflow-gpu 2.10 installs incompatible protobuf version On tensorflow and incompatible protobuf version Dec 5, 2022
@h-vetinari
Copy link
Member

h-vetinari commented Dec 5, 2022

The issue is that protobuf changed to a very weird version scheme, where the minor number is the main number, and the major number for the C++ lib stayed the same (3) while the major number for python got bumped (4).

Protobuf is hard to distribute, so it's pinned quite tightly in the pip metadata, however this is not necessary because it "just" depends on correctly recompiling the code. This is something you wouldn't ask of your average user, but conda-forge can do it, and in fact must, because we need to rebuild all our ecosystem for a consistent protobuf version in order to be able to use shared libraries.

Since we still need to follow upstream versioning for the sake of keeping things manageable, we therefore progressed past the point where this major version bump happened (3.20 -> 4.21) and for conda-forge it was entirely uneventful.

The answer here is: don't mix channels, and certainly don't install anything with pip. If you cannot help the latter, then patch the metadata or use something like --no-deps. Better still, open a PR in https://www.github.com/conda-forge/staged-recipes to bring potentially missing libraries to conda-forge, which would avoid the problem completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants