To create a Docker Linux DevContainer that supports Tensorflow GPU without frustrating and complicated local installation, especially on Windows platforms.
The current version willl create the following setup:
- Python
3.11
- Tensorflow
2.15.0
- CUDA
12.2
+ cuDNN
The current version utilizes
tensorflow[and-cuda]
to install compatible CUDA/cuDNN on a regular Python container. My original and previous version used a NVIDIA CUDA image to install Python and cuDNN.For now Tensorflow
2.16.1
cannot be installed currectly. And, I still have no success to have installed TensorRT detected. Let me know if you managed to get it running!
The current version has been tested on:
- A Windows 11 gaming laptop with a built-in RTX 3070 Ti
I've tried with my another laptop connecting to a eGPU (GTX 1660 Ti) without success so far.
- An amd64 (x64) machine with a CUDA-compatible NVIDIA GPU card
- Docker engine or Docker Desktop (and setup .wslconfig to use more cores and memory than default if you are on Windows.)
- Latest version of the NVIDIA graphic card driver
- NVIDIA Container Toolkit (which is already included in Windows’ Docker Desktop)
- Visual Studio Code with DevContainer extension installed
See here for more detailed hardware and system requirements of running Tensorflow.
Note: some deep learning models require more GPU memory than others and may cause the Python kernel to crash. You may need to try setting a smaller training batch size.
Some older cards may only support older drivers thus older CUDA versions. See the CUDA Support Matrix for details, although I cannot test if
tensorflow[and-cuda]
will install older libraries automatically.
Download the repo as a .zip file, unzip then open the folder in VS Code, or to use Git:
git clone https://github.com/alankrantas/tensorflow-cuda-gpu-devcontainer.git
cd tensorflow-cuda-gpu-devcontainer
code .
Modify requirements.txt to include packages you'd like to install. ipykernel
is required for executing IPython notebook cells in VS Code.
Note: VS Code/Git may messed up the new line characters of
.devcontainer/install-dev-tools.sh
and.devcontainer/requirements
which will cause the DevContainer failed to run the scripts.Click
CRLF
on the button right and change them toLF
.
In VS Code, press F1
to bring up the Command Palette, and select Dev Containers: Open Folder in Container...
Wait until the DevContainer is up and running (nvidia-smi
is executed):
Wed May 1 03:24:54 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.76.01 Driver Version: 552.22 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3070 ... On | 00000000:01:00.0 Off | N/A |
| N/A 46C P0 29W / 130W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Although
nvidia-smi
informed us the current NVIDIA driver uses CUDA12.4
, it doesn't mean Tensorflow supports it. This is why usingtensorflow[and-cuda]
is easier to use existing NVIDIA CUDA/cuDNN images.
Then open a new terminal (Terminal
-> New Terminal
) to test if the Tensorflow detects the GPU correctly:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
If success, you should see something like
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Afterwards, simply open the folder in VS Code to restart the built container.
Test run using the example file - open autokeras-test.py
and seletct Run
-> Run Without Debugging
, or run in the terminal
python3 autokeras-test.py
Which generates a result like below:
Trial 1 Complete [00h 03m 51s]
val_loss: 0.03894084319472313
Best val_loss So Far: 0.03894084319472313
Total elapsed time: 00h 03m 51s
...
Prediction loss: 0.0315
Prediction accuracy: 0.9910
precision recall f1-score support
0 0.99 1.00 0.99 980
1 1.00 1.00 1.00 1135
2 0.99 0.99 0.99 1032
3 0.99 1.00 0.99 1010
4 0.99 0.99 0.99 982
5 0.99 0.99 0.99 892
6 1.00 0.98 0.99 958
7 0.99 0.99 0.99 1028
8 0.99 0.99 0.99 974
9 0.99 0.99 0.99 1009
accuracy 0.99 10000
macro avg 0.99 0.99 0.99 10000
weighted avg 0.99 0.99 0.99 10000
Or open autokeras-test.ipynb
, run the cells or select Run All
on top of the notebook.