-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running on GPU: CUDA_ERROR_NO_DEVICE #44
Comments
Does it use the GPU when you run your training code locally ? You can also try to set CUDA_VISIBLE_DEVICES to see if that changes anything.
I would also execute
|
Hello, Could it be a mismatch between CUDA and TensorFlow versions? For example, there is CUDA 9.0 and TensorFlow 1.6 that requires CUDA 10.0
|
@fhoering as suggested I've used the interactive mode, unfortunately without success. Software installed on GPU Datanode:
Yarn container log:
|
Hi @akimboyko, |
I downloaded the pex file from HDFS and executed it on the Datanode.
It works and that's the output.
|
@cguegi |
Hello
I have a properly configured GPU node with Nvidia / Cuda drivers as well as the Cuda toolkit.
nvidia-smi as well as Cuda samples such as deviceQuery and bandwithTest run.
Tensorflow locally executed detects the GPU device with
python -c "import tensorflow as tf;tf.config.list_physical_devices('GPU')"
As described here the Yarn node label “gpu” exits and is associated to above node.
For test purposes I modified keras_example.py as follows:
The worker.log shows that no GPU has been detected:
Neither nvidia-smi nor YARN RM ui show processes on the GPU. Hence the CPU is used for processing.
Any ideas or hints how to further debug and solve this issue?
Many thanks in advance!
The text was updated successfully, but these errors were encountered: