-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apple Silicon GPU compatibility for Tensorflow #2184
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @comane for this! Good to see that this is indeed what's needed to make it run.
Out of curiosity, how is the performance (how many replicas could you run, etc.)?
.. code-block:: bash | ||
|
||
conda install -c apple tensorflow-deps | ||
pip install tensorflow-macos==2.13.0 tensorflow-metal wandb==0.15.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to really pin this version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is related to this issue here: wandb/wandb#5935
I was not able to make it run on MaC M2 GPUs with other versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a reference to that issue in the docs?
The performance is not really good, at least on my laptop, it takes longer than using cpus. But this might be different for someone else with a more powerful mac |
Out of curiosity, it doesn't work on M3 at all or you only had M1 and M2 to test? |
I only tested for M2, but the above mentioned issue is for M1. But I assume it works for M3 as well. |
When you say longer, how much is it? With how many replicas? (Maybe you are hitting memory bottleneck?) But in any case, if it is not 4/5 times slower I'd say that's still good because you get all the replicas at then same time. |
If @ecole41 can test it that would be great. I'd suggest anyway changing from M1/M2 to something along the lines of "Apple Sillicon".
I would say even 4/5 is still good. In my case, I can run entire fits in ~3 hours in my desktop's GPU, while a single replica takes about 40 minutes. It's about 5 times more but when the cluster is busy is the difference between having the fits ready in the same morning or one day later. |
That's absolutely true! My threshold was really pessimistic 😅 |
Running with 10 replicas only on GPUs takes 15 minutes to get to epoch 4400 / 17000. If I run on cpu the same thing (still on my laptop) it takes 2 min 45 sec.
@scarlehoff when you say on your desktop do you mean a MaC Os? |
Nop, a linux desktop with an nvidia gpu (at some point I tried it as well with an AMD one and it worked fwiw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing this (and adding it to the docs!!!)
.. code-block:: bash | ||
|
||
conda install -c apple tensorflow-deps | ||
pip install tensorflow-macos==2.13.0 tensorflow-metal wandb==0.15.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a reference to that issue in the docs?
Hello @scarlehoff @comane @Radonirinaunimi , I tested this on M3 GPUs and it worked, the performance on GPUs was much slower than on CPUs on my Mac.
|
In CPU you also ran 100 replicas or is this 1 replica in CPU vs 100 in GPU? |
Both GPU and CPU for 100 replicas |
Is that the timing only for the 200 epochs or does it include overhead? |
I think it includes overhead. But in any case, it seems that running a fit on a Mac is not really going to be doable just yet :( Maybe there's some low hanging fruit to improve it but not sure the effort is worth it |
This pull request includes updates to the
doc/sphinx/source/n3fit/runcard_detailed.rst
file to clarify instructions for running parallel models and using GPUs on M1/M2 Macs.Updates to parallel model instructions:
savepseudodata
must be set tofalse
in thefitting
section of the runcard to run with parallel models. (doc/sphinx/source/n3fit/runcard_detailed.rst
)Updates for GPU usage on M1/M2 Macs:
tensorflow-deps
,tensorflow-macos
,tensorflow-metal
, andwandb
) to run replicas in parallel using GPUs on M1/M2 Macs. (doc/sphinx/source/n3fit/runcard_detailed.rst
)