-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why use float_64 for gradient #2
Comments
Hey Kaiyu, @Stonesjtu. Thanks for pointing this out. There was an issue I faced when developing this prototype that Thanks! |
I have tested np.float32 without a problem. And I don't quite understand what |
Sorry for being confusing @Stonesjtu. The issue I mentioned was related to this line, which I wrote for an old version where there wasn't any gradient compression strategy and each worker just send the raw gradient matrices as numpy array. To send numpy array directly, mpi4py provides a series of APIs with capital character e.g. However, all of the forgoing stuff is with respect to an old version. And you're right, the new version with gradient compression works with According to my test (on a cluster with 17 m4.2xlarge instances of AWS EC2, 1 parameter server + 16 workers), changing from Thanks a lot for your contribution! |
So, will you try |
Actually, I think what's interesting is to add a If that's what you're suggesting, then yes, I'm planning on it. Please feel free to do it if you want, any PR is appreciated. |
I do think simply transferring |
Hi wang,
I'm just wondering why to convert the gradient Tensor into
float64
, I thought they might be justfloat32
. And it should be more accurate than SGD required.ps_pytorch/src/distributed_worker.py
Line 258 in 89a1cfa
The text was updated successfully, but these errors were encountered: