distributed train never hit if #8

mambo06 · 2023-07-18T02:07:43Z

so i run train_dist.py and add print(param) under if section
`

  if type(param) is torch.Tensor:

        print(param)

        dist.all_reduce(param.grad.data, op=dist.reduce_op.SUM, group=0)

`
param never printed which means all_reduce never called.

The text was updated successfully, but these errors were encountered:

TomsyPaul · 2024-12-05T08:22:41Z

I even commented out
# average_gradients(model)
line in run() and there is no difference in output!!

TomsyPaul · 2024-12-19T08:03:57Z

This line can be commented out, I think.
# if type(param) is torch.Tensor:

Provide feedback