- Package initialized.
- Initial release.
- Dramatically improved documentation.
- Added channel-distributed convolutional layer.
- Abstracted convolutional layer interface. It now auto-selects implementation.
- Added pre-forward hooks so that communication buffers are only allocated when the shape of the input tensor changes.
- Improved general consistency of layer structure and member names.
- Corrected use of dtype in internal buffers.
- Cleaned up partition API.
- Fixed a bug where MPI resources were not released.
- Removed assumption that transpose requires load-balanced input.
- Added smarter buffer re-use.
- Added distributed batch normalization layer.
- Added distributed upsampling interpolation layer.
- Reorganized code to follow standard PyTorch naming
- Fixed bugs related to invalid convolution arguments
- Improved convolution and pooling implementations to reduce constraints on inputs
- Added all-sum-reduce
- Added distributed loss functions
- Added initial GPU support for MPI backend (experimental)
- Moved from Travis-CI to GitHub Actions
- Multiple documentation fixes