Skip to content

Decentralized Asynchronous Training on Heterogeneous Devices

License

Notifications You must be signed in to change notification settings

ravenprotocol/ravnest

Repository files navigation

Ravnest

Documentation Status Documentation Status

Ravnest introduces a novel asynchronous parallel training approach that combines the best aspects of data and model parallelism. This method enables the distributed training of complex deep learning models across large datasets, utilizing clusters of heterogeneous consumer-grade PCs connected via the internet. Designed with scalability and performance as key objectives, Ravnest seeks to empower researchers and machine learning practitioners. It simplifies the development and deployment of deep learning models, paving the way for innovative research and practical real-world applications.

Documentation: https://ravnest.readthedocs.io

Research Paper: https://arxiv.org/abs/2401.01728

-----------------------------------------------------

Installation

pip install git+https://github.com/ravenprotocol/ravnest.git

Usage

Clone the Repository:

git clone https://github.com/ravenprotocol/ravnest.git

Generate the submodel files:

python cluster_formation.py

NOTE: Uncomment the correct lines in cluster_formation.py for CNN/ResNet-50/Inception-V3/GPT-Sorter/BERT models.

Execution of Clients (in 3 terminals) for CNN:

Create 3 copies of the provider.py file inside examples/cnn/ folder. Rename these files as provider_0.py, provider_1.py and provider_2.py. In each of these files, set the name parameter of Node() object to 'node_0', 'node_1' and 'node_2'.

python examples/cnn/provider_0.py
python examples/cnn/provider_1.py
python examples/cnn/provider_2.py

NOTE: If you have installed Ravnest via Pip, you will have to delete the entire ravnest subfolder in your cloned directory so that your scripts utilize methods and classes pointing to the pip installed library.

-----------------------------------------------------

Citation

If you have found Ravnest or its foundational components and algorithms to be beneficial in your research, please consider citing the following source:

@misc{menon2024ravnest,
      title={Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices}, 
      author={Anirudh Rajiv Menon and Unnikrishnan Menon and Kailash Ahirwar},
      year={2024},
      eprint={2401.01728},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}