Skip to content

Commit

Permalink
Merge pull request #203 from HewlettPackard/2.1.0
Browse files Browse the repository at this point in the history
Merge of 2.1.0 feature branch to master
  • Loading branch information
suresh-ls authored Oct 11, 2023
2 parents 7c14f20 + 25ffac0 commit b8310ac
Show file tree
Hide file tree
Showing 110 changed files with 13,607 additions and 2,012 deletions.
24 changes: 16 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# <d></d> <img style="float: right;" src="docs/images/GettyImages-1148109728_EAA-graphic-A_112_0_72_RGB.jpg?raw=true"/> SWARM LEARNING

#### Product version: 2.0.0
#### Product version: 2.1.0

Swarm Learning is a decentralized, privacy-preserving Machine Learning framework. This framework utilizes the computing power at, or near, the distributed data sources to run the Machine Learning algorithms that train the models. It uses the security of a blockchain platform to share learnings with peers in a safe and secure manner. In Swarm Learning, training of the model occurs at the edge, where data is most recent, and where prompt, data-driven decisions are mostly necessary. In this completely decentralized architecture, only the insights learned are shared with the collaborating ML peers, not the raw data. This tremendously enhances data security and privacy.
<img width="70%" height="70%" src="/docs/User/GUID-E80D248E-E754-498E-99D6-67508092F779-high.png">
<img width="70%" height="70%" src="/docs/User/GUID-899B556F-D33F-42D1-8D0D-37F191715709-high.png">

Swarm Learning framework is made up of various components known as nodes, such as Swarm Learning (SL) nodes, Swarm Network (SN) nodes, Swarm Learning Command Interface (SWCI) nodes, and Swarm Operator (SWOP) nodes. Each node of Swarm Learning is modularized and runs in a separate container. The **nodes represent different Swarm Learning _functionality_ and not physical server nodes**.

- SL nodes run the core of Swarm Learning. An SL node works in collaboration with all the other SL nodes in the network. It regularly shares its learnings with the other nodes and incorporates their insights. SL nodes act as an interface between the user model application and other Swarm Learning components. SL nodes take care of distributing and merging model weights in a secured way.

- SN nodes form the blockchain network. The current version of Swarm Learning uses an open-source version of Ethereum as the underlying blockchain platform. The SN nodes interact with each other using this blockchain platform to maintain and track progress. The SN nodes use this state and progress information to co-ordinate the working of the other swarm learning components.
- SN nodes form the blockchain network. The current version of Swarm Learning uses an open-source version of Ethereum as the underlying blockchain platform. The SN nodes interact with each other using this blockchain platform to maintain and track progress. The SN nodes use this state and progress information to co-ordinate the working of the other swarm learning components. Blockchain can be persisted across SN restart to preserve past progress network. User can lookup blockchain and see all history of operations. Users have the flexibility to stop Swarm after training is completed. Once user restarts the SN network, the existing history can be accessed using the `get` or `list` command of SWCI management interface.

**Sentinel Node** is a special SN node. The Sentinel node is responsible for initializing the blockchain network. This is the first node to start.

Expand All @@ -20,7 +20,7 @@ Swarm Learning framework is made up of various components known as nodes, such a
- SWCI node is the command interface tool to the Swarm Learning framework. It is used to monitor the Swarm Learning framework. SWCI nodes can connect to any of the SN nodes in a given Swarm Learning framework to manage the framework.
For more information on SWCI, see [Swarm Learning Command Interface](./docs/User/Swarm_Learning_Command_Interface.md).

- SWOP is an agent that can manage Swarm Learning operations. SWOP is responsible to execute tasks that are assigned to it. A SWOP node can execute only one task at a time. SWOP helps in executing tasks such as starting and stopping Swarm runs, building and upgrading ML containers, and sharing models for training. For more information about SWOP, see [Swarm Operator node \(SWOP\)](./docs/User/Swarm_Operator_node_(SWOP).md).
- SWOP node is an agent that can manage Swarm Learning operations. SWOP is responsible to execute tasks that are assigned to it. A SWOP node can execute only one task at a time. SWOP helps in executing tasks such as starting and stopping Swarm runs, building and upgrading ML containers, and sharing models for training. For more information about SWOP, see [Swarm Operator node \(SWOP\)](./docs/User/Swarm_Operator_node_(SWOP).md).

- Swarm Learning security and digital identity aspects are handled by X.509 certificates. Communication among Swarm Learning components are secured using X.509 certificates. User can either generate their own certificates or directly use certificates generated by any Standard Security software such as SPIRE. For more information on SPIRE, see [https://thebottomturtle.io/Solving-the-bottom-turtle-SPIFFE-SPIRE-Book.pdf](https://thebottomturtle.io/Solving-the-bottom-turtle-SPIFFE-SPIRE-Book.pdf) and [https://spiffe.io/](https://spiffe.io/).

Expand Down Expand Up @@ -61,9 +61,12 @@ NOTE: All the ML nodes must use the same ML platform either Keras (based on Tens
1. [Prerequisites](/docs/Install/Prerequisites.md) for Swarm Learning
2. [Upgrading from earlier evaluation versions](/docs/Install/Versioning_and_upgrade.md)
3. [Download and setup Swarm Learning](/docs/Install/HPE_Swarm_Learning_installation.md) using the SLM-UI installer
4. Execute [MNIST example](/examples/mnist/README.md)
5. [Frequently Asked Questions](/docs/User/Frequently_asked_questions.md)
6. [Troubleshooting](/docs/User/Troubleshooting.md)
4. Execute [MNIST example](/examples/mnist/README.md)
5. [Running MNIST example using SLM-UI](/docs/User/Running_MNIST_example_using_SLM-UI.md)
6. [Monitoring Swarm Learning training using SLM-UI](/docs/User/Monitoring_Swarm_Learning_training_using_SLM-UI.md)
7. [Frequently Asked Questions](/docs/User/Frequently_asked_questions.md)
8. [Troubleshooting](/docs/User/Troubleshooting.md)
9. [Release Notes](/docs/HPE_Swarm_learning_2.1.0_Release_Notes.pdf)

<blockquote>

Expand All @@ -84,9 +87,14 @@ NOTE: **Accessing Hewlett Packard Enterprise Support** clause and **Concurrent s
- [Using SWOP](/docs/User/Swarm_Operator_node_(SWOP).md)
- [Running Swarm learning examples using SLM-UI](/docs/Install/Running_Swarm_Learning_examples_using_SLM-UI.md)
- [Running Swarm Learning using CLI](/docs/Install/Running_Swarm_Learning_using_CLI.md)
- [Running Swarm Learning with SE Linux](/docs/Install/Running_Swarm_with_SE_Linux.md)
- [Running Swarm Learning with Podman](/docs/Install/Running_Swarm_Learning_with_Podman.md)
- [Examples](/examples/README.md)
- [Swarm Learning Log Collection](/docs/User/Swarm_Log_Collector.md)
- [Swarm Learning diagnostics using CLI](/docs/User/Swarm_Log_Collector.md)
- [Centralized Swarm diagnostics using SLM-UI](/docs/Install/Centralized_Swarm_diagnostic.md)
- [Extending Swarm Learning for new ML platforms](/lib/src/README.md)
- [Merge Methods - Whitepaper](/docs/HPE_Merge_Methods_Whitepaper.pdf)
- [Uninstalling Swarm Learning using SLM-UI](/docs/Install/Uninstalling_Swarm_Learning_using_SLM-UI.md)

## References

Expand Down
Binary file modified docs/HPE AutoPass License Server User Guide.pdf
Binary file not shown.
Binary file added docs/HPE_Merge_Methods_Whitepaper.pdf
Binary file not shown.
Binary file modified docs/HPE_Swarm_Installation_and_Configuration_Guide.pdf
Binary file not shown.
Binary file removed docs/HPE_Swarm_Learning_Release_Notes.pdf
Binary file not shown.
Binary file modified docs/HPE_Swarm_Learning_User_Guide.pdf
Binary file not shown.
Binary file added docs/HPE_Swarm_learning_2.1.0_Release_Notes.pdf
Binary file not shown.
25 changes: 0 additions & 25 deletions docs/Install/Adding_a_Swarm_Host_in_SLM-UI.md

This file was deleted.

26 changes: 26 additions & 0 deletions docs/Install/Centralized_Swarm_diagnostic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Centralized Swarm diagnostic

Centralized Swarm diagnostics utility can be used to collect and upload the logs from all the hosts associated with a project. This can be sent to HPE to report Swarm issues.

![Centralized Swarm Diagnostics](GUID-EA3ED67E-52AD-464F-B126-E21C6F835125-high.png)

1. In the **Projects** tab, click **Collect Log** icon.

![MyProjects](GUID-E0930943-5847-4F31-8231-D1AD21862F1D-high.png)

2. Select **Host Address** from the Host Address drop-down menu.

![Host Address](GUID-31258354-DEB0-428E-BC00-B706BD0BFCBA-high.png)

**NOTE:**

The sshpass utility must be installed in the VM. `\(apt install sshpass\)`.

3. Click **Create** to complete the Log Collection.

![Log Collection Completed](GUID-5383ABE4-9B19-40A1-83CB-47F605828833-high.png)


**Parent topic:**[Running Swarm Learning examples using SLM-UI](Running_Swarm_Learning_examples_using_SLM-UI.md
)

2 changes: 1 addition & 1 deletion docs/Install/Configuring_the_License_Settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

3. Click **Update** to update the License server.

![License Settings](GUID-2DC5F818-A512-4A5E-AECE-C5635663C7E7-high.png)
![License Settings](GUID-22FAC22D-C266-49F6-B2F9-8B0FDCE07DBD-high.png)


**Parent topic:**[Managing the Global Settings](Managing_the_Global_Settings.md)
Expand Down
2 changes: 1 addition & 1 deletion docs/Install/Configuring_the_Swarm_Settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

5. Select the **Set as evaluation** checkbox if you are using the community version of Swarm Learning.

For more information, see [Versioning and upgrade](GUID-2E350669-7E5A-47BC-AB15-58AC4CFAD9C1.md) section.
For more information, see [Versioning and upgrade](Versioning_and_upgrade.md) section.

6. Click **Create** to create the Swarm version.

Expand Down
2 changes: 1 addition & 1 deletion docs/Install/Creating_a_Project_in_SLM-UI.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
- In a two node example, the network names should be `host-1-net` for the sentinel node and `host-2-net` for the non-sentinel node, respectively.
5. Click **Save Project** to create the Project.

![My Projects](GUID-18FA0377-F495-443F-BD38-76AEACC22D98-high.png)
![My Projects](GUID-E0930943-5847-4F31-8231-D1AD21862F1D-high.png)


**Parent topic:**[Running Swarm Learning examples using SLM-UI](Running_Swarm_Learning_examples_using_SLM-UI.md)
Expand Down
15 changes: 13 additions & 2 deletions docs/Install/Environment_variables.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
# <a name="GUID-C1705ADA-8DC5-47D6-B22D-EEDD2F938059"/> Environment variables

The environment variables are passed to containers or added to the environment variable through profile or configuration files. The following environment variables are available to set and modify:
The environment variables are passed to containers or added to the environment variable through profile or configuration files. <br>
<blockquote>

**Note:**

Environment variables starting with a Swarm component name (for example, SN_, SL_) are meant for those particular components. Environment variables starting without a Swarm component name are meant for all Swarm components.

</blockquote>


The following environment variables are available to set and modify:

|<strong>Environment variable name</strong>|<strong>Description</strong>|
|------------------------------------------|----------------------------|
Expand All @@ -11,8 +21,9 @@ The environment variables are passed to containers or added to the environment v
|`SN_I_AM_SENTINEL`| Sets a Swarm Network node to become the Sentinel node, only when it is set to true.<br> Default value: False<br> |
|`SN_START_MINING`| Starts mining on non-sentinel nodes. \(Optional\)<br> Default value: False<br> |
|`SL_MAKE_ME_ADMIN`| Determines whether an SL node can participate in leader election or not. \(Optional\)<br> Default value: True<br> If SL_MAKE_ME_ADMIN is set to ‘False’, the corresponding SL node will not participate in leader election. If user doesn’t want to make a slow node (with less compute power, network band width etc) as a leader, then this can be set to ‘False’. |
|`SL_LEADER_FAILURE_BASE_TIMEOUT`|Sets the minimum timeout value \(in seconds\). If Swarm merging does not happen within this timeout, a new SL leader node is selected. The swarm training continues to run, regardless of SL leader node failures. This timeout will kickin after `min_peers` nodes have completed their local training. <br> Default value: 120 seconds. <br>This variable may need tunning depending on the ML application complexity.|
|`SL_LEADER_FAILURE_BASE_TIMEOUT`|Sets the minimum timeout value \(in seconds\). If Swarm merging does not happen within this timeout, a new SL leader node is selected. The swarm training continues to run, regardless of SL leader node failures. This timeout will kickin after `min_peers` nodes have completed their local training. <br> Default value: 600 seconds. <br>This variable may need tunning depending on the ML application complexity.|
|`SL_WAIT_FOR_FULL_QUORUM_SECONDS`|Sets the maximum time for an SL leader node to wait for full quorum after minPeers are ready for merge. This parameter lets you to maximize the number of peers participating in the merge process.<br>Default value: 30 secs|
|`SL_RAM_INTENSIVE`|Optimizes the usage of RAM in the SL leader node for coordinate and geometric median merge methods. Unlike mean merge method, coordinate and geometric median merge methods involve memory intensive operations. If SL Leader node has limited hardware \(RAM\) configuration, then merging the intermediate model parameters using the median methods can result in memory issues. For such scenarios, user can set up the SL\_RAM\_INTENSIVE flag to 'False' for merging the model parameters layer by layer. This 'False' option is based on I/O operations and is time consuming, hence the default option is set to 'True'.<br> User can pass this parameter in slenvvars option within SWOP profile. This option can be different for each SL node depending on its hardware capacity. Example: 'slenvvars : \[SL\_RAM\_INTENSIVE : False\]' <br> Default value: True|
|`SWCI_TASK_MAX_WAIT_TIME`|Specifies a maximum timeout value for the completion of a task.<br>This value must be set in minutes, and the default is 120 mins (2 hours).
|`SWCI_MODE`| Enables SWCIs web interface instead of command line interface. Allowed values are CLI and WEB.<br> Default value: CLI<br> |
|`SWCI_STARTUP_SCRIPT`|This is a default start script of SWCI.|
Expand Down
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 3 additions & 7 deletions docs/Install/HPE_Swarm_Learning_installation.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
# <a name="GUID-01199457-73B6-45F3-99FC-164E4B25A0A3"/> HPE Swarm Learning installation

1. [Download the License Server](Download_the_License_Server.md)
2. [Install the License Server and download Swarm Learning](Install_the_License_Server.md) from MY HPE SOFTWARE CENTER home page.
3. Installing Swarm Learning is a two-step process.
- Using SLM-UI Installer, you can install the Swarm Learning Managment UI (SLM-UI) on one host. For more information, see [Running SLM-UI Installer](Running_SLM-UI_Installer.md).
- Using SLM-UI, you can install Swarm Learning in multiple hosts. For more information, see [Adding a Swarm Host in SLM-UI](Adding_a_Swarm_Host_in_SLM-UI.md).


1. [Installing the License Server](Install_the_License_Server.md)
2. [Installing HPE Swarm Learning Management UI](Installing_HPE_Swarm_Learning_Management_UI(SLM-UI).md)
3. [Installing Swarm Learning using SLM-UI](Installing_Swarm_Learning_using_SLM-UI.md)
Loading

0 comments on commit b8310ac

Please sign in to comment.