GitHub - samadeep/federated_learning_BERT: Email Spam Classification using BERT on flower : This project demonstrates how to use BERT, a pre-trained deep learning model, for email spam classification using Flower

Project Overview

Title: Email Spam Classification using BERT with Federated Learning (Flower)
Description: This project demonstrates how to leverage the power of BERT (Bidirectional Encoder Representations from Transformers), a pre-trained deep learning model, for email spam classification using Flower, a federated learning framework.

Key Concepts

BERT: A pre-trained transformer model adept at natural language understanding tasks.
Federated Learning: A collaborative machine learning approach where training data remains distributed across devices or servers, enhancing privacy and security.
Flower: A framework that simplifies federated learning workflows.

Data Preparation

Data Source: Describe the email spam classification dataset (e.g., publicly available, custom collection).
Preprocessing: Explain how you prepare the data for BERT:
- Text cleaning (removing noise, punctuation)
- Tokenization (breaking text into smaller units)
- Padding/truncating sequences to a uniform length
- Converting tokens to numerical representations (using BERT's vocabulary)

Model Definition

Model Selection: Specify the chosen pre-trained BERT model variant (e.g., bert-base-uncased).
Fine-Tuning: Explain how you adapt the pre-trained model for email spam classification:
- Adding a custom final layer (e.g., sigmoid for binary classification)
- Freezing or fine-tuning specific BERT layers based on complexity and dataset size

Federated Learning with Flower

Client Simulation: Describe how you simulate multiple clients within a single machine using Flower:
- Distribution of training data to simulated clients (e.g., random or stratified sampling)
- Communication between clients and the central server (Flower) for model updates

Federated Learning Workflow (Dot Diagram)

digraph G {
    
    // Nodes
    A [label="Clients (Simulated)"];
    B [label="Local Models"];
    C [label="Global Model"];
    A1 [label="Weighted Averaging"];
    A2 [label="Median Aggregation"];
    A3 [label="Robust Aggregation"];
    A4 [label="Differential Privacy"];
    A5 [label="Secure Aggregation"];
    A6 [label="Homomorphic Encryption"];
    A7 [label="Efficient Communication"];
    A8 [label="Asynchronous Updates"];

    // Edges
    A -> B [label="Train on Local Data"];
    B -> C [label="Aggregate Updates"];
    C -> A [label="Update Clients"];

    // Subgraphs
    subgraph cluster_0 {
        label = "Custom Aggregation Strategy";
        A1; A2; A3;
    }

    subgraph cluster_1 {
        label = "Data Privacy and Security";
        A4; A5; A6;
    }

    subgraph cluster_2 {
        label = "Communication and Efficiency";
        A7; A8;
    }

}

Running the Project

Prerequisites : List required libraries (e.g., transformers, flower) and installation instructions. Customization : Explain how to modify the main.py file to adjust training parameters:

Number of clients
Training epochs per round
Number of federated learning rounds
Client script filename
Dataset size per client
Execution: Provide clear instructions on running the project with the modified main.py file:
Command to execute
```
  python3 main.py
```

Expected output (training progress, evaluation metrics)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
diagrams		diagrams
Final_Results.csv		Final_Results.csv
README.md		README.md
client.py		client.py
datamaker.py		datamaker.py
main.py		main.py
poisonData.py		poisonData.py
server2.py		server2.py
spamdata_v2.csv		spamdata_v2.csv
test_data.csv		test_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Running the Project

About

Releases

Packages

Languages

samadeep/federated_learning_BERT

Folders and files

Latest commit

History

Repository files navigation

Running the Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages