Skip to content

This project involve sentiment classification of amazon reviews with transformers

License

Notifications You must be signed in to change notification settings

BetikuOluwatobi/sentiment_analysis_with_transformers

Repository files navigation

Fine-Tuning BERT Models with TensorFlow Hub and Hugging Face Transformers

This project focuses on fine-tuning BERT models using TensorFlow Hub and Hugging Face Transformers on the Quora Insincere Questions dataset. The dataset contains question text and a target feature for each question, with the aim of classifying questions as either sincere or insincere. The dataset, available on Kaggle, comprises over 1 million samples, but due to limited GPU resources, a fraction of the data (approximately 10,000 samples for training and 1,000 samples for validation) was used for fine-tuning.

Data Preparation and Input Pipeline

To handle data preparation, Google's Colaboratory was utilized, and the data was read using Pandas from an archive on Archive.org. The dataset was then split into training and validation samples using train_test_split. The TensorFlow data.Dataset input pipeline was employed to convert the text input into features acceptable by BERT, such as input_ids and attention_mask. The input text length was reduced to a maximum of 128, as opposed to the original 512 for BERT, to enhance processing speed on limited GPU resources. The input pipeline included features like prefetching to overlap preprocessing and model execution for improved efficiency.

Model Architecture

For the model architecture, the Hugging Face Transformers library was utilized to import the pre-trained BERT-based-uncased model. The classifier head was added on top of BERT, consisting of a dense layer with a ReLU activation function, followed by a dropout layer to prevent overfitting. Finally, a dense layer with a sigmoid activation function was used as the output layer to predict the binary classification of questions as sincere or insincere.

Results

The fine-tuned BERT model was evaluated and predictions were generated on the validation set.

Conclusion

This project demonstrates how fine-tuning BERT models can effectively classify questions as sincere or insincere. By employing TensorFlow Hub and Hugging Face Transformers, we effectively handled data preparation, created a powerful input pipeline, and built a robust model architecture. We can utilize the limited GPU resources in Google's Colaboratory to optimize performance within constrained environments.

About

This project involve sentiment classification of amazon reviews with transformers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published