Spam Detection using NLP
- Install all the required libraries using
pip install -r requirements.txt
- Type in
python -m streamlit run GUI/homepage.py
- Type in a sentence to be used for determining whether or not it will be spam or ham
- Clear results button to clear results and reset session state
- Users are able to drag and drop or upload their own
.csv
files for entry with multiple input sentences
The best ML model for NLP will be highlighted in fluorescent green.
In our spam detection model, a `TF-IDF Vectorizer` is used to convert the preprocessed text data into numerical features. This vectorisation technique calculates the term frequency and inverse document frequency for each word in the dataset to represent the text data as a matrix of numerical values. This allows us to capture the importance of each word in relation to the document and the entire corpus. The `TF-IDF Vectorizer` not only helps in reducing the dimensionality of the text data but also enhances the model's ability to distinguish between spam and non-spam messages based on the significance of the words used.