Skip to content

Predicting which tweets are about real disasters. Using Bag-of-Words, TF-IDF Vectors, Naive Bayes, Linear Discriminant Analysis, Truncated SVD, custom tokenizer, lemmatization, GridSearchCV.

Notifications You must be signed in to change notification settings

bilge-karaca/NLP_with_Disaster_Tweets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Natural Language Processing with Disaster Tweets

The aim of this project is to predict which tweets are about real disasters and which ones are not.

For this project I use the following methods:

  • Bag of words (CountVectorizer, TfidfTransformer etc.)
  • Defining a custom tokenizer that also includes lemmatization. As an alternative, nltk's TweetTokenizer.
  • Classification Method 1: Multinomial Naive Bayes
  • Classification Method 2: Linear Discriminant Analysis (LDA)
  • TruncatedSVD along with LDA to avoid potential dimensionality problems
  • Optimal hyperparameter search through GridSearchCV for LDA

Link to Kaggle competition: Addison Howard, devrishi, Phil Culliton, Yufeng Guo. (2019). Natural Language Processing with Disaster Tweets. Kaggle. https://kaggle.com/competitions/nlp-getting-started

About

Predicting which tweets are about real disasters. Using Bag-of-Words, TF-IDF Vectors, Naive Bayes, Linear Discriminant Analysis, Truncated SVD, custom tokenizer, lemmatization, GridSearchCV.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published