Skip to content

Latest commit

 

History

History

text_classification

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Text Classification

This folder contains examples of text classification models.

What is Text Classification?

Text classification is a supervised learning method of learning and predicting the category or the class of a document given its text content. The state-of-the-art methods are based on neural networks of different architectures as well as pre-trained language models or word embeddings.

https://github.com/microsoft/nlp-recipes/blob/master/examples/text_classification/README.md

Summary

Notebook Environment Description ACC
TF-IDF & Logistic Regression Local Logistic Regression with TF-IDF vectors 0.9308
TF-IDF & LightGBM Local LightGBM with TF-IDF vectors 0.9512
BERT 'cl-tohoku/bert-base-japanese-v2' Local Transformers BERT 0.9362
BERT 'cl-tohoku/bert-base-japanese-char-v2' Local Transformers BERT 0.9274
BERT 'cl-tohoku/bert-base-large' Local Transformers BERT -
T5 Local T5 for japanese 0.9566

Accuracy scores (ACC) are calculated by running code only in fold 0 in the condition that datasets are devided into train/val/test at the rate of 0.6/0.2/0.2. Be careful that the scores are highly affected by the way of splitting dataset and hyperparameters like the number of epochs.