This folder contains examples of text classification models.
Text classification is a supervised learning method of learning and predicting the category or the class of a document given its text content. The state-of-the-art methods are based on neural networks of different architectures as well as pre-trained language models or word embeddings.
https://github.com/microsoft/nlp-recipes/blob/master/examples/text_classification/README.md
Notebook | Environment | Description | ACC |
---|---|---|---|
TF-IDF & Logistic Regression | Local | Logistic Regression with TF-IDF vectors | 0.9308 |
TF-IDF & LightGBM | Local | LightGBM with TF-IDF vectors | 0.9512 |
BERT 'cl-tohoku/bert-base-japanese-v2' | Local | Transformers BERT | 0.9362 |
BERT 'cl-tohoku/bert-base-japanese-char-v2' | Local | Transformers BERT | 0.9274 |
BERT 'cl-tohoku/bert-base-large' | Local | Transformers BERT | - |
T5 | Local | T5 for japanese | 0.9566 |
Accuracy scores (ACC) are calculated by running code only in fold 0 in the condition that datasets are devided into train/val/test at the rate of 0.6/0.2/0.2. Be careful that the scores are highly affected by the way of splitting dataset and hyperparameters like the number of epochs.