A tool to visualize language models; preleminary experiments so far.
Prepare the data using notebooks in ./pp
- Split and pack data
- Preprocess
Run the baselines in ./baselines
- Use notebooks in ./pp for additional preprocessing, if necessary
- Run the baselines in the notebooks
Top baseline with no balancing of the data or any special tricks, just base TfIdf is NB-SVM (variation of J. Howard's kaggle implementation for multi-label problems):
accuracy_score: 0.371
roc_auc_score: 0.766
hamming_loss: 0.221
- Preprocess using respective notebooks in '.', if needed
- Train/finetune language models
- Train classifiers