In order to reproduce the results given in the paper, you need to install all libraries in the requirements.txt in Python 3.8. Once all the libraries are installed, you can go through one of the following two routes -
-
Training and Inference -
- For pre-training using MLM, execute "Domain Adaptation.py"
- For fine-tuning sentence classification using labeled data after step 1, execute Classification.py
- For fine-tuning sentence classification using labeled data without step 1, execute training_wo_mlm.py
- To get the predictions after Step 2, execute inference_w_mlm.py
- To get the predictions after Step 3, execute inference_wo_mlm.py
- Use Word_Distribution_Analysis.ipynb to generate charts in the "PragTag 2023 - Vocabulary Analysis" paper.
- To obtain performance of model fined tuned on model pre-trained with MLM on out of split data, execute inference_w_mlm_cv.py
- To obtain performance of model fined tuned without pre-training on MLM on out of split data, execute inference_wo_mlm_cv.py
-
Inference -
- To get the predictions from models trained after MLM pre-training, execute inference_w_mlm.py
- To get the predictions from models trained without MLM pre-training, execute inference_wo_mlm.py
- Use Word_Distribution_Analysis.ipynb to generate charts in the "PragTag 2023 - Vocabulary Analysis" paper.