Skip to content

Latest commit

 

History

History
188 lines (170 loc) · 5.47 KB

README.md

File metadata and controls

188 lines (170 loc) · 5.47 KB

pillars-of-gec

This repository provides code, state-of-the art predictions and links to the pretrained Grammatical Error Correction models for "Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models" paper which was accepted for publication at BEA-2024 (19th Workshop on Innovative Use of NLP for Building Educational Applications; co-located with NAACL 2024).

Structure

Scripts directory contain required code to reproduce some of the baselines and build ensembles.
Data directory contain single systems and ensembles outputs on 3 main GEC benchmarks.
Table bellow contain single system scores and links to trained models available for download.

Pretrained models and results

Model name CoNNL-2014 (test) BEA-2019 (dev) BEA-2019 (test)
Precision Recall F05 Precision Recall F05 Precision Recall F05
CTC-copy [repo] 72.6 47.0 65.5 58.2 38.0 52.7 71.7 59.9 69.0
GECToR-2024 [link] 75.0 44.7 66.0 64.6 37.2 56.3 77.7 59.0 73.1
EditScorer [repo] 78.5 39.4 65.5 67.3 36.1 57.4 81.0 56.1 74.4
T5-11B [link] 70.9 56.5 67.5 60.9 51.1 58.6 73.2 71.2 72.8
UL2-20B [link] 73.8 50.4 67.5 60.5 48.6 57.7 75.2 70.0 74.1
Chat-LLaMa-2-7B-FT [link] 75.5 46.8 67.2 58.3 46.0 55.3 72.3 67.4 71.2
Chat-LLaMa-2-13B-FT [link] 77.2 45.6 67.9 59.8 46.1 56.4 74.6 67.8 73.1
Majority-voting ensemble (best 7) 83.7 45.7 71.8 71.7 42.2 62.9 87.3 64.1 81.4
MAJORITY-VOTING ✚[ majority-voting(best 7), GRECO-rank-w(best 7), GPT-4-rank-a(clust 3) ] 83.9 47.5 72.8 70.6 43.5 62.8 86.1 65.6 81.1

Evaluation

There are 3 evaluation sets that we are using for GEC:

  1. CoNLL-2014 (nucle14-2a, m2 file is available; m2scorer is official scorer)
  2. BEA19-dev (bea-dev, m2 file is available; errant is official scorer)
  3. BEA19-test (bea-test, m2 file is NOT available; score can be got only through codelab sumbission)

Examples of evaluation

Evalsest directory: data/evaluation_sets.

  1. Example of evaluation with Errant
ERRANT_SCORER=path_to_errant_scorer_directory
INPUT_FILE=data/evaluation_sets/bea-dev.txt
M2_FILE=data/evaluation_sets/bea-dev.m2
PRED_FILE=YOUR_PRED_FILE.txt
TMP_FILE=YOUR_TMP_FILE.m2


python $ERRANT_SCORER/parallel_to_m2.py -orig $INPUT_FILE -cor $PRED_FILE -out $TMP_FILE
python $ERRANT_SCORER/compare_m2.py -hyp $TMP_FILE -ref $M2_FILE >> {{result}}
  1. Example of evaluation with m2scorer
M2_SCORER=path_to_m2scorer
M2_FILE=data/evaluation_sets/nucle14-2a.m2
PRED_FILE=YOUR_PRED_FILE.txt
$M2_SCORER $PRED_FILE $M2_FILE >> {{reslut}}

Citation

[to be updated once proceedings are published]

@misc{omelianchuk2024pillars,
      title={Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models}, 
      author={Kostiantyn Omelianchuk and Andrii Liubonko and Oleksandr Skurzhanskyi and Artem Chernodub and Oleksandr Korniienko and Igor Samokhin},
      year={2024},
      eprint={2404.14914},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}