Skip to content

jiiiisoo/AIC-kpmg2023

 
 

Repository files navigation

image

AIC-kpmg2023

Leveraging pretrained models from KoELECTRA and adapting to train on the KorQuAD 2.1 dataset. Specifically,

  • We added data preprocessing
  • We modified the transformer to fit the KorQuAD 2.1 dataset
  • We implemented the sliding window in long context to improve accuracy
  • We created our own Q&A datasets on business report and used them for training

If you want to see backend and frontend of AIC, see AIC-BE / AIC-FE

Preparation

Data Preprocessing

To eliminate unnecessary html tags from data files, run:

python tag_remover.py --task korquad --config_file koelectra-base-v3.json

Training/Validation

You can just clone the KoELECTRA repo into your own computer. Then, overwrite our files in the KoELECTR/finetune directory.

To train this model run:

python run_squad.py --task korquad --config_file koelectra-base-v3.json

To validate this model run:

python run_squad.py --task korquad --config_file koelectra-base-v3_test.json

Making Custom QA Dataset

Making custom dataset in the form of KorQuAD 2.1 form target files

python make_custom_dataset.py --data_dir {directory containing html files} --name 정빈

use name for distinguishing people when more than one are making dataset. (for unique id)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%