AIC-kpmg2023

Leveraging pretrained models from KoELECTRA and adapting to train on the KorQuAD 2.1 dataset. Specifically,

We added data preprocessing
We modified the transformer to fit the KorQuAD 2.1 dataset
We implemented the sliding window in long context to improve accuracy
We created our own Q&A datasets on business report and used them for training

If you want to see backend and frontend of AIC, see AIC-BE / AIC-FE

Preparation

The koelectra finetuning is performed by referring to this link
The transformer can be directly used through this huggingface link
You can download the KorQuAD 2.1 dataset in this link

Data Preprocessing

To eliminate unnecessary html tags from data files, run:

python tag_remover.py --task korquad --config_file koelectra-base-v3.json

Training/Validation

You can just clone the KoELECTRA repo into your own computer. Then, overwrite our files in the KoELECTR/finetune directory.

To train this model run:

python run_squad.py --task korquad --config_file koelectra-base-v3.json

To validate this model run:

python run_squad.py --task korquad --config_file koelectra-base-v3_test.json

Making Custom QA Dataset

Making custom dataset in the form of KorQuAD 2.1 form target files

python make_custom_dataset.py --data_dir {directory containing html files} --name 정빈

use name for distinguishing people when more than one are making dataset. (for unique id)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
config		config
data		data
Korquad.py		Korquad.py
README.md		README.md
evaluate.py		evaluate.py
inference.py		inference.py
make_custom_data.py		make_custom_data.py
run_squad.py		run_squad.py
tag_remover.py		tag_remover.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIC-kpmg2023

Preparation

Data Preprocessing

Training/Validation

Making Custom QA Dataset

About

Releases

Packages

Languages

jiiiisoo/AIC-kpmg2023

Folders and files

Latest commit

History

Repository files navigation

AIC-kpmg2023

Preparation

Data Preprocessing

Training/Validation

Making Custom QA Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages