Update README.md

UCDvision · Apr 30, 2024 · 58629e7 · 58629e7
1 parent adffb34
commit 58629e7
Showing 1 changed file with 5 additions and 169 deletions.
diff --git a/gpt/README.md b/gpt/README.md
@@ -1,40 +1,7 @@
-# Adapting GPT-2 using LoRA
-
-This folder contains the implementation of LoRA in GPT-2 using the Python package `lora` and steps to replicate the results in our recent paper
-
-**LoRA: Low-Rank Adaptation of Large Language Models** <br>
-*Edward J. Hu\*, Yelong Shen\*, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen* <br>
-Paper: https://arxiv.org/abs/2106.09685 <br>
-
-<p>
-<img src="figures/LoRA_GPT2.PNG" width="800" >
-</p>
-
-This repo reproduces our experiments on GPT-2.
-
-## Repository Overview
-
-Our implementation is based on the fine-tuning code for GPT-2 in [Hugging Face](https://huggingface.co/).
-There are several directories in this repo:
-* [src/](src) contains the source code used for data processing, training, and decoding.
-* [eval/](eval) contains the code for task-specific evaluation scripts.
-* [data/](data) contains the raw data we used in our experiments.
-* [vocab/](vocab) contains the GPT-2 vocabulary files.
-
-## Getting Started
-
- 1. You can start with the following docker image: `nvcr.io/nvidia/pytorch:20.03-py3` on a GPU-capable machine, but any generic PyTorch image should work.
- ```
- docker run -it nvcr.io/nvidia/pytorch:20.03-py3
- ```
-
- 2. Clone the repo and install dependencies in a virtual environment (remove sudo if running in docker container):
+# NOLA: Large Language Model Finetuning (GPT-2)
+We modified [LoRA](https://github.com/microsoft/LoRA/tree/main) code and added NOLA to LoRA model. Please install dependencies in a virtual environment: 
+
  ```
- sudo apt-get update
- sudo apt-get -y install git jq virtualenv
- git clone https://github.com/microsoft/LoRA.git; cd LoRA
- virtualenv -p `which python3` ./venv
- . ./venv/bin/activate
  pip install -r requirement.txt
  bash download_pretrained_checkpoints.sh
  bash create_datasets.sh
@@ -43,140 +10,9 @@ There are several directories in this repo:
  cd ..
  ```
 
-#### Now we are ready to replicate the results in our paper.
-
-## Replicating Our Result on E2E
-
-1. Train GPT-2 Medium with LoRA (see our paper for hyperparameters for GPT-2 Medium)
-```
-python -m torch.distributed.launch --nproc_per_node=1 src/gpt2_ft.py \
-    --train_data ./data/e2e/train.jsonl \
-    --valid_data ./data/e2e/valid.jsonl \
-    --train_batch_size 8 \
-    --grad_acc 1 \
-    --valid_batch_size 4 \
-    --seq_len 512 \
-    --model_card gpt2.md \
-    --init_checkpoint ./pretrained_checkpoints/gpt2-medium-pytorch_model.bin \
-    --platform local \
-    --clip 0.0 \
-    --lr 0.0002 \
-    --weight_decay 0.01 \
-    --correct_bias \
-    --adam_beta2 0.999 \
-    --scheduler linear \
-    --warmup_step 500 \
-    --max_epoch 5 \
-    --save_interval 1000 \
-    --lora_dim 4 \
-    --lora_alpha 32 \
-    --lora_dropout 0.1 \
-    --label_smooth 0.1 \
-    --work_dir ./trained_models/GPT2_M/e2e \
-    --random_seed 110
-```
-
-2. Generate outputs from the trained model using beam search:
-```
-python -m torch.distributed.launch --nproc_per_node=1 src/gpt2_beam.py \
-    --data ./data/e2e/test.jsonl \
-    --batch_size 1 \
-    --seq_len 512 \
-    --eval_len 64 \
-    --model_card gpt2.md \
-    --init_checkpoint ./trained_models/GPT2_M/e2e/model.26289.pt \
-    --platform local \
-    --lora_dim 4 \
-    --lora_alpha 32 \
-    --beam 10 \
-    --length_penalty 0.8 \
-    --no_repeat_ngram_size 4 \
-    --repetition_penalty 1.0 \
-    --eos_token_id 628 \
-    --work_dir ./trained_models/GPT2_M/e2e \
-    --output_file predict.26289.b10p08r4.jsonl
-```
+Next, please use the corresponding scripts to fine-tune LLaMA models with NOLA. For instance, to fine-tune GPT-2 Large on the E2E dataset, execute the following script.
 
-3. Decode outputs from step (2)
 ```
-python src/gpt2_decode.py \
-    --vocab ./vocab \
-    --sample_file ./trained_models/GPT2_M/e2e/predict.26289.b10p08r4.jsonl \
-    --input_file ./data/e2e/test_formatted.jsonl \
-    --output_ref_file e2e_ref.txt \
-    --output_pred_file e2e_pred.txt
+bash scripts/finetune_gpt2l_e2e_qv_nola.sh
 ```
 
-4. Run evaluation on E2E test set
-
-```
-python eval/e2e/measure_scores.py e2e_ref.txt e2e_pred.txt -p
-```
-
-## Replicating Our Result on WebNLG
-
-1. Follow steps 1 and 2 from E2E pipeline by replacing references to E2E with webnlg (see our paper for hyperparameters)
-
-2. Decode outputs from beam search (step 2 above)
-```
-python src/gpt2_decode.py \
-    --vocab ./vocab \
-    --sample_file ./trained_models/GPT2_M/webnlg/predict.20000.b10p08.jsonl \
-    --input_file ./data/webnlg_challenge_2017/test_formatted.jsonl \
-    --ref_type webnlg \
-    --ref_num 6 \
-    --output_ref_file eval/GenerationEval/data/references_webnlg \
-    --output_pred_file eval/GenerationEval/data/hypothesis_webnlg \
-    --tokenize --lower
-```
-
-3. Run evaluation on WebNLG test set
-```
-cd ./eval/GenerationEval/
-python eval.py \
-    -R data/references_webnlg/reference \
-    -H data/hypothesis_webnlg \
-    -nr 6 \
-    -m bleu,meteor,ter 
-cd ../..
-```
-
-## Replicating Our Result on DART
-
-1. Follow steps 1 and 2 from E2E pipeline by replacing references to E2E with dart (see our paper for hyperparameters)
-
-2. Decode outputs from beam search (step 2 above)
-```
-python src/gpt2_decode.py \
-        --vocab ./vocab \
-        --sample_file ./trained_models/GPT2_M/dart/predict.20000.b10p08.jsonl \
-        --input_file ./data/dart/test_formatted.jsonl \
-        --ref_type dart \
-        --ref_num 6 \
-        --output_ref_file eval/GenerationEval/data/references_dart \
-        --output_pred_file eval/GenerationEval/data/hypothesis_dart \
-        --tokenize --lower
-```
-
-3. Run evaluation on Dart test set
-```
-cd ./eval/GenerationEval/
-python eval.py \
-    -R data/references_dart/reference \
-    -H data/hypothesis_dart \
-    -nr 6 \
-    -m bleu,meteor,ter 
-cd ../..
-```
-
-## Citation
-```
-@misc{hu2021lora,
-    title={LoRA: Low-Rank Adaptation of Large Language Models},
-    author={Hu, Edward and Shen, Yelong and Wallis, Phil and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Lu and Chen, Weizhu},
-    year={2021},
-    eprint={2106.09685},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```