This repo contains some simple Python code (based upon HuggingFace) for instruction tuning common LLMs with LoRA/QLoRA. The repo contains training code, as well as several different scripts for evaluating model generations.
Setup • Details • Usage • Future Work
Install necessary dependencies as follows
> conda create -n lora_tuning python=3.11 anaconda
> conda activate lora_tuning
> pip install -r requirements.txt
The repo support instruction tuning with LoRA and QLoRA, based upon the PEFT (from HuggingFace) and bitsandbytes.
Currently, the example scripts instruction tune the Mistral-7B model, though other models can be specified via the --model_name_or_path
argument.
A breakdown of the main files within the resposity is as follows...
File Description train.py Main training code generate.py Script for examining model output setup.py Functions for downloading and configuring models/tokenizers data.py Code for configuring datasets ./scripts Scripts for training/evaluation See all scripts...
- train.sh: run instruction tuning (2x3090 GPUs)
- generate.sh: examine model outputs
./data Supplemental data files See all files...
- vicuna_questions.json: evaluation questions from Vicuna
The training process supports either the Alpaca or Assistant Chatbot dataset.
Evaluation is performed using the set of questions proposed for evaluating Vicuna (see here).
However, model outputs can be observed over arbitrary datasets by leveraging the generate.py
script.
The training process logs all metrics to wandb (assuming --report_to wandb
is specified in the arguments), as well as generates model outputs for the vicuna evaluation set that are logged to wandb at the end of training.
Example scripts are located in the ./scripts
folder and can be run as follows:
> bash ./scripts/train.sh
> bash ./scripts/generate.sh
These scripts can also be customized by tweaking their arguments. See args.py for a full list of arguments for the model, training, data, and generation.
This repository is very simplistic for now. Future efforts will likely include:
- Expansion to more datasets (for training and evaluation)
- Implementing an LLM-as-a-judge style evaluation pipeline
- Adding evaluation on MMLU
- Try out LoRA+ with different learning rates for A + B matrices