Fine tuning a 1.5B model on a single 4090 GPU

This is a demonstration on using GRPO styled reinforcement learning to fine tune a model. Unsloth's implementation has made this incredibly simple. By using both PEFT, LORA (Q4) and pre-trained reasoning models, you can fine tune it for any reasonably difficult task for the model size.

In this demo I fine tuned it on GSM8k (a dataset it has seen before) but fine tuned it for a different answering format.

This was done on a single 4090 over ~30 hours. Results in 2500 steps (before I got kicked off UCL GPUs 🥹):

Training Log:

,

Huggingface Checkpoint Link

Installation

conda create -n grpo # note 1
pip install unsloth vllm matplotlib pandas huggingface_hub[cli]
huggingface login # note 2 

# note 1: unsloth requires python>=3.10
# note 2: if you want to upload model later remember to select specific repo to give write permissions to

Train and evaluate

Train

continued_distilled_grpo_training.py

Eval

python continued_distilled_grpo_test.py

Uploading models

model.push_to_hub('<repo>/<model>', commit_message=<commit_message>)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
results		results
LICENSE		LICENSE
README.md		README.md
continued_distilled_grpo_test.py		continued_distilled_grpo_test.py
continued_distilled_grpo_training.py		continued_distilled_grpo_training.py
reasoning_trace.md		reasoning_trace.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine tuning a 1.5B model on a single 4090 GPU

Training Log:

Huggingface Checkpoint Link

Installation

Train and evaluate

Train

Eval

Uploading models

About

Releases

Packages

Languages

License

Yeok-c/grpo-gsm8k-demo

Folders and files

Latest commit

History

Repository files navigation

Fine tuning a 1.5B model on a single 4090 GPU

Training Log:

Huggingface Checkpoint Link

Installation

Train and evaluate

Train

Eval

Uploading models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages