-
Notifications
You must be signed in to change notification settings - Fork 53
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Kelly Brown <[email protected]>
- Loading branch information
1 parent
99d4468
commit 1c95f93
Showing
1 changed file
with
114 additions
and
92 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,43 +5,55 @@ | |
![Release](https://img.shields.io/github/v/release/instructlab/training) | ||
![License](https://img.shields.io/github/license/instructlab/training) | ||
|
||
In order to simplify the process of fine-tuning models through the LAB | ||
method, this library provides a simple training interface. | ||
- [Installing](#installing-the-library) | ||
- [Additional Nvidia packages](#additional-nvidia-packages) | ||
- [Using the library](#using-the-library) | ||
- [Learning about the training arguments](#learning-about-training-arguments) | ||
- [`TrainingArgs`](#trainingargs) | ||
- [`DeepSpeedOptions`](#deepspeedoptions) | ||
- [`loraOptions`](#loraoptions) | ||
- [Learning about `TorchrunArgs` arguments](#learning-about-torchrunargs-arguments) | ||
- [Example training run with arguments](#example-training-run-with-arguments) | ||
|
||
## Installation | ||
To simplify the process of fine-tuning models with the [LAB | ||
method](https://arxiv.org/abs/2403.01081), this library provides a simple training interface. | ||
|
||
To get started with the library, you must clone this repo and install it from source via `pip`: | ||
## Installing the library | ||
|
||
```bash | ||
# clone the repo and switch to the directory | ||
git clone https://github.com/instructlab/training | ||
cd training | ||
To get started with the library, you must clone this repository and install it via `pip`. | ||
|
||
Install the library: | ||
|
||
# install the library | ||
pip install . | ||
```bash | ||
pip install instructlab-training | ||
``` | ||
|
||
For development, install it instead with `pip install -e .` instead | ||
to make local changes while using this library elsewhere. | ||
You can then install the library for development: | ||
Check failure on line 31 in README.md GitHub Actions / markdown-lintTrailing spaces
|
||
|
||
```bash | ||
pip install -e ./training | ||
``` | ||
|
||
### Installing Additional NVIDIA packages | ||
### Additional NVIDIA packages | ||
|
||
We make use of `flash-attn` and other packages which rely on NVIDIA-specific | ||
CUDA tooling to be installed. | ||
This library uses the `flash-attn` package as well as other packages, which rely on NVIDIA-specific CUDA tooling to be installed. | ||
If you are using NVIDIA hardware with CUDA, you need to install the following additional dependencies. | ||
|
||
If you are using NVIDIA hardware with CUDA, please install the additional dependencies via: | ||
Basic install | ||
|
||
```bash | ||
# for a regular install | ||
pip install .[cuda] | ||
``` | ||
|
||
Editable install (development) | ||
|
||
# or, for an editable install (development) | ||
```bash | ||
pip install -e .[cuda] | ||
``` | ||
|
||
## Usage | ||
## Using the library | ||
|
||
Using the library is fairly straightforward, import the necessary items, | ||
You can utilize this training library by importing the necessary items. | ||
|
||
```py | ||
from instructlab.training import ( | ||
|
@@ -52,65 +64,18 @@ from instructlab.training import ( | |
) | ||
``` | ||
|
||
Then, define the training arguments which will serve as the | ||
parameters for our training run: | ||
|
||
```py | ||
# define training-specific arguments | ||
training_args = TrainingArgs( | ||
# define data-specific arguments | ||
model_path = "ibm-granite/granite-7b-base", | ||
data_path = "path/to/dataset.jsonl", | ||
ckpt_output_dir = "data/saved_checkpoints", | ||
data_output_dir = "data/outputs", | ||
|
||
# define model-trianing parameters | ||
max_seq_len = 4096, | ||
max_batch_len = 60000, | ||
num_epochs = 10, | ||
effective_batch_size = 3840, | ||
save_samples = 250000, | ||
learning_rate = 2e-6, | ||
warmup_steps = 800, | ||
is_padding_free = True, # set this to true when using Granite-based models | ||
random_seed = 42, | ||
) | ||
``` | ||
|
||
We'll also need to define the settings for running a multi-process job | ||
via `torchrun`. To do this, create a `TorchrunArgs` object. | ||
|
||
> [!TIP] | ||
> Note, for single-GPU jobs, you can simply set `nnodes = 1` and `nproc_per_node=1`. | ||
```py | ||
torchrun_args = TorchrunArgs( | ||
nnodes = 1, # number of machines | ||
nproc_per_node = 8, # num GPUs per machine | ||
node_rank = 0, # node rank for this machine | ||
rdzv_id = 123, | ||
rdzv_endpoint = '127.0.0.1:12345' | ||
) | ||
``` | ||
|
||
Finally, you can just call `run_training` and this library will handle | ||
the rest 🙂. | ||
|
||
```py | ||
run_training( | ||
torchrun_args=torchrun_args, | ||
training_args=training_args, | ||
) | ||
You can then define various training arguments. They will serve as the parameters for your training runs. See: | ||
|
||
``` | ||
- [Learning about the training argument](#learning-about-training-arguments) | ||
- [Example training run with arguments](#example-training-run-with-arguments) | ||
|
||
### Customizing `TrainingArgs` | ||
## Learning about training arguments | ||
|
||
The `TrainingArgs` class provides most of the customization options | ||
for the training job itself. There are a number of options you can specify, such as setting | ||
DeepSpeed config values or running a LoRA training job instead of a full fine-tune. | ||
for training jobs. There are a number of options you can specify, such as setting | ||
`DeepSpeed` config values or running a `LoRA` training job instead of a full fine-tune. | ||
|
||
Here is a breakdown of the general options: | ||
### `TrainingArgs` | ||
|
||
| Field | Description | | ||
| --- | --- | | ||
|
@@ -132,17 +97,31 @@ Here is a breakdown of the general options: | |
| deepspeed_options | Config options to specify for the DeepSpeed optimizer. | | ||
| lora | Options to specify if you intend to perform a LoRA train instead of a full fine-tune. | | ||
|
||
#### `DeepSpeedOptions` | ||
### `DeepSpeedOptions` | ||
|
||
We only currently support a few options in `DeepSpeedOptions`: | ||
This library only currently support a few options in `DeepSpeedOptions`: | ||
The default is to run with DeepSpeed, so these options only currently | ||
allow you to customize aspects of the ZeRO stage 2 optimizer. | ||
|
||
| Field | Description | | ||
| --- | --- | | ||
| cpu_offload_optimizer | Whether or not to do CPU offloading in DeepSpeed stage 2. | | ||
|
||
#### `loraOptions` | ||
For more information about DeepSpeed, see [deepspeed.ai](https://www.deepspeed.ai/) | ||
|
||
### `loraOptions` | ||
|
||
LoRA options currently supported: | ||
|
||
| Field | Description | | ||
| --- | --- | | ||
| rank | The rank parameter for LoRA training. | | ||
| alpha | The alpha parameter for LoRA training. | | ||
| dropout | The dropout rate for LoRA training. | | ||
| target_modules | The list of target modules for LoRA training. | | ||
| quantize_data_type | The data type for quantization in LoRA training. Valid options are `None` and `"nf4"` | | ||
|
||
#### Example run with LoRa options | ||
|
||
If you'd like to do a LoRA train, you can specify a LoRA | ||
option to `TrainingArgs` via the `LoraOptions` object. | ||
|
@@ -160,23 +139,12 @@ training_args = TrainingArgs( | |
) | ||
``` | ||
|
||
Here is the definition for what we currently support today: | ||
|
||
| Field | Description | | ||
| --- | --- | | ||
| rank | The rank parameter for LoRA training. | | ||
| alpha | The alpha parameter for LoRA training. | | ||
| dropout | The dropout rate for LoRA training. | | ||
| target_modules | The list of target modules for LoRA training. | | ||
| quantize_data_type | The data type for quantization in LoRA training. Valid options are `None` and `"nf4"` | | ||
|
||
### Customizing `TorchrunArgs` | ||
### Learning about `TorchrunArgs` arguments | ||
|
||
When running the training script, we always invoke `torchrun`. | ||
|
||
If you are running a single-GPU system or something that doesn't | ||
otherwise require distributed training configuration, you can | ||
just create a default object: | ||
otherwise require distributed training configuration, you can create a default object: | ||
|
||
```python | ||
run_training( | ||
|
@@ -188,12 +156,14 @@ run_training( | |
``` | ||
|
||
However, if you want to specify a more complex configuration, | ||
we currently expose all of the options that [torchrun accepts | ||
the library currently supports all the options that [torchrun accepts | ||
today](https://pytorch.org/docs/stable/elastic/run.html#definitions). | ||
|
||
> ![NOTE] | ||
> [!NOTE] | ||
> For more information about the `torchrun` arguments, please consult the [torchrun documentation](https://pytorch.org/docs/stable/elastic/run.html#definitions). | ||
#### Example training run with `TorchrunArgs` arguments | ||
|
||
For example, in a 8-GPU, 2-machine system, we would | ||
specify the following torchrun config: | ||
|
||
|
@@ -236,3 +206,55 @@ run_training( | |
train_args=training_args | ||
) | ||
``` | ||
|
||
## Example training run with arguments | ||
|
||
Define the training arguments which will serve as the | ||
parameters for our training run: | ||
|
||
```py | ||
# define training-specific arguments | ||
training_args = TrainingArgs( | ||
# define data-specific arguments | ||
model_path = "ibm-granite/granite-7b-base", | ||
data_path = "path/to/dataset.jsonl", | ||
ckpt_output_dir = "data/saved_checkpoints", | ||
data_output_dir = "data/outputs", | ||
|
||
# define model-trianing parameters | ||
max_seq_len = 4096, | ||
max_batch_len = 60000, | ||
num_epochs = 10, | ||
effective_batch_size = 3840, | ||
save_samples = 250000, | ||
learning_rate = 2e-6, | ||
warmup_steps = 800, | ||
is_padding_free = True, # set this to true when using Granite-based models | ||
random_seed = 42, | ||
) | ||
``` | ||
|
||
We'll also need to define the settings for running a multi-process job | ||
via `torchrun`. To do this, create a `TorchrunArgs` object. | ||
|
||
> [!TIP] | ||
> Note, for single-GPU jobs, you can simply set `nnodes = 1` and `nproc_per_node=1`. | ||
```py | ||
torchrun_args = TorchrunArgs( | ||
nnodes = 1, # number of machines | ||
nproc_per_node = 8, # num GPUs per machine | ||
node_rank = 0, # node rank for this machine | ||
rdzv_id = 123, | ||
rdzv_endpoint = '127.0.0.1:12345' | ||
) | ||
``` | ||
|
||
Finally, you can just call `run_training` and this library will handle | ||
the rest 🙂. | ||
|
||
```py | ||
run_training( | ||
torchrun_args=torchrun_args, | ||
training_args=training_args, | ||
) |