Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implemention of disaggregated attention and qkv projection #1433

Closed
wants to merge 226 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
226 commits
Select commit Hold shift + click to select a range
cbfd652
.
jiazhihao Sep 27, 2023
f53a67a
Merge branch 'inference' of https://github.com/flexflow/FlexFlow into…
jiazhihao Sep 27, 2023
60702fc
format
jiazhihao Sep 27, 2023
8360de0
Merge branch 'inference' into peft
goliaro Sep 29, 2023
102745a
resolve merge conflict
jiazhihao Oct 2, 2023
eaf42a4
Merge branch 'peft' of https://github.com/flexflow/FlexFlow into peft
jiazhihao Oct 2, 2023
da9ce1b
implement LoraLinear
jiazhihao Oct 3, 2023
66230bd
add missing files
jiazhihao Oct 3, 2023
f0d1155
format
jiazhihao Oct 3, 2023
00f926b
Merge branch 'inference' into peft
jiazhihao Oct 3, 2023
fb203cc
LoraLinear now takes two inputs and generates one output
jiazhihao Oct 4, 2023
c3d9c38
LoRA forward pass works now
jiazhihao Oct 4, 2023
c4cfcc3
[LoraLinear] update to allocate weight through per-GPU PEFTWeightAllo…
jiazhihao Oct 7, 2023
8b98c45
Merge branch 'inference' into peft
jiazhihao Oct 7, 2023
ea8920b
add API for registering PEFT models
jiazhihao Oct 8, 2023
0e09ac1
Merge branch 'peft' of https://github.com/flexflow/FlexFlow into peft
jiazhihao Oct 8, 2023
44cc16b
bug fix
jiazhihao Oct 8, 2023
29e5547
format
jiazhihao Oct 8, 2023
dfd1c9a
add reserved work space for peft activations and weights
jiazhihao Oct 8, 2023
bb76f75
Merge branch 'inference' into peft
goliaro Oct 10, 2023
e6f671d
fix merge conflicts, implement layernorm peft_bwd
goliaro Oct 11, 2023
207b127
cleanup
goliaro Oct 11, 2023
231e244
rms backward
goliaro Oct 11, 2023
416c322
rms peft
goliaro Oct 11, 2023
f72067a
add LoraLinearConfig
jiazhihao Oct 11, 2023
49e5664
add an API for register peft request
jiazhihao Oct 11, 2023
367bfa5
resolve merge conflict
jiazhihao Oct 15, 2023
008ffd9
format
jiazhihao Oct 15, 2023
2e0aa76
resolve merge conflict
jiazhihao Oct 16, 2023
ace7e3f
.
jiazhihao Oct 17, 2023
6bbb81e
variable renaming
jiazhihao Oct 17, 2023
b6735f9
checkpoint
jiazhihao Oct 17, 2023
0978844
Merge branch 'inference' into peft
goliaro Oct 17, 2023
6a0d51b
resolve merge conflict
jiazhihao Oct 17, 2023
54084c4
resolve conflict
jiazhihao Oct 17, 2023
91f849e
Merge branch 'peft' of https://github.com/flexflow/FlexFlow into peft
jiazhihao Oct 17, 2023
a44e33d
add missing functions
jiazhihao Oct 18, 2023
4d55b40
remove OpMeta(FFhandler) constructor
jiazhihao Oct 18, 2023
eb14798
residual rms norm backward
goliaro Oct 18, 2023
e7fa9ce
cleanup
goliaro Oct 18, 2023
5ca5e49
Merge branch 'inference' of https://github.com/flexflow/FlexFlow into…
jiazhihao Oct 18, 2023
5f7f710
bug fix
jiazhihao Oct 18, 2023
7b2bd08
finished peft bwd for residual rms norm
goliaro Oct 19, 2023
d2f177d
sigmoid_silu_multi backward and peft_bwd
goliaro Oct 19, 2023
8b1f76b
hip_rocm update
goliaro Oct 19, 2023
84c391b
support peft_bwd for fused layers
jiazhihao Oct 20, 2023
1cc723e
format
jiazhihao Oct 20, 2023
f1d5dc0
residual layer norm bwd / peft_bwd
goliaro Oct 21, 2023
3b50e17
fix typo
goliaro Oct 22, 2023
bdb590b
add_bias_residual_layer_norm backward and peft_bwd
goliaro Oct 22, 2023
60c0418
implement IncMHA peft_bwd
jiazhihao Oct 22, 2023
1ce2f27
Merge branch 'peft' of https://github.com/flexflow/FlexFlow into peft
jiazhihao Oct 22, 2023
d6d39ce
resolve merge conflict
jiazhihao Oct 22, 2023
b763ac2
Merge branch 'peft' of https://github.com/flexflow/FlexFlow into peft
jiazhihao Oct 22, 2023
509c54c
several bug fixes
jiazhihao Oct 22, 2023
bc9f538
[rms_norm] do not compute non-peft-bwd tokens in peft-bwd
jiazhihao Oct 23, 2023
d8e92e9
.
jiazhihao Oct 23, 2023
0a512d2
.
jiazhihao Oct 24, 2023
4ee710a
Update the default cublas behavior when CUDA_VERSION is not specified
jiazhihao Oct 24, 2023
2adca3a
Merge branch 'fix_cublas_default' of https://github.com/flexflow/Flex…
jiazhihao Oct 24, 2023
464424e
fix bugs in IncMHA peft_bwd kernel
jiazhihao Oct 24, 2023
82d6e58
resolve merge conflict
jiazhihao Oct 24, 2023
45c1e01
uncomment softmaxbackward
jiazhihao Oct 24, 2023
07636e8
add layernorm to align test
goliaro Oct 24, 2023
28a5e84
add peft test scripts
goliaro Oct 24, 2023
dd94370
fix import
goliaro Oct 24, 2023
3c01328
fix
goliaro Oct 24, 2023
fa56364
add code to convert peft models
goliaro Oct 26, 2023
a484100
add script to download peft for c++, fix bug
goliaro Oct 26, 2023
c83c376
fix
goliaro Oct 26, 2023
aa9f004
add script to fine-tune models
goliaro Oct 27, 2023
4609e9e
implement loading lora configs/weights from file
goliaro Oct 31, 2023
17fa6f3
remove peft_bwd assertion failure in embedding
goliaro Oct 31, 2023
cdc12e6
fix download script
goliaro Oct 31, 2023
eb9e2b8
add peft dependencies in dockerfile
goliaro Oct 31, 2023
3dfa14d
fix softmax backward
goliaro Oct 31, 2023
78523e8
fix bc print indentation
goliaro Nov 1, 2023
bf78ea4
Temporarily Revert "Update the default cublas behavior when CUDA_VERS…
goliaro Nov 2, 2023
b9e7f60
Fix cublas default (#1220)
goliaro Nov 2, 2023
463c757
fix bugs, work on align opt-lora
goliaro Nov 3, 2023
1c231ba
Merge branch 'inference' into peft
goliaro Nov 6, 2023
7c65521
update scripts
goliaro Nov 6, 2023
f4b3f8f
add code to output peft tensors in hf
goliaro Nov 6, 2023
9e5fea9
update, fixes
goliaro Nov 7, 2023
62edfaa
linting
goliaro Nov 7, 2023
ddb5c29
fix printing of tensors for numpy
goliaro Nov 7, 2023
d276496
update save_inference_tensors_to_file
goliaro Nov 8, 2023
bc79d3b
linting
goliaro Nov 8, 2023
8e34632
update
goliaro Nov 8, 2023
b11c5e9
fix issue with save_inference_tensors_to_file
goliaro Nov 8, 2023
fca16cc
fix layer names for save_inference_tensors_to_file
goliaro Nov 8, 2023
9095f2b
fix peft
goliaro Nov 9, 2023
9769604
fix bwd bugs
goliaro Nov 10, 2023
880ede8
linting
goliaro Nov 10, 2023
818375d
fixes
goliaro Nov 10, 2023
2990e20
fix
goliaro Nov 10, 2023
6959e68
fix
goliaro Nov 10, 2023
266368c
fix
goliaro Nov 10, 2023
06775bd
add bc fields for peft training
goliaro Nov 10, 2023
ca879e2
merge conflicts
goliaro Nov 10, 2023
9f60177
linting
goliaro Nov 10, 2023
9442b62
fix
goliaro Nov 10, 2023
11eccb1
remove ptr check
goliaro Nov 10, 2023
9bfc557
fix
goliaro Nov 10, 2023
bcfae08
implement save_operators for bwd
goliaro Nov 12, 2023
d86272c
fix bug
goliaro Nov 13, 2023
0a3258a
implement save tensors for bwd
goliaro Nov 13, 2023
e34c405
.
goliaro Nov 15, 2023
87fbada
bug fix
goliaro Nov 15, 2023
52759bd
fix
goliaro Nov 15, 2023
2a5371d
align linear
goliaro Nov 15, 2023
ed0be61
fix
goliaro Nov 16, 2023
8a0b6ea
bwd kernel updates
goliaro Nov 17, 2023
b0e686d
undo use of CUBLAS_COMPUTE_32F_FAST_16F for now
goliaro Nov 17, 2023
0daf232
only send dataset entry once
goliaro Nov 19, 2023
ec131c7
update peft test scripts
goliaro Nov 20, 2023
0431c73
loss
xinhaoc Nov 20, 2023
371dffd
.
xinhaoc Nov 20, 2023
da690ff
update generate/request api to take both inference and fine-tuning pr…
goliaro Nov 21, 2023
1e5bb72
linting
goliaro Nov 21, 2023
f3ff40b
alignment fixes in lora & linear layer
goliaro Nov 21, 2023
7efd3a7
alignment fix
goliaro Nov 21, 2023
b6fe334
diagonal
xinhaoc Nov 22, 2023
bcf8b19
fix
goliaro Nov 22, 2023
4bfee96
alignment fix ssm
goliaro Nov 22, 2023
efd1976
sigmoid-silu-multi now fully aligned
goliaro Nov 24, 2023
7ae195a
rms norm kernel updates
goliaro Nov 24, 2023
7030814
fix
goliaro Nov 26, 2023
eb3b6ab
in-place residual rms
goliaro Nov 26, 2023
9f26cc1
Merge branch 'inference' into peft
goliaro Nov 27, 2023
a122e30
bug fix and linting
goliaro Nov 28, 2023
53e737b
align backward of o_proj, attn_heads, qk_prods_softmax, and v_proj wi…
goliaro Nov 30, 2023
edc02af
cleanup
goliaro Nov 30, 2023
f00c7e0
finished all alignment fixes in attention backward kernel
goliaro Nov 30, 2023
3955b0b
fix
goliaro Nov 30, 2023
c534638
Update inc_multihead_self_attention.cu
goliaro Dec 3, 2023
fd956c9
Update inc_multihead_self_attention.cu
goliaro Dec 4, 2023
d9b154f
Merge branch 'inference' into peft
goliaro Dec 4, 2023
3a34c88
use grad to store peft in/output (#1241)
xinhaoc Dec 6, 2023
94230d9
format
jiazhihao Dec 6, 2023
b985cc9
enable peft request
jiazhihao Dec 6, 2023
b9c3926
several hacks for performance measurement; some of the changes should…
jiazhihao Dec 6, 2023
4d5c3e0
Update sigmoid_silu_multi.cu
goliaro Dec 16, 2023
7bf863a
RoPE backward
goliaro Dec 18, 2023
960654e
PEFT bug fixes and alignment (#1269)
goliaro Jan 10, 2024
2028900
Fuse bias + relu in OPT (#1271)
goliaro Jan 10, 2024
3bbde56
fix
goliaro Jan 10, 2024
2ebd7f4
fix
goliaro Jan 17, 2024
1b2018b
fix
goliaro Jan 17, 2024
bc61e9d
Peft alignment & debugging tools (#1288)
goliaro Jan 27, 2024
32f0a15
fix legion aliasing error
goliaro Jan 27, 2024
c97f63a
fix warnings
goliaro Jan 27, 2024
3d5a37c
fix
goliaro Jan 27, 2024
571f0d3
fix pipeline parallelism
goliaro Jan 29, 2024
f4a10f3
fix tp issue in combine op
goliaro Jan 29, 2024
ca683f7
fix lora weight loading with tensor parallelism
goliaro Jan 29, 2024
378bdb5
fixes, implement Combine::peft_bwd_task
goliaro Jan 29, 2024
afdae45
fix
goliaro Jan 29, 2024
5660f55
replicate peft bwd
goliaro Jan 29, 2024
a9bacd3
fixes
goliaro Jan 30, 2024
f3a97ff
fix
goliaro Jan 31, 2024
e0a58bb
fix combine and fwd-bwd pass dependencies
goliaro Jan 31, 2024
50fc13d
fix replicate bwd
goliaro Jan 31, 2024
f2c9a05
fix
goliaro Feb 1, 2024
cd68f5d
let user control amount of peft memory
goliaro Feb 3, 2024
64a59d8
only run peft_bwd if peft is enabled
goliaro Feb 3, 2024
32a0716
fix rms norm inference region reqs
goliaro Feb 6, 2024
a37b173
fix in-place fusion (part 1)
goliaro Feb 7, 2024
85f4d40
fix inplace fusion (part 2)
goliaro Feb 7, 2024
bb56a99
fix
goliaro Feb 7, 2024
63f1fce
disable automatic inplace rms norm for now
goliaro Feb 7, 2024
0d3aa7e
fix inf fusion inplace
goliaro Feb 8, 2024
b658061
fix rest input grads for peft without inplace residuals
goliaro Feb 9, 2024
3255fe4
fix
goliaro Feb 9, 2024
ec2002e
fix
goliaro Feb 15, 2024
098e880
fix residual rms
goliaro Feb 16, 2024
5688e16
fix
goliaro Feb 16, 2024
9225e0c
fix
goliaro Feb 16, 2024
e12bff1
enable inf debugging in fusion bwd
goliaro Feb 19, 2024
ed9afb7
hack to silence warning in fused bwd
goliaro Feb 19, 2024
96d0e9b
fix
goliaro Feb 19, 2024
fcbeea0
Merge branch 'inference' into peft
goliaro Feb 19, 2024
2cbc0b7
fix
goliaro Feb 19, 2024
36cb2b3
fix build
goliaro Feb 19, 2024
21b77f1
fix
goliaro Feb 19, 2024
9075d3f
fix
goliaro Feb 19, 2024
0b35b0c
add draft peft test
goliaro Mar 22, 2024
b6ada2f
Peft python interface (#1306)
goliaro Mar 27, 2024
29fcda7
Merge branch 'inference' into peft
goliaro Apr 8, 2024
0ed889a
fix
goliaro Apr 8, 2024
48c431a
update
goliaro Apr 11, 2024
40649ee
fix
goliaro Apr 12, 2024
0580d7e
fix to support prompts larger than max tokens per batch
goliaro Apr 13, 2024
0affe27
fixes to support benchmarking of finetuning throughput
goliaro Apr 14, 2024
d7ebeaf
many upgrades and updates related to finetuning
goliaro Apr 15, 2024
33e873d
add ttft statistics
goliaro Apr 15, 2024
2f92a65
add warmup phase
goliaro Apr 15, 2024
b1e97b1
add benchmarking code
goliaro Apr 16, 2024
e35ebb2
Add scripts for evaluation with Microsoft Azure trace (#1363)
Flechman Apr 17, 2024
f3f6226
Merge branch 'inference' into peft
goliaro Apr 24, 2024
b33f10f
fix
goliaro Apr 25, 2024
97562d6
fix
goliaro May 1, 2024
985c254
add peft tests to ci
goliaro May 1, 2024
33dbd3d
Merge branch 'inference' into peft
goliaro May 1, 2024
f033b4e
shellcheck
goliaro May 8, 2024
1011927
fix
goliaro May 9, 2024
9064c2b
fix python requirements
goliaro May 9, 2024
a125e86
fix
goliaro May 10, 2024
d74fe53
fix
goliaro May 11, 2024
0c6ae09
update ci test
goliaro May 17, 2024
93b6032
update alignment doc
goliaro May 17, 2024
9546239
fix cross entropy loss bug
goliaro May 19, 2024
ff4b703
update alignment test
goliaro May 19, 2024
b613666
update test
goliaro May 20, 2024
dde0b61
add llama peft alignment test to ci
goliaro May 20, 2024
1a31b65
Fix values for unused params in incr_decoding
Flechman May 24, 2024
7e3d111
Add PEFTModelID NO_ID singleton instead of None
Flechman May 24, 2024
079ba59
Fix PEFTModelID::NO_ID reference
Flechman May 25, 2024
f464eb8
reduce logging
goliaro May 25, 2024
8d89acd
fix
goliaro May 26, 2024
33c0fef
fix
goliaro May 29, 2024
6727d3a
Add peft demo
Flechman Jun 11, 2024
6d7c245
Add readme for demo
Flechman Jun 11, 2024
511fd64
fix alignment issue
goliaro Jun 20, 2024
2899ba2
Initial implemention of disaggregated attention and qkv projection
yingchen21 Jul 9, 2024
94e1563
fixed filename problem from renaming weight file
yingchen21 Jul 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/gpu-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,10 @@ jobs:
# Inference tests
source ./build/set_python_envs.sh
./tests/inference_tests.sh

# PEFT tests
./tests/peft_tests.sh
python ./tests/peft/peft_alignment_test.py

- name: Save inference output as an artifact
if: always()
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -187,4 +187,9 @@ gpt_tokenizer
python/flexflow/version.txt

inference_tensors
hf_peft_tensors
lora_training_logs

Untitled-1.ipynb
Untitled-2.ipynb
tests/inference/python_test_configs/*.json
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -558,6 +558,7 @@ if(NOT BUILD_LEGION_ONLY)
if(FF_BUILD_ALL_INFERENCE_EXAMPLES OR FF_BUILD_ALL_EXAMPLES)
add_subdirectory(inference/spec_infer)
add_subdirectory(inference/incr_decoding)
add_subdirectory(inference/peft)
endif()


Expand Down
7 changes: 7 additions & 0 deletions conda/flexflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,10 @@ dependencies:
- sentencepiece
- einops
- requests
- scipy
- bitsandbytes
- datasets
- accelerate
- loralib
- triton
- peft
2 changes: 2 additions & 0 deletions docker/flexflow-environment/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@ RUN conda install -c conda-forge cmake make pillow cmake-build-extension pybind1
RUN conda install pytorch torchvision torchaudio -c pytorch
RUN conda install -c conda-forge onnx transformers>=4.31.0 sentencepiece einops
RUN pip3 install tensorflow notebook
# PEFT-related
RUN pip3 install scipy bitsandbytes datasets accelerate loralib triton peft

# Install Rust
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y
Expand Down
27 changes: 23 additions & 4 deletions include/flexflow/batch_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#pragma once

#include "flexflow/ffconst.h"
#include "flexflow/fftype.h"
#include "legion.h"
#include <cstddef>
#include <cstdlib>
Expand Down Expand Up @@ -43,6 +44,8 @@ class BatchConfig {
BatchConfig();
int num_active_requests() const;
int num_active_tokens() const;
int num_active_infr_tokens() const;
int num_active_peft_tokens() const;
static int max_requests_per_batch();
static int max_tokens_per_batch();
static int max_verify_tokens_per_batch();
Expand All @@ -56,26 +59,41 @@ class BatchConfig {
// Maximum possible values for different parameters
// These maximum values are used for copying BatchConfig
// across workers
static int const MAX_NUM_REQUESTS = 64;
static int const MAX_NUM_REQUESTS = 65;
static int const MAX_NUM_TOKENS = 1024;
static int const MAX_SPEC_TREE_TOKEN_NUM = 64;

// Set by update
int num_tokens;

int num_tokens = 0, num_peft_tokens = 0, num_peft_label_tokens = 0;
// number of tokens in prompt phase, start offset of tokens in inc_decoding
// phase. num_tokens - num_prompt_tokens = num_generation_tokens;
int num_generation_tokens;
int num_generation_tokens = 0;

struct PerRequestInfo {
PerRequestInfo() {
first_token_depth_in_request = 0;
first_token_offset_in_batch = 0;
num_tokens_in_batch = 0;
max_sequence_length = 0;
request_guid = 0;
prompt_phase = false;
batch_config_request_id = -1;
peft_model_id = PEFTModelID::NO_ID;
peft_bwd = false;
}
int first_token_depth_in_request;
int first_token_offset_in_batch;
int num_tokens_in_batch;
int max_sequence_length;

// request id in batch config:
int batch_config_request_id;
int batch_config_request_id = -1;
bool prompt_phase = false;
RequestGuid request_guid;
// PEFT fields
PEFTModelID peft_model_id;
bool peft_bwd;
};
struct PerTokenInfo {
int abs_depth_in_request;
Expand All @@ -102,6 +120,7 @@ class BatchConfig {
BitMask causalMask[MAX_NUM_REQUESTS];
PerRequestInfo requestsInfo[MAX_NUM_REQUESTS];
PerTokenInfo tokensInfo[MAX_NUM_TOKENS];
PerTokenInfo labelsInfo[MAX_NUM_TOKENS];

bool request_completed[MAX_NUM_REQUESTS];
bool request_running[MAX_NUM_REQUESTS];
Expand Down
41 changes: 33 additions & 8 deletions include/flexflow/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,25 @@ constexpr ParameterSyncType CHOSEN_SYNC_TYPE = ParameterSyncType::PS;
#endif

class FFConfig;
class MemoryAllocator;
class PEFTWeightAllocator;

struct CombinedBatchConfigMetaStruct {
BatchConfig::PerTokenInfo tokens_info[BatchConfig::MAX_NUM_TOKENS];
BatchConfig::PerRequestInfo requestsInfo[BatchConfig::MAX_NUM_REQUESTS];
BatchConfig::BitMask causalMask[BatchConfig::MAX_NUM_REQUESTS];
bool request_completed[BatchConfig::MAX_NUM_REQUESTS];

BeamSearchBatchConfig::BeamSearchPerTokenInfo
beamTokenInfo[BeamSearchBatchConfig::MAX_NUM_TOKENS +
BeamSearchBatchConfig::MAX_SPEC_TREE_TOKEN_NUM *
BeamSearchBatchConfig::MAX_NUM_REQUESTS];
BeamSearchBatchConfig::BeamSearchPerRequestInfo
beamRequestsInfo[BeamSearchBatchConfig::MAX_NUM_REQUESTS];

TreeVerifyBatchConfig::CommittedTokensInfo
committed_tokens[TreeVerifyBatchConfig::MAX_NUM_TOKENS];
};

struct FFHandler {
#if defined(FF_USE_CUDA) || defined(FF_USE_HIP_CUDA)
Expand All @@ -76,18 +95,18 @@ struct FFHandler {
#endif
void *workSpace;
size_t workSpaceSize;
void *batch_config_metadata;
CombinedBatchConfigMetaStruct *batch_config_metadata;

// request info + token info + topolopgy mask info
size_t batch_config_metadata_size =
sizeof(BatchConfig::tokensInfo) + sizeof(BatchConfig::requestsInfo) +
sizeof(BeamSearchBatchConfig::beamTokenInfo) +
sizeof(BeamSearchBatchConfig::beamRequestsInfo) +
sizeof(BatchConfig::causalMask) +
sizeof(TreeVerifyBatchConfig::committed_tokens) +
sizeof(BatchConfig::request_completed);
size_t batch_config_metadata_size = sizeof(CombinedBatchConfigMetaStruct);
void *offload_reserve_space;
size_t offload_reserve_space_size;
// PEFT related fields
MemoryAllocator *peft_activation_allocator;
size_t peft_activation_reserve_space_size;
PEFTWeightAllocator *peft_weight_allocator;
size_t peft_weight_reserve_space_size;
// Quantization fields
DataType quantization_type;
bool allowTensorOpMathConversion;
#ifdef FF_USE_NCCL
Expand All @@ -98,6 +117,8 @@ struct FFHandler {
struct FFInitInfo {
size_t workSpaceSize;
size_t offload_reserve_space_size;
size_t peft_activation_reserve_space_size;
size_t peft_weight_reserve_space_size;
DataType quantization_type;
bool allowTensorOpMathConversion;
// int myRank, allRanks;
Expand Down Expand Up @@ -155,6 +176,10 @@ class FFConfig {
bool cpu_offload;
size_t offload_reserve_space_size;
DataType quantization_type;
// PEFT related fields
bool enable_peft;
size_t peft_activation_reserve_space_size;
size_t peft_weight_reserve_space_size;
// Control parallelizable dimensions
bool only_data_parallel;
bool enable_sample_parallel;
Expand Down
15 changes: 15 additions & 0 deletions include/flexflow/ffconst.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,12 @@ enum LossType {
LOSS_IDENTITY = 54,
};

enum OptimizerType {
OPTIMIZER_TYPE_NONE = 60,
OPTIMIZER_TYPE_SGD = 61,
OPTIMIZER_TYPE_ADAM = 62,
};

enum CompMode {
COMP_MODE_TRAINING = 70,
COMP_MODE_INFERENCE = 71,
Expand All @@ -72,6 +78,11 @@ enum InferenceMode {
TREE_VERIFY_MODE = 2003,
};

enum RequestType {
REQ_INFERENCE = 4001,
REQ_FINETUNING = 4002,
};

// This is consistent with TASO's OpType
// https://github.com/jiazhihao/TASO/blob/master/include/taso/ops.h#L75-L138
enum OperatorType {
Expand Down Expand Up @@ -172,6 +183,8 @@ enum OperatorType {
OP_SPEC_INC_MULTIHEAD_SELF_ATTENTION,
OP_TREE_INC_MULTIHEAD_SELF_ATTENTION,
OP_SAMPLING,
// PEFT Ops
OP_LORA,
// Parallel Ops
OP_REPARTITION,
OP_COMBINE,
Expand Down Expand Up @@ -268,5 +281,7 @@ enum {
TENSOR_GUID_LAST_VALID = 3999999,
PARALLEL_TENSOR_GUID_FIRST_VALID = 4000000,
NODE_GUID_FIRST_VALID = 5000000,
PEFT_MODEL_ID_FIRST_VALID = 6000000,
PEFT_MODEL_ID_LAST_VALID = 6999999
};
#endif // _FLEXFLOW_CONST_H_
25 changes: 25 additions & 0 deletions include/flexflow/fftype.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@

#include "flexflow/ffconst.h"
#include <cstddef>
#include <functional>
#include <iostream>

namespace FlexFlow {

Expand All @@ -18,6 +20,29 @@ class LayerID {
size_t id, transformer_layer_id, model_id;
};

class PEFTModelID {
public:
static const PEFTModelID NO_ID;
PEFTModelID();
PEFTModelID(size_t id);
bool is_valid_id() const;
friend bool operator==(PEFTModelID const &lhs, PEFTModelID const &rhs);
friend std::ostream &operator<<(std::ostream &os,
PEFTModelID const &peft_model_id);

public:
size_t id;
};

}; // namespace FlexFlow

namespace std {
template <>
struct hash<FlexFlow::PEFTModelID> {
size_t operator()(FlexFlow::PEFTModelID const &n) const {
return n.id;
}
};
} // namespace std

#endif // _FF_TYPE_H
44 changes: 40 additions & 4 deletions include/flexflow/flexflow_c.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ FF_NEW_OPAQUE_TYPE(flexflow_inference_manager_t);
FF_NEW_OPAQUE_TYPE(flexflow_request_manager_t);
FF_NEW_OPAQUE_TYPE(flexflow_file_data_loader_t);
FF_NEW_OPAQUE_TYPE(flexflow_generation_result_t);
FF_NEW_OPAQUE_TYPE(flexflow_lora_linear_config_t);
FF_NEW_OPAQUE_TYPE(flexflow_peft_model_id_t);

// -----------------------------------------------------------------------
// FFConfig
Expand Down Expand Up @@ -270,6 +272,7 @@ flexflow_tensor_t *
bool elementwise_affine,
float eps,
bool use_bias,
bool inplace_residual,
char const *name);

flexflow_tensor_t *flexflow_model_add_add_bias_residual_layer_norm(
Expand All @@ -281,6 +284,7 @@ flexflow_tensor_t *flexflow_model_add_add_bias_residual_layer_norm(
bool elementwise_affine,
float eps,
bool use_bias,
bool inplace_residual,
char const *name);

flexflow_tensor_t
Expand Down Expand Up @@ -565,6 +569,7 @@ flexflow_tensor_t *
const flexflow_tensor_t input2_,
float eps,
int dim,
bool inplace_residual,
char const *name);

flexflow_tensor_t flexflow_model_add_arg_top_k(flexflow_model_t handle_,
Expand All @@ -590,6 +595,9 @@ flexflow_tensor_t flexflow_model_add_argmax(flexflow_model_t handle_,
bool beam_search,
char const *name);

flexflow_peft_model_id_t flexflow_model_add_lora_layer(
flexflow_model_t handle_, const flexflow_lora_linear_config_t peft_config_);

void flexflow_model_set_sgd_optimizer(flexflow_model_t handle,
flexflow_sgd_optimizer_t optimizer);

Expand All @@ -613,10 +621,13 @@ void flexflow_model_set_transformer_layer_id(flexflow_model_t handle, int id);

void flexflow_model_generate(flexflow_model_t handle_,
int num_requests,
char const **input_text,
int max_num_chars,
char **output_text,
int max_seq_length,
enum RequestType *request_types,
char const **input_texts,
char **output_texts,
int *max_seq_lengths,
flexflow_peft_model_id_t *peft_model_ids,
char const **dataset_filepaths,
int *training_steps,
int **output_length_and_tokens);

void flexflow_model_set_position_offset(flexflow_model_t handle, int offset);
Expand Down Expand Up @@ -978,6 +989,9 @@ void flexflow_request_manager_set_max_spec_tree_token_num(
void flexflow_request_manager_set_max_sequence_length(
flexflow_request_manager_t handle_, int max_seq_length);

void flexflow_request_manager_set_enable_peft_finetuning(
flexflow_request_manager_t handle_, bool enable_peft_finetuning_);

void flexflow_request_manager_register_tokenizer(
flexflow_request_manager_t handle_,
enum ModelType model_type,
Expand Down Expand Up @@ -1036,6 +1050,28 @@ void flexflow_file_data_loader_destroy(flexflow_file_data_loader_t handle_);
void flexflow_file_data_loader_load_weights(flexflow_file_data_loader_t handle_,
flexflow_model_t model_handle_);

// -----------------------------------------------------------------------
// LoraLinearConfig
// -----------------------------------------------------------------------

flexflow_lora_linear_config_t
flexflow_lora_linear_config_create(char const *cache_folder_,
char const *peft_model_id_);

void flexflow_lora_linear_config_destroy(flexflow_lora_linear_config_t handle_);

// -----------------------------------------------------------------------
// PEFTModelID
// -----------------------------------------------------------------------

flexflow_peft_model_id_t flexflow_peft_model_id_create();

flexflow_peft_model_id_t flexflow_peft_model_id_create_id(unsigned long id);

flexflow_peft_model_id_t flexflow_peft_model_id_no_id();

void flexflow_peft_model_id_destroy(flexflow_peft_model_id_t handle_);

#ifdef __cplusplus
}
#endif
Expand Down
2 changes: 1 addition & 1 deletion include/flexflow/layer.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ class Layer {
Tensor outputs[MAX_NUM_OUTPUTS];
Tensor inputs[MAX_NUM_INPUTS];
Tensor weights[MAX_NUM_WEIGHTS];
bool trainableInputs[MAX_NUM_INPUTS];
// bool trainable_inputs[MAX_NUM_INPUTS];
int numInputs, numWeights, numOutputs;
bool profiling;
bool inference_debugging;
Expand Down
Loading
Loading