Skip to content

Commit

Permalink
load_diff bugfix
Browse files Browse the repository at this point in the history
  • Loading branch information
chromecast56 committed Feb 23, 2024
1 parent 964ead3 commit aa06fa6
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 13 deletions.
18 changes: 8 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
[# Compressing Model Diffs for High-Througput Multi-Model Serving]: #

# BitDelta: Your Fine-Tune May Only Be Worth One Bit

[[Paper](https://arxiv.org/abs/2402.10193)][[Blog](https://fasterdecoding.github.io/BitDelta/)]
Expand All @@ -12,13 +10,8 @@ BitDelta compresses the weight delta between a fine-tuned and base model LLM to
</a>
</div>


The current release supports:





- Llama-2 and Mistral based models.
- Memory efficient 16-bit + 1-bit Δ Linear in PyTorch
- Triton kernel for fast inference
Expand Down Expand Up @@ -63,7 +56,6 @@ See [`demo/README.md`](https://github.com/FasterDecoding/BitDelta/blob/main/demo

[BitDelta Demo.webm](https://github.com/FasterDecoding/BitDelta/assets/51351043/b56747df-1108-42f2-ae6f-05e1c460080c)


## Usage

We provide some scripts in (`./scripts`) so you can compress your own models! As an example, we will compress `lmsys/vicuna-7b-v1.5` with base model `meta-llama/Llama-2-7b-hf`.
Expand Down Expand Up @@ -92,7 +84,7 @@ If `--save_full_model` is specified, the compressed model will also be saved in
Double check the perplexity of the compressed model:

```
CUDA_VISIBLE_DEVICES=0 python \
### Perplexity CheckCUDA_VISIBLE_DEVICES=0 python \
bitdelta/eval_ppl.py \
--base_model meta-llama/Llama-2-7b-hf \
--dataset_name wikitext \
Expand All @@ -103,17 +95,23 @@ CUDA_VISIBLE_DEVICES=0 python \
```

### Perplexity Check

To replicate our other results, please use `--save_full_model` to run the model in Llama format for compatibility with eval harnesses.

## Citation

If you find BitDelta useful, please consider citing:

```
@misc{liu2024bitdelta,
title={BitDelta: Your Fine-Tune May Only Be Worth One Bit},
title={BitDelta: Your Fine-Tune May Only Be Worth One Bit},
author={James Liu and Guangxuan Xiao and Kai Li and Jason D. Lee and Song Han and Tri Dao and Tianle Cai},
year={2024},
eprint={2402.10193},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```

[# Compressing Model Diffs for High-Througput Multi-Model Serving]: #
8 changes: 5 additions & 3 deletions bitdelta/diff.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,11 @@ def load_diff(model, diff_dir):
coeff = diff_dict[name + ".coeff"].to(device)
mask = diff_dict[name + ".mask"].to(device)

setattr(module, "mask", mask)
setattr(module, "coeff", coeff)
# module.weight.add_((mask * coeff).to(module.weight.dtype))
# setattr(module, "mask", mask)
# setattr(module, "coeff", coeff)
weight = (unpack(mask)*2-1) * coeff

module.weight.add_(weight.T.to(module.weight.dtype))
elif name + ".weight" in diff_dict:
module.weight = nn.Parameter(diff_dict[name + ".weight"].to(device).to(module.weight.dtype))

Expand Down

0 comments on commit aa06fa6

Please sign in to comment.