Skip to content

Commit

Permalink
[Inference]Move quantization code from run_finetune.py to run_quantiz…
Browse files Browse the repository at this point in the history
…ation.py (#9450)

* 1. Move quantization code from run_finetune.py to run_quantization.py
2. Remove experimental code about quantization in llm/experimental. These code will merge in Paddleslim.

* repair experimental/ceval

* update readme and remove useless code

* add test for run_quantization

* update wrong comment

* update qwen2 fp8 quantization config
  • Loading branch information
lixcli authored Nov 22, 2024
1 parent 7bfe5bc commit 9494e9a
Show file tree
Hide file tree
Showing 16 changed files with 614 additions and 1,042 deletions.
8 changes: 4 additions & 4 deletions llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,16 +228,16 @@ python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./alignment/dpo

```shell
# PTQ 量化启动命令参考
python run_finetune.py ./config/llama/ptq_argument.json
python run_quantization.py ./config/llama/ptq_argument.json

# GPTQ 量化启动命令参考
python run_finetune.py ./config/llama/ptq_argument.json
python run_quantization.py ./config/llama/gptq_argument.json

# W8A8C8(INT)量化启动命令参考
python run_finetune.py ./config/llama/ptq_c8_argument.json
python run_quantization.py ./config/llama/ptq_c8_argument.json

# W8A8(FP8)量化启动命令参考
python run_finetune.py ./config/llama/fp8_ptq_argument.json
python run_quantization.py ./config/llama/fp8_ptq_argument.json
```

更多技术细节和模型量化使用详见[量化文档](./docs/quantization.md)
Expand Down
3 changes: 1 addition & 2 deletions llm/config/qwen/AdvertiseGen/wfp8afp8_ptq_argument.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,5 @@
"unified_checkpoint": false,
"smooth": false,
"weight_quant_method": "abs_max",
"act_quant_method": "abs_max",
"skip_list_names": ["down_proj"]
"act_quant_method": "abs_max"
}
10 changes: 5 additions & 5 deletions llm/docs/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,31 +67,31 @@ python prepare_data_for_ptq.py
### 2.3 PTQ 量化

```shell
python run_finetune.py ./config/llama/ptq_argument.json
python run_quantization.py ./config/llama/ptq_argument.json
```

### 2.4 GPTQ 量化

```shell
python run_finetune.py ./config/llama/gptq_argument.json
python run_quantization.py ./config/llama/gptq_argument.json
```

### 2.5 AWQ 量化

```shell
python run_finetune.py ./config/llama/awq_argument.json
python run_quantization.py ./config/llama/awq_argument.json
```

### 2.6 W8A8C8(INT8)量化

```shell
python run_finetune.py ./config/llama/ptq_c8_argument.json
python run_quantization.py ./config/llama/ptq_c8_argument.json
```

### 2.7 W8A8(FP8)量化

```shell
python run_finetune.py ./config/llama/fp8_ptq_argument.json
python run_quantization.py ./config/llama/fp8_ptq_argument.json
```

### 2.8 量化参数介绍
Expand Down
182 changes: 0 additions & 182 deletions llm/experimental/layers/cache_kv.py

This file was deleted.

91 changes: 0 additions & 91 deletions llm/experimental/layers/custom_attention.py

This file was deleted.

Loading

0 comments on commit 9494e9a

Please sign in to comment.