PTQ1.61

Block-wise Optimization and Evaluation

We use LLaMa-7B as an example here:

Obtain the channel-wise scales required for initialization:

python generate_act_scale_shift.py --model /PATH/TO/LLaMA/llama-7b

Training and Evaluating PPL

CUDA_VISIBLE_DEVICES=0 python main.py --model /PATH/TO/LLAMA/llama-7b --epochs 20 --output_dir ./log/llama-7b --eval_ppl --wbits 4 --abits 16 --quant_type mix --lwc \
--save_dir /CHECKPOINT/TO/FIRST/PTQ \
--calib_dataset wikitext2  \

More detailed and optional arguments:

--model: the local model path or huggingface format.
--wbits: weight quantization bits.
--quant_type: quantization type, mix means using structured masks.
--lwc: activate the weight quantizer.
--epochs: training epochs.
--nsamples: number of calibration samples, 128 as default.
--eval_ppl: evaluating the perplexity of quantized models.
--multigpu: to inference larger network on multiple GPUs
--save_dir: saving the quantization model for further exploration.

Reproduce the evaluation results of our paper.

Download the prebuilt quantized model from our anonymous huggingface repo: https://huggingface.co/ptq161.
The detailed reproduction methods please refer to reproduce.ipynb.

Quantization Preprocessing

Preprocessing

cd preprocessing
CUDA_VISIBLE_DEVICES=0 python restorative_lora.py --model_id /PATH/TO/LLAMA/llama-7b \
--save_dir /CHECKPOINT/TO/FIRST/PTQ

CUDA_VISIBLE_DEVICES=0 python test_perplexity.py  --model_path /PATH/TO/LLAMA/llama-7b \
--ckpt /CHECKPOINT/TO/FIRST/PTQ \
--lora_path ./outputs/CHECKPOINT_NAME/step-r \
--output_path /PATH/TO/MERGED/MODEL

Evaluation PPL after Preprocessing

CUDA_VISIBLE_DEVICES=0 python main.py --model /PATH/TO/MERGED/MODEL --epochs 20 --output_dir ./log/llama-7b --eval_ppl --wbits 4 --abits 16 --quant_type mix --lwc \
--save_dir /CHECKPOINT/TO/SECOND/PTQ \
--calib_dataset wikitext2  \

Reasoning Tasks Evaluation

Please follow lm-eval-harness for evaluating Hellaswag, PIQA, MMLU, GSM8K, LAMBADA, etc.

'lm_eval' file is lm-evaluation-harness, a open-sourced evaluation framework from https://github.com/EleutherAI/lm-evaluation-harness, contains datasets, benchmarks, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
lm_eval		lm_eval
models		models
preprocessing		preprocessing
quantize		quantize
salient_weights		salient_weights
.gitignore		.gitignore
README.md		README.md
all salient weights.pdf		all salient weights.pdf
categories.py		categories.py
datautils.py		datautils.py
datautils_new.py		datautils_new.py
eval_zero_shot.py		eval_zero_shot.py
generate_act_scale_shift.py		generate_act_scale_shift.py
main.py		main.py
parallel_utils.py		parallel_utils.py
pyproject.toml		pyproject.toml
remain.py		remain.py
reproduce.ipynb		reproduce.ipynb
rotation.pdf		rotation.pdf
salient weights in salient channels.pdf		salient weights in salient channels.pdf
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PTQ1.61

Block-wise Optimization and Evaluation

Quantization Preprocessing

Reasoning Tasks Evaluation

About

Releases

Packages

Contributors 3

Languages

zjq0455/PTQ1.61

Folders and files

Latest commit

History

Repository files navigation

PTQ1.61

Block-wise Optimization and Evaluation

Quantization Preprocessing

Reasoning Tasks Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages