GitHub - JohnConnor123/multilingual-capabilities-through-quantizations

Quickstart

P.s. This code works only on Linux and tested only with python3.12 and cuda12.1

Create enviroment:

python -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt
git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ && python setup.py build && python setup.py install && cd ..

P.s. If you get the error:

libcusparse.so.12: undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12

when running the AutoGPTQ installation command, then just run the command unset LD_LIBRARY_PATH (more details: pytorch/pytorch#111469 (comment))

P.s. If you get the error:

In file included from /usr/local/cuda-12.1/include/cuda_runtime.h:83,
                 from <command-line>:
/usr/local/cuda-12.1/include/crt/host_config.h:132:2: error: #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
  132 | #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
      |  ^~~~~
error: command '/usr/local/cuda-12.1/bin/nvcc' failed with exit code 1

when running the AutoGPTQ installation command, then install gcc/g++ v12 and set it as default compiler: Installation gcc/g++ v12:

sudo apt install gcc-12 g++-12

Setting gcc/g++ v12 as default - paste in script.sh this:

cd /usr/bin
for f in gcc cpp g++ gcc-ar gcc-nm gcc-ranlib gcov gcov-dump gcov-tool lto-dump; do
    ln -vsf $f-12 $f
done

and run it via sudo bash script.sh (more details: https://askubuntu.com/a/1510476)

Download MERA benchmark:

git clone --recurse-submodules https://github.com/MERA-Evaluation/MERA.git
git pull --all --rebase --recurse-submodules
cd MERA/lm-evaluation-harness
pip install -e .[vllm]
cd ..

After that use mera evaluate commands in MERA directory

Download llama.cpp project:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
cd ..

P.s. If you want to speed up the build process, add -j <number of threads>. For example, the command below will use 8 threads to build llama.cpp: cmake --build build --config Release -j 8

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
quantization_benchmark.py		quantization_benchmark.py
quantize_awq.py		quantize_awq.py
quantize_bnb.py		quantize_bnb.py
quantize_gguf.py		quantize_gguf.py
quantize_gptq.py		quantize_gptq.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quickstart

About

Releases

Packages

Languages

JohnConnor123/multilingual-capabilities-through-quantizations

Folders and files

Latest commit

History

Repository files navigation

Quickstart

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages