llama.cpp perplexity scorecard

A helper project to run perplexity tests for llama.cpp. llama.cpp is a leading LLM (Large Language Model) inference engine. llama.cpp runs LLMs like Llama2.

Perplexity is the most commonly used measure of a language model's performance on a given text corpus. It is a measure of how well a model is able to predict the contents of a dataset. Lower perplexity scores are better.

See background discussions in the llama.cpp discussions on the needs and motives for this project here and here

This python app wraps the llama.cpp ./perplexity executable and uploads perplexity scores and test results as JSON to an Amazon S3 bucket for analysis.

The standard llama.cpp perplexity test uses wiki.test.raw.406 - ie 406 lines from wiki.test.raw

Install

pip install -r requirements.txt

Config

Copy .env.example and update the config variables to suit your system.

You can use an existing wiki.test.raw if you want. The script will download the test corpus if required.

Run

python3 perplexity_scorecard.py

Coming soon... the llama.cpp perplexity leaderboard and Jupyter (.ipynb) analysis and charting examples.

PRs are welcome 😀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

llama.cpp perplexity scorecard

Install

Config

Run

Files

README.md

Latest commit

History

README.md

File metadata and controls

llama.cpp perplexity scorecard

Install

Config

Run