Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sharktank] Add perplexity CI to sharktank dashboard #466

Merged
merged 34 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
8a1552c
Rename _vmfb to _iree
archana-ramalingam Nov 8, 2024
c69d09b
Add perplexity scoreboard and description
archana-ramalingam Nov 8, 2024
2e286c8
README updates
archana-ramalingam Nov 8, 2024
f1b167f
Add perplexity to github.io dashboard
archana-ramalingam Nov 8, 2024
d79236a
README updates
archana-ramalingam Nov 8, 2024
16c100b
README updates
archana-ramalingam Nov 8, 2024
b5d7b0e
Test github pages deployment
archana-ramalingam Nov 9, 2024
f9171c1
Merge branch 'main' into update-perplexity-ci
archana-ramalingam Nov 9, 2024
2de02a0
README.md updates
archana-ramalingam Nov 9, 2024
074bc66
Pin actions-gh-pages to latest hash
archana-ramalingam Nov 11, 2024
f95023f
Pin actions-gh-pages to latest hash
archana-ramalingam Nov 11, 2024
6ff6535
Merge branch 'main' into update-perplexity-ci
archana-ramalingam Nov 15, 2024
a5a1d49
Remove pre-submit debug
archana-ramalingam Nov 15, 2024
5cb6bb4
Merge branch 'update-perplexity-ci' of https://github.com/nod-ai/SHAR…
archana-ramalingam Nov 15, 2024
8f70b33
Update GH pages dir
archana-ramalingam Nov 16, 2024
975e5a0
Test github pages deployment
archana-ramalingam Nov 16, 2024
6665a2b
Merge branch 'main' into update-perplexity-ci
archana-ramalingam Nov 16, 2024
f39d91e
Add keep_files and destination_dir
archana-ramalingam Nov 18, 2024
4c83fcc
Merge branch 'update-perplexity-ci' of https://github.com/nod-ai/shar…
archana-ramalingam Nov 18, 2024
5c209bd
Merge branch 'main' into update-perplexity-ci
archana-ramalingam Nov 18, 2024
72b56dc
Test gh-pages deployment
archana-ramalingam Nov 18, 2024
c569310
Merge branch 'update-perplexity-ci' of https://github.com/nod-ai/shar…
archana-ramalingam Nov 18, 2024
507bc5d
Rename numerics to perplexity
archana-ramalingam Nov 18, 2024
750c4b1
Update README.md
archana-ramalingam Nov 19, 2024
0666514
Test gh pages deployment
archana-ramalingam Nov 19, 2024
5ace2ce
Merge branch 'main' into update-perplexity-ci
archana-ramalingam Nov 19, 2024
6e83d0e
Revert debug changes
archana-ramalingam Nov 20, 2024
ac3c0ef
Merge branch 'main' into update-perplexity-ci
archana-ramalingam Nov 20, 2024
382da34
Merge branch 'update-perplexity-ci' of https://github.com/nod-ai/shar…
archana-ramalingam Nov 20, 2024
6ae5e9e
Merge branch 'main' into update-perplexity-ci
archana-ramalingam Nov 21, 2024
eaeb097
Merge branch 'main' into update-perplexity-ci
archana-ramalingam Nov 21, 2024
bec632b
Merge branch 'main' into update-perplexity-ci
archana-ramalingam Nov 22, 2024
e27818a
Merge branch 'main' into update-perplexity-ci
archana-ramalingam Nov 22, 2024
325b9af
Merge branch 'main' into update-perplexity-ci
archana-ramalingam Nov 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 20 additions & 7 deletions .github/workflows/ci_eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
name: CI - Perplexity

on:
pull_request:
archana-ramalingam marked this conversation as resolved.
Show resolved Hide resolved
workflow_dispatch:
schedule:
# Weekdays nightly at 07:00 UTC = 23:00 PST / 00:00 PDT.
Expand All @@ -21,9 +22,9 @@ concurrency:
cancel-in-progress: true

jobs:
test_perplexity_vmfb:
test_perplexity_iree:
timeout-minutes: 1000
name: "IREE/vmfb"
name: "Perplexity-IREE"
strategy:
matrix:
version: [3.11]
Expand Down Expand Up @@ -74,12 +75,18 @@ jobs:
iree-base-runtime \
"numpy<2.0"

- name: Run perplexity test with vmfb
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_vmfb_test.py --longrun --iree-device='hip://7' --iree-hip-target=gfx942 --iree-hal-target-backends=rocm --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json
- name: Run perplexity test with IREE
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_iree_test.py --longrun --iree-device='hip://7' --iree-hip-target=gfx942 --iree-hal-target-backends=rocm --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json --html=perplexity/perplexity_iree.html

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
archana-ramalingam marked this conversation as resolved.
Show resolved Hide resolved
with:
github_token: ${{ secrets.SHARK_PLATFORM_GH_TOKEN }}
publish_dir: ./perplexity

test_perplexity_torch:
timeout-minutes: 1000
name: "Torch/eager mode"
name: "Perplexity-Torch"
strategy:
matrix:
version: [3.11]
Expand Down Expand Up @@ -122,5 +129,11 @@ jobs:
pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \
-e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine"

- name: Run perplexity test in eager mode
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json
- name: Run perplexity test with Torch
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json --html=perplexity/perplexity_torch.html

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
archana-ramalingam marked this conversation as resolved.
Show resolved Hide resolved
with:
github_token: ${{ secrets.SHARK_PLATFORM_GH_TOKEN }}
publish_dir: ./perplexity
19 changes: 17 additions & 2 deletions sharktank/sharktank/evaluate/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,31 @@ pip install -r sharktank/requirements-tests.txt

### Perplexity

Test perplexity for Llama3.1 8B & 405B (FP16 & FP8) models:
Perplexity score measures the ability of a language model to predict the next token in a sequence. A lower score indicates that a model has higher certainty in it's predictions. Perplexity acts as an intrinsic evaluation metric that measures the model quality, independent of any downstream task.

In SHARK-Platform, we use perplexity to track code regressions and quality loss across quantized models (with FP16 as baseline). We use 100 prompts randomly selected from the Wikitext-2 test set and calculate the mean perplexities shown below. These numbers are neither comparable between models with different tokenizers nor with other projects due to varying implementations.

* Test perplexity for Llama3.1 8B (FP16) model:

```bash
pytest sharktank/tests/evaluate/perplexity_test.py --longrun
```

Get perplexity for a new model:
* Calculate perplexity for a new model:

```bash
python -m sharktank.evaluate.perplexity \
--gguf-file=llama3_70b_f16.gguf \
--tokenizer-config-json=tokenizer_config.json
```

### LLaMA 3.1 Perplexity Scoreboard

| CPU | GPU |
|:-------------: |:----------:|
| AMD EPYC 9554 | MI300X |


|Models |Model size (GB) |Torch |IREE |
|:--------|:---------------|:----------|:----------|
|8B f16 |16.07 |14.930181 |14.991893 |
sogartar marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion sharktank/tests/evaluate/baseline_perplexity_scores.json
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@
],
"mean_perplexity": 6.060831
},
"llama3_8B_f16_decomposed_vmfb": {
"llama3_8B_f16_decomposed_iree": {
"perplexities": [
6.651368,
22.059452,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import pytest
import json

from sharktank.evaluate import perplexity_vmfb
from sharktank.evaluate import perplexity_iree

longrun = pytest.mark.skipif("not config.getoption('longrun')")

Expand All @@ -32,10 +32,10 @@ def test_llama3_8B_f16_decomposed(self):

# Llama 3.1 8B decomposed

model_name = "llama3_8B_f16_decomposed_vmfb"
model_name = "llama3_8B_f16_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_f16_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -67,10 +67,10 @@ def test_llama3_8B_f16(self):

# Llama 3.1 8B non-decomposed

model_name = "llama3_8B_f16_vmfb"
model_name = "llama3_8B_f16_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_f16_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -102,10 +102,10 @@ def test_llama3_8B_fp8_decomposed(self):

# Llama 3.1 8B decomposed

model_name = "llama3_8B_fp8_decomposed_vmfb"
model_name = "llama3_8B_fp8_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_fp8_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -137,10 +137,10 @@ def test_llama3_8B_fp8(self):

# Llama 3.1 8B non-decomposed

model_name = "llama3_8B_fp8_vmfb"
model_name = "llama3_8B_fp8_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_8b_fp8_model}",
f"--tokenizer-config-json={self.llama3_8b_tokenizer}",
Expand Down Expand Up @@ -172,10 +172,10 @@ def test_llama3_405B_f16_decomposed(self):

# Llama 3.1 405B decomposed

model_name = "llama3_405B_f16_decomposed_vmfb"
model_name = "llama3_405B_f16_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_f16_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down Expand Up @@ -207,10 +207,10 @@ def test_llama3_405B_f16(self):

# Llama 3.1 405B non-decomposed

model_name = "llama3_405B_f16_vmfb"
model_name = "llama3_405B_f16_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_f16_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down Expand Up @@ -242,10 +242,10 @@ def test_llama3_405B_fp8_decomposed(self):

# Llama 3.1 405B decomposed

model_name = "llama3_405B_fp8_decomposed_vmfb"
model_name = "llama3_405B_fp8_decomposed_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_fp8_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down Expand Up @@ -277,10 +277,10 @@ def test_llama3_405B_fp8(self):

# Llama 3.1 405B non-decomposed

model_name = "llama3_405B_fp8_vmfb"
model_name = "llama3_405B_fp8_iree"
baseline_perplexity = self.baseline_perplexity[model_name]

current_perplexity = perplexity_vmfb.main(
current_perplexity = perplexity_iree.main(
[
f"--irpa-file={self.llama3_405b_fp8_model}",
f"--tokenizer-config-json={self.llama3_405b_tokenizer}",
Expand Down
Loading