Specscheduler evaluation support code #1542

goliaro · 2024-11-15T17:17:58Z

Description of changes:

This PR does the following:

LLAMA 3 speculation support:
- Add support for LLAMA 3.1 and 3.2
- Benchmark performance of LLAMA-3.1-70B with small models: Zhuominc/Llama-3-330M, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B-Instruct, meta-llama/Llama-3.1-8B-Instruct (tl;dr meta-llama/Llama-3.2-1B-Instruct is the best)
- Add support for serving SSMs with TP_degre > 1
Benchmarking
- Gather fine-grained performance statistics allowing us to compute additional metrics later, e.g. P90 latency
- Fix issue causing to underestimate average number of generation tokens per step
- Added code to benchmark speculation accuracy and end-to-end performance for specinfer and incr decoding with various SSMs and arrival rates.
- Plots available below:
  average_accepted_tokens.pdf
  throughput_vs_tpot.pdf
  ttft_vs_arrival_rate.pdf
  queueing_time_vs_arrival_rate.pdf
Make evaluation easier/faster to run:
- Add code to load all the weights in parallel, fixing context issue discussed with Legion team here
- Record memory usage breakdown when passing --log-instance-creation. Add script to debug issues related to insufficient memory by device and task. See here.
Bug fixes
- Remove all reduce deadlock by adding Legion barriers
- Detection of EOS tokens when produced in the middle of speculation (instead of at the end) and early stop to prevent infinite generation (until max sequence length) when the EOS token is in middle of verified sequence

Related Issues:

Linked Issues:

Issues closed by this PR:

This change is

Specscheduler evaluation support code (#1541)

b798385

goliaro marked this pull request as ready for review November 15, 2024 17:18

cleanup

2990c88

chenzhuofu self-requested a review November 16, 2024 04:46

chenzhuofu added 3 commits November 15, 2024 23:24

feat: use custom allreduce for performance

30efe4d

chore: minor

76df177

chore: minor

6c3bebc

chenzhuofu approved these changes Nov 16, 2024

View reviewed changes

chenzhuofu merged commit 13dcb23 into specscheduler Nov 16, 2024
33 of 39 checks passed

chenzhuofu deleted the specscheduler_eval branch November 16, 2024 10:20

Provide feedback