Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specscheduler evaluation support code #1542

Merged
merged 5 commits into from
Nov 16, 2024
Merged

Conversation

goliaro
Copy link
Collaborator

@goliaro goliaro commented Nov 15, 2024

Description of changes:

This PR does the following:

  • LLAMA 3 speculation support:

    • Add support for LLAMA 3.1 and 3.2
    • Benchmark performance of LLAMA-3.1-70B with small models: Zhuominc/Llama-3-330M, meta-llama/Llama-3.2-1B-Instruct, meta-llama/Llama-3.2-3B-Instruct, meta-llama/Llama-3.1-8B-Instruct (tl;dr meta-llama/Llama-3.2-1B-Instruct is the best)
    • Add support for serving SSMs with TP_degre > 1
  • Benchmarking

  • Make evaluation easier/faster to run:

    • Add code to load all the weights in parallel, fixing context issue discussed with Legion team here
    • Record memory usage breakdown when passing --log-instance-creation. Add script to debug issues related to insufficient memory by device and task. See here.
  • Bug fixes

    • Remove all reduce deadlock by adding Legion barriers
    • Detection of EOS tokens when produced in the middle of speculation (instead of at the end) and early stop to prevent infinite generation (until max sequence length) when the EOS token is in middle of verified sequence

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

This change is Reviewable

@goliaro goliaro marked this pull request as ready for review November 15, 2024 17:18
@chenzhuofu chenzhuofu self-requested a review November 16, 2024 04:46
@chenzhuofu chenzhuofu merged commit 13dcb23 into specscheduler Nov 16, 2024
33 of 39 checks passed
@chenzhuofu chenzhuofu deleted the specscheduler_eval branch November 16, 2024 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants