Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Min-P sampling and late temperature adjustment as a fused sampling layer #2643

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

aikitoria
Copy link

@aikitoria aikitoria commented Jan 2, 2025

This is a work in progress implementation of #1154 implementing both the Min-P sampling and the option to compute temperature after the Min-P filter.

It first determines the maximum P element, then computes the sum of adjusted probabilities over all unfiltered elements, and then directly samples one in the range.

What makes this a draft:

  • It cannot be controlled from the frontend with a separate min_p parameter. But how could I implement this, when some random auxiliary code (executor) is closed source for no apparent reason?
  • It currently does not support processing requests that want top_k/top_p in the same batch as ones that want min_p. I am of the opinion that enabling top_k/top_p and min_p on the same request is nonsense.
  • It does not yet support exporting logprobs if requested by the user.
  • There are no tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant