Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new_deployment option for serve #275

Merged
merged 4 commits into from
Aug 23, 2024
Merged

Add new_deployment option for serve #275

merged 4 commits into from
Aug 23, 2024

Conversation

cheehook
Copy link
Contributor

  • Objective: allow user to deploy and keep multiple model endpoints alive so that they can easily test and benchmark different models. (as current OpenAI compatible API server deployment will kill the previous deployment)

Changes:

  • add --new_deployment option to serve.py
  • if this option is enabled, the application name and route prefix will be using name as defined in the deployment config instead of router and /
  • for OpenAI compatible API deployment, the endpoint URL will become http://localhost:8000/{name}/v1 instead of http://localhost:8000/v1

enable option for deploying without overriding previous deployment

llm_on_ray/inference/serve.py Outdated Show resolved Hide resolved
llm_on_ray/inference/serve.py Outdated Show resolved Hide resolved
@KepingYan
Copy link
Contributor

LGTM. Thanks @cheehook.

@KepingYan KepingYan merged commit fc6b44a into intel:main Aug 23, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants