Add new_deployment option for serve #275

cheehook · 2024-08-15T07:01:07Z

Objective: allow user to deploy and keep multiple model endpoints alive so that they can easily test and benchmark different models. (as current OpenAI compatible API server deployment will kill the previous deployment)

Changes:

add --new_deployment option to serve.py
if this option is enabled, the application name and route prefix will be using name as defined in the deployment config instead of router and /
for OpenAI compatible API deployment, the endpoint URL will become http://localhost:8000/{name}/v1 instead of http://localhost:8000/v1

enable option for deploying without overriding previous deployment

llm_on_ray/inference/api_server_openai.py

llm_on_ray/inference/serve.py

Signed-off-by: cheehook <[email protected]>

llm_on_ray/inference/serve.py

Signed-off-by: cheehook <[email protected]>

KepingYan · 2024-08-23T09:39:50Z

LGTM. Thanks @cheehook.

KepingYan reviewed Aug 16, 2024

View reviewed changes

llm_on_ray/inference/api_server_openai.py Outdated Show resolved Hide resolved

KepingYan reviewed Aug 16, 2024

View reviewed changes

llm_on_ray/inference/serve.py Outdated Show resolved Hide resolved

cheehook closed this Aug 19, 2024

cheehook force-pushed the main branch from 64cacef to 3a1e46e Compare August 19, 2024 03:21

Add custom route prefix option

d8d162b

Signed-off-by: cheehook <[email protected]>

cheehook reopened this Aug 19, 2024

cheehook force-pushed the main branch from 298dbd7 to d8d162b Compare August 19, 2024 09:27

cheehook added 2 commits August 19, 2024 09:27

fix the option with proper properties

cf81ec4

Signed-off-by: cheehook <[email protected]>

Merge branch 'main' into main

f580eba

Signed-off-by: cheehook <[email protected]>

KepingYan reviewed Aug 22, 2024

View reviewed changes

llm_on_ray/inference/serve.py Outdated Show resolved Hide resolved

llm_on_ray/inference/serve.py Outdated Show resolved Hide resolved

user will enter their own route prefix with --openai_route_prefix option

d3bd052

Signed-off-by: cheehook <[email protected]>

KepingYan approved these changes Aug 23, 2024

View reviewed changes

KepingYan merged commit fc6b44a into intel:main Aug 23, 2024
13 checks passed

Provide feedback