-
Generate configration:
First, replace the model path and dataset path specified in
bench_end_to_end_muxserve.py
with your own path. Specifically, modify the following variables according to the comments.MODEL_TO_PATH
SHAREGPT_PATH
TOKENIZED_DATA_CACHE
Run the scripts to generate configuration:
python bench_end_to_end_muxserve.py
This will generate the configuration and workloads file for the corresponding end-to-end evaluation in the paper:
alpha
= 0.7, 0.9, 1.3, 1.7, 2.1. -
To start the experiment with running the
run_end_to_end.sh
script. Execute the following command in your terminal:bash run_end_to_end.sh <launch_type> <cuda_device> <yaml> <workload> [split_llm if 'spatial']
launch_type
is choosen from [muxserve
,spatial
,temporal
]- Note:
llm-id
is needed iflaunch_type
is temporal; which is in the config file - Note: Flexsm utilizes Nvidia MPS. Running the muxserve component in the experiment requires root privileges. Replace the password in the script with your password(which is marked as
YOUR_PASSWD
in therun_end_to_end.sh
).
An example:
bash run_end_to_end.sh spatial 0 \ model_cfgs/alpha0.7_scale0.5_max40/spatial_cfg.yaml \ workloads/alpha0.7_scale0.5_max40/sharegpt_n19_req.json 2
Make sure you are in the correct directory where the
run_end_to_end.sh
script is located. This script will initiate the necessary steps to run the end-to-end experiment.Once the test is stared, run logs will be generated in
${PROJ_DIR}/benchmark/end_to_end/log
by default. -
Extract the evaluation result from log file:
We provide an automated script
plot_p_latency.py
that performs statistical analysis on evaluation results and visualizes them.
end_to_end
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||