Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 执行sglang.launch_server 出现KeyError "rope_type" #269

Open
1 task done
JV-X opened this issue Dec 10, 2024 · 0 comments
Open
1 task done

[Bug]: 执行sglang.launch_server 出现KeyError "rope_type" #269

JV-X opened this issue Dec 10, 2024 · 0 comments
Labels
bug Something isn't working triage

Comments

@JV-X
Copy link

JV-X commented Dec 10, 2024

Is there an existing issue ? / 是否已有相关的 issue ?

  • I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。

Describe the bug / 描述这个 bug

我根据 sgl的官方文档:https://sgl-project.github.io/start/install.html 用pip的方式安装了sglang包,然后根据MiniCPM的README文档执行python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml,执行后出现错误信息如下:

(sgl) hygx@hygx:~$ python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml
[2024-12-10 09:17:10] server_args=ServerArgs(model_path='openbmb/MiniCPM3-4B', tokenizer_path='openbmb/MiniCPM3-4B', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization=None, context_length=None, device='cuda', served_model_name='openbmb/MiniCPM3-4B', chat_template='chatml', is_embedding=False, revision=None, host='127.0.0.1', port=30000, mem_fraction_static=0.88, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, tp_size=1, stream_interval=1, random_seed=186360464, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='SGLang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=8, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, num_continuous_decode_steps=1, delete_ckpt_after_loading=False)
config.json: 100%|█████████████████████████████████████████████████████████████████| 1.93k/1.93k [00:00<00:00, 19.5MB/s]
configuration_minicpm.py: 100%|████████████████████████████████████████████████████| 9.23k/9.23k [00:00<00:00, 60.0MB/s]
A new version of the following files was downloaded from https://huggingface.co/openbmb/MiniCPM3-4B:
- configuration_minicpm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
tokenizer_config.json: 100%|███████████████████████████████████████████████████████| 10.4k/10.4k [00:00<00:00, 82.3MB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████| 1.18M/1.18M [00:00<00:00, 8.66MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████| 3.68M/3.68M [00:01<00:00, 2.77MB/s]
added_tokens.json: 100%|███████████████████████████████████████████████████████████████| 216/216 [00:00<00:00, 2.14MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████| 1.63k/1.63k [00:00<00:00, 5.12MB/s]
[2024-12-10 09:17:17] Use chat template for the OpenAI-compatible API server: chatml
[2024-12-10 09:17:18 TP0] MLA optimization is turned on. Use triton backend.
[2024-12-10 09:17:18 TP0] Init torch distributed begin.
[2024-12-10 09:17:18 TP0] Load weight begin. avail mem=22.50 GB
[2024-12-10 09:17:18 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1493, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 191, in __init__
    self.tp_worker = TpWorkerClass(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 62, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 62, in __init__
    self.model_runner = ModelRunner(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 155, in __init__
    self.load_model()
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 253, in load_model
    self.model = get_model(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 357, in load_model
    model = _initialize_model(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 138, in _initialize_model
    return model_class(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 551, in __init__
    self.model = MiniCPM3Model(config, quant_config=quant_config)
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 508, in __init__
    [
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 509, in <listcomp>
    MiniCPM3DecoderLayer(config, i, quant_config=quant_config)
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 416, in __init__
    self.self_attn = MiniCPM3AttentionMLA(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 313, in __init__
    self.rotary_emb = get_rope(
  File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 978, in get_rope
    scaling_type = rope_scaling["rope_type"]
KeyError: 'rope_type'

Killed

请问需要我做哪些调整以让程序正常运行?

To Reproduce / 如何复现

conda create -n sgl python==3.10
conda activate sgl
python -m pip install --upgrade pip
python -m pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu121/torch2.4/flashinfer/
source switch_cuda.sh 11.6
python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml

Expected behavior / 期望的结果

模型能在我的本地电脑上运行起来

Screenshots / 截图

No response

Environment / 环境

- OS: windows11上的WSL2 
- Pytorch: 2.5.1+cu124 
- CUDA:11.6 
- Device: i9-14900KF + RTX 4090 D

Additional context / 其他信息

No response

@JV-X JV-X added bug Something isn't working triage labels Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

1 participant