We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我根据 sgl的官方文档:https://sgl-project.github.io/start/install.html 用pip的方式安装了sglang包,然后根据MiniCPM的README文档执行python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml,执行后出现错误信息如下:
python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml
(sgl) hygx@hygx:~$ python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml [2024-12-10 09:17:10] server_args=ServerArgs(model_path='openbmb/MiniCPM3-4B', tokenizer_path='openbmb/MiniCPM3-4B', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization=None, context_length=None, device='cuda', served_model_name='openbmb/MiniCPM3-4B', chat_template='chatml', is_embedding=False, revision=None, host='127.0.0.1', port=30000, mem_fraction_static=0.88, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, cpu_offload_gb=0, tp_size=1, stream_interval=1, random_seed=186360464, constrained_json_whitespace_pattern=None, watchdog_timeout=300, download_dir=None, base_gpu_id=0, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, enable_metrics=False, decode_log_interval=40, api_key=None, file_storage_pth='SGLang_storage', enable_cache_report=False, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', disable_radix_cache=False, disable_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=8, torchao_config='', enable_nan_detection=False, enable_p2p_check=False, triton_attention_reduce_in_fp32=False, num_continuous_decode_steps=1, delete_ckpt_after_loading=False) config.json: 100%|█████████████████████████████████████████████████████████████████| 1.93k/1.93k [00:00<00:00, 19.5MB/s] configuration_minicpm.py: 100%|████████████████████████████████████████████████████| 9.23k/9.23k [00:00<00:00, 60.0MB/s] A new version of the following files was downloaded from https://huggingface.co/openbmb/MiniCPM3-4B: - configuration_minicpm.py . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision. tokenizer_config.json: 100%|███████████████████████████████████████████████████████| 10.4k/10.4k [00:00<00:00, 82.3MB/s] tokenizer.model: 100%|█████████████████████████████████████████████████████████████| 1.18M/1.18M [00:00<00:00, 8.66MB/s] tokenizer.json: 100%|██████████████████████████████████████████████████████████████| 3.68M/3.68M [00:01<00:00, 2.77MB/s] added_tokens.json: 100%|███████████████████████████████████████████████████████████████| 216/216 [00:00<00:00, 2.14MB/s] special_tokens_map.json: 100%|█████████████████████████████████████████████████████| 1.63k/1.63k [00:00<00:00, 5.12MB/s] [2024-12-10 09:17:17] Use chat template for the OpenAI-compatible API server: chatml [2024-12-10 09:17:18 TP0] MLA optimization is turned on. Use triton backend. [2024-12-10 09:17:18 TP0] Init torch distributed begin. [2024-12-10 09:17:18 TP0] Load weight begin. avail mem=22.50 GB [2024-12-10 09:17:18 TP0] Scheduler hit an exception: Traceback (most recent call last): File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1493, in run_scheduler_process scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank) File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 191, in __init__ self.tp_worker = TpWorkerClass( File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 62, in __init__ self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port) File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 62, in __init__ self.model_runner = ModelRunner( File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 155, in __init__ self.load_model() File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 253, in load_model self.model = get_model( File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model return loader.load_model( File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 357, in load_model model = _initialize_model( File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/model_loader/loader.py", line 138, in _initialize_model return model_class( File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 551, in __init__ self.model = MiniCPM3Model(config, quant_config=quant_config) File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 508, in __init__ [ File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 509, in <listcomp> MiniCPM3DecoderLayer(config, i, quant_config=quant_config) File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 416, in __init__ self.self_attn = MiniCPM3AttentionMLA( File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/sglang/srt/models/minicpm3.py", line 313, in __init__ self.rotary_emb = get_rope( File "/home/hygx/anaconda3/envs/sgl/lib/python3.10/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 978, in get_rope scaling_type = rope_scaling["rope_type"] KeyError: 'rope_type' Killed
请问需要我做哪些调整以让程序正常运行?
conda create -n sgl python==3.10 conda activate sgl python -m pip install --upgrade pip python -m pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu121/torch2.4/flashinfer/ source switch_cuda.sh 11.6 python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml
conda create -n sgl python==3.10
conda activate sgl
python -m pip install --upgrade pip
python -m pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu121/torch2.4/flashinfer/
source switch_cuda.sh 11.6
模型能在我的本地电脑上运行起来
No response
- OS: windows11上的WSL2 - Pytorch: 2.5.1+cu124 - CUDA:11.6 - Device: i9-14900KF + RTX 4090 D
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Is there an existing issue ? / 是否已有相关的 issue ?
Describe the bug / 描述这个 bug
我根据 sgl的官方文档:https://sgl-project.github.io/start/install.html 用pip的方式安装了sglang包,然后根据MiniCPM的README文档执行
python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml
,执行后出现错误信息如下:请问需要我做哪些调整以让程序正常运行?
To Reproduce / 如何复现
conda create -n sgl python==3.10
conda activate sgl
python -m pip install --upgrade pip
python -m pip install "sglang[all]" --find-links https://flashinfer.ai/whl/cu121/torch2.4/flashinfer/
source switch_cuda.sh 11.6
python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml
Expected behavior / 期望的结果
模型能在我的本地电脑上运行起来
Screenshots / 截图
No response
Environment / 环境
Additional context / 其他信息
No response
The text was updated successfully, but these errors were encountered: