-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Are multimodal models supported by trtllm-serve?
OpenAI API
triaged
Issue has been triaged by maintainers
#2714
opened Jan 23, 2025 by
xiaoyuzju
Inconsistent Inference Results Between HuggingFace Python Model and TensorRT-LLM Triton Model (v0.11.0)
#2713
opened Jan 23, 2025 by
fclearner
how to compile deepseekv3 ?
Installation
triaged
Issue has been triaged by maintainers
#2711
opened Jan 22, 2025 by
zmtttt
Support for Blackwell and Thor
triaged
Issue has been triaged by maintainers
#2710
opened Jan 21, 2025 by
phantaurus
Speculative Decoding - Draft Target model approach - Having issue with Triton inference Server
Investigating
triaged
Issue has been triaged by maintainers
Triton Backend
#2709
opened Jan 21, 2025 by
sivabreddy
[bug] Encountered an error in forwardAsync function: Assertion failed: mNextBlocks.empty()
Generic Runtime
Investigating
triaged
Issue has been triaged by maintainers
#2708
opened Jan 21, 2025 by
akhoroshev
convert NVILA with 0.16.0
bug
Something isn't working
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2706
opened Jan 20, 2025 by
dzy130120
2 of 4 tasks
trt-llm相比hf跑qwen的forward仅context phrase有加速效果,generation没有加速效果
bug
Something isn't working
#2705
opened Jan 20, 2025 by
nickole2018
2 of 4 tasks
Support for int2/int3 quantization
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2704
opened Jan 20, 2025 by
ZHITENGLI
quantized model using AWQ and lora weights
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2703
opened Jan 17, 2025 by
shuyuan-wang
Input length limitation (8192) despite model supporting 32k context window
#2702
opened Jan 17, 2025 by
HuangZhen02
Wrong outputs with FP8 kv_cache reuse
bug
Something isn't working
Investigating
KV-Cache Management
triaged
Issue has been triaged by maintainers
#2699
opened Jan 16, 2025 by
lishicheng1996
2 of 4 tasks
What is Issue has been triaged by maintainers
execution context memory
?
triaged
#2698
opened Jan 16, 2025 by
wxsms
Custom allreduce performance improvement
Customized Kernels
Investigating
triaged
Issue has been triaged by maintainers
#2696
opened Jan 16, 2025 by
yizhang2077
Failed TensorRT-LLM Benchmark
bug
Something isn't working
#2694
opened Jan 15, 2025 by
maulikmadhavi
1 of 4 tasks
0.16.0 Qwen2-72B-Struct SQ error
bug
Something isn't working
#2693
opened Jan 15, 2025 by
gy0514020329
4 tasks
NotImplementedError: Cannot copy out of meta tensor; no data!
bug
Something isn't working
#2692
opened Jan 15, 2025 by
chilljudaoren
2 of 4 tasks
(Memory leak) trtllm-build gets OOM without GPTAttentionPlugin
bug
Something isn't working
#2690
opened Jan 14, 2025 by
idantene
2 of 4 tasks
trtllm-build llama3.1-8b failed
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2688
opened Jan 14, 2025 by
765500005
Multi-LoRA cpp inference error: Assertion failed: lora_weights has to few values for attn_k
Investigating
Lora/P-tuning
triaged
Issue has been triaged by maintainers
#2687
opened Jan 13, 2025 by
lodm94
internvl-2.5
triaged
Issue has been triaged by maintainers
#2686
opened Jan 13, 2025 by
ChenJian7578
Inference error encountered while using the draft target model.
bug
Something isn't working
#2684
opened Jan 13, 2025 by
pimang62
2 of 4 tasks
Deepseek-v3 int4 weight only inference outputs garbage words with TP 8 on nvidia H20 GPU
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2683
opened Jan 13, 2025 by
handoku
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-12-23.