Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom samplingconfig addition #2633

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
dcd773e
Updates for release/0.5.0
juney-nvidia Oct 15, 2023
6b32b40
refresh 0.5.0 release branch with the latest revision
juney-nvidia Oct 18, 2023
b4af28c
Fix memory leak in falcon weight loader (#8)
kaiyux Oct 18, 2023
4926a92
update aarch64 batch manager libraries to release/0.5.0 (#10)
Shixiaowei02 Oct 18, 2023
a42dc2b
Minor doc updates
juney-nvidia Oct 19, 2023
395f850
add git-lfs dependency for binaries (#11)
Shixiaowei02 Oct 19, 2023
ffd5af3
revise the homepage (#14)
Shixiaowei02 Oct 19, 2023
84fd0dd
Fix the link to the documentation (#15)
jdemouth Oct 19, 2023
9a5da6d
Fix small doc issue (#19)
juney-nvidia Oct 20, 2023
7295ce2
Fix two deadlinks in README.md (#21)
wangkuiyi Oct 20, 2023
26f797c
Update docs/source/batch_manager.md (#40)
kaiyux Oct 20, 2023
1c4a0ee
Update windows related documentation (#59)
juney-nvidia Oct 22, 2023
d0b56df
fix doc typo. (#114)
nv-guomingz Oct 25, 2023
11e1450
update the batch manager (#152)
Shixiaowei02 Oct 27, 2023
f84d5fe
Add batch manager lib (#221)
sestephens-nv Nov 2, 2023
d8ebeee
patch for commit f84d5fe (#245)
Shixiaowei02 Nov 2, 2023
93174ab
build: Update Windows torch versions (#309)
sestephens-nv Nov 8, 2023
71a5b97
Add Latest News section (#314)
Shixiaowei02 Nov 8, 2023
1f3a421
Add Latest News section (#361)
Shixiaowei02 Nov 13, 2023
7ce7e1d
Add Latest News section (#366)
Shixiaowei02 Nov 13, 2023
73a9ee4
Add Latest News section (#368)
Shixiaowei02 Nov 13, 2023
6837c81
Update the latest news (#379)
kaiyux Nov 14, 2023
a21e2f8
Fix an issue of mpi4py (#475)
Shixiaowei02 Nov 27, 2023
587d063
Update TensorRT-LLM (#506)
kaiyux Nov 30, 2023
119e216
update aarch64 libraries (#525)
Shixiaowei02 Dec 1, 2023
8dd9c91
Update TensorRT-LLM (#539)
kaiyux Dec 4, 2023
9b3e12d
Update TensorRT-LLM (#546)
Shixiaowei02 Dec 4, 2023
b40cfac
Update badge (#552)
kaiyux Dec 4, 2023
0268914
Add batch manager static lib for Windows (#569)
sestephens-nv Dec 6, 2023
59f41c0
Update TensorRT-LLM (#708)
Shixiaowei02 Dec 20, 2023
a8018c1
Fix a docker build error (#719)
Shixiaowei02 Dec 22, 2023
80bc075
Update TensorRT-LLM Release branch (#745)
kaiyux Dec 26, 2023
2f169d1
Add batch manager static lib for Windows (#814)
sestephens-nv Jan 5, 2024
5955b8a
Update TensorRT-LLM Release branch (#1192)
kaiyux Feb 29, 2024
37aee91
Add 0.8 batch manager static lib for Windows (#1202)
tp5uiuc Mar 1, 2024
250d9c2
Update TensorRT-LLM Release branch (#1445)
kaiyux Apr 12, 2024
6533c4e
Update documents for release 0.9 (#1461)
Shixiaowei02 Apr 17, 2024
a9356d4
Fix document (#1462)
kaiyux Apr 17, 2024
9bd15f1
TensorRT-LLM v0.10 update
kaiyux Jun 5, 2024
05316d3
TensorRT-LLM v0.11 Update (#1969)
kaiyux Jul 17, 2024
ab49b93
fix : v0.11 windows docs (#1975)
tp5uiuc Jul 18, 2024
bfc50a7
TensorRT-LLM v0.12 Update (#2164)
Shixiaowei02 Aug 29, 2024
28fb9aa
Add Windows library for release 0.12 (#2165)
Shixiaowei02 Aug 29, 2024
201135e
TensorRT-LLM v0.13 Update (#2269)
Shixiaowei02 Sep 30, 2024
9078696
Add the known issue to windows installation guide (#2275)
pamelap-nvidia Sep 30, 2024
b088016
Update TensorRT-LLM v0.14.0 (#2401)
kaiyux Nov 1, 2024
8f91cff
TensorRT-LLM Release 0.15.0 (#2529)
Shixiaowei02 Dec 4, 2024
713c01c
custom changes to add a new sampling config 'custom'
Dec 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
1 change: 1 addition & 0 deletions .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ PenaltyBreakString: 1000
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 60
PointerAlignment: Left
QualifierAlignment: Right
ReflowComments: true
SeparateDefinitionBlocks: Always
SortIncludes: CaseSensitive
Expand Down
2 changes: 2 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
build
cpp/*build*
cpp/cmake-*
cpp/.ccache
cpp/tests/resources/models
tensorrt_llm/libs
**/__pycache__
Expand Down
3 changes: 3 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
*.a filter=lfs diff=lfs merge=lfs -text
*.lib filter=lfs diff=lfs merge=lfs -text
*.so filter=lfs diff=lfs merge=lfs -text
*.dll filter=lfs diff=lfs merge=lfs -text
116 changes: 116 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
name: "Bug Report"
description: Submit a bug report to help us improve TensorRT-LLM
labels: [ "bug" ]
body:
- type: textarea
id: system-info
attributes:
label: System Info
description: Please share your system info with us.
placeholder: |
- CPU architecture (e.g., x86_64, aarch64)
- CPU/Host memory size (if known)
- GPU properties
- GPU name (e.g., NVIDIA H100, NVIDIA A100, NVIDIA L40S)
- GPU memory size (if known)
- Clock frequencies used (if applicable)
- Libraries
- TensorRT-LLM branch or tag (e.g., main, v0.7.1)
- TensorRT-LLM commit (if known)
- Versions of TensorRT, Modelopt, CUDA, cuBLAS, etc. used
- Container used (if running TensorRT-LLM in a container)
- NVIDIA driver version
- OS (Ubuntu 22.04, CentOS 7, Windows 10)
- Any other information that may be useful in reproducing the bug
validations:
required: true

- type: textarea
id: who-can-help
attributes:
label: Who can help?
description: |
To expedite the response to your issue, it would be helpful if you could identify the appropriate person
to tag using the **@** symbol. Here is a general guideline on **whom to tag**.

Rest assured that all issues are reviewed by the core maintainers. If you are unsure about whom to tag,
you can leave it blank, and a core maintainer will make sure to involve the appropriate person.

Please tag fewer than 3 people.

Quantization: @Tracin

Documentation: @juney-nvidia

Feature request: @ncomly-nvidia

Performance: @kaiyux

Others: @byshiue

placeholder: "@Username ..."

- type: checkboxes
id: information-scripts-examples
attributes:
label: Information
description: 'The problem arises when using:'
options:
- label: "The official example scripts"
- label: "My own modified scripts"

- type: checkboxes
id: information-tasks
attributes:
label: Tasks
description: "The tasks I am working on are:"
options:
- label: "An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)"
- label: "My own task or dataset (give details below)"

- type: textarea
id: reproduction
validations:
required: true
attributes:
label: Reproduction
description: |
Kindly share a code example that demonstrates the issue you encountered. It is recommending to provide a code snippet directly.
Additionally, if you have any error messages, or stack traces related to the problem, please include them here.

Remember to use code tags to properly format your code. You can refer to the
link https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting for guidance on code formatting.

Please refrain from using screenshots, as they can be difficult to read and prevent others from copying and pasting your code.
It would be most helpful if we could reproduce your issue by simply copying and pasting your scripts and codes.

placeholder: |
Steps to reproduce the behavior:

1.
2.
3.

- type: textarea
id: expected-behavior
validations:
required: true
attributes:
label: Expected behavior
description: "Provide a brief summary of the expected behavior of the software. Provide output files or examples if possible."

- type: textarea
id: actual-behavior
validations:
required: true
attributes:
label: actual behavior
description: "Describe the actual behavior of the software and how it deviates from the expected behavior. Provide output files or examples if possible."

- type: textarea
id: additioanl-notes
validations:
required: true
attributes:
label: additional notes
description: "Provide any additional context here you think might be useful for the TensorRT-LLM team to help debug this issue (such as experiments done, potential things to investigate)."
25 changes: 25 additions & 0 deletions .github/workflows/auto_close_inactive_issues.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Ref: https://docs.github.com/en/actions/managing-issues-and-pull-requests/closing-inactive-issues
name: Close inactive issues
on:
schedule:
- cron: "30 1 * * *"

jobs:
stale:
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- uses: actions/stale@v9
with:
days-before-issue-stale: 30
days-before-issue-close: 15
stale-issue-label: "stale"
exempt-issue-labels: ""
stale-issue-message: This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
close-issue-message: "This issue was closed because it has been stalled for 15 days with no activity."
days-before-pr-stale: -1
days-before-pr-close: -1
repo-token: ${{ secrets.GITHUB_TOKEN }}
debug-only: false
41 changes: 38 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,55 @@ __pycache__/
*.cache
*.nsys-rep
.VSCodeCounter
build*/
*.so
cpp/build*
build
!tensorrt_llm/bench/build
!builders/
*.egg-info/
.coverage
*.csv
*.onnx
tmp/
venv/
.venv/
.local/
.hypothesis/
.idea/
dump*/
.trt-internal
*.dot
*.prof
*.log
*.pkl
*.hdf5
*.lock
config.json
/*.svg
cpp/cmake-build-*
cpp/.ccache
tensorrt_llm/bin
tensorrt_llm/libs
tensorrt_llm/bindings.*.so
tensorrt_llm/bindings.pyi
tensorrt_llm/bindings/**/*.pyi
*docs/cpp_docs*
*docs/source/_cpp_gen*
docs/source/llm-api/*.rst
docs/source/llm-api-examples/llm_*.rst
*.swp

# Testing
.coverage.*
results_trt/

# build/debug
*.safetensors
*/tllm_debug/**
*.patch

# Generated files
cpp/include/tensorrt_llm/executor/version.h

# User config files
CMakeUserPresets.json
compile_commands.json
*.bin
7 changes: 6 additions & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
[submodule "3rdparty/cutlass"]
path = 3rdparty/cutlass
url = https://github.com/NVIDIA/cutlass.git
branch = v2.10.0
[submodule "3rdparty/json"]
path = 3rdparty/json
url = https://github.com/nlohmann/json.git
Expand All @@ -12,3 +11,9 @@
[submodule "3rdparty/NVTX"]
path = 3rdparty/NVTX
url = https://github.com/NVIDIA/NVTX.git
[submodule "3rdparty/ucxx"]
path = 3rdparty/ucxx
url = https://github.com/rapidsai/ucxx.git
[submodule "3rdparty/pybind11"]
path = 3rdparty/pybind11
url = https://github.com/pybind/pybind11.git
16 changes: 12 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ repos:
rev: v4.1.0
hooks:
- id: check-added-large-files
exclude: 'cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/cubin'
exclude: |
(?x)^(.*cubin.cpp)$
- id: check-merge-conflict
- id: check-symlinks
- id: detect-private-key
Expand All @@ -33,10 +34,17 @@ repos:
- id: clang-format
types_or: [c++, c, cuda]
exclude: |
(?x)^(
cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/cubin/.*
)$
(?x)^(.*cubin.cpp$ | .*fmha_cubin.h)$
- repo: https://github.com/cheshirekow/cmake-format-precommit
rev: v0.6.10
hooks:
- id: cmake-format
- repo: https://github.com/codespell-project/codespell
rev: v2.2.4
hooks:
- id: codespell
args:
- --skip=".git,3rdparty"
- --exclude-file=examples/whisper/tokenizer.py
- --ignore-words-list=rouge,inout,atleast,strat,nd,subtile,thrid,improbe
exclude: 'tests/llm-test-defs/turtle/test_input_files'
2 changes: 1 addition & 1 deletion 3rdparty/cutlass
Submodule cutlass updated 2222 files
2 changes: 1 addition & 1 deletion 3rdparty/json
Submodule json updated 165 files
1 change: 1 addition & 0 deletions 3rdparty/pybind11
Submodule pybind11 added at f99ffd
1 change: 1 addition & 0 deletions 3rdparty/ucxx
Submodule ucxx added at 5c7451
Loading