Skip to content

Releases: triton-inference-server/model_navigator

Triton Model Navigator v0.13.1

03 Jan 11:52
Compare
Choose a tag to compare
  • Updates:
    • fix: Add AutocastType to public API

Triton Model Navigator v0.13.0

06 Dec 14:35
Compare
Choose a tag to compare
  • Updates:
    • new: Introducing custom_args in TensorConfig for custom runners to use which
      allows dynamic shapes setup for TorchTensorRT compilation
    • new: autocast_dtype added Torch runner configuration to set the dtype for autocast
    • new: New version of Onnx Runtime 1.20 for python version >= 3.10
    • new: Use torch.compile path in heuristic search for max batch size
    • change: Removed TensorFlow dependencies for nav.jax.optimize
    • change: Removed PyTorch dependencies from nav.profile
    • change: Collect all Python packages in status instead of filtered list
    • change: Use default throughput cutoff threshold for max batch size heuristic when None provided in configuration
    • change: Updated default ONNX opset to 20 for Torch >= 2.5
    • fix: Exception is raised with Python >=3.11 due to wrong dataclass initialization
    • fix: Removed option from ExportOption removed from Torch 2.5
    • fix: Improved preprocessing stage in Torch based runners
    • fix: Warn when using autocast with bfloat16 in Torch
    • fix: Pass runner configuration to runners in nav.profile

Triton Model Navigator v0.12.0

10 Sep 12:27
Compare
Choose a tag to compare
  • Updates:

    • new: simple and detailed reporting of the optimization process
    • new: adjusted exporting TensorFlow SavedModel for Keras 3.x
    • new: inform user when wrapped a module which is not called during optimize
    • new: inform user when module use a custom forward function
    • new: support for dynamic shapes in Torch ExportedProgram
    • new: use ExportedProgram for Torch-TensorRT conversion
    • new: support back-off policy during profiling to avoid reporting local minimum
    • new: automatically scale conversion batch size when modules have different batch sizes in scope of a single pipeline
    • change: TensorRT conversion max batch size search rely on saturating throughput for base formats
    • change: adjusted profiling configuration for throughput cutoff search
    • change: include optimized pipeline to list of examined variants during nav.profile
    • change: performance is not executed when correctness failed for format and runtime
    • change: verify command is not executed when verify function is not provided
    • change: do not create a model copy before executing torch.compile
    • fix: pipelines sometimes obtain model and tensors on different devices during nav.profile
    • fix: extract graph from ExportedProgram for running inference
    • fix: runner configuration not propagated to pre-processing steps
  • Version of external components used during testing:

Triton Model Navigator v0.11.0

05 Aug 12:44
Compare
Choose a tag to compare
  • Updates:

    • new: Python 3.12 support
    • new: Improved logging
    • new: optimized in-place module can be stored to Triton model repository
    • new: multi-profile support for TensorRT model build and runtime
    • new: measure duration of each command executed in optimization pipeline
    • new: TensorRT-LLM model store generation for deployment on Triton Inference Server
    • change: filter unsupported runners instead of raising an error when running optimize
    • change: moved JAX to support to experimental module and limited support
    • change: use autocast=True for Torch based runners
    • change: use torch.inference_mode or torch.no_grad context in nav.profile measurements
    • change: use multiple strategies to select optimized runtime, defaults to [MaxThroughputAndMinLatencyStrategy, MinLatencyStrategy]
    • change: trt_profiles are not set automatically for module when using nav.optimize
    • fix: properly revert log level after torch onnx dynamo export
  • Version of external components used during testing:

Triton Model Navigator v0.10.1

26 Jun 19:23
Compare
Choose a tag to compare

Triton Model Navigator v0.10.0

24 Jun 12:49
Compare
Choose a tag to compare
  • Updates:

    • new: inplace nav.Module accepts batching flag which overrides a config setting and precision which allows setting appropriate configuration for TensorRT
    • new: Allow to set device when loading optimized modules using nav.load_optimized()
    • new: Add support for custom i/o names and dynamic shapes in Torch ONNX Dynamo path
    • new: Added nav.bundle.save and nav.bundle.load to save and load optimized models from cache
    • change: Improved optimize and profile status in inplace mode
    • change: Improved handling defaults for ONNX Dynamo when executing nav.package.optimize
    • fix: Maintaining modules device in nav.profile()
    • fix: Add support for all precisions for TensorRT in nav.profile()
    • fix: Forward method not passed to other inplace modules.
  • Version of external components used during testing:

Triton Model Navigator v0.9.0

07 May 18:21
Compare
Choose a tag to compare
  • Updates:

    • new: TensorRT Timing Tactics Cache Management - using timing tactics cache files for optimization performance improvements
    • new: Added throughput saturation verification in nav.profile() (enabled by default)
    • new: Allow to override Inplace cache dir through MODEL_NAVIGATOR_DEFAULT_CACHE_DIR env variable
    • new: inplace nav.Module can now receive a function name to be used instead of call in modules/submodules, allows customizing modules with non-standard calls
    • fix: torch dynamo export and torch dynamo onnx export
    • fix: measurement stabilization in nav.profile()
    • fix: inplace inference through Torch
    • fix: trt_profiles argument handling in ONNX to TRT conversion
    • fix: optimal shape configuration for batch size in Inplace API
    • change: Disable TensorRT profile builder
    • change: nav.optimize() does not override module configuration
  • Known issues and limitations

    • DistillERT ONNX dynamo export does not support dynamic shapes
  • Version of external components used during testing:

Triton Model Navigator v0.8.1

04 Apr 14:04
Compare
Choose a tag to compare
  • fix: Inference with TensorRT when model has input with empty shape
  • fix: Using stabilized runners when model has no batching
  • fix: Invalid dependencies for cuDNN - review known issues
  • fix: Make ONNX Graph Surgeon produce artifacts within protobuf Limit (2G)
  • change: Remove TensorRTCUDAGraph from default runners
  • change: updated ONNX package to 1.16.0

Triton Model Navigator v0.8.0

22 Mar 16:23
Compare
Choose a tag to compare

Updates:

Triton Model Navigator v0.7.7

09 Feb 05:43
Compare
Choose a tag to compare

Updates:

  • change: Add input and output specs for Triton model repositories generated from packages

Version of external components used during testing: