diff --git a/docs/migrating/v2.mdx b/docs/migrating/v2.mdx index c0d3a8d2..2c22484f 100644 --- a/docs/migrating/v2.mdx +++ b/docs/migrating/v2.mdx @@ -3,6 +3,29 @@ title: Migrating from v1.x to v2 description: Here's what you need to know to update your application to use the latest & greatest version of ort. --- +## `Environment` no more +The `Environment` struct has been removed. Only one `Environment` was allowed per process, so it didn't really make sense to have an environment as a struct. + +To configure an `Environment`, you instead use the `ort::init` function, which returns the same `EnvironmentBuilder` as v1.x. Use `commit()` to then commit the environment. + +```rust +ort::init() + .with_execution_providers([CUDAExecutionProvider::default().build()]) + .commit()?; +``` + +`commit()` must be called before any sessions are created to take effect. Otherwise, the environment will be set to default and cannot be modified afterwards. + +## Session creation +`SessionBuilder::new(&environment)` has been soft-replaced with `Session::builder()`: + +```diff +-// v1.x +-let session = SessionBuilder::new(&environment)?.with_model_from_file("model.onnx")?; ++// v2 ++let session = Session::builder()?.with_model_from_file("model.onnx")?; +``` + ## Session inputs ### `CowArray`/`IxDyn`/`ndarray` no longer required @@ -17,7 +40,7 @@ One of the biggest usability changes is that the usual pattern of `CowArray::fro -// v1.x -let mut tokens = CowArray::from(Array1::from_iter(tokens.iter().cloned()).into_dyn()); +// v2 -+let mut tokens = Array1::from_iter(tokens.iter().cloned())); ++let mut tokens = Array1::from_iter(tokens.iter().cloned()); ``` It should be noted that there are some cases in which an array is cloned when converting into a `Value` which may lead to a surprising performance hit. ONNX Runtime does not expose an API to specify the strides of a tensor, so if an array is reshaped before being transformed into a `Value`, it must be cloned in order to make the data contiguous. Specifically: @@ -62,7 +85,7 @@ In previous versions, `Value::from_array` took an allocator parameter. The alloc ```diff -// v1.x -let val = Value::from_array(session.allocator(), &array)?; -+// v2.0 ++// v2 +let val = Value::from_array(&array)?; ``` @@ -86,7 +109,7 @@ v2.0 adds the groundwork for supporting sequence and maps types in `Value`s, so ```diff -// v1.x -let output1 = outputs[0].try_extract::()?; -+// v2.0 ++// v2 +let output1 = outputs[0].extract_tensor::()?; ``` @@ -97,6 +120,33 @@ Just like how inputs can now be specified by name, you can now retrieve session let l = outputs["latents"].extract_tensor::()?; ``` +## Execution providers +Execution provider structs with public fields have been replaced with builder pattern structs. See the [API reference](https://docs.rs/ort/2.0.0-alpha.1/ort/index.html?search=ExecutionProvider) and the [execution providers reference](/perf/execution-providers) for more information. + +```diff +-// v1.x +-builder = builder.with_execution_providers(ExecutionProvider::DirectML(DirectMLExecutionProvider { +- device_id: 1 +-}))?; ++// v2 ++builder = builder.with_execution_providers([ ++ DirectMLExecutionProvider::default() ++ .with_device_id(1) ++ .build() ++])?; +``` + +## Updated dependencies & features + +### `ort::sys` has been split into `ort-sys` +The `ort::sys` module has been split out into [its own `ort-sys` crate](https://crates.io/crates/ort-sys). If you don't use `ort::sys` in your application, this won't affect you. + +### `ndarray` is now optional +The dependency on `ndarray` is now declared optional. If you use `ort` with `default-features = false`, you'll need to add the `ndarray` feature. + +## The Flattening +All modules except `download` are no longer public. Exports have been flattened to the crate root, so i.e. `ort::session::Session` becomes `ort::Session`. + ## Renamed types The following types have been renamed with no other changes. - `NdArrayExtensions` -> `ArrayExtensions` diff --git a/docs/migrating/version-mapping.mdx b/docs/migrating/version-mapping.mdx index 6422141a..ad5abae6 100644 --- a/docs/migrating/version-mapping.mdx +++ b/docs/migrating/version-mapping.mdx @@ -6,7 +6,7 @@ description: Information about `ort`'s versioning and relation to ONNX Runtime v ## A note on SemVer `ort` versions pre-2.0 were not SemVer compatible. From v2.0 onwards, breaking API changes are accompanied by a **major version update**. -Updates to the version of ONNX Runtime used by `ort` may occur on **minor** version updates, i.e. 2.0 ships with ONNX Runtime 1.16.2, but 2.1 may ship with 1.17.0. In case you require a specific version of ONNX Runtime, you should pin the minor version in your `Cargo.toml` using a [tilde requirement](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#tilde-requirements): +Updates to the version of ONNX Runtime used by `ort` may occur on **minor** version updates, i.e. 2.0 ships with ONNX Runtime 1.16.2, but 2.1 may ship with 1.17.0. ONNX Runtime is generally forward compatible, but in case you require a specific version of ONNX Runtime, you should pin the minor version in your `Cargo.toml` using a [tilde requirement](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#tilde-requirements): ```toml [dependencies] ort = { version = "~2.0", ... } diff --git a/docs/perf/execution-providers.mdx b/docs/perf/execution-providers.mdx index d00cc0d0..2d8d8d3c 100644 --- a/docs/perf/execution-providers.mdx +++ b/docs/perf/execution-providers.mdx @@ -2,3 +2,158 @@ title: Execution providers description: Learn how to enable execution providers to leverage hardware acceleration. --- + +Execution providers (EPs) enable ONNX Runtime to execute ONNX graphs with hardware acceleration. If you have specialized hardware like a GPU or NPU, execution providers can provide a massive performance boost to your `ort` applications. For more information on the intricacies of execution providers, see the [ONNX Runtime docs](https://onnxruntime.ai/docs/execution-providers/). + +ONNX Runtime must be compiled with support for each execution provider. pyke provides precompiled binaries for some of the most common EPs. Below is a table showing available EPs, their support in `ort`, and their binary availability status. + +| EP | Supported | Binaries | Static linking +|:-------- |:------- |:------ |:------ | +| NVIDIA CUDA | 🟢 | 🟢 | ❌ | +| NVIDIA TensorRT | 🟢 | 🟢 | ❌ | +| Microsoft DirectML | 🟢 | 🟢 | 🟢 | +| Apple CoreML | 🟢 | 🟢 | 🟢 | +| AMD ROCm | 🟢 | ❌ | ❌ | +| Intel OpenVINO | 🟢 | ❌ | ❓ | +| Intel oneDNN | 🟢 | ❌ | ❓ | +| XNNPACK | 🟢 | 🟢 | 🟢 | +| Qualcomm QNN | 🟢 | ❌ | ❓ | +| Huawei CANN | 🟢 | ❌ | ❓ | +| Android NNAPI | 🟢 | ❌ | ❓ | +| Apache TVM | 🟢 | ❌ | ❓ | +| Arm ACL | 🟢 | ❌ | ❓ | +| ArmNN | 🟢 | ❌ | ❓ | +| AMD MIGraphX | ❌ | ❌ | ❓ | +| AMD Vitis AI | ❌ | ❌ | ❓ | +| Microsoft Azure | ❌ | ❌ | ❓ | +| Rockchip RKNPU | ❌ | ❌ | ❓ | + +Some EPs supported by ONNX Runtime are not supported by `ort` due to a lack of hardware for testing. If your preferred EP is missing support and you've got the hardware, please [open an issue](https://github.com/pykeio/ort/issues/new)! + +## Registering execution providers +In order to use an execution provider, you must **register** it before creating an environment or session. You can do this via the `SessionBuilder::with_execution_providers` method. For example, to register the CUDA execution provider: + +```rust +use ort::{CUDAExecutionProvider, Session}; + +fn main() -> anyhow::Result<()> { + let session = Session::builder()? + .with_execution_providers([CUDAExecutionProvider::default().build()])? + .with_model_from_file("model.onnx")?; + + Ok(()) +} +``` + +You can, of course, specify multiple execution providers. `ort` will register all EPs specified, in order. If registration of an EP succeeds, `ort` will still continue to register the rest of the EPs so that ONNX Runtime can fallback to other EPs in case an EP does not support a certain operator. + +```rust +use ort::{CoreMLExecutionProvider, CUDAExecutionProvider, DirectMLExecutionProvider, TensorRTExecutionProvider, Session}; + +fn main() -> anyhow::Result<()> { + let session = Session::builder()? + .with_execution_providers([ + // Prefer TensorRT over CUDA. + TensorRTExecutionProvider::default().build(), + CUDAExecutionProvider::default().build(), + // Use DirectML on Windows if NVIDIA EPs are not available + DirectMLExecutionProvider::default().build(), + // Or use ANE on Apple platforms + CoreMLExecutionProvider::default().build() + ])? + .with_model_from_file("model.onnx")?; + + Ok(()) +} +``` + +## Configuring EPs +EPs have configuration options to control behavior or increase performance. Each `XXXExecutionProvider` struct returns a builder with configuration methods. See the [API reference](https://docs.rs/ort/2.0.0-alpha.1/ort/index.html?search=ExecutionProvider) for the EP structs for more information on which options are supported and what they do. + +```rust +use ort::{CoreMLExecutionProvider, Session}; + +fn main() -> anyhow::Result<()> { + let session = Session::builder()? + .with_execution_providers([ + CoreMLExecutionProvider::default() + // this model uses control flow operators, so enable CoreML on subgraphs too + .with_subgraphs() + // only use the ANE as the CoreML CPU implementation is super slow for this model + .with_ane_only() + .build() + ])? + .with_model_from_file("model.onnx")?; + + Ok(()) +} +``` + +## Fallback behavior +`ort` will silently fail and fall back to executing on the CPU if all execution providers fail to register. In many cases, though, you'll want to show the user an error message when an EP fails to register, or outright abort the process. + +If you need to know the status of an EP registration, use `ExecutionProvider::register` on a `SessionBuilder`. + +```rust +use ort::{CUDAExecutionProvider, ExecutionProvider, Session}; + +fn main() -> anyhow::Result<()> { + let builder = Session::builder()?; + + let cuda = CUDAExecutionProvider::default(); + if cuda.register(&builder).is_err() { + eprintln!("Failed to register CUDA!"); + std::process::exit(1); + } + + let session = builder.with_model_from_file("model.onnx")?; + + Ok(()) +} +``` + +You can also check whether ONNX Runtime is even compiled with support for the execution provider with the `is_available` method. + +```rust +use ort::{CoreMLExecutionProvider, ExecutionProvider, Session}; + +fn main() -> anyhow::Result<()> { + let builder = Session::builder()?; + + let coreml = CoreMLExecutionProvider::default(); + if !coreml.is_available() { + eprintln!("Please compile ONNX Runtime with CoreML!"); + std::process::exit(1); + } + + let session = builder.with_model_from_file("model.onnx")?; + + Ok(()) +} +``` + +## Global defaults +You can configure `ort` to attempt to register a list of execution providers for all sessions created in an environment. + +```rust +use ort::{CUDAExecutionProvider, Session}; + +fn main() -> anyhow::Result<()> { + ort::init() + .with_execution_providers([CUDAExecutionProvider::default().build()]) + .commit()?; + + let session = Session::builder()?.with_model_from_file("model.onnx")?; + // The session will attempt to register the CUDA EP + // since we configured the environment default. + + Ok(()) +} +``` + +If a session is built `with_execution_providers` after configuring global defaults, the EPs provided by the session will be registered *after* the environment defaults. It does not override the environment EPs. + +Note that `ort::init` must come before you create any sessions, otherwise the configuration will not take effect! + +## Troubleshooting +If it seems like the execution provider is not registering properly, or you are not getting acceptable performance, see the [Troubleshooting: Performance](/troubleshooting/performance) page for more information on how to debug any EP issues. diff --git a/docs/setup/platforms.mdx b/docs/setup/platforms.mdx index 12526e4d..d007ac1b 100644 --- a/docs/setup/platforms.mdx +++ b/docs/setup/platforms.mdx @@ -1,35 +1,27 @@ --- title: Platform support +description: ONNX Runtime, and by extension `ort`, supports a wide variety of platforms. For most desktop users, pre-built binaries are available, so setting up `ort` is as simple as adding it to your `Cargo.toml`! --- -ONNX Runtime, and by extension `ort`, supports a wide variety of platforms. For some platforms, pre-built binaries are available, so setting up `ort` is as simple as adding it to your `Cargo.toml`! - -Binaries may be provided by Microsoft, pyke, or both. Binaries provided by Microsoft are downloaded directly from ONNX Runtime's CI pipeline, though they may include telemetry. Binaries provided by pyke do not include this telemetry. - Here are the supported platforms and binary availability status, as of v2.0. -* 🟢 - Supported. Binaries provided by both Microsoft and pyke. -* 🔷 - Supported. Binaries provided by pyke. -* 🟨 - Supported. No binaries available. -* ❌ - Not supported. No binaries available. - -| Platform | Dynamic | Static | -|:-------- |:------- |:------ | -| **Windows** | 🟢 x86/x64, ARM/ARM64 | 🔷 x64, ARM64 | -| **Linux** | 🟢 x86/x64, ARM/ARM64 | 🔷 x64, ARM64 | -| **macOS** | 🟢 x64, ARM64 | 🔷 x64, ARM64 | -| **WebAssembly** | ❌ | 🔷 | -| **iOS** | 🟨 | 🟨 | -| **Android** | 🟨 | 🟨 | +* 🟢 - Supported. Dynamic & static binaries provided by pyke. +* 🔷 - Supported. Static binaries provided by pyke. +* 🟨 - Supported. Dynamic binaries provided by pyke. +* ⭕ - Supported. Precompiled binaries not available. +* ❌ - Not supported. + +| Platform | x86 | x86-64 | ARMv7 | ARM64 | WASM32 | +|:-------- |:------- |:------ |:------ |:------ |:------ | +| **Windows** | ⭕ | 🟢 | ⭕ | 🔷 | ❌ | +| **Linux** | ⭕ | 🟢 | ⭕ | 🔷 | ❌ | +| **macOS** | ❌ | 🔷 | ❌ | 🔷 | ❌ | +| **iOS** | ❌ | ❌ | ❌ | ⭕ | ❌ | +| **Android** | ❌ | ❌ | ⭕ | ⭕ | ❌ | +| **Web** | ❌ | ❌ | ❌ | ❌ | 🔷 | Remember that "Dynamic" linking means ONNX Runtime will come as a DLL or `.so` alongside your application. "Static" means ONNX Runtime is linked directly into your application in a single file. If available, `ort` will always prefer linking to static binaries over dynamic binaries. -Currently, the default behavior in `ort` is to only download Microsoft binaries. To enable pyke binaries, enable the `"pyke-parcel"` feature in your `Cargo.toml`. -```toml -[dependencies] -ort = { version = "2.0", features = [ "pyke-parcel" ] } -``` - -Certain execution providers may or may not have binaries available; check EP support in the [execution providers](/perf/execution-providers) documentation. +Certain execution providers may not have binaries available. Most EPs only work with dynamic linking. You can check EP binary support in the [execution providers](/perf/execution-providers) documentation. For platforms marked as 🟨, you'll need to [compile ONNX Runtime from source](https://onnxruntime.ai/docs/build/). diff --git a/src/lib.rs b/src/lib.rs index 97627a81..f1aaf491 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -36,13 +36,7 @@ pub use self::environment::{init, EnvironmentBuilder}; #[cfg_attr(docsrs, doc(cfg(feature = "fetch-models")))] pub use self::error::FetchModelError; pub use self::error::{Error, ErrorInternal, Result}; -pub use self::execution_providers::{ - ACLExecutionProvider, ArenaExtendStrategy, ArmNNExecutionProvider, CANNExecutionProvider, CANNExecutionProviderImplementationMode, - CANNExecutionProviderPrecisionMode, CPUExecutionProvider, CUDAExecutionProvider, CUDAExecutionProviderCuDNNConvAlgoSearch, CoreMLExecutionProvider, - DirectMLExecutionProvider, ExecutionProviderDispatch, NNAPIExecutionProvider, OneDNNExecutionProvider, OpenVINOExecutionProvider, QNNExecutionProvider, - QNNExecutionProviderPerformanceMode, ROCmExecutionProvider, TVMExecutionProvider, TVMExecutorType, TVMTuningType, TensorRTExecutionProvider, - XNNPACKExecutionProvider -}; +pub use self::execution_providers::*; pub use self::io_binding::IoBinding; pub use self::memory::{AllocationDevice, Allocator, MemoryInfo}; pub use self::metadata::ModelMetadata;