Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Builder API #23223

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Model Builder API #23223

wants to merge 2 commits into from

Conversation

skottmckay
Copy link
Contributor

Description

Supports creating a model programmatically using the ORT C or C++ API.
Supports augmenting an existing model to add nodes.

TODO: Validation API is feature complete and additional tests.

Motivation and Context

Supports creating a model programmatically using the ORT C or C++ API.
Supports augmenting an existing model to add nodes.
// FUTURE: This will also allow CopyTensors to utilize the IDataTransfer objects
// "0": Disabled. [DEFAULT]
// "1": Enable Model Builder Session
static const char* const kOrtSessionOptionsEnableModelBuilder = "session.model_builder_session";

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable kOrtSessionOptionsEnableModelBuilder is never read.
#include "core/framework/error_code_helper.h"
#include "core/framework/execution_provider.h"
#include "core/session/abi_session_options_impl.h"
// #include "core/session/environment.h"

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
// FUTURE: This will also allow CopyTensors to utilize the IDataTransfer objects
// "0": Disabled. [DEFAULT]
// "1": Enable Model Builder Session
static const char* const kOrtSessionOptionsEnableModelBuilder = "session.model_builder_session";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't currently used. Initially I was thinking it would enable copying inputs/initializers to the correct device AOT, but...

a) that requires knowing where the value would be used, which is dependent on partitioning that happens later and depends on which EPs are enabled so easy to get wrong;
and
b) would be counter-productive if an optimizer wanted to update the initializer as we'd have to copy it back to CPU to do that.

TBD if needed.

@Craigacp
Copy link
Contributor

Is it possible to save out the model from the builder via the C API? It'll be a nice alternative to building ONNX models with raw protobufs in languages which don't have a native ONNX library.

@skottmckay
Copy link
Contributor Author

Is it possible to save out the model from the builder via the C API? It'll be a nice alternative to building ONNX models with raw protobufs in languages which don't have a native ONNX library.

You can use the SessionOption that's typically used to save the optimized ONNX model.

ORT_API2_STATUS(SetOptimizedModelFilePath, _Inout_ OrtSessionOptions* options,
_In_ const ORTCHAR_T* optimized_model_filepath);

Caveat is that does not currently support saving tensors created with CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue, but could be updated to do so if required.

api.ReleaseTensorTypeAndShapeInfo(tensor_type_info); // input_type_info took a copy

// create ValueInfo and release the type info as CreateValueInfo takes a copy.
OrtValueInfo* input_value_info = nullptr;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where to release OrtValueInfo?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems SetGraphInputs() (and SetGraphOutputs()) would take ownership of OrtValueInfos. Resolving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Craigacp
Copy link
Contributor

Craigacp commented Jan 1, 2025

Is it possible to save out the model from the builder via the C API? It'll be a nice alternative to building ONNX models with raw protobufs in languages which don't have a native ONNX library.

You can use the SessionOption that's typically used to save the optimized ONNX model.

ORT_API2_STATUS(SetOptimizedModelFilePath, _Inout_ OrtSessionOptions* options,
_In_ const ORTCHAR_T* optimized_model_filepath);

Caveat is that does not currently support saving tensors created with CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue, but could be updated to do so if required.

Is the optimized model one which has had op fusion and other passes done so it's no longer using ONNX standard ops everywhere, or is that a different process?

@skottmckay
Copy link
Contributor Author

Is the optimized model one which has had op fusion and other passes done so it's no longer using ONNX standard ops everywhere, or is that a different process?

You can specify the optimization level. If you keep it to level 1 (GraphOptimizationLevel.ORT_ENABLE_BASIC) it will only use standard ONNX ops.

typedef enum GraphOptimizationLevel {
ORT_DISABLE_ALL = 0,
ORT_ENABLE_BASIC = 1,

/** \brief Set the optimization level to apply when loading a graph
*
* Please see https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html for an in-depth explanation
* \param[in,out] options The session options object
* \param[in] graph_optimization_level The optimization level
*
* \snippet{doc} snippets.dox OrtStatus Return Value
*/
ORT_API2_STATUS(SetSessionGraphOptimizationLevel, _Inout_ OrtSessionOptions* options,
GraphOptimizationLevel graph_optimization_level);

https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html

huningxin added a commit to huningxin/chromium-src that referenced this pull request Jan 2, 2025
huningxin added a commit to huningxin/chromium-src that referenced this pull request Jan 2, 2025
mingmingtasd pushed a commit to shiyi9801/chromium that referenced this pull request Jan 2, 2025
* Pass ORT_API_VERSION to `OrtApiBase::GetApi()`

Also removes the inclusion of onnx.pb.h header.

* Add third_party/onnxruntime_headers

Import https://github.com/microsoft/onnxruntime/tree/main/include

Commit is based on microsoft/onnxruntime#23223

* Use ORT Model Builder API

* Refactor scoped ORT type ptr

1. Rename to ScopedOrtTypePtr
2. Use macros
3. Introduce `operator T*()`
4. Introduce `Release()` method
5. Rename `get_ptr()` to `Get()`
6. Rename `get_pptr()` to `GetAddressOf()`

* Remove ONNX Runtime headers from third_party/microsoft_dxheaders
if (attributes != nullptr) {
n->attributes.reserve(attribs_len);
for (size_t i = 0; i < attribs_len; ++i) {
n->attributes.push_back(*reinterpret_cast<const ONNX_NAMESPACE::AttributeProto*>(attributes[i]));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call ReleaseOpAttr after it's copied into the node so the user doesn't have to? Would be more consistent with the rest of the API to 'take ownership' of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question applies to the CreateXXXTypeInfo/CreateValueInfo calls.

Those felt a little more re-usable (e.g. if you were constructing a model with KV cache you're going to be using the same TypeInfo for multiple inputs/outputs.) but maybe it's better overall to have a consistent pattern of ownership transferring when you add to a containing class instead of taking a copy in some places.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those felt a little more re-usable (e.g. if you were constructing a model with KV cache you're going to be using the same TypeInfo for multiple inputs/outputs.)

OpAttr might also be re-used?

but maybe it's better overall to have a consistent pattern of ownership transferring when you add to a containing class instead of taking a copy in some places.

I feel the consistency is for those AddXxxToXxx() methods who do ownership transferring? SetGraphInputs()/SetGrapOutputs() also take ownership (I was not aware of it when I initially used the API), should they be renamed to AddInputsToGraph()/AddOutputsToGraph()?

* Pre-existing memory:
* Use CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue to create an OrtValue
* with a tensor that contains a pointer to the existing data.
* User must keep pointer valid for lifetime of the inference session.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true if using CreateTensorWithDataAndDeleterAsOrtValue()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ownership transfers to ORT so the pointer must remain valid, but in that case the user shouldn't be freeing the memory at any point. Will update the comment to clarify.

ORT_API(const OrtModelBuilderApi*, GetModelBuilderApi);

ORT_API_STATUS_IMPL(CreateTensorWithDataAndDeleterAsOrtValue, _In_ OrtAllocator* deleter,
_In_ void* p_data, size_t p_data_len,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would p_data be written by ORT? Or should it take const void* if it is only read by ORT? A similar question is for CreateTensorWithDataAsOrtValue() which takes _Inout_ void* p_data.

https://github.com/microsoft/onnxruntime/blob/skottmckay/ModelBuilderAPI_PR/include/onnxruntime/core/session/onnxruntime_c_api.h#L1359C83-L1359C103

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ORT wouldn't write to p_data itself, but a user could use update the data in the OrtValue returned using the ORT API (e.g. call GetTensorMutableData and make changes).

@@ -27,6 +27,7 @@
#include "core/common/span_utils.h"
#include "core/common/status.h"
#include "core/common/logging/logging.h"
#include "core/framework/ort_value.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It introduces a circular dependency between onnxruntime_graph and onnxruntime_framework. ort_value is a core concept in onnxruntime_framework, which also depends on MemoryInfo, Allocators, etc. It means that the lifetime of a graph will be bound to an allocator. Furthermore, people may ask if the OrtValue can be put on GPU devices, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That dependency already existed with InjectExternalInitializedTensors in this file. And there are lots of places in the graph code where we use types from the framework library. If we want to fix that we might need to limit the graph library to fairly pure ONNX related types, and have ORT things built on top of those in the framework library. e.g. you'd have an ONNX Graph class as well as an onnxruntime Graph class, and things like OrtValue usage would be in the latter.

Long term I think it would be better to convert initializers to OrtValue when loading from the ONNX model so we detach from the protobuf types asap. There are many reasons for doing so. Having to add things like InjectExternalInitializedTensors to efficiently manage memory is a good sign the current setup isn't working well.

Can you elaborate on how the lifetime of the Graph is bound to an allocator? The OrtValue instances internally have a Tensor instance where the deleter is in a shared_ptr, so I would have thought the Graph instance can go away at any time, and the shared_ptr for the allocator in the Tensor deleter would also keep the allocator alive for as long as needed.

The problem I'm trying to address is that there's pre-existing memory where we want to transfer ownership to ORT. e.g. to free CPU based memory if we copy it to GPU. Because we have protobuf based initializers there's no way to attach the deleter to them, and the ORT API deals in OrtValue. So this was the best option I could find to essentially pass through that OrtValue to session state finalization.

The OrtValue could theoretically be on GPU. If you did that you could avoid a copy (if you knew for sure the value would be used on GPU) but you'd break the current setup with optimizers as they expect initializer data to be on CPU. Not clear we want to allow that.

Copy link
Member

@snnn snnn Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tensor cannot live longer than the allocator that allocated the buffer.
An allocator cannot live longer than the corresponding EP(e.g. CUDA EP). Because the EP needs to manage a lot of handlers, and the allocator may need to use a device handler to do malloc/free. All such handlers get destroyed when the EP is destroyed.
That could make things complicated. For example, in InferenceSession class, we have:

  std::shared_ptr<onnxruntime::Model> model_;

  // The file path of where the model was loaded. e.g. /tmp/test_squeezenet/model.onnx
  PathString model_location_;

  // The list of execution providers.
  ExecutionProviders execution_providers_;

  //...
  std::unique_ptr<SessionState> session_state_;

The model_ variable contains a graph, which contains OrtValues, which should be deleted before the execution_providers_ . But they are not ordered in that way. We had similar issues with "execution_providers_" and "session_state_". So, this is very subtle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. So whilst the Tensor has a shared_ptr for the allocator, if the allocator depends on internals of the EP, and the EP goes away, it breaks due to that?

And if we add OrtValue to Graph, which is in InferenceSession::model_, which will be released after execution_providers_ it may break?

Should execution_providers_ therefore be declared prior to model_ in InferenceSession?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.
However, the code is ok as for now if the graph's OrtValues only use CPU allocators which are relatively simple.

* If using CreateTensorWithDataAsOrtValue you must keep the pointer valid for lifetime of the inference session.
* Set `data_is_external` to true.
*
* Allocated memory:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to our testing, some ops seem to require shape-inference-related initializers being in allocated memory, including:

  • Reshape's shape
  • Reduce's axes
  • Expand's shape
  • Slice's starts, ends and steps

If using pre-existing memory, there will be shape inference error, e.g.

[ShapeInferenceError] Cannot parse data from external tensors. Please load external data into raw data for tensor: x

If that is the case, it would be helpful to be documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general approach, there's about 60 bytes or so of overhead to use the external memory structure for pre-allocated memory, so if the value is less than say 128 bytes you're probably better off using allocated memory.

I think doing so could almost guaranteed that shape inferencing isn't going to break as I can't think of an input that shape inferencing would read that would have 128 bytes of data (16x int64_t dimension or indices values).

If it seems reasonable to do, we could enforce that pre-allocated data is a minimum size of 128 to reduce the chance of a user hitting a shape inferencing error, and document any edge cases we find in the ONNX ops where shape inferencing fails as that would be a much smaller set of operators if any.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it seems reasonable to do, we could enforce that pre-allocated data is a minimum size of 128 to reduce the chance of a user hitting a shape inferencing error, and document any edge cases we find in the ONNX ops where shape inferencing fails as that would be a much smaller set of operators if any.

SGTM, that would be very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants