Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DML EP] Add BFC allocator #16634

Closed
wants to merge 90 commits into from

Conversation

PatriceVignola
Copy link
Contributor

@PatriceVignola PatriceVignola commented Jul 8, 2023

This change adds the BFC allocator to the DML EP in order to reduce peak memory usage and allow bigger models to be loaded in memory (e.g. LLMs). Note that we still need to keep the Bucketized allocator since the WinML API allows for the caller to query D3D12Resources directly, which isn't backward compatible with the way that the BFC allocations work.

The reserved resource logic is similar to what we had in TF. We leverage the existing ORT allocator by creating a tagged pointer which can be incremented arithmetically by ORT, as if it was sequentially allocated memory. When we need to access the memory, we decode the tagged pointer in order to access its allocation id and offset. This allows us to effectively retrieve the appropriate resource and access it at the right offset.

To keep the legacy WinML APIs working, I added a way to detect whether custom ops have been registered at session creation them. I then added an ORT API that allows us to disable the BFC allocator when creating the execution provider, which allows us to use the bucketized allocator instead of the BFC one. This doesn't cause any regressions since the custom op users were already using the Bucketized buffer allocator, and it's a niche scenario anyway.

…origin/user/pavignol/add-bfc-allocator-4

namespace Dml
{
class DmlBfcAllocator : public onnxruntime::IAllocator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DmlBfcAllocator

nit: maybe name DmlResourceAllocator

m_dataInterface = static_cast<IUnknown*>(m_impl->MutableDataRaw());
m_tensorData = m_impl->MutableDataRaw();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats going on here??

What happened to my Shadow Copy!??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this whole logic and this is super outdated code. What this code is trying to do is decoding what the actual pointer is (something that inherits IUnknown or a more specific ID3D12Resource), but we ended up always returning ID3D12Resource objects even for the external operators. Also, this notion of "layout" doesn't seem to make any sense here since all GetShadowCopyIfRequired does is increasing the ref count, but this isn't needed since there's no layout conversion done anywhere.

Maybe @jeffbloo can shed some light here, but from my analysis this all seems to be code that isn't needed anymore.

@PatriceVignola PatriceVignola requested a review from smk2007 August 16, 2023 05:43
Patrice Vignola and others added 2 commits August 16, 2023 06:56
smk2007
smk2007 previously approved these changes Aug 16, 2023
Copy link
Member

@smk2007 smk2007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

#ifdef USE_DML
const bool bothValuesOnGPU = copy_info.source_device.Type() == OrtDevice::GPU && copy_info.target_device.Type() == OrtDevice::GPU;
const bool sourceIsDmlAlloc = copy_info.source_device.MemType() == OrtDevice::MemType::DEFAULT || copy_info.source_device.MemType() == OrtDevice::MemType::DML_EXTERNAL;
const bool targetIsInternalAlloc = copy_info.target_device.MemType() == OrtDevice::MemType::DEFAULT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_is_internal_alloc since this is ORT code 🐫🐍 rather than the DML EP 🐫🐪.


namespace Dml
{

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] extra blank line

@fdwr
Copy link
Contributor

fdwr commented Jul 10, 2024

@PatriceVignola Happy 1 year anniversary on this open CR. Still relevant? 🤔

…origin/user/pavignol/add-bfc-allocator-4
ort_api
);
// Ensure resource is the same
WINML_EXPECT_EQUAL(d3d12_resource, d3d12_resource_from_allocation);

Check warning

Code scanning / PREfast

Potential comparison of a constant with another constant. Warning test

Potential comparison of a constant with another constant.
auto unique_cpu_memory_info = UniqueOrtMemoryInfo(cpu_memory_info, ort_api->ReleaseMemoryInfo);
auto cpu_tensor = CreateTensorFromMemoryInfo(unique_cpu_memory_info.get());
THROW_IF_NOT_OK_MSG(winml_adapter_api->ValueGetDeviceId(cpu_tensor.get(), &device_id), ort_api);
WINML_EXPECT_EQUAL(0, device_id);

Check warning

Code scanning / PREfast

Potential comparison of a constant with another constant. Warning test

Potential comparison of a constant with another constant.
THROW_IF_NOT_OK_MSG(
winml_adapter_api->SessionGetInputRequiredDeviceId(cpu_session.get(), "inputImage", &device_id), ort_api
);
WINML_EXPECT_EQUAL(0, device_id);

Check warning

Code scanning / PREfast

Potential comparison of a constant with another constant. Warning test

Potential comparison of a constant with another constant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants