New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

add mean pooling divisor to cuda stream #1863

Closed

zainhuda wants to merge 1 commit into pytorch:main from zainhuda:export-D55945969

zainhuda commented Apr 10, 2024

Summary:
Initial mean pooling implementation was not attending to the appropriate CUDA stream properly with respect to train pipeline. We now register the divisor tensor into the CUDA stream in context.

The key insight:
Tensors used on a different stream than their origin, the memory allocator may reuse the memory unexpectedly.

We also split the callback function into two (create divisor, apply mean pooling). Change the context from holding a callable to divisor tensor instead. This is because recording non tensors into a CUDA stream is non trivial, whereas recording a tensor into a CUDA stream is easily supported. This has no perf regressions from the original implementation nor lack of clarity.

Differential Revision: D55945969

facebook-github-bot added the CLA Signed label

Contributor

facebook-github-bot commented Apr 10, 2024

This pull request was exported from Phabricator. Differential Revision: D55945969

facebook-github-bot added the fb-exported label

Contributor

facebook-github-bot commented Apr 10, 2024

This pull request was exported from Phabricator. Differential Revision: D55945969

zainhuda pushed a commit to zainhuda/torchrec that referenced this pull request


          add mean pooling divisor to cuda stream (pytorch#1863)

d091d11

Summary:
Pull Request resolved: pytorch#1863

Initial mean pooling implementation was not attending to the appropriate CUDA stream properly with respect to train pipeline. We now register the divisor tensor into the CUDA stream in context.

The key insight:
Tensors used on a different stream than their origin, the memory allocator may reuse the memory unexpectedly.

We also split the callback function into two (create divisor, apply mean pooling). Change the context from holding a callable to divisor tensor instead. This is because recording non tensors into a CUDA stream is non trivial, whereas recording a tensor into a CUDA stream is easily supported. This has no perf regressions from the original implementation nor lack of clarity.

Differential Revision: D55945969

zainhuda force-pushed the export-D55945969 branch from c46a991 to d091d11 Compare

April 10, 2024 01:08

Contributor

facebook-github-bot commented Apr 10, 2024

This pull request was exported from Phabricator. Differential Revision: D55945969

zainhuda pushed a commit to zainhuda/torchrec that referenced this pull request


          add mean pooling divisor to cuda stream (pytorch#1863)

0c19087

Summary:
Pull Request resolved: pytorch#1863

Initial mean pooling implementation was not attending to the appropriate CUDA stream properly with respect to train pipeline. We now register the divisor tensor into the CUDA stream in context.

The key insight:
Tensors used on a different stream than their origin, the memory allocator may reuse the memory unexpectedly.

We also split the callback function into two (create divisor, apply mean pooling). Change the context from holding a callable to divisor tensor instead. This is because recording non tensors into a CUDA stream is non trivial, whereas recording a tensor into a CUDA stream is easily supported. This has no perf regressions from the original implementation nor lack of clarity.

Differential Revision: D55945969

zainhuda force-pushed the export-D55945969 branch from d091d11 to 0c19087 Compare

April 10, 2024 01:16

zainhuda pushed a commit to zainhuda/torchrec that referenced this pull request


          add mean pooling divisor to cuda stream (pytorch#1863)

2ca3a1b

Summary:

Initial mean pooling implementation was not attending to the appropriate CUDA stream properly with respect to train pipeline. We now register the divisor tensor into the CUDA stream in context.

The key insight:
Tensors used on a different stream than their origin, the memory allocator may reuse the memory unexpectedly.

We also split the callback function into two (create divisor, apply mean pooling). Change the context from holding a callable to divisor tensor instead. This is because recording non tensors into a CUDA stream is non trivial, whereas recording a tensor into a CUDA stream is easily supported. This has no perf regressions from the original implementation nor lack of clarity.

Reviewed By: joshuadeng

Differential Revision: D55945969

zainhuda force-pushed the export-D55945969 branch from 0c19087 to 2ca3a1b Compare

April 10, 2024 05:33

Contributor

facebook-github-bot commented Apr 10, 2024

This pull request was exported from Phabricator. Differential Revision: D55945969


          add mean pooling divisor to cuda stream (pytorch#1863)

ba83f32

Summary:

Initial mean pooling implementation was not attending to the appropriate CUDA stream properly with respect to train pipeline. We now register the divisor tensor into the CUDA stream in context.

The key insight:
Tensors used on a different stream than their origin, the memory allocator may reuse the memory unexpectedly.

We also split the callback function into two (create divisor, apply mean pooling). Change the context from holding a callable to divisor tensor instead. This is because recording non tensors into a CUDA stream is non trivial, whereas recording a tensor into a CUDA stream is easily supported. This has no perf regressions from the original implementation nor lack of clarity.

Reviewed By: joshuadeng

Differential Revision: D55945969

zainhuda force-pushed the export-D55945969 branch from 2ca3a1b to ba83f32 Compare

April 10, 2024 05:34

Contributor

facebook-github-bot commented Apr 10, 2024

This pull request was exported from Phabricator. Differential Revision: D55945969

facebook-github-bot closed this in

e38728c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported