Embedding grouper distinguish between prefetched vs non-prefetched table #1859

levythu · 2024-04-09T01:31:02Z

Summary:
If prefetch_pipeline is as fused parameter, training pipeline will try to call prefetch() in a separate stream one batch ahead of time. This process, is unfortunately consuming lots of extra memory. Practically it consumes 8~9x the size of input tensor at peak.

Therefore, we wish to minimize the input tensor size to prefetch() call as much as possible. To achieve that, we don't want to mix tables that require prefetch and doesn't to be grouped to the same TBE.

This diff will not change behavior for any jobs without cached embedding offloading.

For embedding-offloaded jobs, this diff will slightly decrease the performance of TBE lookup as it result in more TBEs (and subsequently more kernels in forward and backward). but greatly increase the memory efficiency.

Differential Revision: D55901328

facebook-github-bot · 2024-04-09T01:31:12Z

This pull request was exported from Phabricator. Differential Revision: D55901328

…ble (pytorch#1859) Summary: If `prefetch_pipeline` is as fused parameter, training pipeline will try to call `prefetch()` in a separate stream one batch ahead of time. This process, is unfortunately consuming lots of extra memory. Practically it consumes 8~9x the size of input tensor at peak. Therefore, we wish to minimize the input tensor size to `prefetch()` call as much as possible. To achieve that, we don't want to mix tables that require prefetch and doesn't to be grouped to the same TBE. This diff will not change behavior for any jobs without cached embedding offloading. For embedding-offloaded jobs, this diff will slightly decrease the performance of TBE lookup as it result in more TBEs (and subsequently more kernels in forward and backward). but greatly increase the memory efficiency. Differential Revision: D55901328

facebook-github-bot · 2024-04-22T03:56:42Z

This pull request was exported from Phabricator. Differential Revision: D55901328

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 9, 2024

facebook-github-bot added the fb-exported label Apr 9, 2024

levythu force-pushed the export-D55901328 branch from f73b151 to b4549e5 Compare April 22, 2024 03:56

facebook-github-bot closed this in 819ecf3 Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedding grouper distinguish between prefetched vs non-prefetched table #1859

Embedding grouper distinguish between prefetched vs non-prefetched table #1859

levythu commented Apr 9, 2024

facebook-github-bot commented Apr 9, 2024

facebook-github-bot commented Apr 22, 2024

Embedding grouper distinguish between prefetched vs non-prefetched table #1859

Embedding grouper distinguish between prefetched vs non-prefetched table #1859

Conversation

levythu commented Apr 9, 2024

facebook-github-bot commented Apr 9, 2024

facebook-github-bot commented Apr 22, 2024