#14080: Preprocess weights for Conv2D on Device #16750

sankarmanoj-tt · 2025-01-15T04:54:14Z

Ticket

Problem description

Currently weights preprocessing takes place on the host, on a single thread. This is slow, especially when there is a large weights matrix, and Debug mode is enabled.

What's changed

The weights are loaded to the device in the same format as PyTorch. All other processing, including permute, padding, etc are done on the Device.

Checklist

Post commit CI passes
(For models and ops writers) Full new models tests passes
New/Existing tests provide coverage for changes

sankarmanoj-tt · 2025-02-03T04:55:13Z

@sankarmanoj-tt TODO: Re-enable transpose cast

cfjchu · 2025-02-15T00:00:16Z

ttnn/ttnn/__init__.py

+    HEIGHT_SHARDED_LAYOUT,
+    BLOCK_SHARDED_LAYOUT,
+    WIDTH_SHARDED_LAYOUT,


Please revert. You're not using this and I'd prefer users access enum via TensorMemoryLayout.HEIGHT_SHARDED_LAYOUT

cfjchu · 2025-02-15T00:04:55Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

@@ -58,6 +58,7 @@ def run_conv(
    config_override,
    dilation=1,
    use_shallow_conv_variant=False,
+    transpose_shards=True,  # TODO: Fails when set to False


why? Should this be tracked in a github issue?

It's a bug that needs to be fixed. I've created an issue here to track it.

cfjchu · 2025-02-15T00:08:34Z

ttnn/cpp/ttnn/operations/conv/conv2d/prepare_conv2d_weights.cpp

+template <typename T>
+std::pair<ttnn::Tensor, std::optional<ttnn::Tensor>> prepare_conv_weights_biases_on_device(
+    const ttnn::Tensor& weight_tensor,


There's a lot of conv code that does templating on device, but shouldn't need to be. Are there some plans to clean it up?

What is the correct way to support MeshDevice and Device?

pavlejosipovic · 2025-02-16T16:00:51Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

@@ -1212,7 +1220,7 @@ def test_resnet50_conv_wh_fp32(
 )
 @pytest.mark.parametrize(
    "weights_dtype",
-    [ttnn.bfloat8_b],


revert or intentional change?

pavlejosipovic · 2025-02-16T16:00:59Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

@@ -1354,7 +1362,7 @@ def test_sd_conv(
 )
 @pytest.mark.parametrize(
    "activations_dtype",
-    [ttnn.bfloat16, ttnn.bfloat8_b],


revert or intentional change?

pavlejosipovic · 2025-02-16T16:01:07Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

@@ -1495,7 +1503,7 @@ def test_sd_conv_wh(
 )
 @pytest.mark.parametrize(
    "weights_dtype",
-    [ttnn.bfloat8_b],


revert or intentional change?

pavlejosipovic · 2025-02-16T16:01:41Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

@@ -2007,6 +2020,7 @@ def test_halo_reshard_conv(
    )


+@skip_for_grayskull()


Why is this skiped now on GS?

This fails because the device ops used to prepare the weights don't support FP32 on grayskull.

pavlejosipovic · 2025-02-16T16:02:11Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

@@ -2618,7 +2633,7 @@ def test_conv_for_vanilla_unet(
 )
 @pytest.mark.parametrize(
    "weights_dtype",
-    [ttnn.bfloat8_b, ttnn.bfloat16],


revert? or is this intentional change?

pavlejosipovic · 2025-02-16T16:02:26Z

tests/ttnn/unit_tests/operations/test_new_conv2d.py

@@ -2855,6 +2873,7 @@ def test_shallow_conv_with_tiled_input(device):

 # Tests running conv2d which maps to matmul w/o sharding the input tensor.
 # Output tensor is in DRAM.
+@skip_for_grayskull()


Why is this now skiped on GS?

pavlejosipovic · 2025-02-16T16:07:43Z

On a high level

This change doesn't allow for user to pass in a tensor on device and get it preprocessed instead we have two paths that do this with tensor that has to be on host
New codepath that performs more (but not all preprocessing on device) is not default, but all our tests are using just that? So default code path is not tested and we have two codepaths for the same thing.
This change extends the runtime of our tests on post-commit?
@mywoodstock any thoughts on the above.

sankarmanoj-tt force-pushed the smanoj/conv_device_weights branch from fe919e2 to e00b7ec Compare January 15, 2025 10:53

sankarmanoj-tt force-pushed the smanoj/conv_device_weights branch 2 times, most recently from 6351705 to 7662eba Compare January 29, 2025 13:34

sankarmanoj-tt marked this pull request as ready for review January 30, 2025 12:05

sankarmanoj-tt requested review from a team, ntarafdar, sjameelTT, jaykru-tt, yugi957, jvegaTT, llongTT, nardoTT, ayerofieiev-tt, dmakoviichuk-tt, cfjchu and TT-BrianLiu as code owners January 30, 2025 12:05

llongTT approved these changes Jan 30, 2025

View reviewed changes

sankarmanoj-tt force-pushed the smanoj/conv_device_weights branch from 7f3a9c0 to c5c4540 Compare January 31, 2025 08:38

sankarmanoj-tt force-pushed the smanoj/conv_device_weights branch from c5c4540 to 6c6da4d Compare February 3, 2025 10:12

mywoodstock requested a review from pavlejosipovic February 4, 2025 22:47

sankarmanoj-tt force-pushed the smanoj/conv_device_weights branch 2 times, most recently from a967aee to 644f04b Compare February 5, 2025 17:08

mywoodstock approved these changes Feb 6, 2025

View reviewed changes

sankarmanoj-tt force-pushed the smanoj/conv_device_weights branch from 667a8a4 to 687fc72 Compare February 13, 2025 04:40

cfjchu reviewed Feb 15, 2025

View reviewed changes

pavlejosipovic reviewed Feb 16, 2025

View reviewed changes

sankarmanoj-tt added 26 commits February 18, 2025 22:45

#0: First commit for loading weights on device

a861ac5

#0: WIP Conv device weights

b896d17

#0: WIP Conv device weights

43a168f

#0: Conv device weights

7e708cc

#0: 80% pass for loading weights on device

697e266

#0: Shallow conv support

16d172a

#0: rebase fix

b86db4c

#0: Fix pad by using multicore

7b6a6eb

#0: Fix pad by using multicore

5d117db

#0: Fix OOM for pad

560210d

#0: Fix device weights

0b9b06c

#0: Re-enable tests

8a95097

#0: Re-enable tests

cb9b469

#0: Re-enable tests

df60638

#0: Fix OOM for pad

e4d58f3

#0: Build fix

b0c5e74

#0: Build fix

21a6df7

#0: Re-enable transpose shards for Conv2D Unit Tests

ba7671c

#0: Tests fix

60c493a

#0: Tests fix

4e73a20

#0: Rebase fi

916641e

#0: Tests fix

72c6bbd

#0: Skip weights bfloat8 on grayskull

b22c655

#0: Reverted types

9346113

#0: Add flag for always preprocessing weights

45c376f

#0: Preprocess bias on device

0435cce

sankarmanoj-tt force-pushed the smanoj/conv_device_weights branch from 9f806ae to 0435cce Compare February 18, 2025 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#14080: Preprocess weights for Conv2D on Device #16750

#14080: Preprocess weights for Conv2D on Device #16750

sankarmanoj-tt commented Jan 15, 2025 •

edited

Loading

sankarmanoj-tt commented Feb 3, 2025

cfjchu Feb 15, 2025

sankarmanoj-tt Feb 15, 2025

cfjchu Feb 15, 2025

sankarmanoj-tt Feb 15, 2025

cfjchu Feb 15, 2025

sankarmanoj-tt Feb 15, 2025

pavlejosipovic Feb 16, 2025

pavlejosipovic Feb 16, 2025

pavlejosipovic Feb 16, 2025

pavlejosipovic Feb 16, 2025

sankarmanoj-tt Feb 16, 2025

pavlejosipovic Feb 16, 2025

pavlejosipovic Feb 16, 2025

pavlejosipovic commented Feb 16, 2025

		@@ -2007,6 +2020,7 @@ def test_halo_reshard_conv(
		)


		@skip_for_grayskull()

#14080: Preprocess weights for Conv2D on Device #16750

Are you sure you want to change the base?

#14080: Preprocess weights for Conv2D on Device #16750

Conversation

sankarmanoj-tt commented Jan 15, 2025 • edited Loading

Ticket

Problem description

What's changed

Checklist

sankarmanoj-tt commented Feb 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavlejosipovic commented Feb 16, 2025

sankarmanoj-tt commented Jan 15, 2025 •

edited

Loading