Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Arithmetic strategy for Dense, Conv1/2D, and EinsumDense #1191

Open
wants to merge 79 commits into
base: main
Choose a base branch
from

Conversation

calad0i
Copy link
Contributor

@calad0i calad0i commented Feb 11, 2025

Description

This PR introduces a new strategy, distributed_arithmetic for

  • Dense (io parallel / stream)
  • Conv1/2D (io parallel / stream)
  • EinsumDense (io parallel)

With this strategy, all matmul like operations in there layers are decomposed into optimized adder trees. Heavy lifting tasks are offloaded to da4ml, where everything is jitted with numba. There, CMVM problem is optimized with greedy common subexpression elimination. A reduction of LUT consumption of over 30% is frequently seen when WRAP is used as overflow mode with improved latency. DSP consumption will almost always be 0 with this strategy.

This PR depends on the s-quark-pr and includes all changes made there. (QEinsumDense not available otherwise)

Type of change

  • New feature (non-breaking change which adds functionality)

Tests

Tests added to test_hgq_layers.py and test_einsum_dense.py. EinsumDense test will NOT be triggered in the current configuration due to keras v3 dependency.

Checklist

  • No docs for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant