-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed Arithmetic strategy for Dense, Conv1/2D, and EinsumDense #1191
Open
calad0i
wants to merge
79
commits into
fastmachinelearning:main
Choose a base branch
from
calad0i:da4ml-v2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fix syntax err in fused fixed_point_quantizer
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces a new strategy,
distributed_arithmetic
forDense
(io parallel / stream)Conv1/2D
(io parallel / stream)EinsumDense
(io parallel)With this strategy, all
matmul
like operations in there layers are decomposed into optimized adder trees. Heavy lifting tasks are offloaded toda4ml
, where everything is jitted withnumba
. There,CMVM
problem is optimized with greedy common subexpression elimination. A reduction ofLUT
consumption of over30%
is frequently seen whenWRAP
is used as overflow mode with improved latency.DSP
consumption will almost always be 0 with this strategy.This PR depends on the
s-quark-pr
and includes all changes made there. (QEinsumDense
not available otherwise)Type of change
Tests
Tests added to
test_hgq_layers.py
andtest_einsum_dense.py
.EinsumDense
test will NOT be triggered in the current configuration due tokeras v3
dependency.Checklist