-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add procs for interfacing with AXI peripherals in ZSTD Decoder #1613
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
4ca2f68
to
013deef
Compare
Moved the cocotb tests to a separate PR: #1616 |
Added performance improvements for the The PR is ready for review, @proppy please take a look. |
xls/modules/zstd/memory/BUILD
Outdated
dslx_top = "AxiReaderInst", | ||
library = ":axi_reader_dslx", | ||
opt_ir_args = { | ||
"top": "__axi_reader__AxiReaderInst__AxiReader_0__16_32_4_4_4_3_2_14_next", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that still required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is required for IR benchmarks because XLS does not allow optimizing IRs of procs that have empty next()
.
This is very often the case for parameterized procs.
We must specify additional Instance
proc that spawns the proc in question with specified parameters.
In such case the proc marked as default top
in the IR will be the Instance
proc which has empty next()
. If we want to benchmark the IR, it is required to point to the internal proc as the new top because this proc contains all of the logic.
xls/modules/zstd/memory/BUILD
Outdated
"streaming_channel_data_suffix": "_data", | ||
"flop_inputs_kind": "skid", | ||
"flop_outputs_kind": "skid", | ||
"clock_period_ps": "1800", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason the clock period is different here? should we keep a list of those at the top for easier tuning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The critical path delay from the IR benchmark for this proc exceeds the 750ps that we use throughout the ZSTD Decoder. In order to provide valid constraints for the scheduler we increased the clock period to the value that is close to the critical path delay. Moved this to a variable at the top of the file where the 750ps
clock period is also defined.
xls/modules/zstd/memory/BUILD
Outdated
|
||
axi_stream_remove_empty_codegen_args = common_codegen_args | { | ||
"module_name": "axi_stream_remove_empty", | ||
"clock_period_ps": "1300", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason the clock period is different here? should we keep a list of those at the top for easier tuning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The critical path delay from the IR benchmark for this proc exceeds the 750ps that we use throughout the ZSTD Decoder. In order to provide valid constraints for the scheduler we increased the clock period to the value that is close to the critical path delay. Moved this to a variable at the top of the file where the 750ps
clock period is also defined.
xls/modules/zstd/memory/BUILD
Outdated
"streaming_channel_data_suffix": "_data", | ||
"flop_inputs_kind": "skid", | ||
"flop_outputs_kind": "skid", | ||
"clock_period_ps": "2600", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason the clock period is different here? should we keep a list of those at the top for easier tuning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The critical path delay from the IR benchmark for this proc exceeds the 750ps that we use throughout the ZSTD Decoder. In order to provide valid constraints for the scheduler we increased the clock period to the value that is close to the critical path delay. Moved this to a variable at the top of the file where the 750ps
clock period is also defined.
xls/modules/zstd/memory/README.md
Outdated
4. Wait for the response submitted on the `resp_s` channel, which indicates | ||
if the write operation was successful or an error occurred. | ||
|
||
# Cocotb Simulation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this section to #1616?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
The tests take quite a long time to execute:
is that expected? |
This is the problem described in #1615. Reverting afc720d, allowed us to pass the type-checking step, but most probably the issue requires a proper fix, rather than reverting the functionality. |
according to #1615 (comment), does this need to be updated? |
@proppy After the rebase we experienced a regression in Additionally, we noticed that We addressed your review comments and slightly changed the interface of the |
did you report it to https://github.com/hdl/bazel_rules_hdl ? /cc @mikesinouye @QuantamHD |
That sounds weird, can you share the shell-expanded bazel command line that only run the first target? |
) | ||
|
||
CLOCK_PERIOD_PS = "750" | ||
# Clock periods for modules that exceed the 750ps critical path in IR benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would increasing pipelining help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not enough for the scheduler as we constrain it from 3 sides:
- worst case throughput - 1
- target clock period - 750ps
- pipeline stages - as many as required
Combining worst_case_throughput==1
with fixed clock period requirement enforces a condition that each next()
evaluation must be possible to compute in a given clock period time limit.
Currently, some of the procs don't meet this requirement and scheduling in such case will fail with a message to increase the worst_case_throughput
. We can do that or we can stop enforcing specific clock period for such procs. This way, the module would still evaluate the next() function in a single clock cycle and we would still have an estimate for the max clock period from the IR benchmark.
src = ":axi_reader_verilog.opt.ir", | ||
benchmark_ir_args = axi_reader_codegen_args | { | ||
"pipeline_stages": "10", | ||
"top": "__axi_reader__AxiReaderInst__AxiReader_0__16_32_4_4_4_3_2_14_next", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this rule works mostly for the simple DSLX functions. I was not able generate a correct name of the proc with it. Tried e.g.:
get_mangled_ir_symbol("AxiReaderInst", "AxiReader", (16, 32, 4, 4, 4, 3, 2, 14))
, got:__AxiReaderInst__AxiReader__16_32_4_4_4_3_2_14
get_mangled_ir_symbol("AxiReaderInst", "AxiReader", is_proc_next=True)
, got:__AxiReaderInst__AxiReadernext
Looks like the case of a next()
function of an instance of the parameterized proc that is spawned in other proc is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah you're right:
is_proc_next: A boolean flag denoting whether the symbol is a next proc function. The argument is mutually exclusive with arguments: 'parametric_values' and 'is_implicit_token'.
We should file a feature request for this to be supported, one workaround I can see is to define wrapper proc rather than putting the parameters in the BUILD file.
That would be for example:
This will execute the IR benchmark only for the |
bazel run executes only the first target from the list of targets acquired from the output of bazel query. In order to properly call all targets it is required to loop through the targets and run one at a time Signed-off-by: Pawel Czarnecki <[email protected]>
Required to pass place_and_route Signed-off-by: Pawel Czarnecki <[email protected]>
Co-authored-by: Michal Czyz <[email protected]> Signed-off-by: Robert Winkler <[email protected]>
Signed-off-by: Robert Winkler <[email protected]>
This comit adds implementation of AxiReader proc that can be used to to issue AXI read requests as an AXI Manager device. Signed-off-by: Robert Winkler <[email protected]>
This commits adds AxiStreamRemoveEmpty proc, that can be used to remove bytes marked as containing no data in the Axi Stream Signed-off-by: Robert Winkler <[email protected]>
This commit adds AxiStreamDownscaler that can be used to convert AxiStream transactions from a wider bus, to multiple transactions on more narrow bus Signed-off-by: Robert Winkler <[email protected]>
This commit adds MemReader and MemReaderAdv procs for handling read transactions on the AXI bus. Signed-off-by: Robert Winkler <[email protected]>
Internal-tag: [#62924] Co-authred-by: Pawel Czarnecki <[email protected]> Co-authred-by: Robert Winkler <[email protected]> Signed-off-by: Michal Czyz <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]>
Internal-tag: [#64376] Signed-off-by: Pawel Czarnecki <[email protected]>
Internal-tag: [#65205] Signed-off-by: Pawel Czarnecki <[email protected]>
Co-authored-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]>
Looks like this is by design:
|
load("@rules_hdl//place_and_route:build_defs.bzl", "place_and_route") | ||
load("@rules_hdl//synthesis:build_defs.bzl", "benchmark_synth", "synthesize_rtl") | ||
load("@rules_hdl//verilog:providers.bzl", "verilog_library") | ||
load("@xls_pip_deps//:requirements.bzl", "requirement") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an unused import.
Can you run
buildifier --lint=fix xls/modules/zstd/memory/BUILD
on it ? (Sorry, we should have built into the CI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing this now on import, but it is always a good idea to run buildifier before.
This PR introduces pros that enable the interface between DSLX modules and AXI subordinates. The attached README provides detailed documentation of the new functionality. Provided Verilog simulations show that the new procs allow for reading from and writing to RAM located on the AXI bus.
While working on this PR, we encountered a potential issue linked to afc720d, which appears to cause indefinite parsing and type-checking loops for certain files, such as
mem_writer.x
(provided in this PR).