[CFDFC] CFDFC Extraction & Caching Redesign #262
Replies: 5 comments
-
Indeed, while we are currently satisfied with FTD functionality-wise, it remains an "abstract" method as long as we cannot really benefit from all the handshake optimizations (especially FPGA'20) . I hope the following comment is somehow relevant to your topic - otherwise feel free to ignore it :) One problem we faced for FTD was the lack of a CFG structure at the end of the conversion pass. See these messages for a full context: 1, 2, 3. Right now, we want to use the edge information to re-introduce a fake Maybe this is not relevant at all with your concern, but if you feel like such addition might be beneficial, we can find a way which accommodates both our scopes.
Is this something you were thinking of discovering during buffering or during a previous pass? Right now, it is not fully guaranteed the persistence of names across all passes, thus the stored P.S.
The link here is broken! |
Beta Was this translation helpful? Give feedback.
-
@pcineverdies thanks for the comment!
I updated the flow that I had in mind. But I see your point, although the conversion passes do not change the names (something called |
Beta Was this translation helpful? Give feedback.
-
Thanks @Jiahui17 for initiating this discussion! I think for buffer placement to work for an arbitrary dataflow circuit (for instance, those produced by the fast token delivery strategy), we need the following:
I think @pcineverdies's work on propagating information about the CFG in the IR and being able to reconstruct it at any point in the As for the naming problem in the identification of the conditions of BBs, I think we should be relying on attributes rather than on the names of operations, as I suggest here. |
Beta Was this translation helpful? Give feedback.
-
Further to the above discussion, @pcineverdies and I were discussing today that it would be useful for everyone to have a single, consistent, and general methodology to refer to control flow information at any point in the In @pcineverdies's work on implementing FTD, he added new annotations to the IR and used them to temporarily reconstruct the entirety of the CFG structure to benefit from all the standard analysis done at the We are thinking of opening this in a separate issue, but feel free to object or support the idea here early on :) |
Beta Was this translation helpful? Give feedback.
-
Since this is being pushed as some kind of long-term "good" solution for this problem I need to re-iterate that, while functional for this use case, this is very much a hack that will be hard or impossible to maintain in the long run and is therefore not desirable as anything longer than a short-term solution. The correct way to do this, as I have already alluded to on the issue where this was originally brought up, is to become a client of LLVM's |
Beta Was this translation helpful? Give feedback.
-
Introduction
Performance optimization algorithms like the one in FPGA20 aim to optimize dataflow subcircuits called CFDFCs, which correspond to one or multiple loops in the control-flow graph. Since then, many papers have built on top of this idea in these few ways:
The Requirements
Limitations of the current design. So far, the CFDFC data structure suits 1, but it is not designed with 2 and 3 in mind. I would like to initiate the discussion for a new implementation in this issue.
Better Caching Mechanism
Currently, the buffer placement pass would log the performance optimization result directly inside the MLIR file as a function attribute. This includes:
Note
Duplicated logic. As seen from the LSQ sizing pass: The CFDFC class in the buffering pass is not the same as the CFDFC class in the LSQ sizing pass. Thus, the LSQ sizing pass needs its logic for recreating the CFDFCs from the function attributes.
Note
Complex instrumentation. As seen from the sharing pass: The buffer placement passes are instrumented to retrieve the performance optimization decision. The buffer passes are internally called by the sharing pass (instead of reading the cached result from somewhere).
This approach is against the modularity principle (but is fundamentally due to the fact that caching the performance information is hard).
Note
Rely on BB organization. As I will discuss later, some handshake circuits would omit BB organization. Therefore, using a list of BBs cannot recover the CFDFC from this kind of circuit.
Thus, I suggest creating a simple Graphviz DOT format for caching the performance optimization decision across the optimization passes. Here is an example:
For each CFDFC, a
.DOT
file is created and it encodes the following information:The CFDFC would be extended to:
General CFDFC Extraction Method
More recent CF to handshake conversion passes (#177) omit BB organization for performance merit. The current CFDFC extraction logic would not work here because it relies on BB information. Yet, if we can figure out a new way to extract CFDFC, the performance optimization algorithm should work out of the box.
Current CFDFC extraction logic:
The result of software profiling returns a list of BBs transitions. These transitions are then used to construct a list of "BB cycles". For example (the
matvec
example):Based on these two BB loops, we identify:
An extraction flow. Ideally, we could have:
Note
Effect of omitting BB organization. Yet, the fast token method would create "weird connections", like
1 -> 3
for the second loop. These connections would be completely ignored using the current logic, which might cause a performance penalty due to lack of token balancing.Proposed solution:
(here I am improvising and definitely need your help @AyaElAkhras @paolo-ienne @lana555)
Instead of identifying a list of BB loops, we identify the set of conditions that will "keep the loop running".
For instance, for the same loop
2 -> 2
, we might get a hypothetical set of conditions:From here, we can propagate this information to the rest of the circuit to remove inactive parts when the loop is running. For instance:
Remarks
What do you think? I'd like to hear your thoughts on this :D
Beta Was this translation helpful? Give feedback.
All reactions