-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bumpy road complexity metrics #714
Conversation
Elimination of duplicate Traverse* bodies via decorator functions. Minor refactors in the McCabe metric calculation, in anticipation of a similar approach with bumpy road metrics. Additional test cases for McCabe.
Order of Stmt and Expr traversal functions changed so that they are grouped together more coherently.
More centralized/flexible type-based scope creation spanning all potential cases; not just specialized functions. Unification of the statement scope stack and the statement stack.
…t even be considered at query level.
…ame, different ctors look the same; therefore they cannot be distinguished during tests.
Elimination of duplicate Traverse* bodies via decorator functions. Minor refactors in the McCabe metric calculation, in anticipation of a similar approach with bumpy road metrics. Additional test cases for McCabe.
Order of Stmt and Expr traversal functions changed so that they are grouped together more coherently.
More centralized/flexible type-based scope creation spanning all potential cases; not just specialized functions. Unification of the statement scope stack and the statement stack.
…t even be considered at query level.
…ame, different ctors look the same; therefore they cannot be distinguished during tests.
For reference, here is a list of McCabe and bumpy road metrics exported from the Xerces-C project using the current state: |
These types are only intended for scope-like usage (ctor-dtor pair matters), not the relocation of data.
I found an interesting phenomenon in the parser: In C++, we know that records and functions can be either just a declaration or an actual definition. However, our policy towards storing these in the parsed database is very different:
Note: In this PR, this call happens in
Note: In this PR, this call happens in Essentially, this means that:
Note: The 3 different CppFunction entities are nearly identical in content, with the exception of fields computed from the function's body (e.g. mccabe and bumpiness metrics). The metrics fields of the declaration entities will retain their default values assigned in VisitFunctionDecl; only the definition (which has the body) will contain the actual metrics. As far as I can see:
The "multiple function entities" problem is also the root cause of why the McCabe and BumpyRoad tests fail for functions that are declared more than once: For such functions, I tried resolving this ambiguity by applying the same logic to functions as what we already utilize with records. In my local branch, I added a guard condition to only store the CppFunction entity for the definition.
The second option seems like the more rational alternative to me, both from a database design perspective, and from a metrics-query perspective. It would also cut back on the size of the CppFunction table, which apparently contains a lot of "duplicates" right now. But I don't know if this problem deserves an issue ticket of its own or not. I am also uncertain about the regressions this second option would introduce, so I definitely don't want to rush ahead with the development until the situation is clear. @mcserep Could you advise me on which of the above two options is the correct approach? |
@dbukki As we discussed yesterday on the weekly meeting, please continue as follows:
|
If I execute incremental parsing on a project, parsing fails immediately with a segmentation fault. @dbukki can you please check if this is coming from your modifications? |
I checked the current master (8e84d84) and the segfault is also present there. In any case, this is a separate problem. |
|
||
bool TraverseFunctionDecl(clang::FunctionDecl* fd_) | ||
class TypeScope final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any special reason these helper classes are marked final?
@dbukki I have checked and branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, LGTM! 🚀
@dbukki The bumpy road metrics was not added to the service. Thrift interface: CodeCompass/plugins/cpp_metrics/service/cxxmetrics.thrift Lines 7 to 13 in 7ba19f9
Service implementation: CodeCompass/plugins/cpp_metrics/service/src/cppmetricsservice.cpp Lines 26 to 46 in 7ba19f9
Please add this to the codebase in a new fixing PR, so the bumpy road metric become queryable through the web API. |
Fixes #684
Formula
The bumpy road metric of a function is computed as the function's total bumpiness divided by the number of statements considered. The total bumpiness of a function is the sum of the depth of each considered statement, where depth is the level of the statement's indentation. (How many parent scopes does it have?)
bumpy_road(F) = ( SUM {s in S} (depth(s)) ) / count(S)
where S is the set of considered statements in function F.
Note: Functions with
count(S)=0
are considered empty. In this case the result of the formula is1
.Domain
The bumpy road metric only considers
if
,for
, ...),{ ... }
, function body), andint a;
,x += y + 8;
, ...) (basically only the "proper" statements terminated by a semicolon)when traversing the AST. Anything else (e.g. labels, sub-expressions, ...) is not counted towards this metric.
Range
For every function
F
, letmax(D)
be the depth of its deepest statement. The bumpy road metric ofF
falls within the range:[1, max(D)]
where
1
means completely flat (=good), andmax(D)
means completely nested/bumpy (=bad).Changes
In order to integrate this metric into CodeCompass, the total bumpiness and statement count had to be computed during parsing (since we do not store the necessary statement info in the database that the metrics plugin could utilize).
For these values, the
CppFunction
table now has two extra fields:bumpiness
andstatementCount
.The metrics parser is then responsible for computing the quotient that makes up the final metric.
In the parser, most of our existing
Traverse*
functions had to be restructured. As this also impacted existing features (notably: McCabe and destructor usage via statement stacks), some refactoring also had to be done to ensure they still work like before.Instead of the old template decorator approach, we now use scope objects (
StatementScope
,TypeScope
,EnumScope
,FunctionScope
,CtxStmtScope
andScopedValue
) to perform pre- and post- actions aroundBase::Traverse*
calls with their ctors and dtors. Combined with some further generalizations at the rootTraverseDecl
andTraverseStmt
functions, this not only reduces the number of specializedTraverse*
functions (which usually contained duplicate bodies), but also shortens/condenses the code of the parser.In order to be able to track nested statements, scopes, and therefore the depth of the current statement during traversal, a new
_stmtStack
member has been introduced to the parser. This member is of a specialStatementStack
type that is built up fromStatementScope
objects (one per each statement during traversal) to form the "parent chain" of the currently inspected statement.Individual statement scopes can then be further configured by the specialized
Traverse*
functions to describe how that particular statement affects the depth of further statements on the stack. (Note: I did my best to add documentation/comments to the non-trivial parts of each scope type.)With this new mechanism, the old
_mcCabeStack
and_statements
stacks could be successfully folded into this logic, thus further reducing complexity (and unnecessary duplication of the same statement stack pattern).Testing
Unit tests for bumpy road have been added into to the test directory. Test cases have been written in a similar style as with the McCabe metric. Further McCabe test cases have also been added to check previously untested cases that I have discovered during my attempts at manual regression testing.