Bumpy road complexity metrics #714

dbukki · 2024-02-29T14:15:28Z

Fixes #684

Formula

The bumpy road metric of a function is computed as the function's total bumpiness divided by the number of statements considered. The total bumpiness of a function is the sum of the depth of each considered statement, where depth is the level of the statement's indentation. (How many parent scopes does it have?)

bumpy_road(F) = ( SUM {s in S} (depth(s)) ) / count(S)

where S is the set of considered statements in function F.

Note: Functions with count(S)=0 are considered empty. In this case the result of the formula is 1.

Domain

The bumpy road metric only considers

control flow (e.g if, for, ...),
scope (e.g. { ... }, function body), and
top-level statements (e.g. int a;, x += y + 8;, ...) (basically only the "proper" statements terminated by a semicolon)

when traversing the AST. Anything else (e.g. labels, sub-expressions, ...) is not counted towards this metric.

Range

For every function F, let max(D) be the depth of its deepest statement. The bumpy road metric of F falls within the range:

[1, max(D)]

where

1 means completely flat (=good), and
max(D) means completely nested/bumpy (=bad).

Changes

In order to integrate this metric into CodeCompass, the total bumpiness and statement count had to be computed during parsing (since we do not store the necessary statement info in the database that the metrics plugin could utilize).

For these values, the CppFunction table now has two extra fields: bumpiness and statementCount.
The metrics parser is then responsible for computing the quotient that makes up the final metric.

In the parser, most of our existing Traverse* functions had to be restructured. As this also impacted existing features (notably: McCabe and destructor usage via statement stacks), some refactoring also had to be done to ensure they still work like before.

Instead of the old template decorator approach, we now use scope objects (StatementScope, TypeScope, EnumScope, FunctionScope, CtxStmtScope and ScopedValue) to perform pre- and post- actions around Base::Traverse* calls with their ctors and dtors. Combined with some further generalizations at the root TraverseDecl and TraverseStmt functions, this not only reduces the number of specialized Traverse* functions (which usually contained duplicate bodies), but also shortens/condenses the code of the parser.

In order to be able to track nested statements, scopes, and therefore the depth of the current statement during traversal, a new _stmtStack member has been introduced to the parser. This member is of a special StatementStack type that is built up from StatementScope objects (one per each statement during traversal) to form the "parent chain" of the currently inspected statement.
Individual statement scopes can then be further configured by the specialized Traverse* functions to describe how that particular statement affects the depth of further statements on the stack. (Note: I did my best to add documentation/comments to the non-trivial parts of each scope type.)
With this new mechanism, the old _mcCabeStack and _statements stacks could be successfully folded into this logic, thus further reducing complexity (and unnecessary duplication of the same statement stack pattern).

Testing

Unit tests for bumpy road have been added into to the test directory. Test cases have been written in a similar style as with the McCabe metric. Further McCabe test cases have also been added to check previously untested cases that I have discovered during my attempts at manual regression testing.

Elimination of duplicate Traverse* bodies via decorator functions. Minor refactors in the McCabe metric calculation, in anticipation of a similar approach with bumpy road metrics. Additional test cases for McCabe.

… stack.

Order of Stmt and Expr traversal functions changed so that they are grouped together more coherently.

More centralized/flexible type-based scope creation spanning all potential cases; not just specialized functions. Unification of the statement scope stack and the statement stack.

…t even be considered at query level.

…ame, different ctors look the same; therefore they cannot be distinguished during tests.

…licitly.

… performance.

…to bumpy-road

Elimination of duplicate Traverse* bodies via decorator functions. Minor refactors in the McCabe metric calculation, in anticipation of a similar approach with bumpy road metrics. Additional test cases for McCabe.

… stack.

Order of Stmt and Expr traversal functions changed so that they are grouped together more coherently.

More centralized/flexible type-based scope creation spanning all potential cases; not just specialized functions. Unification of the statement scope stack and the statement stack.

…t even be considered at query level.

…ame, different ctors look the same; therefore they cannot be distinguished during tests.

…licitly.

… performance.

…to bumpy-road

dbukki · 2024-02-29T14:18:40Z

For reference, here is a list of McCabe and bumpy road metrics exported from the Xerces-C project using the current state:
xerces-bumpyroad.log
xerces-mccabe.log

These types are only intended for scope-like usage (ctor-dtor pair matters), not the relocation of data.

dbukki · 2024-03-07T12:16:02Z

I found an interesting phenomenon in the parser:

In C++, we know that records and functions can be either just a declaration or an actual definition. However, our policy towards storing these in the parsed database is very different:

For record types, VisitRecordDecl returns early if the visited record declaration is not a definition. This causes the corresponding CppRecord entity to remain initialized with astNodeId==0, which eventually causes the record traversal functions to omit it from _types (and therefore the database):

if (_type->astNodeId)
  _visitor->_types.push_back(_type);

Note: In this PR, this call happens in TypeScope::~TypeScope, but the behavior itself is not new to this PR. In the current master, the same behavior can be found in the TraverseRecordDecl (and similar functions), so this is not a regression.

For functions, VisitFunctionDecl will initialize the CppFunction entity regardless of whether the function is a declaration or definition, which causes every occurrence of it to be added to _functions (and therefore the database):

if (_curFun->astNodeId)
  _visitor->_functions.push_back(_curFun);

Note: In this PR, this call happens in FunctionScope::~FunctionScope, but the behavior itself is not new to this PR. In the current master, the same behavior can be found in the TraverseFunctionDecl (and similar functions), so this is not a regression.

Essentially, this means that:

The following code persists only one CppRecord entity in the database (but 3 CppAstNode entities for each declaration):

class A;
class A {};
class A;

The following code persists no CppRecord entities in the database at all (only the 3 CppAstNode entities):

class NoDef0;
class NoDef1;
class NoDef2;

The following code persists 3 different (!) CppFunction entities in the database (along with the 3 CppAstNode entities):

void f();
void f() {}
void f();

Note: The 3 different CppFunction entities are nearly identical in content, with the exception of fields computed from the function's body (e.g. mccabe and bumpiness metrics). The metrics fields of the declaration entities will retain their default values assigned in VisitFunctionDecl; only the definition (which has the body) will contain the actual metrics.

As far as I can see:

Scenario 1 above is what we would normally expect: One record entity for the one definition.
Scenario 2 is a bit weird, since no actual type information is ever recorded for types that have only ever been declared, but never defined. Nonetheless, I can still image this being consistent with what we want to use our database for. (Since an AST node is still stored even for declarations, navigation via the UI is still possible here.)
Scenario 3 definitely smells fishy though. I would expect the same "one definition = one entity" rule to apply here as with record types.

The "multiple function entities" problem is also the root cause of why the McCabe and BumpyRoad tests fail for functions that are declared more than once: For such functions, _db->query_value<model::CppFunction>(...) runs into an assert as the query yields more than one results for the given name, out of which only one is truly meaningful.

I tried resolving this ambiguity by applying the same logic to functions as what we already utilize with records. In my local branch, I added a guard condition to only store the CppFunction entity for the definition.
However, this actually causes an existing test (namely: CppParserTest::FunctionDeclarationOnly) to fail. So this got me confused, and I now see two possibilities:

Either this "multiple function entities" phenomenon is the actual proper/expected behavior, and this test is right. That would mean that there has been reason in the past (and still is) to retain the current behavior and have a CppFunction record for every declaration of a function. But then by design, this also means that the metrics-related fields are also stored for each declaration, even though they have no true meaning there, which raises a database design concern.
Or, both our current policy for function storage and this test are wrong, and we should instead fix the database to only store every function entity once. By extension, this would cause functions without a definition to have no CppFunction entity stored in the database (as is already the case with records in Scenario 2). Thus the meaning of the FunctionDeclarationOnly test would have to change, but metrics (and relevant function info) would only be stored for every function once.

The second option seems like the more rational alternative to me, both from a database design perspective, and from a metrics-query perspective. It would also cut back on the size of the CppFunction table, which apparently contains a lot of "duplicates" right now. But I don't know if this problem deserves an issue ticket of its own or not. I am also uncertain about the regressions this second option would introduce, so I definitely don't want to rush ahead with the development until the situation is clear.

@mcserep Could you advise me on which of the above two options is the correct approach?

mcserep · 2024-03-14T10:12:18Z

@dbukki As we discussed yesterday on the weekly meeting, please continue as follows:

Do not deal with the found multiple function entities issue in this PR. Instead implement a simple solution, which can be used, e.g.:
- either insert the correct metric value for each CppFunction record; or
- insert the value only for the definition record.
Handle the multiple CppFunction issue in a separate PR, see Multiple CppFunction entities #720.

intjftw · 2024-04-24T13:25:54Z

If I execute incremental parsing on a project, parsing fails immediately with a segmentation fault. @dbukki can you please check if this is coming from your modifications?

dbukki · 2024-04-26T07:20:24Z

If I execute incremental parsing on a project, parsing fails immediately with a segmentation fault. @dbukki can you please check if this is coming from your modifications?

I checked the current master (8e84d84) and the segfault is also present there.
I put some logging into various places, including the constructor of ClangASTVisitor, and no logs are printed before the segfault happens. This leads me to believe that the crash happens at a much earlier step, long before any actual C++ parsing can take place.

In any case, this is a separate problem.
I have opened an issue for it: #735

mcserep · 2024-04-26T07:40:57Z

plugins/cpp/parser/src/clangastvisitor.h


-  bool TraverseFunctionDecl(clang::FunctionDecl* fd_)
+  class TypeScope final


Is there any special reason these helper classes are marked final?

mcserep · 2024-04-28T12:00:35Z

I checked the current master (8e84d84) and the segfault is also present there. I put some logging into various places, including the constructor of ClangASTVisitor, and no logs are printed before the segfault happens. This leads me to believe that the crash happens at a much earlier step, long before any actual C++ parsing can take place.

In any case, this is a separate problem. I have opened an issue for it: #735

@dbukki I have checked and branch release/gershwin is not affected by this bug, so it is highly likely to be related to the cppmetrics plugin, as that is the only new plugin in the upcoming release with a parser component. Nevertheless, it should be investigated in a separate issue (#735), as it is not directly related to the bumpy road metric.

intjftw

Looks fine, LGTM! 🚀

mcserep · 2024-05-14T07:55:11Z

@dbukki The bumpy road metrics was not added to the service.

Thrift interface:

CodeCompass/plugins/cpp_metrics/service/cxxmetrics.thrift

Lines 7 to 13 in 7ba19f9

    
           enum CppAstNodeMetricsType 
        
           { 
        
             ParameterCount = 1, 
        
             McCabe = 2, 
        
             LackOfCohesion = 3, 
        
             LackOfCohesionHS = 4 
        
           }

Service implementation:

CodeCompass/plugins/cpp_metrics/service/src/cppmetricsservice.cpp

Lines 26 to 46 in 7ba19f9

    
           void CppMetricsServiceHandler::getCppAstNodeMetricsTypeNames( 
        
             std::vector<CppAstNodeMetricsTypeName>& _return) 
        
           { 
        
             CppAstNodeMetricsTypeName typeName; 
        
             typeName.type = CppAstNodeMetricsType::ParameterCount; 
        
             typeName.name = "Number of function parameters"; 
        
             _return.push_back(typeName); 
        
             typeName.type = CppAstNodeMetricsType::McCabe; 
        
             typeName.name = "McCabe metric"; 
        
             _return.push_back(typeName); 
        
             typeName.type = CppAstNodeMetricsType::LackOfCohesion; 
        
             typeName.name = "Lack of cohesion of function"; 
        
             _return.push_back(typeName); 
        
             typeName.type = CppAstNodeMetricsType::LackOfCohesionHS; 
        
             typeName.name = "Lack of cohesion of function (Henderson-Sellers variant)"; 
        
             _return.push_back(typeName); 
        
           }

Please add this to the codebase in a new fixing PR, so the bumpy road metric become queryable through the web API.

dbukki added 30 commits February 22, 2024 12:46

Preliminary refactor of relevant parser code.

0debec5

Elimination of duplicate Traverse* bodies via decorator functions. Minor refactors in the McCabe metric calculation, in anticipation of a similar approach with bumpy road metrics. Additional test cases for McCabe.

First draft of computing bumpiness of functions via a statement scope…

b155518

… stack.

Bumpy road metrics extraction from parsed functions.

0d10be7

First set of unit tests.

03a60ef

Decorator functions replaced by scope objects on stack.

d44728a

Order of Stmt and Expr traversal functions changed so that they are grouped together more coherently.

TraverseDecl and TraverseStmt have been refactored.

0877185

More centralized/flexible type-based scope creation spanning all potential cases; not just specialized functions. Unification of the statement scope stack and the statement stack.

ScopedValue is moved to util with some extra C++ generalizations.

a8181ba

McCabe test case modifications.

0d30293

McCabe test fix: function declarations without a definition should no…

9bb9b62

…t even be considered at query level.

McCabe test fix again: Without signatures in the qualified function n…

4d2104e

…ame, different ctors look the same; therefore they cannot be distinguished during tests.

Added nested transparent blocks to handle case and default labels exp…

c78193a

…licitly.

Elimination of inheritence, virtual functions and dyn_cast-s to boost…

80a0f42

… performance.

Minor adjustments and code polishing.

c538bd1

Revoke local changes from gitignore (mistake).

7ebf7c6

Revoke local changes from gitignore (mistake).

e0d1450

Merge branch 'bumpy-road' of https://github.com/dbukki/CodeCompass in…

e2661e9

…to bumpy-road

Preliminary refactor of relevant parser code.

68daa6a

Elimination of duplicate Traverse* bodies via decorator functions. Minor refactors in the McCabe metric calculation, in anticipation of a similar approach with bumpy road metrics. Additional test cases for McCabe.

First draft of computing bumpiness of functions via a statement scope…

a7143e9

… stack.

Bumpy road metrics extraction from parsed functions.

ba32b6c

First set of unit tests.

0f81ae0

Decorator functions replaced by scope objects on stack.

8b77d47

Order of Stmt and Expr traversal functions changed so that they are grouped together more coherently.

TraverseDecl and TraverseStmt have been refactored.

14b8273

More centralized/flexible type-based scope creation spanning all potential cases; not just specialized functions. Unification of the statement scope stack and the statement stack.

ScopedValue is moved to util with some extra C++ generalizations.

605b49d

McCabe test case modifications.

a1904e7

McCabe test fix: function declarations without a definition should no…

0cd0796

…t even be considered at query level.

McCabe test fix again: Without signatures in the qualified function n…

a352eee

…ame, different ctors look the same; therefore they cannot be distinguished during tests.

Added nested transparent blocks to handle case and default labels exp…

16957d9

…licitly.

Elimination of inheritence, virtual functions and dyn_cast-s to boost…

3ac576f

… performance.

Minor adjustments and code polishing.

1bbc9c4

Revoke local changes from gitignore (mistake).

e51dff2

dbukki added 4 commits February 29, 2024 13:39

Revoke local changes from gitignore (mistake).

c8f9788

Elimination of some recently introduced compiler warnings.

f90aade

Merge branch 'bumpy-road' of https://github.com/dbukki/CodeCompass in…

244372b

…to bumpy-road

New line adjustments in diff.

f9bddaa

mcserep requested review from intjftw and mcserep March 6, 2024 14:46

mcserep assigned dbukki Mar 6, 2024

mcserep added Kind: Enhancement 🌟 Plugin: C++ Issues related to the parsing and presentation of C++ projects. Plugin: Metrics Issues related to the code metrics plugin. labels Mar 6, 2024

Copy and move semantics have been disabled for scope types.

e77d2cd

These types are only intended for scope-like usage (ctor-dtor pair matters), not the relocation of data.

mcserep mentioned this pull request Mar 14, 2024

Multiple CppFunction entities #720

Open

mcserep mentioned this pull request Mar 14, 2024

Separate parse-time C++ metrics information into distinct tables #721

Open

dbukki mentioned this pull request Apr 26, 2024

Incremental parsing fails with segmentation fault #735

Closed

mcserep reviewed Apr 26, 2024

View reviewed changes

intjftw approved these changes Apr 28, 2024

View reviewed changes

intjftw merged commit 3279e4b into Ericsson:master Apr 28, 2024
11 checks passed

mcserep mentioned this pull request Apr 29, 2024

Bumpy road complexity metrics #738

Merged

dbukki deleted the bumpy-road branch May 11, 2024 07:05

dbukki mentioned this pull request May 16, 2024

Update C++ metrics service with our new metrics #748

Merged

mcserep added this to the Upcoming Release milestone Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bumpy road complexity metrics #714

Bumpy road complexity metrics #714

dbukki commented Feb 29, 2024

dbukki commented Feb 29, 2024

dbukki commented Mar 7, 2024

mcserep commented Mar 14, 2024

intjftw commented Apr 24, 2024

dbukki commented Apr 26, 2024 •

edited

Loading

mcserep Apr 26, 2024 •

edited

Loading

mcserep commented Apr 28, 2024 •

edited

Loading

intjftw left a comment

mcserep commented May 14, 2024


		bool TraverseFunctionDecl(clang::FunctionDecl* fd_)
		class TypeScope final

Bumpy road complexity metrics #714

Bumpy road complexity metrics #714

Conversation

dbukki commented Feb 29, 2024

Formula

Domain

Range

Changes

Testing

dbukki commented Feb 29, 2024

dbukki commented Mar 7, 2024

mcserep commented Mar 14, 2024

intjftw commented Apr 24, 2024

dbukki commented Apr 26, 2024 • edited Loading

mcserep Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

mcserep commented Apr 28, 2024 • edited Loading

intjftw left a comment

Choose a reason for hiding this comment

mcserep commented May 14, 2024

dbukki commented Apr 26, 2024 •

edited

Loading

mcserep Apr 26, 2024 •

edited

Loading

mcserep commented Apr 28, 2024 •

edited

Loading