I started thinking about this recently:
Why are we finding a lot of bugs in MontePy despite having around 98% code coverage?
This is a broad and complex issue, but in part I came across the concept of "pseudo-tested methods" (this was written about Java).
The authors do provide a tool for finding these methods, but it is only implemented for Java.
The authors also wrote an IEE article on this topic, that I should read at some point (doi: 10.1145/2896941.2896944)
Not reading anything won't stop me from drawing conclusions from an Abstract though:
Automated tests play an important role in software evolution because they can rapidly detect faults introduced during changes. In practice, code-coverage metrics are often used as criteria to evaluate the effectiveness of test suites with focus on regression faults. However, code coverage only expresses which portion of a system has been executed by tests, but not how effective the tests actually are in detecting regression faults.
Our goal was to evaluate the validity of code coverage as a measure for test effectiveness. To do so, we conducted an empirical study in which we applied an extreme mutation testing approach to analyze the tests of open-source projects written in Java. We assessed the ratio of pseudo-tested methods (those tested in a way such that faults would not be detected) to all covered methods and judged their impact on the software project. The results show that the ratio of pseudo-tested methods is acceptable for unit tests but not for system tests (that execute large portions of the whole system). Therefore, we conclude that the coverage metric is only a valid effectiveness indicator for unit tests.
So some actionable steps for the time being:
- exclude
tests/test_integration, etc. from coverage reports
- Limit the scope of coverage for specific test packages to specific source code. e.g.,
tests/test_syntax_parsing should not contribute to the MCNP_Problem coverage
Wishlist:
- Have an automated tool to detect pseudo-tested functions
- Detect when a function's return value is not tested.
I started thinking about this recently:
This is a broad and complex issue, but in part I came across the concept of "pseudo-tested methods" (this was written about Java).
The authors do provide a tool for finding these methods, but it is only implemented for Java.
The authors also wrote an IEE article on this topic, that I should read at some point (doi: 10.1145/2896941.2896944)
Not reading anything won't stop me from drawing conclusions from an Abstract though:
So some actionable steps for the time being:
tests/test_integration, etc. from coverage reportstests/test_syntax_parsingshould not contribute to theMCNP_ProblemcoverageWishlist: