6 Large vascular networks: a computational challenge
6.2 Coding strategy
6.2.2 Reliability
For the purpose of ensuring the reliability of the code, a series of tests is regularly executed and any failure is immediately reported. The quality of these tests entirely relies on the capacity of planning potential mistakes, from the most obvious to the tiniest one. Far from being universal, the philosophy of testing has to be discussed up-front. Indeed, an infinite variety of tests exists [Bertolino, 2007] and it is crucial to adopt a common strategy to make sense. Concerning the HNFV-Code, the tests can be sorted in two main classes:
• the one-function-one-test class is the lowest level of testing. It implies that one test is dedicated to one single functionality of the code. The aim of these tests is to check if a given function performs as expected, in a strict computational point of view, independently of whether the results are physically or mathematically correct. Most of the time, it is actually impossible to isolate a functionality (e.g. any test on a network requires the call of the input files reader first). Thus, this kind of tests is meaningful only if all the functions are tested, from the first computational step, each test validating the testing of the next step. In other words, this strategy implies that the number of tests is equal to the number of functions in the sources. In the implementation of some professional software, the developers even reverse the writing steps and write the test associated to a new functionality before implementing the source code. This strategy is called “test-driven development” [Pitt-Francis et al., 2009] and is the ideal way of testing. Unfortunately, it is extremely time consuming. In our case, we first apply the one- function-one-test approach to crucial parts of the code, then progressively refine our testing during pair-programming sessions. However, it appears to be very difficult to cover 100% of the functions, unless it is the full-time job of one of the developers.
• the physical tests verify if the outputs are physically accurate. This is the highest level of testing. The code is considered as a “black box” tool for biophysical applications. Contrary to the one-function-one-test class, the interest of the physical tests is to involve the largest series of functionalities. They actually ignore the intern step-by-step process and verify if any
6. Large vascular networks: a computational challenge modifications of the inputs correctly impact the outputs (e.g. constant pressure imposed at each inlet and outlet of a network implies a constant pressure in the whole component). Because the possibility of testing is infinite, due to the number of input files and operations that can be combined, we restrict the tests to our standard use of the code and to critical particular cases (e.g. boundary conditions that may imply a bad conditioning of the matrix).
To offer a quality control process, the testing unit must be independent and cover all the sources. We use the external library CxxTest (cxxtest.com), which is a light portable testing framework adapted to C++. This tool automatically checks all the assertions implemented in the tests and displays a detailed report in case of failure. Moreover, this library provides the possibility of building classes, thus the organization of the test files can be the mirror of the sources (see Fig. 6.1), which is very convenient. This choice has also been influenced by its compatibility with GitLab. Indeed, upon each commit on the central repository, all the tests are run to check if no functionality has been inadvertently broken. This process is called “continuous integration”. It ensures that the code still performs as intended, in the limit of the accuracy defined by the set of tests. This approach does not guarantee bug-free software but makes them very rare, since the reaction facing a bug must be to implement a test to prevent it.
To measure the coverage of the tests on the entire code, we further combine the test running with the coverage tools gcov and lcov. The first one flags each executed line and function during the execution, and the second one translates the outputs of the first one into an information file that can be read by the tool genhtml to generate a fancy visualization in HTML format. This report highlights the lack of coverage of some parts of the code. For instance, the global coverage in Fig. 6.2 shows that:
• the functions in the post-processing classes are poorly tested. This coincides with the highest level of testing. This lack of testing is not critical in the sense that no further step in the execution process depends on them. Moreover, the outputs resulting from these classes are regularly exploited by the users, thus implicitly validated.
• the class Mesh/Network/Writer is almost not tested. This corresponds to the generation of input files for standards networks (e.g. 6- and 3- regular) or transformed networks (e.g. extrac- tion or duplication). In the same spirit as the outputs, the input files are implicitly validated, mostly by visualization tools. Furthermore, automatic controls are implemented in the code to verify the entries, sending error messages if the input files are not built correctly (e.g. detection of a boundary node with no boundary condition assigned).
6. Large vascular networks: a computational challenge
• the classes related to the coupling part also present a low rate of testing. This is explained by the fact that several coupling models have been implemented and investigated before finding the version we present in this manuscript. Thus a lot of functions are kept as a history but not exploited in the end.
Figure 6.2: Global test coverage. The coverage of the tests can be visualized by combining the use of the tools gcov, lcov and genhtml. assigned
More precise observations can also be made by exploring the coverage report on lines in each file (Fig. 6.3):
• a lot of unexecuted lines correspond to the sending of error messages (e.g. solver divergence in Fig. 6.3(a)), meaning that we actually check if the code functionalities work well in valid configurations but not if it performs as expected in case of failure;
• some unexecuted lines correspond to unreached cases. These generally correspond to non- physical cases (e.g. a network node with two neighbours in Fig. 6.3(b)) that have been an- ticipated during the implementation but that are rarely reached in standard executions of the code;
• the test coverage is generated on a sequential execution of the tests (on one processor), thus the lines dedicated to a parallel execution are not tested (e.g. updating of ghosted values in Fig. 6.3(c)). It would be interesting to adapt the coverage process to an execution on several processors in the future.
6. Large vascular networks: a computational challenge
(a)
(b)
(c)
Figure 6.3: Zoom in on standard unexecuted areas. These three screenshots result from the test coverage using gcov, lcov and genhtml. The numbers in yellow represent the line number in the file. The blue lines correspond to executed lines (the number on the left-hand side of the colon shows how many times) and the red lines correspond to unexecuted lines. (a) The error report in case of divergence of the solver is not tested. (b) The non physical case where a network node has two neighbours is not tested. (c) Operations dedicated to a parallel execution of the code is not tested.
6. Large vascular networks: a computational challenge
Finally, it is quite challenging to evaluate the exact test coverage of the code and its relevance from this report. We can indeed conclude that the unexecuted functions (about 30%) are not tested, but not that the executed functions (about 70%) are tested. However, some trends can be observed concerning the unexecuted functions and lines, providing a valuable tool to improve or re-orient our testing strategies (e.g. focus on poorly tested parts of the code).