Speeding Up the Query Calculation - Automated model-based spreadsheet debugging

For large systems determining the next query can take too long in the interactive sequential diagnosis process. The reason is that a set of diagnoses is required to determine the next query. Although it was shown that a set of 9 diagnoses is sufficient to determine a good query [Shc+12], for larger systems calculating these 9 diagnoses can already exceed acceptable times.

Algorithmic approach: In [Shc+16b] a new algorithmic approach was presented to speed up the calculation of the diagnoses required to determine the next query. The approach builds upon the new concept of so-called partial diagnoses. These partial diagnoses are, as the name suggests, subsets of real diagnoses. The idea of using partial diagnoses is to search for conflicts only once during the HS-Tree construction, for example using MERGEXPLAIN (see Section 3.1), and to use the found conflicts to determine partial diagnoses without checking if they fully explain

the observed fault. Since the found conflicts are a subset of all conflicts of the system, the partial diagnoses determined because of these conflicts will also be subsets of the (complete) diagnoses of the system. Therefore, queries that help to discriminate between the calculated partial diagnoses will also help to reduce the number of (complete) diagnoses.

If we would have, for example, a system with components 1 to 8 and conflicts {{2,7},{3,4},{6,7,8}}, as used in the example of Section 3.1, the (complete) diagnoses for this system would be{{3,7},{4,7},{2,3,6},{2,3,8},{2,4,6},{2,4,8}}. If we now assume that we only computed 2 of these 3 conflicts, for example,{{2,7},

{3,4}}, we could determine the partial diagnoses {{2,3}, {2,4}, {3,7}, {4,7}}. These partial diagnoses are all subsets of complete diagnoses. In fact, 2 of these partial diagnoses are even complete although only 2 of the 3 conflicts of the system were used to calculate them.

2 7 3 4

Determine some conflicts:

Calculate partial diagnoses: 2 3 2 4 3 7 4 7

Find preferred partial diagnosis: 2 4

Determine more conflicts: 6 7 8

Calculate partial diagnoses: 2 4 6 2 4 8

Find preferred partial diagnosis: 2 4 6

Figure 4.2: Example of the sequential diagnosis process using partial diagnoses.

The concept of partial diagnoses can be utilized in the sequential diagnosis process using the following technique [Shc+16b]. An example of the process is shown in Figure 4.2. First, the algorithm searches for a set of conflicts in the given faulty system using MERGEXPLAINor some other conflict detection technique that is in the best case able to efficiently determine multiple conflicts and will find, for example, the conflicts{2,7} and {3,4}. The found conflicts are then used to determine a limited number, for example, 9, of partial diagnoses. In the example of Figure 4.2, however, only 4 partial diagnoses can be calculated because of the found conflicts. The system uses these partial diagnoses to determine queries to ask to the user in

the same way as the general sequential diagnosis approach does (see Section 4.1). The process of calculating the partial diagnoses, determining a query, and asking it to the user is repeated until only a single partial diagnosis can be found, for example, {2,4}. This partial diagnosis is then called the preferred partial diagnosis and is known to be a subset of the true reason of the observed fault. The algorithm then continues to search for an additional set of conflicts with MERGEXPLAINand repeats the process for these new conflicts. In the example, the new conflict{6,7,8}is found. The component7, however, was already excluded because of the previously asked questions and is thus ignored. Therefore, only 2 partial diagnoses can be calculated with the new conflict and the system asks another query to find the preferred partial diagnosis among them. Since no more conflicts can be found in the next step, the preferred partial diagnosis determined this way is known to be a complete diagnosis and the true reason of the fault. In [Shc+16b], which is included in this thesis, the details of this technique are described and its correctness is proven.

Evaluation: To evaluate the new approach it was compared to another technique that calculates diagnoses directly without using the concept of conflicts and was shown to be efficient in [Shc+12]. The average reductions in computation time, number of queries, and number of queried statements, which were asked in the queries, are shown in Table 4.1 for the two tested types of systems.

Table 4.1: Average reductions of the computation time, number of queries, and number of queried statements of the new approach presented in [Shc+16b] compared to the technique presented in [Shc+12]. Values in parentheses show the reductions for systems that require more than a second to compute.

System type Time #Queries #Statements

Digital circuits 61% (81%) 30% 1%

Ontologies 83% (88%) 4% 5%

The results show that using partial diagnoses significantly reduces the time required to calculate the queries. This reduction in time is even bigger for those systems that require more than a second to compute (shown in parentheses in Table 4.1). For the most complex digital circuit, the technique proposed in [Shc+12] was not able to find the true reason of the fault after 24 hours while the new approach needed about 40 minutes. Regarding the number of required queries and queried statements in order to find the true reason of the fault, using partial diagnoses resulted in about the same numbers as the compared approach except for the number of queries for the digital circuits. For these systems the new approach was able to reduce the number of required queries by 30%. This means that using partial diagnoses does not lead to an increased amount of effort required by the user.

5

Creating a Corpus of Faulty

Spreadsheets

Most of the approaches for fault detection in spreadsheets are evaluated on real- world spreadsheets in which the researchers inserted faults manually or based on randomly mutating the formulas [Jan+14a]. Although these evaluations are a good indicator to show that the tested approaches could theoretically help to locate faults in the spreadsheets, whether these approaches would work for spreadsheets with real faults cannot be evaluated with certainty based on these artificial faults.

To assess the quality of new approaches for fault detection in practice, spreadsheets are required that contain formula faults made by real users. An additional challenge is that although many real-world spreadsheets probably contain faults, it has to be known where these faults are in order to evaluate if the techniques for spreadsheet debugging are able to detect them. Therefore, we need to know which formulas are faulty and how they should be corrected.

5.1 Types of Spreadsheets Used in Research

In the research literature about fault detection in spreadsheets, three different types of spreadsheets with fault information are used to evaluate the efficiency or effectiveness of the approaches. Examples of these evaluations are given in [Jan+14a]. The different types of spreadsheets used in existing evaluations can be summarized as follows:

• Artificial spreadsheets with artificial faults: These spreadsheets were de- signed by the researchers in order to evaluate their new approach. Often, such spreadsheets are inspired by real-world spreadsheets, but are much simpler and did not evolve over time. In addition, as the faults were artificially inserted by the researchers, evaluations solely based on these spreadsheets can only serve as a first indicator for the quality of the approach.

• Artificial spreadsheets with real faults: Spreadsheets of this category are created in spreadsheet development experiments, see [Pan00] for examples.

In these experiments the participants have to develop a spreadsheet to fulfill a given task. After the experiment, the experimenters can then check the created spreadsheets for faults as the expected behavior of the spreadsheets is well defined. Although the faults found this way are real, the spreadsheets themselves are artificial because they were only created for the experiment and it is not known how well the specified task fits to the tasks encountered in practice.

• Real-world spreadsheets with artificial faults: Most of the approaches for fault detection in spreadsheets are evaluated on spreadsheets of this category. These spreadsheets were used in the industry to solve real tasks and are thus a good example of what kind of spreadsheets can be found in the real world. Although many of these spreadsheets probably contain faults, no information about the contained faults is available, as the semantics of a spreadsheet cannot be reconstructed with certainty. Therefore, researchers insert artificial faults in these spreadsheets in order to use them for their evaluations.

As all of these spreadsheet types are not sufficient to fully evaluate the functionality of new approaches in the real world, spreadsheets of the fourth possible type are desirable.

• Real-world spreadsheets with real faults: The ideal spreadsheets to be used in an evaluation of a new fault detection approach are real-world spreadsheets for which the information about the contained real faults is available, i.e., the spreadsheets have faults made by real users and it is known which formulas are faulty and what the correct formulas should be. Since the spreadsheets of this category have been used to solve real tasks and their faults were made by real users, they represent good examples of faults that should be detected by all testing and fault localization techniques.

In document Automated model-based spreadsheet debugging (Page 30-34)