The More Efficient and Effective Approach for Extracting Archi-

4.4 Evaluation of SADGE Framework

4.4.1 The More Efficient and Effective Approach for Extracting Archi-

We conducted two experiments to find out the more efficient approach for extracting architectural issues among manual, automatic and semi-automatic approaches; the results

would indicate the nominated approach, among automatic and semi-automatic to be in- cluded in SADGE. Besides, the experiment results would evaluate the efficiency and effectiveness of the automatic approach against the completely manual approach. In both experiments (with IT students and expert architects), the participants were, in the first stage, asked to manually annotate a sample text from architectural documents. The outcome of this step is representative for the manual approach. Prior to the experiment sessions, the sample text was annotated by the Automatic Annotator (AA) to be representative for the automatic approach. In the second stage of both experiments, the annotated sentences by AA were given to the participants for accepting/rejecting AA annotations. The outcome of this step was representative for the semi-automatic approach. Details about the preparation, sessions, materials, participants and stages of experiments are pub- lished in Paper 3 (P3) and Paper 4 (P4).

Table 4.1 shows the results of the experiment with 19 IT students. Lower processing time of the automatic approach compared to the manual and semi-automatic approaches is not a surprising result. But when it comes to the recall, we did not expect that the automatic approach gives a higher result. The lower recall in the semi-automatic approach compared to the automatic approach, reveals that the participants have rejected some of the true positive results of the automatic part (AA annotations). We speculated the expertise level of the participants as the reason, as it is reported in Paper 3.

Approach Time (min) Recall (%)

Manual 9 38

Automatic 0.03 86

Semi-automatic 3 55

Table 4.1: Results of experiment on IT students

We conducted a new experiment to test this hypothesis, after improving the framework. This time the participants were 21 expert software/IT architects and the material were different from the first experiment. Table 4.2 shows the results.

Approach Time (min) Recall (%)

Manual 14 34

Automatic 0.03 53.5

Semi-automatic 7 32

Table 4.2: Results of experiment on expert architects

Although the numbers are different from the previous experiment, the same pattern has occurred again: Besides processing time that is much lower in the automatic approach compared to the other two approaches, recall is shown to be higher in the automatic approach. However, a statistic test needs to be run to find out whether this difference of recall is significant or not. To test data normality, the data was analyzed applying

4.4. EVALUATION OF SADGE FRAMEWORK 43 Shapiro-Wilks test. Since the data was shown to have a normal distribution, a two-sample Kolmogorov-Smirnov (K-S) test was applied to compare the recall in the automatic, manual, and semi-automatic approaches.

Comparison p-value

Automatic vs. Manual 0.001774

Automatic vs. Semi-automatic 0.0001581

Table 4.3: Results of K-S test

As Table 4.3 shows, the recall of the automatic approach is significantly higher than the manual and semi-automatic approaches, setting p-value threshold at α=0.01. A significantly lower recall of semi-the automatic approach compared to the automatic approach in the experiment with expert architects rejects the assumption about the expertise level being the reason for rejecting some of the correct suggestion of AA by participants. In the search for justification of the lower recall of semi-automatic approach, we found that researchers in the field of psychology of decision-making have an empirical explana- tion: ”Several studies have shown that human decision makers are inferior to a prediction formula even when they are given the score suggested by the formula. They feel that they can overrule the formula because they have additional information about the case, but they are wrong more often than not” [Kah11]. Therefore, the research in the psychology field suggests ”to maximize predictive accuracy, final decisions should be left to formulas, especially in low-validity environments” [Kah11]. Low-validity environments are the do- mains that involve a substantial degree of uncertainty and unpredictability [Kah11]. The task of annotating architectural issues in a document has a significant degree of uncertainty. Hence, it can be considered as a low-validity context, and the lower recall of the semi-automatic approach compared to the automatic approach, is not a surprising result. We should note that final decision here is not the architectural decision, but the decision about annotating or not annotating a sentence as an architectural issue, to avoid any misunderstanding. SADGE is not an expert system replacing humans with artificial in- telligence for decision-making. Rather, as a support (recommender) system it highlights recurring issues, thus empowering architects in making more comprehensive decisions. In summary, considering the metrics of evaluation, processing time and recall, the automatic approach has shown to be more efficient than the semi-automatic approach for extracting architectural issues. In the initial version of the framework presented in Paper 3, we considered the manual fine-tune stage as a part of the framework. After the experiment with the experts, we conclude that this stage can be excluded. However, based on the results of the case study presented in Paper 4, we selected the hybrid method as the proper method for construction of the CoT. The hybrid method includes the manual boot- strapping and therefore the framework can be considered semi-automated, and its name Semi-Automated Design Guidance Enhancer (SADGE) is still justified.

4.4.2 Efficiency and Effectiveness of SADGE in Extracting Architec-

In document A Rule-based Framework for Enhancing Architectural Decision Guidance (Page 60-63)