5.3 Evaluation
5.3.2 Comparison with Artemis
For the second experiment, we have compared our method with an al- ready existing method – Artemis [5], in terms of maximum coverage the methods were able to achieve. We have used the same web application suite as in the experiments described in [5] (and we have excluded the same libraries from the results, e.g. jQuery), except for one applica- tion (we have excluded the AjaxPoll application because the application functionality is dependent on the state of the server, which is a feature that our approach does not yet take into account).
Table 5.2 shows the experiment results. The gains achieved by our method range from 0 – 9 percent, with an average of 3.1 percent. The gains are larger for applications in which input parameter values sig- nificantly influence the achieved coverage (e.g. if it is important which mouse button or keys were pressed), while they are smaller (or almost non-existent) in other types of applications. Our method is able to achieve these improvements because we rely on symbolic execution, and are able to generate values of event parameters that change the control- flow of the application.
60 Chapter 5. Automatic Scenario Generation
5.3
Evaluation
The evaluation of the automatic scenario generation technique is based on four experiments. In the first experiment, we study the coverage the process was able to achieve, and the number of generated scenarios and their events, when generating test cases for a suite of web applications. In the second experiment, we compare the results of our method, with the results obtained with a related method – Artemis [5]. For the third experiment, the goal was to compare the effectiveness of different pri- oritization functions, in terms of achieved statement coverage. Finally, in the forth experiment, we study how the process is able to generate feature scenarios for a case study application. All results were obtained
with the Firecrow tool1 (Chapter 8) which implements the algorithms
described in this chapter. The code of all web applications, the gen- erated scenarios, and other experiment results can be obtained from: www.fesb.hr/~jomaras/download/usageScenarioGenerator.zip.
5.3.1
Generating scenarios for the whole page
The first experiment was performed on a suite of web applications, most
of them obtained from the 10k and 1k JavaScript challenges1.
The goal of the first experiment was to compare the statement cov- erage achieved by simply loading the page, with the coverage achieved by using our method. Table 5.1 shows the results. For each application it shows: the lines of code (LOC); the statement coverage that can be achieved by simply loading the page (L-Cov); the maximum statement coverage the generated scenarios were able to achieve (A-Cov); the gain of our method, when compared to simply loading the page; the number of kept scenarios (S); and the total number of events in the kept scenarios (E). On average, the process was able to achieve additional 40 percent statement coverage when compared to the coverage achieved by loading the page. In order to cover these additional statements, our process, on average, generates 9 scenarios with 19 events.
1https://github.com/jomaras/Firecrow
1http://10k.aneventapart.com/and http://js1k.com/
5.3 Evaluation 61
Table 5.1: Experiment results for analyzing 100 scenarios for the whole application. LOC (Lines of Code), Cov (statement Coverage), L (Load- ing), A (Achieved), S (Scenarios), E (Events)
Application LOC L-Cov A-Cov Gain S E
ajaxtabscontent 238 66% 91% 25% 7 7 angelJump 313 66% 71% 5% 2 11 minesweeper 177 60% 94% 34% 4 7 prism3D 402 61% 91% 30% 15 33 snake 211 58% 99% 41% 6 13 dynamicArticles 178 35% 85% 50% 3 6 fractalViewer 1630 36% 77% 41% 14 42 rentingAgency 297 41% 100% 59% 14 15 tinySlider 128 47% 76% 29% 4 5 snowpar 353 20% 99% 79% 24 32 floatwar 458 18% 67% 49% 7 39 Average 398.6 46% 86% 40.1% 9 19
5.3.2
Comparison with Artemis
For the second experiment, we have compared our method with an al- ready existing method – Artemis [5], in terms of maximum coverage the methods were able to achieve. We have used the same web application suite as in the experiments described in [5] (and we have excluded the same libraries from the results, e.g. jQuery), except for one applica- tion (we have excluded the AjaxPoll application because the application functionality is dependent on the state of the server, which is a feature that our approach does not yet take into account).
Table 5.2 shows the experiment results. The gains achieved by our method range from 0 – 9 percent, with an average of 3.1 percent. The gains are larger for applications in which input parameter values sig- nificantly influence the achieved coverage (e.g. if it is important which mouse button or keys were pressed), while they are smaller (or almost non-existent) in other types of applications. Our method is able to achieve these improvements because we rely on symbolic execution, and are able to generate values of event parameters that change the control- flow of the application.
62 Chapter 5. Automatic Scenario Generation
Table 5.2: Comparison of the experiment results between Artemis and our approach (Firecrow)
Application Artemis Firecrow Gain
3dModeller 74% 82% 8% ajaxtabscontent 89% 91% 2% ball pool 90% 99% 9% dragable-boxes 62% 63% 1% dynamicArticles 82% 85% 3% fractalViewer 75% 77% 2% homeostasis 63% 63% 0% htmlEdit 63% 63% 0% pacman 44% 47% 3% Average 71.3% 82.6% 3.1%
tests for a single application does not take more than 2 minutes on an average PC (but they do not present concrete numbers). In our case (Ta- ble 5.3), the time required to generate tests for a single application is, on average, 12 minutes. This happens because the backbone of their ap-
proach is the WebKit1browser engine, while we use our own JavaScript
interpreter (Chapter 8), written in JavaScript, which is not optimized for speed. We do this, because our approach uses tracking of symbolic expressions and the dependency graph to generate scenarios, and this in- formation is obtained with the custom JavaScript interpreter. However, the techniques and algorithms used in our approach could be adapted to WebKit, Firefox, or Chrome, if these browsers would support tracking of symbolic expressions and the generation of dependency graphs. This would considerably reduce the time required to generate web application scenarios. This is part of our future work.