We applied the approach described above on the implementation of our benchmark in Ruby on Rails.
We will now present and discuss the results of both the SQLi benchmark, as well as the XSS benchmark.
7.3.1 Results of the SQLi benchmark
The SQLi benchmark was analysed with both Arachni and W3af. We will discuss these results per scanner, starting with Arachni.
Analysis with Arachni
The results of the analysis of the SQLi benchmark with Arachni are displayed in table 7.2. Arachni found quite some potential vulnerabilities: 55 in total. We will discuss these results per submodule, followed by a discussion of the scan run times.
7.3. Results of the analysis
Normal SQLi Blind SQLi diff Blind SQLi time Combined
Total TP FP Total TP FP Total TP FP Total TP FP Time
Injection 2 2 0 0 0 0 0 0 0 2 2 0 00:14
Create 0 0 0 0 0 0 0 0 0 0 0 0 13:26
Read 0 0 0 0 0 0 1 0 1 1 0 1 01:23:32
Update 0 0 0 52 0 52 0 0 0 52 0 52 23:30
Delete 0 0 0 0 0 0 0 0 0 0 0 0 18:17
Total 2 2 0 52 0 52 1 0 1 55 2 53 2:18:59
Table 7.2: Experimental results of the Ruby on Rails SQLi benchmark analysis using Arachni. We list the found vulnerabilities for each benchmark submodule per auditor. Total indicates the number of vulnerabilities found by that auditor. TP are the number of true positives (actual vulnerabilities), while FP indicates the number of false positives (wrong results). Time is the total running time of the complete scan for that submodule (in hh:mm:ss).
Injection tests
Arachni found two vulnerabilities in the injection tests module. These vulnerabilities are actual vulner-abilities, i.e. true positives. However, these were to be expected, since they were found in the injection module, which acts as our self-sanity check, as explained in benchmark design choice 9. This module contains two tests and indeed two vulnerabilities were found by Arachni, so everything seems to function correctly.
Read tests
There was only one vulnerability found in the read tests module. It was found by the blind SQLi using timing attacks auditor. This vulnerability is listed as a false positive result. First of all the fact that the normal SQL injection auditor did not found a vulnerability already indicates that it might be a false positive. This is because the benchmark application has full error reports enabled, thus the normal SQLi auditor should be able to detect every SQL injection. The blind auditors are only enabled as a safe-guard, as was discussed in section 6.3.3. However, to be completely sure this is indeed a false positive, we employed manual verification and tried to reproduce the vulnerability. Our investigations showed the result indeed is a false positive. It might have been caused by server lag, making the server response too slow, which lead Arachni to believe its timing attack succeed.
Update tests
The scan found 52 vulnerabilities in the update test module. All these vulnerabilities were detected by the blind SQLi using differential analysis auditor and were categorised as false positives. Again, a first indication is that the normal SQLi auditor did not find any vulnerabilities. Furthermore, our manual verifications showed that these results are indeed false positives. They were found by Arachni using differential analysis, meaning Arachni compares responses of two attacks and if they differ too much, then there might be a vulnerability. However, these responses could also differ for many other reasons and do not necessarily have to be caused by a real vulnerability. The differential analysis was likely too sensitive, determining two responses are different even if they only differ slightly2.
Run times
As for scan run times we see the read module takes by far the longest to execute. This is expected, since its the largest module, i.e. contains the most tests. The create, update and delete modules only take about 10-25 minutes, which is reasonable. The injection module is extremely fast, which makes sense, since it only contains two simple tests. The total execution time for the complete benchmark is below two and a half hours, which we think is reasonable, considering the old hardware the analysis was performed on. With regard to the running times we should note that these are obtained in one run only, and therefore not very reliable. This is not an issue since they are not meant for comparison, but just to give some indication of how long the analysis takes.
2Unfortunately, Arachni offers no option to configure the sensitivity, so we could not set it to a more logical value.
Analysis with W3af
The results of the analysis of the SQLi benchmark with W3af are shown in table 7.3. As can be seen from this table, W3af found only nine vulnerabilities in total, of which two were anticipated true positives and seven were unexpected false positives. Again we discuss the results per submodule.
Normal SQLi Blind SQLi diff Blind SQLi time Combined
Total TP FP Total TP FP Total TP FP Total TP FP Time
Injection 2 2 0 0 0 0 0 0 0 2 2 0 00:13
Create 0 0 0 0 0 0 0 0 0 0 0 0 11:27
Read 0 0 0 0 0 0 0 0 0 0 0 0 44:58
Update 0 0 0 7 0 7 0 0 0 7 0 7 29:24
Delete 0 0 0 0 0 0 0 0 0 0 0 0 09:35
Total 2 2 0 7 0 7 0 0 0 9 2 7 01:35:37
Table 7.3: Experimental results of the Ruby on Rails SQLi benchmark analysis using W3af.
Injection tests
W3af found two true positives in the injection tests module. Again, this was expected and indicates the benchmark and scanner are working correctly.
Update tests
The update tests module contains seven vulnerabilities according to W3af, however, all were false pos-itives. They are similar to the ones Arachni found, and likely caused for the same reason, namely that the differential analysis was too sensitive. However, W3af found much less false positives, indicating it employs slightly less thorough analysis than Arachni.
Run times
As for the run times we see the read module takes the longest, followed by the update module. This is expected, since these two modules are the largest. The create and delete modules only take about ten minutes, while the injection module is extremely fast again. The total execution time is slightly above an hour and a half, which we think is reasonable and even better than Arachni.
Analysis of Rails 3.2
At the time we started implementing the benchmark in Rails, the current version (Rails 4.0) was not released yet. Because of this we first implemented the SQLi benchmark in Rails 3.2. Later, we decided to switch to Rails 4. Therefore, we also have an almost complete implementation of the SQLi benchmark in Rails 3.2. We analysed this version of the benchmark as well, using the newest versions of the software available to us at that time. Therefore, the analysis environment was slightly different. We used Rails 3.2.14 in combination with Arel 3.0.2 and PG 0.17.0, PostgreSQL 9.1.9, Arachni 0.4.5.2 and W3af 1.6 (revision 3ef1aa4e9e). The results of the analysis are similar: the expected two true positives in the injection module, a number of false positives, but no real vulnerabilities. We only mention these results very briefly, without discussing them in detail, because the main focus of this research is on the 4.0 version of the Rails framework.
7.3.2 Results of the XSS benchmark
As discussed in section 6.3.2, the XSS benchmark was only analysed with Arachni. The results of this analysis are displayed in table 7.4. As can be seen in this table, there are quite some vulnerabilities found, especially when considering the XSS benchmark implementation is far from complete. We discuss the vulnerabilities per submodule.
Injection tests
Two of the vulnerabilities were caused by the injection tests module. This is the self sanity check and