3.3 Specialized Replace Algorithms
4.1.5 Analyzing and Patching Open Source Applications:
We applied our analysis to three open source PHP web applications: (1)Webchess 0.9.0
(a server for playing chess over the internet) (2)EVE 1.0(a tracker for players activity for an
online game), and (3) Faqforge 1.3.2 (a document management tool). The sizes of these
applications are shown in 4.4. These applications are downloaded from sourceforge and are directly analyzed without any manual modification.
Application # of php files total loc # of sanitizers
XSS SQLI
1 Webchess 0.9.0 23 3375 421 140
2 EVE 1.0 8 906 114 17
3 Faqforge 1.3.2 10 534 375 133
Table 4.4: The sizes of analyzed applications.
Table 4.5 and Table 4.6 summarize the results of our XSS and SQLI vulnerability analysis respectively and the performance for signature generation. Notice that we omited results for pre-image computation for multi-input sanitizers since we do not generate vulnerability signa- tures for vulnerabilities in such sanitizers. We discovered 55 XSS and 61 SQLI vulnerabilities in these applications. (single, two, three) indicates the number of detected vulnerabilities that have single input, two inputs and three inputs, respectively. For example, all detected vulner- abilities in Faqforge have single input (denoted as (20, 0, 0)). That is, all sanitizers extracted from this application are single-input sanitizers.
# of Vul. Time (seconds) Memory (Kb) (single, two, three) total forward backward average
1 (24, 3, 0) 39.78 1.73 0.92 16850
1 (0, 0, 8) 160.7 6.80 − 125382
3 (20, 0, 0) 7.87 0.22 0.22 9948
Table 4.5: XSS vulnerability analysis results.
# of Vul. Time (seconds) Memory (Kb)
(single, 2, 3, 4) total forward backward average
1 (43, 3, 1, 2) 72.67 4.87 12.039 136790
2 (8, 3, 0, 0) 18.7 1.5 8.47 17280
3 (0, 0, 0, 0) 6.7 − − < 1
Table 4.6: SQLI vulnerability analysis results.
As shown in Table 4.5 and Table 4.6, the analysis cost seems affordable: the total time indicates the total time to analyze all php files in these applications from start to the end, which includes extraction time and policy-based repair time (vulnerability analysis and vulnerability signature generation for single-input sanitizers). It ranges from 7 seconds to 161 seconds. The
forwardtime indicates the total time to detect vulnerabilities in all extracted sanitizers includ-
ing post-image computation using forward analysis (Algorithm 1) and intersection with the attack pattern. The backward time indicates the total time to generate vulnerability signatures for all detected vulnerabilities in single-input sanitizers i.e., time to compute pre-image using backward analysis (Algorithm 2).
Mincut performance: The average time spent in generating the alphabet-cut from the vulnera- bility signature automata for for XSS (SQLI) was 0.05 (0) seconds per automaton for Faqforge and 0.06 (0.07) seconds per automaton for Webchess (we ignore EVE since it only contains multi-input sanitizers).
All of the generated alphabet-cuts contain only a single character per each input. For each
XSS single track automata the cut is only the character < which is the optimum cut (conse-
quently the optimum sanitization with respect to the attack pattern). The automatically gen- erated sanitization (replace) statements from our analysis were almost the same as the ones
that are manually written except that they delete the < character instead of replacing it with
the HTML entity “<” as is typically done in manual sanitization. On the other hand, for
each SQLI single track automata the cut is only the character = which is the optimum cut
(consequently the optimum sanitization with respect to the attack pattern).
Match performance: We evaluated the overhead of running the generated match code to simulate one of the vulnerability signature automata from Webchess against a manually written PHPpreg_matchthat performs the same task. Bothpreg_matchand ourstranger_matchare
written as C extensions to PHP and called from a PHP script on the same input. We evaluated the overhead of running this code on 10 sets of randomly generated strings each containing 1000 strings of the same length. The lengths started from 100 characters per string for the first set, adding 100 more characters for each new set and going up to 1000 characters per string for the last set. The results are shown in Figure 4.5. Clearly, automatically generated match does not cause an extra overhead compared to the manually written one. The time of matching a 1000 character string to the vulnerability signature automaton is less than 0.35 milliseconds. How to use our analysis result: Our analysis produces two artifacts: a PHP extension that
contains a number of patch functions FP where each function contains a stranger_match_*
Figure 4.5: Input matching overhead using stranger_match to simulate vulnerability sig- nature DFA.
We generate one patch function FP for each extracted sanitizer. In PHP, user input from such
places as $_GETand $_POSTis always available at the first program point in the script. This
means that if we want to sanitize the inputs, we need to do it at the first PHP line of the target script. Inserting calls to these patch functions can easily be automated as we have the file names for each of the input variables along with the variables’ names from the parsing phase. Note that we are analyzing PHP scripts statically in a sound manner where we only deal with one script at a time along with all the files it includes
We used the result of our analysis to sanitize the three applications above by placing the au- tomatically generated sanitization statements at the beginning of each vulnerable script. Then we ran our forward vulnerability analysis which reported zero vulnerabilities with regard to the attack pattern mentioned above.
1 sanitizer isValidEmail(x) { 2 x = trim(x); 3 if(!x matches 4 /^[a-z0-9!#$%&’*+/=?^_‘{|}~-]+ 5 (?:\.[a-z0-9!#$%&’*+/=?^_‘{|}~-]+)*@ 6 (?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+ 7 [a-z0-9](?:[a-z0-9-]*[a-z0-9])$/)) 8 { 8 reject; 10 } 11 return x; 12 }
Figure 4.6: An over-constrained validator that corresponds to Javascript function in Figure 1.7.