COMPARISON OF SQL INJECTION
DETECTION TECHNIQUES WHICH
USES CHI-SQUARE TEST
KHWAIRAKPAM AMITAB
Department of Computer Science and Engineering PEC, University of Technology, Chandigarh-160012, India
PADMAVATI
Department of Computer Science and Engineering PEC, University of Technology, Chandigarh-160012, India
Abstract :
Database driven by interactive web applications are at risk of SQL Injection Attacks (SQLIA) these applications accept user inputs and use them to form SQL statements. During SQL injection process the attacker inputs malicious SQL query segments which will result in different database request. SQLIA can be use to bypass authentication control and also extract and/or modify valuable information. In order to encounter such type of threats different techniques are purposed by researchers but most of the implemented approaches which uses anomaly detection model have very high false alert. In this paper we have analyze existing detection techniques that uses Chi-square test. And we have evaluated these techniques against SQLIA and normal request.
Keywords: SQL injection attack; Anomaly detection; Chi-square test; false positive; true positive. 1. Introduction
The rapid growth of internet technology results in offering wide range of web based service. For example financial transaction, social network service, etc. These applications and their underlying databases often store confidential and sensitive data. And these web applications usually accept data from users and bring these data to access the back-end database; these types of applications carry the possibility of being exposed to the SQL injection attacks. According to Open Web Application Security Project 2010, SQL injection attack is most popular code injection technique used in system hacking or cracking to gain information or unauthorized access to a system [1].
SQLIA is a code injection technique where the attacker attempts to change the logic or semantic of a legitimate SQL statement by inserting new SQL keywords or operator in the statement. These keywords are inserted through user input field that has not been checked to see that it is valid. The injection process works by inserting SQL codes which form a syntactically correct SQL command when concatenated with dynamic SQL command. Once the attacker successfully injects the attack into the database, the database will be susceptible of being altered, extracted or even dropped.
Consider an example query which is usually used in authentication form. SELECT * FROM user WHERE name= ‘+UName+ ’and password=’+PWord+’;
The values of “UName” and “PWord” are the actual values obtained from username textbox and password textbox of the authentication form. The intent of using the username and password obtained from the form is to see if there is matching username and password in the User table. If any rows are returned, the user is authenticated. However, if the web programmer is not careful and uses this method and takes input from the form without proper checking the inputs, a hacker may put malicious code. The hacker can specify a valid user name as “admin” and then specify the password as “' OR '1'='1” in the form. The final test SQL query that uses these values will be:
SELECT * FROM user WHERE name= ‘admin’ and password=’ ’OR ‘1’=’1’;
injection attack; each type SQLIA has different ways to insert their manipulated statement depending on the goal of the attacker. For detail on types of SQLIA please refer [2-6].
The problem of SQL injection is with vulnerable web application not the web server or services running in the operating system. The common spots of vulnerability are user input text box and uniform resource locator (URL). This vulnerability allows the attacker to gain complete access to underlying database. To overcome this problem the developer has to consider range of security measures however practically this approach is human based and porn to error. In order to solve this problem researchers have developed various detection [7-9] and prevention techniques [10-13]. One of the most promising approaches is anomaly detection technique. The basic idea underlying anomaly detection is that the attack pattern differs from normal behavior. But most of this approach have limitation and are not able to detect all types of SQLIA.
The aim of this paper is to introduce detection techniques which uses Chi-square statics and comparatively evaluates these techniques against SQLIA. The chi-square statistic [10] is a nonparametric statistical technique used to determine if a distribution of observed frequencies differs from the theoretical expected frequencies.
The paper is organized as follows. In section 2 we review currently existing SQLIA detection techniques. In section 3 we evaluate SQL injection detection techniques against SQLIA types. Conclusion and future work are presented in section 4.
2. SQL Injection Detection Techniques
The concept of using Chi-Square statics to detect SQLIA was introduced by C.Kruegel [7] the authors have proposed six different models to detect web based attack. One among these is Attribute character distribution model which captures the concept of a `normal' or `regular' query parameter by looking at its character distribution. The character distribution of an attribute that is perfectly normal is termed as attribute's Idealized Character Distribution (ICD). ICD is determined during the training phase. For each observed query attribute, its character distribution is stored. ICD is then approximated by calculating the average of all stored character distributions. The task of the detection phase is to determine the probability that the character distribution of a query attribute is an actual sample drawn from its ICD. They have group the character in to six group and test for anomaly using Chi-Square ( ) [14,15].
= ∑ ( ) (1)
Where n denotes number of groups, and denotes observe frequency and expected frequency respectively. The author of the work have made the observation that the character distribution model did not prove to be very useful in the detection of input validation attacks which is the category of attacks that SQL injection falls under. We believe this limitation of character distribution model arises due to the inappropriate groupings formed by their model.
A similar work analyzing characters was developed by Mehdi Kiani [16]. In their algorithm named as same character comparison (SCC) model they intercept an HTTP request and extract the query section and calculate the frequency of each character in a query. Once all the queries have been process they calculate the cumulative character count. And place them into groups, they also perform regrouping in order to make the frequency of each group greater than or equal five. In the testing phase and are calculated exactly in same manner as Kruegel and Vigna did. If the anomaly scored is greater than a particular threshold an alert is trigged. Their experimentation is mainly focused on UNION and tautology attacks and the results show that the SCC model is extremely effective in the detection of UNION attacks. While the results related to the detection of tautology attacks were not as good as those for the UNION attacks. And they have comparatively evaluated their proposed approaches with ICD and shown that SCC is superior to ICD.
Another approach proposed by Rajagopal [17] is attribute character distribution (ACD), during their learning phase they counts the number of alphabet, numeric and special symbol characters. These counts are bin into three groups one for each alphabet, numeric and special symbol characters. At the end of learning phase the system calculates the average of all the observed values which is termed as ACD. During the detection phase, the algorithm determines ACD for user supplied inputs and normalize the result in order to obtain . Similarly
moreover the sample size is increase to double which gives a better result in using statistical method. To determine HCD convert each character into hexadecimal codes, group them into four pre defined groups and determine the relative frequency for each group. During the training phase HCD is applied on data collected to derive which can represent normal inputs, in the testing phase is calculated from user supplied inputs and is multiplied with the length of user supplied input. And compute the value of using Eqn.1. Compare the calculated with the well known look up table at three as degree of freedom. If the calculated result of is less than or equal to the table value then conclude that the user input is normal, else conclude that the user input is anomaly. The experimental result shows that HCD is able to detect common type of SQLIA and is superior to ACD.
3. Experiment And Result Analysis
In this section we comparatively evaluate the techniques presented in the previous section. In order to carry out this experiment we have developed a web application using ASP.NET and integrate the detection techniques implemented in C# which are to be tested. Large number of user name and password are collected from internet [18-21]. This same data are used in all the technique in order to determine expected frequencies. We have observed that the results all of the techniques discuss in this paper are dependent on the training data used. The observe frequency are derived during the testing phase from the supplied inputs. ICD is excluded from the comparisons because SSC and ACD are more superior. As all the techniques are based on anomaly detection model, we have evaluated the result based on false positive and true positive rate.
Table 1. Sample of data use in determination of true positive rate.
sl User Name Password HCD ACD SCC
1 admin 1 OR 1=1 blocked blocked access
2 admin 1' OR '1'='1 blocked blocked access
3 1' OR '1'='1 1' OR '1'='1 blocked blocked blocked
4 abcd 1 AND 1=1 blocked blocked access
5 abcd 1' AND 1=(SELECT COUNT(*) FROM tablenames);
--
blocked blocked blocked
6 Default ' OR uname IS NOT NULL OR uname = ' blocked blocked blocked
7 abcd Exec(char(60)) exec(char(61)) blocked access blocked
8 ') or ('1'='1 Password blocked blocked access
9 ABCD ‘;drop table users; blocked access blocked
10 username Exec(char(60)) or Exec(char(60)) 1 Exec(char(60)) 1 blocked blocked blocked
11 admin ;1’=’1 access access access
12 database UNION ALL SELECT * from user-- blocked blocked blocked
13 12345 ‘or’1’=’1 blocked blocked access
14 exec(char(0x73687574646f7 76e));
123 blocked access blocked
15 Admin;1’=’1 12345 blocked access access
16 newuser ‘or’a’=’a blocked access access
17 UNION ALL SELECT *
from user--
password blocked blocked blocked
18 ‘ UNION SELECT name,
type, id FROM sysobject;--
12345 blocked blocked blocked
19 abcd ' UNION SELECT UserName, Password, Isadam
FROM Users;--
blocked blocked blocked
20 anomaly '; DELETE Orders;-- access blocked access
True Positive rate =
Number of attack correctly identi ied as an attack
Total number of attack (2)
Table 2. Samples of data used in determination of false positive rate.
False Positive rate =
Number of normal request incorrectly identi ied
Total number of request (3)
To determine false positive rate we use normal (not anomaly) string and calculate the result using Eqn.3. Table II shows the samples of username and password used in determination of false positive rate. In this table if a particular technique indentify wrongly an normal request as an attack then we have marked as ”false”. For technique which identify correctly are marked as “true”.
Figure. True Positive and False Positive rate.
From Fig. we can observe that HCD has 0.9 true positive rates and 0.138 false positive rates which show that this technique has very high detection rate and gives very low false alert. ACD has 0.7 true positive rates and
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
HCD ACD SCC
True Positive
False Positive
Sl. No. User Name Password HCD ACD SCC
1 username password true true false
2 adsf 1234 true true false
3 login hacker true true true
4 antonio cheers true true true
5 haven 12345 true true true
6 mekster11 program true true false
7 jolie720 31904aaf true true true
8 steven15 0010jg3522 false true false
9 pual23 candyy5 true true true
10 heatrock1 tomboy05 true true true
11 crayon snooke1 true true false
12 friends chris87 true true true
13 assclowns! 7/7/90k true false true
14 jamarcus 19875317 true true true
15 kingston6 abc123 true true true
16 destiny22 LUMIDEE4U true false false
17 2cent2 schroeder;' true false false
18 r.i.pmario insane140 true true true
19 alexis justme7 true true false
20 DAISY jjjjjjj4 false false true
21 warning1 kobe12 true true true
22 princess7 watermelons true true false
23 ricky soccer12 true true true
24 october 3056299020 false true false
4. Conclusions and future work
In this paper we investigate SQLIA detection techniques which use chi-square test. And comparatively evaluated these techniques and found that HCD model has the highest detection rate and gives very low false alert.
In future work we aim to further explore HCD approach and extend this work in-order to detect all kind of web based attack.
Reference
[1] Open Web Application Security Project, “Top 10 Web application vulnerabilities for 2010”, "http://www.owasp.org/index.php/"
[2] Khwairakpam Amitab, Padmavati “Hexadecimal conversion to detect SQL inject attack using chi square test”, 1st International
conference on innovative science and engineering technology(ICISET),2011.
[3] Atefeh Tajpour and Maslin Massrum. “Comparison of SQL Injection Detection and Prevention Techniques”, 2nd International
Conference on Education Technology and Computer(ICETC), 2010.
[4] V.Shanmughalaneethi, C.emilin Shyni and S.Swamynathan. “SBSQLID: Securing Web Applications with Service Based SQL
Injection Detection”, International Conference on Advance in Computing, Control, and Telecomunication Technologies, 2009.
[5] MeiJunjin, “An approach for SQL Injection vulnerability detection”, Sixth International Conference on Information Technology:
New Generation, 2009.
[6] William G.J. Halfond,Jeremy, Viegas, and Alessandro Orso,” A Classification of SQL Injection Attacksand
Countermeasures”,IEEE 2006.
[7] Christopher Kruegel,Giovanni Vigna, “Anomaly Detection of Web based Attacks”, CCS’03
[8] Frank S. Rietta, “ Application Layer Intrusion detection system”,
[9] Elisa Bertino, Ashish Kamara and James P. Early. “Profiling Database Application to Detect SQL Injection Attacks”, 2007.
[10] Ke Wei, M. Muthuprasanna, Suraj Kothari,” Preventing SQL Injection Attacks in Stored Procedures”, Proceedings of the 2006
Australian Software Engineering Conference (ASWEC’06), IEEE 2006.
[11] R. Ezumalai, G. Aghila,” Combinatorial Approach for Preventing SQL Injection Attacks”, International Advance Computing
Conference(IACC 2009),IEEE 2009.
[12] M. Muthuprasanna, Ke Wei and Suraj Kothari,” Eliminating SQL Injection Attacks- A Transparent Defense Mechanism”, Eight
IEEE International Symposium on Web Site Evolution(WSE 06), 2006.
[13] Jin-Cherng Lin and Jan-Min Chen. “The Automatic Defence Mechanism for Malicious Injection Attack”, Seventh International
Conference on computer and Information Technology, 2007.
[14] B. Weaver, "Assumptions/restrictions for use of chi square tests," vol. 2007, 2006. [15] M. Mamahlodi, "What is the chi-square statistic?," vol.2007, 2006.
[16] Mehdi Kiani, Andrew Clark and George Mohay. “Evaluation of Anomaly Based Character Distribution Models in the Detection
of SQL injection Attacks”. The Third International Conference on Availability, Reliability and Security, 2008.
[17] Rajagopal G. Sriraghavan and Luca Lucchese, “Data processing and Anomaly Detection in Web-Based Applications”, 2008.
[18] Help Net Security, “Analysis of 32 million bresches passwords”, 2010.
http://www.net-security.org/secworld.php?id=8742.
[19] Download torrent “RockYou.com UserAccount-Passwords”,
http://thepiratebay.org/torrent/5232943/RockYou.com_UserAccount-passwords.
[20] Torrent Hound, “45000 hacked myspace accounts (login and password)”, 2008.
http://www.torrenthound.com/hash/5597f856d375092e683f26ff3cb2d2c411aaa118/torrent-info/45000-hacked-myspace-accounts-login-and-passwords-txt-3802623-.
[21] Whats My Pass?,”The Top 500 Worst Passwords of All Time”, 2008.