COMPARISON OF SQL INJECTION DETECTION TECHNIQUES WHICH USES CHI-SQUARE TEST

(1)

COMPARISON OF SQL INJECTION

DETECTION TECHNIQUES WHICH

USES CHI-SQUARE TEST

KHWAIRAKPAM AMITAB

Department of Computer Science and Engineering PEC, University of Technology, Chandigarh-160012, India

[email protected]

PADMAVATI

Department of Computer Science and Engineering PEC, University of Technology, Chandigarh-160012, India

[email protected]

Abstract :

Database driven by interactive web applications are at risk of SQL Injection Attacks (SQLIA) these applications accept user inputs and use them to form SQL statements. During SQL injection process the attacker inputs malicious SQL query segments which will result in different database request. SQLIA can be use to bypass authentication control and also extract and/or modify valuable information. In order to encounter such type of threats different techniques are purposed by researchers but most of the implemented approaches which uses anomaly detection model have very high false alert. In this paper we have analyze existing detection techniques that uses Chi-square test. And we have evaluated these techniques against SQLIA and normal request.

Keywords: SQL injection attack; Anomaly detection; Chi-square test; false positive; true positive. 1. Introduction

The rapid growth of internet technology results in offering wide range of web based service. For example financial transaction, social network service, etc. These applications and their underlying databases often store confidential and sensitive data. And these web applications usually accept data from users and bring these data to access the back-end database; these types of applications carry the possibility of being exposed to the SQL injection attacks. According to Open Web Application Security Project 2010, SQL injection attack is most popular code injection technique used in system hacking or cracking to gain information or unauthorized access to a system [1].

SQLIA is a code injection technique where the attacker attempts to change the logic or semantic of a legitimate SQL statement by inserting new SQL keywords or operator in the statement. These keywords are inserted through user input field that has not been checked to see that it is valid. The injection process works by inserting SQL codes which form a syntactically correct SQL command when concatenated with dynamic SQL command. Once the attacker successfully injects the attack into the database, the database will be susceptible of being altered, extracted or even dropped.

Consider an example query which is usually used in authentication form. SELECT * FROM user WHERE name= ‘+UName+ ’and password=’+PWord+’;

The values of “UName” and “PWord” are the actual values obtained from username textbox and password textbox of the authentication form. The intent of using the username and password obtained from the form is to see if there is matching username and password in the User table. If any rows are returned, the user is authenticated. However, if the web programmer is not careful and uses this method and takes input from the form without proper checking the inputs, a hacker may put malicious code. The hacker can specify a valid user name as “admin” and then specify the password as “' OR '1'='1” in the form. The final test SQL query that uses these values will be:

SELECT * FROM user WHERE name= ‘admin’ and password=’ ’OR ‘1’=’1’;

(2)

injection attack; each type SQLIA has different ways to insert their manipulated statement depending on the goal of the attacker. For detail on types of SQLIA please refer [2-6].

The problem of SQL injection is with vulnerable web application not the web server or services running in the operating system. The common spots of vulnerability are user input text box and uniform resource locator (URL). This vulnerability allows the attacker to gain complete access to underlying database. To overcome this problem the developer has to consider range of security measures however practically this approach is human based and porn to error. In order to solve this problem researchers have developed various detection [7-9] and prevention techniques [10-13]. One of the most promising approaches is anomaly detection technique. The basic idea underlying anomaly detection is that the attack pattern differs from normal behavior. But most of this approach have limitation and are not able to detect all types of SQLIA.

The aim of this paper is to introduce detection techniques which uses Chi-square statics and comparatively evaluates these techniques against SQLIA. The chi-square statistic [10] is a nonparametric statistical technique used to determine if a distribution of observed frequencies differs from the theoretical expected frequencies.

The paper is organized as follows. In section 2 we review currently existing SQLIA detection techniques. In section 3 we evaluate SQL injection detection techniques against SQLIA types. Conclusion and future work are presented in section 4.

2. SQL Injection Detection Techniques

The concept of using Chi-Square statics to detect SQLIA was introduced by C.Kruegel [7] the authors have proposed six different models to detect web based attack. One among these is Attribute character distribution model which captures the concept of a `normal' or `regular' query parameter by looking at its character distribution. The character distribution of an attribute that is perfectly normal is termed as attribute's Idealized Character Distribution (ICD). ICD is determined during the training phase. For each observed query attribute, its character distribution is stored. ICD is then approximated by calculating the average of all stored character distributions. The task of the detection phase is to determine the probability that the character distribution of a query attribute is an actual sample drawn from its ICD. They have group the character in to six group and test for anomaly using Chi-Square ( ) [14,15].

= ∑ ( ) (1)

Where n denotes number of groups, and denotes observe frequency and expected frequency respectively. The author of the work have made the observation that the character distribution model did not prove to be very useful in the detection of input validation attacks which is the category of attacks that SQL injection falls under. We believe this limitation of character distribution model arises due to the inappropriate groupings formed by their model.

A similar work analyzing characters was developed by Mehdi Kiani [16]. In their algorithm named as same character comparison (SCC) model they intercept an HTTP request and extract the query section and calculate the frequency of each character in a query. Once all the queries have been process they calculate the cumulative character count. And place them into groups, they also perform regrouping in order to make the frequency of each group greater than or equal five. In the testing phase and are calculated exactly in same manner as Kruegel and Vigna did. If the anomaly scored is greater than a particular threshold an alert is trigged. Their experimentation is mainly focused on UNION and tautology attacks and the results show that the SCC model is extremely effective in the detection of UNION attacks. While the results related to the detection of tautology attacks were not as good as those for the UNION attacks. And they have comparatively evaluated their proposed approaches with ICD and shown that SCC is superior to ICD.

Another approach proposed by Rajagopal [17] is attribute character distribution (ACD), during their learning phase they counts the number of alphabet, numeric and special symbol characters. These counts are bin into three groups one for each alphabet, numeric and special symbol characters. At the end of learning phase the system calculates the average of all the observed values which is termed as ACD. During the detection phase, the algorithm determines ACD for user supplied inputs and normalize the result in order to obtain . Similarly

(3)

moreover the sample size is increase to double which gives a better result in using statistical method. To determine HCD convert each character into hexadecimal codes, group them into four pre defined groups and determine the relative frequency for each group. During the training phase HCD is applied on data collected to derive which can represent normal inputs, in the testing phase is calculated from user supplied inputs and is multiplied with the length of user supplied input. And compute the value of using Eqn.1. Compare the calculated with the well known look up table at three as degree of freedom. If the calculated result of is less than or equal to the table value then conclude that the user input is normal, else conclude that the user input is anomaly. The experimental result shows that HCD is able to detect common type of SQLIA and is superior to ACD.

3. Experiment And Result Analysis

In this section we comparatively evaluate the techniques presented in the previous section. In order to carry out this experiment we have developed a web application using ASP.NET and integrate the detection techniques implemented in C# which are to be tested. Large number of user name and password are collected from internet [18-21]. This same data are used in all the technique in order to determine expected frequencies. We have observed that the results all of the techniques discuss in this paper are dependent on the training data used. The observe frequency are derived during the testing phase from the supplied inputs. ICD is excluded from the comparisons because SSC and ACD are more superior. As all the techniques are based on anomaly detection model, we have evaluated the result based on false positive and true positive rate.

Table 1. Sample of data use in determination of true positive rate.

sl User Name Password HCD ACD SCC

1 admin 1 OR 1=1 blocked blocked access

2 admin 1' OR '1'='1 blocked blocked access

3 1' OR '1'='1 1' OR '1'='1 blocked blocked blocked

4 abcd 1 AND 1=1 blocked blocked access

5 abcd 1' AND 1=(SELECT COUNT(*) FROM tablenames);

--

blocked blocked blocked

6 Default ' OR uname IS NOT NULL OR uname = ' blocked blocked blocked

7 abcd Exec(char(60)) exec(char(61)) blocked access blocked

8 ') or ('1'='1 Password blocked blocked access

9 ABCD ‘;drop table users; blocked access blocked

10 username Exec(char(60)) or Exec(char(60)) 1 Exec(char(60)) 1 blocked blocked blocked

11 admin ;1’=’1 access access access

12 database UNION ALL SELECT * from user-- blocked blocked blocked

13 12345 ‘or’1’=’1 blocked blocked access

14 exec(char(0x73687574646f7 76e));

123 blocked access blocked

15 Admin;1’=’1 12345 blocked access access

16 newuser ‘or’a’=’a blocked access access

17 UNION ALL SELECT *

from user--

password blocked blocked blocked

18 ‘ UNION SELECT name,

type, id FROM sysobject;--

12345 blocked blocked blocked

19 abcd ' UNION SELECT UserName, Password, Isadam

FROM Users;--

blocked blocked blocked

20 anomaly '; DELETE Orders;-- access blocked access

True Positive rate =

Number of attack correctly identi ied as an attack

Total number of attack (2)

(4)

Table 2. Samples of data used in determination of false positive rate.

False Positive rate =

Number of normal request incorrectly identi ied

Total number of request (3)

To determine false positive rate we use normal (not anomaly) string and calculate the result using Eqn.3. Table II shows the samples of username and password used in determination of false positive rate. In this table if a particular technique indentify wrongly an normal request as an attack then we have marked as ”false”. For technique which identify correctly are marked as “true”.

Figure. True Positive and False Positive rate.

From Fig. we can observe that HCD has 0.9 true positive rates and 0.138 false positive rates which show that this technique has very high detection rate and gives very low false alert. ACD has 0.7 true positive rates and

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

HCD ACD SCC

True Positive

False Positive

Sl. No. User Name Password HCD ACD SCC

1 username password true true false

2 adsf 1234 true true false

3 login hacker true true true

4 antonio cheers true true true

5 haven 12345 true true true

6 mekster11 program true true false

7 jolie720 31904aaf true true true

8 steven15 0010jg3522 false true false

9 pual23 candyy5 true true true

10 heatrock1 tomboy05 true true true

11 crayon snooke1 true true false

12 friends chris87 true true true

13 assclowns! 7/7/90k true false true

14 jamarcus 19875317 true true true

15 kingston6 abc123 true true true

16 destiny22 LUMIDEE4U true false false

17 2cent2 schroeder;' true false false

18 r.i.pmario insane140 true true true

19 alexis justme7 true true false

20 DAISY jjjjjjj4 false false true

21 warning1 kobe12 true true true

22 princess7 watermelons true true false

23 ricky soccer12 true true true

24 october 3056299020 false true false

(5)

4. Conclusions and future work

In this paper we investigate SQLIA detection techniques which use chi-square test. And comparatively evaluated these techniques and found that HCD model has the highest detection rate and gives very low false alert.

In future work we aim to further explore HCD approach and extend this work in-order to detect all kind of web based attack.

Reference

[1] Open Web Application Security Project, “Top 10 Web application vulnerabilities for 2010”, "http://www.owasp.org/index.php/"

[2] Khwairakpam Amitab, Padmavati “Hexadecimal conversion to detect SQL inject attack using chi square test”, 1st_{International}

conference on innovative science and engineering technology(ICISET),2011.

[3] Atefeh Tajpour and Maslin Massrum. “Comparison of SQL Injection Detection and Prevention Techniques”, 2nd_{International}

Conference on Education Technology and Computer(ICETC), 2010.

[4] V.Shanmughalaneethi, C.emilin Shyni and S.Swamynathan. “SBSQLID: Securing Web Applications with Service Based SQL

Injection Detection”, International Conference on Advance in Computing, Control, and Telecomunication Technologies, 2009.

[5] MeiJunjin, “An approach for SQL Injection vulnerability detection”, Sixth International Conference on Information Technology:

New Generation, 2009.

[6] William G.J. Halfond,Jeremy, Viegas, and Alessandro Orso,” A Classification of SQL Injection Attacksand

Countermeasures”,IEEE 2006.

[7] Christopher Kruegel,Giovanni Vigna, “Anomaly Detection of Web based Attacks”, CCS’03

[8] Frank S. Rietta, “ Application Layer Intrusion detection system”,

[9] Elisa Bertino, Ashish Kamara and James P. Early. “Profiling Database Application to Detect SQL Injection Attacks”, 2007.

[10] Ke Wei, M. Muthuprasanna, Suraj Kothari,” Preventing SQL Injection Attacks in Stored Procedures”, Proceedings of the 2006

Australian Software Engineering Conference (ASWEC’06), IEEE 2006.

[11] R. Ezumalai, G. Aghila,” Combinatorial Approach for Preventing SQL Injection Attacks”, International Advance Computing

Conference(IACC 2009),IEEE 2009.

[12] M. Muthuprasanna, Ke Wei and Suraj Kothari,” Eliminating SQL Injection Attacks- A Transparent Defense Mechanism”, Eight

IEEE International Symposium on Web Site Evolution(WSE 06), 2006.

[13] Jin-Cherng Lin and Jan-Min Chen. “The Automatic Defence Mechanism for Malicious Injection Attack”, Seventh International

Conference on computer and Information Technology, 2007.

[14] B. Weaver, "Assumptions/restrictions for use of chi square tests," vol. 2007, 2006. [15] M. Mamahlodi, "What is the chi-square statistic?," vol.2007, 2006.

[16] Mehdi Kiani, Andrew Clark and George Mohay. “Evaluation of Anomaly Based Character Distribution Models in the Detection

of SQL injection Attacks”. The Third International Conference on Availability, Reliability and Security, 2008.

[17] Rajagopal G. Sriraghavan and Luca Lucchese, “Data processing and Anomaly Detection in Web-Based Applications”, 2008.

[18] Help Net Security, “Analysis of 32 million bresches passwords”, 2010.

http://www.net-security.org/secworld.php?id=8742.

[19] Download torrent “RockYou.com UserAccount-Passwords”,

http://thepiratebay.org/torrent/5232943/RockYou.com_UserAccount-passwords.

[20] Torrent Hound, “45000 hacked myspace accounts (login and password)”, 2008.

http://www.torrenthound.com/hash/5597f856d375092e683f26ff3cb2d2c411aaa118/torrent-info/45000-hacked-myspace-accounts-login-and-passwords-txt-3802623-.

[21] Whats My Pass?,”The Top 500 Worst Passwords of All Time”, 2008.