Web Application Security 101
Real-world examples, tools and
techniques for securing websites
A WhiteHat Security White Paper
Table of Contents
WEB APPLICATION SECURITY 101 1
THE BASICS 3
ANATOMY OF WEB REQUESTS AND URL’S 5
HTTP AND HTTPS 5
WEB SERVER / DOMAIN NAME 6
DIRECTORY PATH 6
WEB APPLICATIONS 6
QUERY STRING 7
POST DATA 8
THREE PLACES TO ATTACK A WEBSITE 9
POST DATA 12
ARCHITECTURE SECURITY 13
SECURE SOFTWARE DEVELOPMENT PROGRAM 14
VULNERABILITY ASSESSMENT AND MANAGEMENT 15
Over 700 million people worldwide bank, shop, buy airline tickets, and perform research using the World Wide Web. With each transaction, private information, including names, addresses, phone numbers, credit card numbers, and passwords, are routinely transferred and stored in a variety of locations. Billions of dollars and millions of personal identities are at stake every day. In the past, security
professionals thought firewalls, Secure Sockets Layer (SSL), patching, and privacy policies were enough to protect websites from hackers (see 5 Myths of Web Application Security1). Today, with prominent
Web attacks taking place seemingly every week, the industry knows better.
The Web Application Security Consortium has identified twenty-four classes of Web attacks, including Cross-Site Scripting2 (XSS) and SQL
Injection3, used to prey upon corporations, their customers, and
educational institutions. These attacks are forcing many organizations to take a hard look at their existing web application security posture. In many cases, web application security is a new concept with many facets. This paper will examine the fundamental components of a website, entry points of web attacks, attack methodologies, and suggested preventive measures.
The best way to begin exploring web application security is by learning how the Web works. While most IT professionals are very comfortable with using a web browser to surf the Web, few of us look behind the application, at the client-server4 structure that powers the Web. This
structure governs the way web browsers (Firefox5, Microsoft Internet
Explorer6) must communicate with web servers7 (Apache8, Microsoft
IIS9) to retrieve web pages10. To peer deeper into the world of the
Web we’ll begin by looking at the web browser location bar (see diagram 1).
All major web browsers possess a location bar that displays the web address11 (URL) of the current web page. URL manipulation is one of
(location bars) are required to enable customers, partners, and
hackers to view your website. URL’s are used to uniquely identify the location of a web page or on-line resource. When traveling from one web page to the next, the displayed URL is updated. URLs, also
referred to as links, are commonly embedded in web pages to click on to visit other pages. URLs also tell us a lot about a website. They tell us what type of communication they expect, what type of operating system they run, the type of web application code is being used, and more. We’ll be exploring the anatomy of URLs closely in the following section and we’ll look at how each section can be vulnerable to attack.
Diagram 1: Location Bar
A critical point to note here: lack of firewall protection on the Web. When visiting any one of the millions of websites that exist today, it’s unlikely that you will encounter firewall protection. It’s not that
firewalls aren’t there or not useful, they are. In fact, most websites have firewalls protecting them from network-based attack (worms, viruses, hackers). But network-based attacks are fundamentally different from web-based attacks (XSS, SQL Injection), which are immune to a firewall’s defenses.
A firewall’s job is to prevent unauthorized connections to protected network devices. For instance, you probably do not want intruders connecting to internal databases, workstations, printers, and so forth. For a website to be accessible to the public, firewalls must allow traffic to the web server. If it did not, no one would be able to visit a
website. This means if a website has a vulnerability, firewalls are powerless, since web traffic must be allowed in. Web application security sits above the network layer, in a world of its own.
Basic Web Architecture
Now that we’ve established a basic understanding of Web
communication, lets dive deeper into the technology by analyzing the anatomy of a URL.
Anatomy of Web Requests and URLs
HTTP and HTTPS
http://example.com/path/to/application.cgi?param1=value1¶m2=value2 At the beginning of a URL there is the designated communication protocol (in this case “http”). The protocol designates how the web browser and the web server communicate with each other. HTTP12 is a
stateless protocol. This means when a user wants something, a connection to the web server is established, a request is sent, and a response is received. Afterwards the connection is severed. HTTPS is another common protocol specification which is HTTP wrapped with Secure Sockets Layer13 (SSL) encryption.
SSL connections, indicated by a tiny lock symbol in web browser window ( ), ensures that information sent to and from a website is encrypted. Anyone monitoring the network traffic will not be able to read the data. This is great for protecting credit card numbers, social security numbers, and other forms of sensitive data traveling across the network. Contrary to popular belief, however, SSL does not secure a website. SSL only protects data in transit, and does not protect information stored once it arrives. Many sites using strong, 128-bit SSL have been hacked as often as those that do not. When private data is stored on the web, the risk is at the server, not in between.
Web Server / Domain Name
http://example.com/path/to/application.cgi?param1=value1¶m2=value2 The next section of the URL after the double forward slash,
“example.com,” specifies the web server’s domain name. When we click on a URL, this is the web server to which the web browser will connect. Web servers handle the network communication between the website and the visitor.
http://example.com/path/to/application.cgi?param1=value1¶m2=value2 Beginning with the forward slash after the domain name, is the directory or file path. This points to the location on the web server where the resource is found. A resource could be the path to an html file, web application, Powerpoint presentation, or almost any other file type. The URLs tell the web server where to find it. Hackers are able to manipulate the directory path to find old files, like customer account numbers, that may have been updated and forgotten on a server.
http://example.com/path/to/application.cgi?param1=value1¶m2=value2 Web application security owes its name to the next section of the URL. The portion of the URL after the final forward slash and before the question mark is the “web application14.” Web applications are
software that enables a website to serve-up dynamic content. They can do just about anything and be programmed in just about any language. When they receive requests, these applications dynamically create web pages to return to the web browser. Web applications turn an ordinary web server into a Google (www.google.com), a Yahoo! (www.yahoo.com), an online auction, web bank, message board, blog, etc. Without web applications, the web would be filled with static content and none of the interactivity that drives innovation.
The last part of the URL after the question mark is referred to as the “query string.” filled with parameter name-value pairs. Web
applications use parameter values as an input to the program. For example, by reading the following URL, we can conclude that we are using http to connect to google.com, executing the search web application “search” and searching for “testing.”
If we were to change the “q” parameter value, we could search for “security.”
In more severe cases, hackers use parameter tampering to gain unauthorized access to customer order numbers and other private data.
As mentioned earlier, HTTP is a stateless protocol. From one HTTP request to the next, the web server cannot determine if the second request is from the same person. Without the ability to make this connection, there is no way to track a user on a website and it’s
difficult to maintain user login state. Cookies provide a mechanism to keep state.
Cookies are a small amount of data supplied by the web server and stored by the web browser. With each new request the browser
sends, the cookies stored by the web server are returned. This allows a user to be uniquely identified and state maintained. Since cookies are used to identify users, they are an attractive target to hackers. Once someone has a user’s cookie, he can effectively become that user.
HTTP/1.1 200 OK
Date: Thursday, 01 Dec 2006 23:37:18 GMT Server: Apache
Set-Cookie: Name=Value; path=/; expires=Thursday, 01-Dec-06 23:12:40 GMT Content-Type: text/html HTTP Request GET http://www.whitehatsec.com HTTP/1.1 Host: www.whitehatsec.com Cookie: Name=Value
Post Data is another portion of an HTTP request, and is typically populated with data from web forms. Post Data is most often used when larger amounts of data need to be sent from the browser to the web server. Post data is more or less identical to a query string, except that it’s located in the lower body of the request.
GET http://www.whitehatsec.com HTTP/1.1 Host: www.whitehatsec.com
Three Places to Attack a Website
Statistically, over 90% of all websites have serious security issues, but the big question is “how are they attacked?” If we look at everything we’ve covered so far, basically there are only three places to attack a website: URL’s, cookies, and Post Data. We’ll explore each of these attack points by analyzing real-world exploits. Recall that there are twenty-four classes of attack, and each of these points is vulnerable to all of them.
In July of 2005, the University of Southern California (USC) was informed of a SQL Injection vulnerability found within one of their websites15. The website in question is used to accept applications
from prospective students. According to sources, a lack of security checks on the login web-form text boxes allowed commands to be sent to the back-end database. These commands enabled public access to personal information including the names, birth dates, addresses, and social-security numbers of up to 280,000 users. While the specific technical details remain undisclosed, here is a likely scenario of what took place.
When filling out the login web-form with a username (johndoe) and password (abc123), the web browser would generate a URL similar to following:
The data from the HTTP request is passed into a database SQL
statement of the web application “login.asp.” Login.asp is responsible for performing user authentication.
string strQry = "SELECT Count(*) FROM Users WHERE UserName='" + username.Text + "' AND Password='" + password.Text + "'";
The resulting SQL command is sent to the database:
SELECT Count(*) FROM Users WHERE UserName='johndoe' AND Password='abc123'
If the SQL command succeeded (username/password combo was correct), a database record is returned and the user is logged-in. A security issue arises if meta-characters are submitted into the web-form instead of the expected, alpha-numeric usernames and
passwords. Specifically, we’re referring to single quotes and semi-colons.
http://victim.com/login.asp?username=’;&password=’; The resulting SQL command:
SELECT Count(*) FROM Users WHERE UserName='’;' AND Password='’;'
The above SQL command produces a database error caused because the syntax is wrong. An error similar to the following often is
displayed within resulting web pages as a solid indication that a SQL Injection vulnerability exists.
Microsoft OLE DB Provider for SQL Server error '80040e14' Unclosed quotation mark before the character string '; password. /login.asp, line 39
In the case of the USC incident, here is what the hacker likely submitted into the web-form:
http://victim.com/login.asp?username=’+OR+1=1&password=’+OR+1=1; Producing the following SQL command:
The previous SQL command always returns true, and produces the first record in the User’s table. This allows authentication to be
bypassed completely. From this point, the hacker is also able to send any valid SQL commands he can generate to pull information from the database.
This vulnerability could have been avoided with proper sanity checking of the incoming username and password values. If login.asp ONLY accepted alphanumeric characters, the incident would not have occurred.
Just before Valentine’s Day 2003, FTD.com (a large online florist) received word that hackers could illegally access customer information by simply changing a particular number within a cookie16. The number
was a customer identifier that could be easily guessed and changed to access the sensitive information of other users. This class of attack is generally referred to as Credential Session Prediction17. Security experts confirmed the problem existed and that customer billing records, names, addresses, and phone numbers were exposed. Here is an example of what of likely took place:
When users login to FTD.com or add products to their shopping carts, they are given a cookie to track the current session. The cookie
contains a unique customer identifier; in this case “CustomerID”, with a value of “1001.”
Set-Cookie: CustomerID=1001; path=/; expires=Thursday, 01-Dec-06 23:12:40 GMT
On subsequent requests, the user’s browser returns the cookie so that he or she may be properly identified.
At this point, an easy assumption is that the next user on the FTD.com website would be given a cookie with a customer identifier of “1002”. What a hacker would do is simply edit their own FTD.com cookie customer identifier to that of someone else who has already been to the website.
If successful, the hacker would automatically jump into another user’s session, with the ability to take over the account and access personal information.
In this case, the solution is to have random Customer Identifiers. FTD should have been using random-unique integers of at least 10 to 12 characters in length. This would prevent the Customer Identifier from being ascertained, even after extended attempts.
In October 2005, in an incident known as the Samy Worm, a hacker (Samy) used a common, Cross-Site Scripting (XSS) vulnerability to exploit the MySpace social networking website18. Users’ web browsers
All it takes is a single security flaw, or one small oversight, for your company to make headlines. Experience tells us that no single protective measure is completely impenetrable. Everything has its weakness, and it’s only a matter of time before that weakness is found and exploited. With this real-world knowledge, most security experts subscribe to a philosophy called defense-in-depth to protect their systems. Defense-in-depth promotes a layered security approach, so that if any single control mechanism fails, other defensive measures are in place to ensure nothing is compromised.
All secure e-commerce infrastructures must be built on a solid
foundation. Without a solid foundation, no amount of security in web application code will be enough to defend a website. Below is a top-level checklist to use to assess your overall security. The Center for Internet Security19 (CIS) has excellent resources for in-depth,
system-specific knowledge of architecture security issues. Also, the Payment Card Industry20 (PCI) Data Security Standard is another resource for a
comprehensive security program.
o Networks are properly segmented to separate public,
semi-private, and private systems.
o Perimeter firewalls are in place between network segments to only allow a limited set of network services to communicate.
o Operating systems are hardened and patches are kept up-to-date.
o Web servers are properly patched, configured, and have any
Secure Software Development Program
Secure software is quality software. Vulnerabilities are nothing more than software bugs. And the best way to squash these bugs before they become real problems is to tightly integrate security
consideration at all points of the software development life cycle (SDLC), from architecture design, to development releases, to quality assurance phases.
While there are copious amounts of data available covering software security best practices, the primary caution to developers is “DO NOT TRUST CLIENT-SIDE DATA.” Lack of proper input validation is the number-one cause of web application security issues. The Web is a hostile environment; therefore it’s absolutely critical to validate all data you plan to utilize, whether it’s from HTTP requests or the database. Here is a checklist for how to perform proper input validation in any programming language:
o Character-set: Ensure the data only contains characters you expect to receive.
o Length: Ensure the data falls within a restricted minimum and maximum number of bytes.
o Data Format: Ensure the structure of the data is consistent with what is expected. Phone numbers should look like phone numbers, email addresses should look like email address, etc.
o Escape: Before data is passed onto sub-systems, especially database or operating system calls, all characters should be escaped, meaning no special characters should be allowed into the system unchecked.
o Filtering: Sanitize data to not include dangerous characters. Specifically, convert < and > characters into their equivalent HTML entities to prevent XSS issues.
Vulnerability Assessment and Management
No matter how many defensive measures are piled onto a system, the only way to tell if risk is being mitigated is to adequately measure security. For web application security this means comprehensively assessing websites for vulnerabilities (using the WASC Threat
Classification) and managing the remediation process when issues are found. To ensure security, this process should be conducted with each change to the web application code.
There are three key points to consider when assessing web applications:
1. Web applications are inherently unique.
As we’ve discussed, each website, whether e-commerce, online banking or health information, contains custom code. Off-the-shelf products cannot fully identify web application vulnerabilities in custom code, and many website vulnerabilities cannot be found in an
automated manner. Since each site has specialized functionality, the methods to exploit that application may be just as specialized.
Software cannot respond to that level of customization. Security expertise is required.
2. Assessing your production website is essential.
Hackers enter through holes in a live site, not the development or QA environment. Unseen flaws can appear between the time
development is completed and production is started. The credit card companies have realized this important distinction. That is why PCI mandates scanning all custom web applications in the production environment.
3. Communication between the development and security teams is critical.
90% of all websites have security issues. That’s a fact. It will take time to fix all the vulnerabilities in a website, which is why teamwork is important. The security organization needs updates on remediation progress so that they can adequately protect corporate applications. The development organization needs to work with security to prioritize fixes so that the most dangerous issues are resolved quickly.
In the past, website vulnerability assessment was a time-consuming and often expensive process. Today, WhiteHat Security offers
WhiteHat Sentinel21, the only continuous vulnerability assessment and
management service for web applications.
WhiteHat Sentinel identifies, manages, and recommends remediation for website vulnerabilities. We supply the information needed for organizations to protect corporate websites from attack.
WhiteHat Sentinel is a must-have to secure valuable customer data, comply with industry standards and maintain brand integrity. WhiteHat Sentinel is a crucial component in a website security program and makes web application security easy.
Why choose WhiteHat Sentinel?
WhiteHat Sentinel is a turnkey service. There’s nothing to install and no technology or personnel investment is required. We’ve simplified the complex world of web application security by delivering only actionable information to clients.
WhiteHat Sentinel is the only solution that finds all website flaws, both technical (SQL Injection and Cross Site Scripting) and logical (Insufficient Authorization), in your website. It covers the WASC twenty-four classes of attack for maximum coverage. And WhiteHat Sentinel can assess both production and development websites, an important requirement for PCI.
WhiteHat Sentinel continuously assesses websites and provides web-based reporting, delivering 24/7 access to vulnerability status for development and security personnel.
It’s difficult to quantify the total cost of a security breach, but we know it’s a combination of diminished customer confidence,
exposed data, technical repairs, customer notification and brand damage, all of which can have significant financial impact.
knowledge to repair potential trouble spots before they can be exploited. In addition, our customers save time by replacing hard-to-use tools with an easy-to-access, web-based
management console. They also save money over quarterly or annual web assessments by consultants because one year of WhiteHat Sentinel is approximately the same cost as a single consultant assessment. With WhiteHat Sentinel, corporate security has the ongoing, big-picture view of the security of all web applications necessary to protect business from the growing number of web-based attacks.
For more information about WhiteHat Sentinel or to find more white papers on web application security, please contact WhiteHat Security:
Email: WH-Info@whitehatsec.com Website: www.whitehatsec.com Telephone: (408) 492-1817
About WhiteHat Security, Inc.
Headquartered in Santa Clara, California, WhiteHat Security is a leading provider of web application security services. WhiteHat develops comprehensive, easy-to-use, cost-effective solutions that enable companies to secure valuable customer data, meet federal compliance standards, and maintain customer confidence. WhiteHat Sentinel, the company’s flagship service, provides continuous
5 Security Myths
2 Cross-site Scripting (XSS) is an attack technique that forces a web site to echo
3 SQL Injection is an attack technique used to exploit web sites that construct SQL
statements from user-supplied input.
4 A common form of distributed system in which software is split between server
tasks and client tasks. A client sends requests to a server, according to some protocol, asking for information or action, and the server responds.
5 A popular open-source web browser.
6 Microsoft’s web browser (IIS).
7 A general-purpose software application used to handle HTTP requests. A web server
may utilize a web application for dynamic web page content. http://www.webappsec.org/projects/glossary/#WebServer
8 A popular open-source web server by the Apache Software Foundation.
9 Microsoft Internet Information Server (IIS)
10 A document on the World Wide Web, consisting of an HTML file and any related
files for scripts and graphics, and often hyperlinked to other documents on the Web. http://dictionary.reference.com/search?q=web%20page
11 Uniform Resource Locator (URL). The location of an on-line web-based resource.
12 Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed,
collaborative, hypermedia information systems. http://www.ietf.org/rfc/rfc2616.txt
13 An industry standard public-key protocol used to create encrypted tunnels
between two network-connected devices
14 A software application, executed by a web server, which responds to dynamic web
page requests over HTTP.
15 Flawed USC admissions site allowed access to applicant data
16 FTD.com hole leaks personal information
17 Credential/Session Prediction is a method of hijacking or impersonating a web site
user. Deducing or guessing the unique value that identifies a particular session or user accomplishes the attack. Also known as Session Hijacking, the consequences could allow attackers the ability to issue web site requests with the compromised
18 Teen uses worm to boost ratings on MySpace.com
19 The Center for Internet Security (CIS) is a non-profit enterprise whose mission is
to help organizations reduce the risk of business and e-commerce disruptions resulting from inadequate technical security controls.
20 Payment Card Industry (PCI) Data Security Requirements apply to all Members,
merchants, and service providers that store, process or transmit cardholder data. http://usa.visa.com/business/accepting_visa/ops_risk_management/cisp.html
21 WhiteHat Sentinel is the only continuous vulnerability assessment and
management service for web applications. http://www.whitehatsec.com/services.shtml