HTML source code can contain numerous juicy tidbits of information.
HTML Comments The most obvious place attackers look is in HTML comments, special sections of source code where the authors often place informal remarks that can be quite revealing. The <-- characters mark all basic HTML comments.
HTML comments are a hit-or-miss prospect. They may be pervasive and uninformative, or they may be rare and contain descriptions of a database table for a subsequent SQL query, or worse yet, user passwords.
The next example shows how our getit.sh script can obtain the index.html file for a site, and then pipe it through the UNIX/Linux grep command to find HTML comments (you can use the Windows findstr command similarly to the grep command).
The! character has special meaning on the Unix/Linux command line and will need to be escaped using"\in grep searches.
[root@meddle ]# getit.sh www.victim.com /index.html | grep "<\!--" www.victim.com [192.168.189.113] 80 (http) open
<!-- $Id: index.shtml,v 1.155 2002/01/25 04:06:15 hpa Exp $ --> sent 17, rcvd 16417: NOTSOCK
At the very least, this example shows us that the index.html file is actually a link to index.shtml. The .shtml extension implies that parts of the page were created with Server Side Includes. Induction plays an important role when profiling the application, which is why it’s important to familiarize yourself with several types of web technologies.
Pop quiz: What type of program could be responsible for the information in the $Id
shown in the previous example?
You can use this method (using our getit script or the automated web crawling tool of your choice) to dump the comments from the entire site into one file and then review that file for any interesting items. If you find something that looks promising, you can search the site for that comment to find the page it’s from and then carefully study that page to understand the context of the comment. This process can reveal even more interesting information, including:
• Filename-like comments You will typically see plenty of comments with template fi lenames tucked in them. Download them and review the template code. You never know what you might fi nd.
• Old code Look for links that might be commented out. They could point to an old portion of the web site that could contain security holes. Or maybe the link points to a fi le that once worked, but now, when you attempt to access it, a very revealing error message is displayed.
• Auto-generated comments A lot of comments that you might see are
engine and see what other sites turn up those same comments. Hopefully, you’ll discover what software generated the comments and learn useful information. • The obvious We’ve seen things like entire SQL statements, database
passwords, and actual notes left for other developers in fi les such as IRC chat logs within comments.
Other HTML Source Nuggets Don’t stop at comment separators. HTML source has all kinds of hidden treasures. Try searching for a few of these strings:
SQL Select Insert #include #exec
Password Catabase Connect //
If you find SQL strings, thank the web hacking gods—the application may soon fall (although you still have to wait for Chapter 8 to find out why). The search for specific strings is always fruitful, but in the end, you will have to just open the file in Notepad or vi to get the whole picture.
When using the grep command, play around with the –i flag (ignore case), –AN flag (show N lines after the matching line), and –BN flag (show N lines before the matching line).
Once in a while, syntax errors creep into dynamic pages. Incorrect syntax may cause a file to execute partially, which could leave raw code snippets in the HTML source. Here is a snippet of code (from a web site) that suffered from a misplaced PHP tag:
Go to forum!\n"; $file = "http://www.victim.com/$subdir/list2.php? f=$num"; if (readfile($file) == 0) { echo "(0 messages so far)"; } ?>
Another interesting thing to search for in HTML are tags that denote server-side execution, such as <? and ?> for PHP, and <% and %> and <runat=server> for ASP pages. These can reveal interesting tidbits that the site developer never intended the public to see.
HTML source information can also provide useful information when combined with the power of Internet search engines like Google. For example, you might find developer names and e-mail addresses in comments. This bit of information by itself may not be that interesting, but what if you search on Google and identify that the developer posted multiple questions related to the development of his or her application? Now you suddenly have nice insight into how the application was developed. You could also assume that same information could be a username for one of the authenticated portions of the site and try brute-forcing passwords against that username.
In one instance, a Google search on a username that turned up in HTML comments identified several other applications that the developer had written that were downloadable from his web site. Looking through the code, we learned that his application uses configuration data on the developer’s own web site! With a bit more
effort, we found a DES administer password file within this configuration data. We downloaded this file and ran a password-cracking tool against it. Within an hour, we got the password and logged in as the administrator. All of this success thanks to a single comment and a very helpful developer’s homepage.
Some final thoughts on HTML source-sifting: the rule of thumb is to look for anything that might contain information that you don’t yet know. When you see some weird- looking string of random numbers within comments on every page of the file, look into it. Those random numbers could belong to a media management application that might have a web-accessible interface. The tiniest amount of information in web assessments can bring the biggest breakthroughs. So don’t let anything slide by you, no matter how insignificant it may seem at first.
Forms
Forms are the backbone of any web application. How many times have you unchecked the box that says, “Do not uncheck this box to not receive SPAM!” every time you create an account on a web site? Even English majors’ in-boxes become filled with unsolicited e-mail due to confusing opt-out (or is it opt-in?) verification. Of course, there are more important, security-related parts of the form. You need to have this information, though, because the majority of input validation attacks are executed against form information.
When manually inspecting an application, note every page with an input field. You can find most of the forms by a click-through of the site. However, visual confirmation is not enough. Once again, you need to go to the source. For our command-line friends who like to mirror the entire site and use grep, start by looking for the simplest indicator of a form, its tag. Remember to escape the < character since it has special meaning on the command line:
[root@meddle]# getit.sh www.victim.com /index.html |
grep -i \<form www.victim.com [192.168.33.101] 80 (http) open sent 27, rcvd 2683: NOTSOCK
<form name=gs method=GET action=/search>
Now you have the name of the form, gs; you know that it uses GET instead of POST; and it calls a script called “search” in the web root directory. Going back to the search for helper files, the next few files we might look for are search.inc, search.js, gs.inc, and gs.js. A lucky guess never hurts. Remember to download the HTML source of the /search file, if possible.
Next, find out what fields the form contains. Source-sifting is required at this stage, but we’ll compromise with grep to make things easy:
[root@meddle]# getit.sh www.victim.com /index.html |
grep -i "input type" www.victim.com [192.168.238.26] 80 (http) open <input type="text" name="name" size="10" maxlength="15">
<input type="password" name="passwd" size="10" maxlength="15"> <input type=hidden name=vote value="websites">
This form shows three items: a login field, a password field, and the submit button with the text, “Login.” Both the username and password must be 15 characters or less (or so the application would like to believe). The HTML source reveals a fourth field called “name.” An application may use hidden fields for several purposes, most of which seriously inhibit the site’s security. Session handling, user identification, passwords, item costs, and other sensitive information tend to be put in hidden fields. We know you’re chomping at the bit to actually try some input validation, but be patient. We have to finish gathering all we can about the site.
If you’re trying to create a brute-force script to perform FORM logins, you’ll want to enumerate all of the password fields (you might have to omit the \" characters):
[root@meddle]# getit.sh www.victim.com /index.html | \> grep -i "type=\"password\""
www.victim.com [192.168.238.26] 80 (http) open <input type="password" name="passwd" size="10" maxlength="15">
Tricky programmers might not use the password input type or have the words “pass- word” or “passwd” or “pwd” in the form. You can search for a different string, although its hit rate might be lower. Newer web browsers support an autocomplete function that saves users from entering the same information every time they visit a web site. For example, the browser might save the user’s address. Then, every time the browser detects an address field (i.e., it searches for “address” in the form), it will supply the user’s information automatically. However, the autocomplete function is usually set to “off” for password fields:
[root@meddle]# getit.sh www.victim.com /login.html | \ > grep -i autocomplete
www.victim.com [192.168.106.34] 80 (http) open <input type=text name="val2"
size="12" autocomplete=off>
This might indicate that "val2" is a password field. At the very least, it appears to contain sensitive information that the programmers explicitly did not want the browser to store. In this instance, the fact that type="password" is not being used is a security issue, as the password will not be masked when a user enters her data into the field. So when inspecting a page’s form, make notes about all of its aspects:
• Method Does it use GET or POST to submit data? GET requests are easier to manipulate on the URL.
• Action What script does the form call? What scripting language was used (.pl, .sh, .asp)? If you ever see a form call a script with a .sh extension (shell script), mark it. Shell scripts are notoriously insecure on web servers.
• Maxlength Are input restrictions applied to the input fi eld? Length restrictions are trivial to bypass.
• Hidden Was the field supposed to be hidden from the user? What is the value of the hidden field? These fields are trivial to modify.
• Autocomplete Is the autocomplete tag applied? Why? Does the input field ask for sensitive information?
• Password Is it a password field? What is the corresponding login field?
Query Strings and Parameters
Perhaps the most important part of a given URL is the query string, the part following the question mark (in most cases) that indicates some sort of arguments or parameters being fed to a dynamic executable or library within the application. An example is shown here:
http://www.site.com/search.cgi?searchTerm=test
This shows the parameter searchTerm with the value test being fed to the search.cgi executable on this site.
Query strings and their parameters are perhaps the most important piece of information to collect because they represent the core functionality of a dynamic web application, usually the part that is the least secure because it has the most moving parts. You can manipulate parameter values to attempt to impersonate other users, obtain restricted data, run arbitrary system commands, or execute other actions not intended by the application developers. Parameter names may also provide information about the internal workings of the application. They may represent database column names, be obvious session IDs, or contain the username. The application manages these strings, although it may not validate them properly.
Fingerprinting Query Strings Depending on the application or how the application is tailored, parameters have a recognizable look and implementation that you should be watching for. As we noted earlier, usually anything following the ? in the query string includes parameters. In complex and customized applications, however, this rule does not always apply. So one of the first things that you need to do is to identify the paths, filenames, and parameters. For example, in the list of URLs shown in Table 2-3, spotting the parameters starts out easy and gets more difficult.
The method that we use to determine how to separate these parameters is to start deleting items from the URL. An application server will usually generate a standard error message for each part. For example, we may delete everything up to the slash from the URL, and an error message may be generated that says something like “Error Unknown Procedure.” We then continue deleting segments of the URL until we receive a different error. Once we reach the point of a 404 error, we can assume that the removed section was the file. And you can always copy the text from the error message and see if you can find any application documentation using Google.
In the upcoming section entitled “Common Web Application Profiles,” we’ll provide plenty of examples of query string structure fingerprints. We’ve shown a couple here to whet your appetite:
file.xxx?OpenDocument or even !OpenDatabase (Lotus Domino) file.xxx?BV_SESSIONID=(junk)&BV_ENGINEID=(junk) (BroadVision) D o wn lo ad f ro m Wo w! e Bo ok < www .wo we bo ok .c om >
Analyzing Query Strings and Parameters Collecting query strings and parameters is a complicated task that is rarely the same between two applications. As you collect the variable names and values, watch for certain trends. We’ll use the following example (again) to illustrate some of these important trends:
http://www.site.com/search.cgi?searchTerm=testing&resultPage=testing &db=/templates/db/archive.db
There are three interesting things about these parameters:
• The resultPage value is equal to the search term—anything that takes user input and does something other than what it was intended for is a good prospect for security issues.
• The name resultPage brings some questions to mind. If the value of this parameter does not look like a URL, perhaps it is being used to create a fi le or to tell the application to load a fi le named with this value.
• The thing that really grabs our attention, however, is db=/templates/db/ archive.db, which we’ll discuss next.
Table 2-4 shows a list of things we would try within the first five minutes of seeing the db=/[path] syntax in the query string. Any application logic that uses the file system path as input is likely to have issues. These common attack techniques against web application file-path vulnerabilities will illustrate the nature of many of these issues.
We would also try all of these tactics on the resultPage parameter. If you want to really dig deeper, then do a search for search.cgi archive.db, or learn more about how the search engine works, or assume that “db” is the database that is being searched.
Query String Conclusion
/file.xxx?paramname=paramvalue Simple, standard URL
parameter structure. /folder/filename/paramname=paramvalue Filename here looks like
a folder.
/folder/file/paramname¶mvalue Equal sign is represented
by&.
/folder/(SessionState)/file/paramvalue Session state kept in the
URL—it’s hard to determine where a fi le, folder, or parameter starts or ends.
Be creative—perhaps you could guess at other hidden database names that might contain not-for-public consumption information; for instance:
db=/templates/db/current.db db=/templates/db/intranet.db db=/templates/db/system.db db=/templates/db/default.db
Here are some other common query string/parameter “themes” that might indicate potentially vulnerable application logic:
• User identifi cation Look for values that represent the user. This could be a username, a number, the user’s social security number, or another value that appears to be tied to the user. This information is used for impersonation attacks. Relevant strings are userid, username, user, usr, name, id, uid. For example:
/login?userid=24601.
Parameter Implications
db=/../../../../etc/passwd File retrieval possible? Pass in boot.ini or
some other fi le if it’s win32.
db=/templates/db/ Can we get a directory listing or odd
error?
db=/templates/db/%00 Use the NULL byte trick to grab a
directory listing or other odd errors.
db=/templates/db/junk.db What happens when we pass in an
invalid database name?
db=|ls or db=|dir Attempt to use the old Perl pipe trick.
db= Always try blank.
db=* If we use *, will it search all the
databases in the confi guration?
db=/search.cgi What happens if we give it an existing
fi lename on the web site? Might dump source code?
http://www.site.com/ templates/db/ archive.db
Can we just download the DB fi le directly?
http://www.site.com/ templates/db/
Can we retrieve a directory listing?
Don't be intimidated by hashed values to these user parameters. For instance, you may end up with a parameter that looks like this:
/login?userid= 7ece221bf3f5dbddbe3c2770ac19b419
In reality, this is nothing more than the same userid value just shown but hashed with MD5. To exploit this issue, just increment the value to 24602 and MD5 that value and place it as the parameter value. A great tactic to use to identify these munged parameter values is to keep a database of hashes of commonly used values such as numbers, common usernames, common roles, and so on. Then, taking any MD5 that is found in the application and doing a simple comparison will catch simple hashing techniques like the one just