The structure of a web application will usually provide a unique signature. Examining things as seemingly trivial as directory structure, file extensions, naming conventions used for parameter names or values, and so on, can reveal clues that will immediately identify what application is running (see the upcoming section “Common Web Application Profiles,” later in this chapter, for some crisp examples of this).
Obtaining the directory structure for the public portion of the site is trivial. After all, the application is designed to be surfed. However, don’t stop at the parts visible through the browser and the site’s menu selections. The web server may have directories for
administrators, old versions of the site, backup directories, data directories, or other directories that are not referenced in any HTML code. Try to guess the mindset of the administrators and site developers. For example, if static content is in the /html directory and dynamic content is in the /jsp directory, then any cgi scripts may be in the /cgi directory.
Other common directories to check include these:
• Directories that have supposedly been secured, either through SSL, authentication, or obscurity: /admin/ /secure/ /adm/
• Directories that contain backup fi les or log fi les: /.bak/ /backup/ /back/ / log/ /logs/ /archive/ /old/
• Personal Apache directories: /~root/ /~bob/ /~cthulhu/
• Directories for include fi les: /include/ /inc/ /js/ /global/ /local/ • Directories used for internationalization: /de/ /en/ /1033/ /fr/
This list is incomplete by design. One application’s entire directory structure may be offset by /en/ for its English-language portion. Consequently, checking for /include/ will return a 404 error, but checking for /en/include/ will be spot on. Refer back to your list of known directories and pages documented earlier using manual inspection. In what manner have the programmers or system administrators laid out the site? Did you find the /inc/ directory under /scripts/? If so, try /scripts/js/ or /scripts/inc/js/ next.
Attempting to enumerate the directory structure can be an arduous process, but the getit scripts can help whittle any directory tree. Web servers return a non-404 error code when a GET request is made to a directory that exists on the server. The code might be 200, 302, or 401, but as long as it isn’t a 404 you’ve discovered a directory. The technique is simple:
[root@meddle]# getit.sh www.victim.com /isapi
www.victim.com [192.168.230.219] 80 (http) open HTTP/1.1 302 Object Moved Location: http://tk421/isapi/ Server: Microsoft-IIS/5.0 Content-Type: text/html Content-Length: 148 <head><title>Document Moved</title></head>
<body><h1>Object Moved</h1>This document may be found <a HREF="http:// tk-421/isapi/">
here</a></body>sent 22, rcvd 287: NOTSOCK
Using our trusty getit.sh script, we made a request for the /isapi/ directory; however, we omitted an important piece. The trailing slash was left off the directory name, causing an IIS server to produce a redirect to the actual directory. As a by-product, it also reveals the internal hostname or IP address of the server—even when it’s behind a firewall or
load balancer. Apache is just as susceptible. It doesn’t reveal the internal hostname or IP address of the server, but it will reveal virtual servers:
[root@meddle]# getit.sh www.victim.com /mail
www.victim.com [192.168.133.20] 80 (http) open HTTP/1.1 301 Moved Permanently
Date: Wed, 30 Jan 2002 06:44:08 GMT Server: Apache/2.0.28 (Unix)
Location: http://dev.victim.com/mail/ Content-Length: 308
Connection: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head>
<title>301 Moved Permanently</title> </head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://dev.victim.com/mail/">here</ a>.</p>
<hr />
<address>Apache/2.0.28 Server at dev.victim.com Port 80</address> </body></html>
sent 21, rcvd 533: NOTSOCK
That’s it! If the directory does not exist, then you will receive a 404 error. Otherwise, keep chipping away at that directory tree.
Another tool that can reduce time and effort when traversing a web application for hidden folders is OWASP DirBuster. DirBuster is a multithreaded Java application that is designed to brute-force directories and files on a web server. Based on a user-supplied dictionary file, DirBuster will attempt to crawl the application and guess at non-linked directories and files with a specific extension. For example, if the application uses PHP, the user would specify “php” as a file extension and DirBuster would guess for a file named [dictionary word].php in every directory the crawler encounters (see Figure 2-4). DirBuster can recursively scan new directories that it finds and performance is adjustable. It should be noted that recursive scanning with DirBuster generates a lot of traffic, and the thread count should be reduced in an environment where an excessive number of requests is undesirable.