• No results found

˛ Summary

˛ Solutions Fast Track

Introduction

The abundance of Web sites has turned the Internet from a playground for text-obsessed geeks and academics into a multicoloured and attractive media mall where people can get information, exchange views, and do their shopping and banking. Among the side effects of the explosion in the use of the Internet and inter-connectivity levels is the proliferation of malicious software (malware) that gains access to computers via the WorldWide Web (Web).

Hypertext Transfer Protocol (HTTP) and the Hypertext Markup Language (HTML) standard in combination comprise a major building block of Internet communication. It is therefore unsurprising that HTML is frequently used for distribution of malicious code, and thus, that effective blocking of malicious HTML code is becoming more important. At the same time, the increasing effectiveness of anti-virus solutions in block- ing Simple Mail Transport Protocol (SMTP) threats (particularly mass mailers), means that the predominant malware deployment vector is moving from SMTP (e-mail) to HTTP (Web).

Here, Dr. Igor G. Muttik, a researcher of considerable reputation and long experience in the development and maintenance of top-flight antivirus solutions, takes an in-depth look at the Web as a vector for malware transmission, and considers technical approaches to detection, removal, and testing.

Attacks on the Web

There is a significant difference between malware distributions over SMTP (e-mail) as opposed to over HTTP. From the point of view of the average computer user, e-mails are received passively, having been “pushed” onto their systems from afar; e-mails simply come in without any user effort (apart from clicking on an e-mail client’s icon to start the program). It is very natural that users treat material received as, or attached to, unsolicited e-mail with more suspicion, especially after all the warnings they’ve received about attachments. At the same time, Web content is viewed as “pulled” by the users when they actively browse the Web and, thus presumed to be somehow safer. Browsing the Internet is not generally considered a dangerous activity. In the minds of many computer users, the worst that can happen is that they could accidentally stumble on some sites of explicit nature.

Work by E. Wolak indicates that advertisements on Web sites are generally trusted much more than the same ads distributed via spamming. (Chaelynne Wolak, “Advertising on the Internet” (www.itstudyguide.com/papers/cwDISS890A3.pdf.) For this very reason, direct malware distribution via Web sites is likely to be more successful in terms of the number of victims ensnared, than distributing to newsgroup, spamming executables, or even spamming out malicious Uniform Resource Locators (URLs) to potential victims. For people involved in the distribution of malware, it makes a lot more sense to direct or entice computer users to their Web sites than to use e-mail as a medium for direct malware transfer.This psychological reasoning drives attackers to use the Web for malware distribu- tion.The antivirus research community feels that the attacks on the Internet over

HTTP are already an established fact, and their ferocity is increasing. So far, we have

Tools and Traps

Web Mail

We should include one or two caveats at this point:

■ On no account should you assume that e-mail is getting safer. While massmailer virus epidemics are now the exception rather than the rule, and replicative malware is a shrinking percentage of e-mail-borne mali- cious traffic, e-mail is still a significant malware transmission vector. At the time of writing, the so-called “Storm Worm” (actually a Trojan downloader) is using very similar social engineering techniques to old- time mass-mailers to lure e-mail recipients into opening an attachment. And, of course, the use of e-mail messages to lure the recipient to a malware-spiked URL is very common.

■ We should also remember that e-mail is often seen by Web-mail users as a purely Web-based application. Such users may be completely unaware of the underlying transport mechanisms. If the Web is seen as more trustworthy than mail, it may be that Web-mail is seen (in a sort of halo effect) as more trustworthy than mail received via a desktop e-mail client. In fact, the reverse is often true, depending in part on the particular e-mail service being used and how well protected it is.

(The term halo effect is used when the perception of a single positive or nega- tive attribute has a disproportionate influence on our overall positive or negative perception of the object possessing that attribute.)

observed at least five different kinds of attacks: hacking into Web sites, manipulation of search engines, DNS poisoning, domain hijacking, and exploiting common user

mistakes (e.g., typing errors and misspellings).The defenses available to counter Web attacks are not as strong as they should be, however. An abundance of Web browser vulnerabilities means that users are really entering a minefield whenever they start to browse the Web intensively.

Now let’s look at three types of attacks from the point of view of distributing malicious code—hacking into Web sites, manipulation of search engines (also known as index hijacking) and DNS poisoning (also sometimes known as pharming).

Hacking into Web Sites

Imagine you’re a bad guy wanting to make sure your malicious code gets to be run by as many users as possible.You can post it on a Web site but, naturally, this will have very limited exposure, as users are not very likely to visit your Web site by accident or purely at random. This is really the same problem that legitimate businesses are facing; how do you make sure potential customers visit your Web site? The main difference is that the bad guys are clearly much less limited by ethical and legal boundaries in choosing the way they push malicious Web content onto the Internet users.

There are several ways in which users can be diverted to a Web site of the attacker’s choice. One way is to modify a popular Web site so as to include malicious links, redirects, or pop-up and pop-down windows. Frequently, this attack is called “Web defacement” even though it does not necessarily involve a modification of how a Web site looks.Thus “a defacement” can be alien code (intrusive, unauthorized third-party code) implanted into a Web site and not visible by a user in a browser. It can also be an injected alien link, visible or invisible (we shall explain why links are important later). Defacement is only possible if an attacker has access (local or remote) to a Web site, or is able to hack into it.

N

OTE

Popular Web sites are generally more carefully maintained and their integrity is checked more frequently, so such attacks are less likely to succeed. However, there do still exist records of such Web site attacks. For instance:

http://vil.nai.com/vil/content/v_100488.htm

www.lurhq.com/berbew.html

First, “defacement” attacks could be made using so-called “remote root” and “remote code execution” vulnerabilities. Web sites could be lacking recent security patches and might therefore be susceptible to such attacks. Secondly, bad management and/or practices can be exploited using open network shares, weak passwords, unprotected guest accounts, vulnera- bilities in applications run by Web site administrators, and so on.

Effects similar to manipulation of Web sites can be achieved if a Web proxy is hacked into.The end users will receive modified content even though the original Web site content is unchanged. Obviously, a local malicious proxy or layered service provider (LSP) filter could have a similar effect. Even though some adware is known to have taken this approach, such an attack is beyond the scope of our discussion, as malicious modifications are made locally and not via the Internet.This proxy-hosted attack method is not yet common, because the number of users served from a single proxy is not usually high. In the future, however, it may grow as attempts to introduce proxy service on the Internet level increase (e.g., the Google Web Accelerator -www.windowsdevcenter.com/pub/a/windows/2005/05/24/

google_accelerator.html).

There are additional risks in compromising Web sites that cache passwords: for instance, where users are allowed to access several bank accounts from a single page or several mail accounts.

It must be noted that subtle modifications made to a hacked Web site may go unnoticed for a very long time.The Webmaster may notice a malicious change as a result of performing an integrity check on the site’s contents, or by manual inspection, but many administrators don’t implement such countermeasures. After all, for big Web sites this can be a huge task. Another possible monitoring method would be inspection of the logs, but this is not in itself a foolproof way of finding unauthorized modifications, because log entries could have been edited out, or whole log files might have been deleted after a break-in. On the client side where a PC that contracted something from a Web page it may be difficult to trace a problem back to the source because in any average Web session, users frequently follow many links and visit many Web sites. Some defacement examples and advice on how to prevent defacements are given in http://cnscenter.future.co.kr/resource/security/application/ deface.pdf, a presentation by Ryan C. Barnett.

We should also mention W32/CodeRed worms (http://vil.nai.com/vil/content/ v_99142.htm).The first version (W32/CodeRed.a) of this very successful worm (in terms of being widespread) performed a visible defacement of a Web site, but a later variant (W32/CodeRed.c – see http://vil.nai.com/vil/content/v_99142.htm) silently installed a backdoor program on a server, avoiding the visibility of the original W32/CodeRed. Once a backdoor is successfully installed, a Web site is under the control of the attacker, who can modify its Web contents at will.The CodeRed story confirms that any zero-day Web server exploit has the potential to provide an attacker with many thousands of Web servers to manipulate.

Even for known exploit, restrictions on the speed at which patches can be deployed, especially in large organizations, gives attackers a window of opportunity to achieve some distribution of malware before patches are universally applied.

Several viruses infect new targets by mass-mailing a link to a Web page that the virus has just created on a compromised computer: W32/Mydoom.ah, for example (http://vil.nai.com/vil/content/v_129631.htm). In the case of this Mydoom variant, the Web page was a simplistic HTTP server created for only one purpose: to run an exploit and infect another machine. But it would not be very difficult for the bad guys to expand this concept and make this Web page real.The question is then, how do you make sure that potential victims visit it?

In any case, adding alien modification (that is, changes made by an unauthorized out- sider) to legitimate sites can only have a temporary effect. If the bad guys want to sustain their business, they need to tap into the source and concentrate their efforts on systems over which they have lasting control. One of the best sources to tap is the Internet search engine.

Index Hijacking

The objective of this class of attack is to make sure that a Web site that hosts malware comes high up in the list of sites returned by an Internet search engine.This will ensure a steady supply of victims to the bad guys.

We first learned about this attack from a user who complained that Google had directed him to a malicious Web site. Google is very popular, so we concentrated our investigation specifically on that search engine. Google uses so-called “PageRank” values to determine the quality of any Web page.

N

OTE

In the case of CodeRed, it was estimated that approximately 70,000 com- puters were compromised. See Dmitry Gryaznov’s article “Red Number Day,” published in Virus Bulletin’s issue of October 2001 (www.nai.com/common/ media/vil/pdf/dgryaznov_VB_oct2001.pdf). In a sense, though, this number actually understates the extent of the damage. For instance, one organiza- tion with several thousand sites and around three million systems shut down Web services for several days while infected machines were traced and dealt with. (There was a consensus that it was better to suffer that inconvenience than to be a vector for further infection in and beyond the organization’s borders.)

Google has stated that PageRank (PR) is not the only criterion they use to determine the position of a page in the search lists it displays, and that many other parameters are also used. Google has been cautious about revealing the details of its methodology, having stated that “Due to the nature of our business and our interest in protecting the integrity of our search results, this is the only information we make available to the public about our ranking system.” It is clear, however, that apart from PR, other important components in Google’s approach to ranking include page contents, text of the links, text around the link, contents of neighboring pages, page URL, filename, and title. Google has changed their ranking strategy several times, which has resulted in significant movement in the returned results, as reported by the Internet Search Engine Database (http://www.isedb.com/news/article/663). Nevertheless, PR remains as the core of Google’s ranking system.

The PR values are determined from analyzing the graph representing the topology of all Web pages collected by Google crawler.

N

OTE

The name “PageRank” is trademarked by Google, and the algorithm is patented by Stanford University. See the paper “The PageRank Citation Ranking: Bringing Order to the Web” by Larry Page, Sergey Brin, R. Motwani, and T. Winograd, at http://citeseer.ist.psu.edu/page98pagerank.html. The “Page” part of the name comes from Larry Page’s name, not from the fact that the algorithm deals with Web pages.

N

OTE

The Google search engine ranks a page by interpreting links from other pages as “votes” by referring pages. The ranking is not, however, judged only by the volume of referring links a page receives, but by the popularity (or, in Googlespeak, the importance) of the page that “casts the vote.” Referring pages that are themselves “important” (that is, have lots of refer- ring pages) carry more weight. Their links to other pages make those pages more “important.” More information can be found at www.google.com/ technology/ and www.google.com/corporate/tech.html.

Even though this is a horrendously complex computational task, crawling the Web takes even more time. On average, Google manages to update their ranking rules approximately once per month. Figure 3.1 demonstrates the PR calculation method. Each “incoming” link is a “vote” for this page, and each such “vote” increases a page’s PR. Each outgoing link casts a vote for another page. Numbers near pages are PageRanks (PR), numbers near links are “PR vote” value. PR is a sum of “PR votes.”Two pages in the bottom right corner represent a “Rank Sink.”

N

OTE

PRs are attributes of pages, not Web sites.

60 100 50 30 50 ?? 70 5 5 50 30 50 30 50 ?? Figure 3.1 PR Calculation

A vulnerability exists in the simplistic PR approach, called “a Rank Sink.” It occurs when the graph has a loop with no outgoing links. Google does have a method of handling this problem, but it still can be exploited to inflate PR values, by creating loops that have very few outgoing links. It can be proved that by adding good incoming links and reducing

the number of visible outgoing links, you can increase the PR value of a page.This is trivial to do. Adding links to selected pages is easy, and hiding outgoing links can be done with obfuscated scripts, for example (instead of normal “href ” links).There are commercial com- panies that specialize in manipulating Google search results. Examples include SubmitExpress and WebGuerilla.These are also known as search engine optimization (SEO) companies. The mere existence of such companies confirms that exploitation of the ranking method- ology is possible and even routinely implemented.

So, how are malicious attacks on Google triggered? One type of attack occurs when a user enters a phrase such as “Santa Trojan,” “Filmaker Trojan,” “Stinger Trojan,” “Skipping Christmas,” “Honda Vespa,” “crack CSS,” “Windows XP activation,” “adware Adaware,” “hacker tricks,” and “edonkey serverlist” into Google, and then he or she would find that a bunch of very suspicious links would be returned.

W

ARNING

Important note: these are all real examples, so be careful if you try any of them. Google has removed some malicious URLs from their search results, but new malware-related phrases and URLs appear all the time. Following most of these links might load your computer with malware.

Let’s follow a link like this. I had to go looking for a new one because Google sup- pressed all that I already knew about after we reported them. But it was not difficult at all to put 2 and 2 together and get a hit. For example, a search for “Christmas adware” returns a link (right after sponsored links, at the top) to http://spyware.qseek.info/adware-comparison- remover-spyware/ (see Figure 3.2).

The contents of the Web page accessed by the third of the above links are rather

amusing and start with an obfuscated redirect. (Remember what we said above about hiding outgoing links to create “Page Sink” loops.) This is followed by machine-generated text (nonsense, but on the topic).This is followed by a series of links.

The text on this Web site is clearly machine-generated, but in such a way so that any cursory automated computer analysis will not be able to detect it as such. (There is proper HTML formatting, JPEG picture inclusion, links, and such.) I would be surprised if this HTML were not generated by a program that pulled most of the words from a Google search results for the word “adware.” Note that the name of the link includes the keyword “adware-comparison-remover-spyware,” which makes Google interpret it as a very relevant hit.

In order to be effective, the phrases that are used to manipulate and trigger Google must not be too common, so as not to be lost among all the useful and reputable links. On the other hand, phrases should not be unique; otherwise, no user would ever look for them. Texts randomly assembled from words related to the topic of the page (“adware” in our case)

Related documents