2.2 The Evolution of Mitigation Techniques
2.2.1 Detecting Infections
One way of fighting large-scale cybercriminal operations is detecting infected comput- ers before they can cause any harm. In this section I discuss the research that has been conducted to achieve this goal.
host. By looking at the binaries that get installed, one can try to infer whether they are malicious or not. Traditional anti-viruses build signatures (e.g., regular expressions) from known malware, and look for the presence of those signatures in the binaries the user downloads [24]. This technique is not very robust, and previous work showed how the detection can be fooled by simple obfuscations such as inserting NOP instructions and performing code transpositions [46, 47].
Remaining in the field of static analysis, a better approach is to extract semantic infor- mation from known malware samples, and look for the same semantics in new samples while performing detection [48, 47]. The issue here is that program equivalence is an undecidable problem. Therefore, even if the proposed systems can cover a number of variations that model the same behavior, it is not guaranteed that this will work for any possible sample. In addition, modern malware comes packed (i.e., encrypted) and decrypts itself at runtime, and this makes static analysis difficult.
Dynamic analysis makes malware analysis easier, because one can look at the program once the decryption has happened. The techniques that have been proposed include modelling the behavior of a program based on the system calls it executes [82], moni- toring programs accessing sensitive information while they should not [153], or look- ing at the buffers allocated by a malware sample to reconstruct the C&C protocol it uses [39]. The problem of dynamic analysis is that running large amounts of malware samples takes time and resources, and cybercriminals can realize that the malware is being run in an analysis environment, and avoid performing any malicious activity [91].
Malicious web pages detection. Another approach used by researchers is looking at malicious web pages that try to compromise the victims browser and automatically
download a piece of malware (in a so-called drive-by download attack [108]). These attacks are typically performed by malicious JavaScript scripts. To detect such scripts, various approaches have been used:
• Using machine learning to detect legitimate and malicious web pages. The fea- tures that researchers leveraged include how many HTTP redirections the web page uses, or whether the JavaScript code included in the web page is obfus- cated. Detection can be performed either offline, by re-visiting the page with an instrumented browser [52], or online, by instrumenting the victim’s browser and stopping executing the page once one detects that it is malicious [53, 68, 131].
• Looking at the changes in the victim’s system when she visits a malicious web page. The creation of files of the changes of registry keys are indicators of the compromise [93, 108].
• The last possibility is to look at typical attack patterns and flag as malicious any script that shows those patterns [113].
The problem with these techniques is mostly that they rely on a static model to detect malicious scripts, or they train on malicious behavior that might adapt and change over time [72]. Therefore, they could miss newer attacks cybercriminals might come up with.
Network based detection.Another vantage point that can be leveraged for detection is the network traffic generated by infected machines. By observing the malicious traffic generated by such machines it is possible to learn important information about botnets, and develop effective countermeasures.
A direction researchers looked at is detecting successful infections by monitoring net- work traffic [65]. In this research work, infections are modeled as a set of flows that picture the different steps of the infection. Although interesting, this model cannot be applied anymore today. The reason is that years ago botnet infections followed a well defined, worm-like behavior (i.e., scanning for victims, exploitation, download of an egg, connection to the command and control), which is not widely used anymore.
More recent research proposed to look at the correlation between C&C messages and malicious activity. The idea is that any time a bot will receive a command, it will per- form a malicious activity. By looking at this correlation, it is possible to detect bots without any previous knowledge of the botnet [64]. The problem is how to identify C&C traffic. Older approaches looked for commonly-misused, well-known protocols (e.g., IRC) [66]. However, this type of techniques are not applicable anymore, since most botnets moved to proprietary protocols for their C&C traffic. More recent work looks for malicious activity first, and then looks for any interaction with external servers that happened before that activity to find the actual commands [147]. Zand et al. pro- posed a system to find strings that are typical of C&C commands, and leverage them for detection [155].
DNS based detection. Similar to most legitimate Internet services, botnets use the DNS infrastructure to easily connect the different components of their infrastructure (i.e., the bots and the C&C server). Therefore, by looking at the interaction between bots and DNS servers researchers can learn important information about the botnet, such as which IP addresses are associated to infected machines. This can be done by sinkholing the domains used by a botnet’s C&C infrastructure. By doing this, the
infected machines will contact the researchers instead of the botmaster, and it will be possible to enumerate them [55]. Another option is to look in local DNS servers for the presence of cached results associated to malicious domains [25]. If such records are found, that is an indicator of the presence of infected machines in the network.
Ramachandran et al. analyzed queries against a DNS blacklist (DNSBL) to reveal botnet memberships [111]; the intuition behind their approach is that bots might check if their own IP address is blacklisted by a given DNSBL. Such queries can be detected, which discloses information about infected machines.
In Chapter 3 we present BOTMAGNIFIER, a system that can grow the set of known
infected machines belonging to a botnet by observing the set of mailservers contacted by such machines over time. As we will see, BOTMAGNIFIERhelps in integrating DNS
blacklists, significantly improving the coverage offered by these services.