4.2 ROP Payload Example
4.3.4 ROP Chain Profiling
In the course of profiling individual gadgets, one should also track the requisite offset that would be required to jump to the next candidate in a chain of gadgets —i.e., the stack pointer modifications caused bypush,pop, and arithmetic instructions. Using this information, profile each gadget as in stepÌ, then select the next gadget using the stack offset produced by the previous gadget. Continue profiling gadgets in the chain until either an invalid gadget candidate or the end of the memory region containing the chain is encountered. Upon termination of a particular chain, the task is to determine if it represents a maliciousROPpayload or random (benign) data. In the former case, trigger an alert and provide diagnostic output; in the latter, return to stepÊand advance the sliding window by one byte.
Unfortunately, the distinction between benign and maliciousROPchains is not immediately obvious. For example, contrary to the observations of Polychronakis and Keromytis (2011), there may be many validROPchains longer than6unique gadgets in benign application snapshots. Likewise, it is also possible for maliciousROPchains to have as few as2unique gadgets. One such example is a gadget that usespop eaxto load the value of an API call followed by a gadget that usesjmp eax to initiate the API call, with function parameters that follow. Similarly, apop/callorpop/push chain of gadgets works equally well.
5
That said, chains of length2are difficult to use in real-world exploits. The difficulty arises because a usefulROPpayload will need to call an API that requires a pointer parameter, such as a string pointer for the command to be executed inWinExec. Without additional gadgets to ascertain the current value of the stack pointer, the adversary would need to resort to hard-coded pointer addresses. However, these addresses would likely fail in face of ASLR or heap randomization, unless the adversary could also leverage a memory disclosure vulnerability prior to launching theROPchain. An alternative to the2-gadget chain with hard-coded pointers is thepushamethod of performing an API call, as illustrated in Figure 4.1. Such a strategy requires5gadgets (for theWinExecexample) and enables a single pointer parameter to be used without hard-coding the pointer address.
The aforementionedROPexamples shed light on a common theme—maliciousROPpayloads will at some point need to make use of an API call to interact with the operating system and perform some malicious action. At minimum, aROPchain will need to first load the address of an API call into a register, then actually call the API. A gadget that loads a register with a value fits theLoadRegG profile, while a gadget that actually calls the API fits either theJumpG, CallG,PushAllG, or PushGprofiles. Thus, the primary heuristic for distinguishing maliciousROPpayloads from those that are benign is to identify chains that potentially make an API call, which is fully embodied by observing aLoadRegG, followed by any of the profiles in theCallGset. This intuitive heuristic is sufficient to reliably detect all known real-world maliciousROPchains. However, by itself, the above strategy would lead to false positives with very short chains, and hence one must apply a final filter. When the total number of unique gadgets is≤ 2, one requires that theLoadRegGgadget loads the value of a system API function pointer. Assuming individual gadgets are discrete operations (as described in Chapter 2), there is no room for the adversary to obfuscate the API pointer value between the load and call gadgets. On the other hand, if the discrete operation assumption is incorrect payloads may be missed that are only 1 or 2 unique gadgets, which has not actually been observed in real-world payloads. Empirical results showing the impact of varying the criteria used in this heuristic versus the false positive rate, especially with regard to the number of unique gadgets, is provided next.
StepsËtoÍare implemented in 3803 lines ofC++code, not including a third party disassembly library (libdasm).
4.4 Evaluation
This section presents the results of an empirical analysis where the staticROPchain profiling method is used to distinguish malicious documents from benign documents. The benign dataset includes a random subset of the Digital Corpora collection6 provided by Garfinkel et al. (2009). A total of7,662benign files are analyzed that included1,082Microsoft Office, 769Excel, 639
PowerPoint,2,866Adobe Acrobat, and2,306htmldocuments evaluated with Internet Explorer. The malicious dataset spans57samples that include the three ideal 2-gadget ROPpayloads (e.g., pop/push,pop/jmp, andpop/callsequences) embedded inPDFdocuments exploiting CVE- 2007-5659, thepushaexample in Figure 4.1, 47PDFdocuments collected in the wild that exploit CVE-2010-{0188,2883}, two payloads compiled using theJIT-ROPframework (see Chapter 3) from gadgets disclosed from a running Internet Explorer 10 instance, and four malicious html documents with embedded Flash exploiting CVE-2012-{0754,0779,1535}in Internet Explorer 8. The latter four documents are served via theMetasploitframework.
All experiments are performed on an Intel Core i7 2600 3.4GHz machine with 16GB of memory. All analyzes are conducted on a single CPU.