• No results found

6.2 Application Case Studies

6.2.1 Email Case Study: Outlook Express

In order to demonstrate email application support, we extended Gyrus to support Outlook Express on Windows XP. Our modules for Outlook Express detect when a user is interact- ing with the application to send a message, and then extracts the message contents from memory and places it into a database of allowed messages. When an email is seen leaving the host, a transparent SMTP proxy checks that a message with matching content can be found in the database of authorized messages. This allows user email to pass unhindered, while blocking spam sent by malware on the host.

The implementation is divided into several components. First, the event testing module gets notification of hardware events, and decides if they represent a user sending an email. If so, the message’s contents are extracted and then validated using the screen capture, and the authorization creation module creates an authorization allowing the message. Finally, in the enforcement module, the outgoing message is extracted from the SMTP session by the proxy, and the subject, sender, and body are compared with the authorization database.

6.2.1.1 Event Testing

In the event handler, the spam blocker receives notification of all mouse clicks from the device model, as described in Chapter 5. Upon receiving a mouse click event, the window mapper is consulted to see if the user is clicking on the “Send” button of an Outlook Express message window. Both a “left button down” and “left button up” event on the send button are required, with no intervening mouse button events. If the user is clicking on the send button, the system moves on to extracting the message contents.

6.2.1.2 Authorization Creation

To create a message-specific authorization, the message content is retrieved from both memory and the screen capture. Using memory analysis, we traverse the internal data structures used to represent a message while it is being composed. By reverse engineering portions of Outlook Express, we determined that the message composition pane is actually an instance of the MSHTML rendering engine (called Trident), which is also used by Inter- net Explorer to render web pages. When a user enters text into the window, the MSHTML engine dynamically updates the parsed HTML tree in memory with the new text. When the message is sent, the rendering engine serializes this tree to HTML and sends it using the SMTP protocol.

The parsed HTML is stored in memory as a splay tree [159], which optimizes access to recently used nodes. The nodes of this tree are objects of type CTreePos, and each tree node represents an opening or closing HTML tag or a text string (for the textual content of the page markup). HTML tags are represented by CElement objects (which are accessible from the corresponding CTreePos), which store, among other things, the name of the tag and its HTML attributes. Text nodes have no associated CElement, and are represented

by their length and pointer into a document-wide gap buffer, a data structure commonly

used to optimize interactive edits to a buffer. Our memory analysis code replicates the

serialization process by traversing the tree and writing out the opening and closing tags, as well as the content of any text nodes. The same approach can be used to extract plain text email (by ignoring the HTML tags); however, we currently only implement the default case of HTML email. We also use memory analysis to retrieve the subject and recipients from the email client’s “To” and “Subject” text boxes.

Since an attacker can manipulate the message contents in memory, as depicted in Figure 21, we validate the memory contents using their on-screen appearance. To do this, we use the bounding boxes of the subject, recipient, and message text from the window mapper to crop the screen capture provided by Gyrus down to just the text we are interested in.

Next, after upscaling and resampling the images to improve readability, we extract the text

using the off-the-shelf Tesseract OCR software [160]. OCR is not completely accurate,

however, so we use the Levenshtein edit distance [107] to compare the OCRed text with that retrieved from memory.

If the edit distance between the on-screen and in-memory strings exceeds a configurable threshold, the message validation fails and the message will not be placed into the autho-

rization database. If the rate of OCR errors is sufficiently high, this could cause legitimate

email to be rejected. In practice, we have found that setting an error threshold of 20% (rel-

ative to the length of the string) is sufficient to compensate for Tesseract’s mistakes. Note

that this does not create much, if any, of an opportunity for an attacker to send spam be- cause any alteration to the email will count against the 20% in addition to the OCR errors. Since the OCR errors approach 20% before any malicious changes to the email, an attacker would be unable to make any meaningful changes to the email.

If greater accuracy is needed, we could turn to the work of Lasko et al. [105] and use a Bayesian measure to estimate the probability that discrepancies between the strings are a result of OCR errors. We also note that Tesseract is not optimized for use with screen captures. Using an OCR algorithm that takes advantage of the fact that the image is computer generated could improve accuracy (e.g., if the font is known, even simple image

matching might suffice to recognize individual characters). Once the message content from

memory is validated, it is placed into the authorization database, and the mouse click is passed to the User VM.

Our current implementation uses a variety of costly operations (i.e., memory analysis, screen capture, OCR, etc.) whenever a user clicks the send button. In Section 6.4, we show that the combined time to complete these operations is approximately 2 seconds, and we discuss a variety of options for improving these numbers for greater user acceptability.

User VM Outlook Express Email Client User Kernel comctl32.dll win32k Authorization Creation Module Window Mapper Extracts information about

windows, buttons, and other widgets from win32k.

Extracts DOM tree with email contents

from mshtml. Extracts email subject and recipients from UI components in comctl32.

Attacker Option #1 The attacker can modify

data in the application context, but Gyrus validates this data against

other sources (e.g., screen captures). Attacker Option #2 !"#$%&#'()$**+,"-'.'/012*34&!56"&,77*&#*5-''89:;"<'////1:=/ !"#$%&#'()$**+,"-'.'/0=34:>=2 !"#$%&#'(?9*@*7#-'.'/0=3A5B&/''89:;"<'////1/C1''D8EF)G !"#$%&#'()$**+,"-'.'/0=/:H53&!56"&,77*&#*5-''89:;"<'//////:> !"#$%&#'()$**+,"-'.'/0=3A*>5/ !"#$%&#'(?9*@*7#-'.'/0=3A=H2/''89:;"<'/////&H1''DIEJKG

!"#$%&#'()$**+,"-'.'/0=/:H5A2!56"&,77*&#*5-''89:;"<'/////4:2 !"#$%&#'()$**+,"-'.'/0=/:H5C>!56"&,77*&#*5-''89:;"<'/////43> !"#$%&#'()$**+,"-'.'/0=3A*>*/ !"#$%&#'(?9*@*7#-'.'/0=3A=H2/''89:;"<'//////4=''DLIEJKG !"#$%&#'()$**+,"-'.'/0=34:452 !"#$%&#'(?9*@*7#-'.'/0=34&3H/''89:;"<'/////=H=''DLJMNG !"#$%&#'()$**+,"-'.'/0=34:C1/ !"#$%&#'(?9*@*7#-'.'/0=34&::/''89:;"<'//////4=''DLJMNG !"#$%&#'()$**+,"-'.'/0=34:H&/ !"#$%&#'(?9*@*7#-'.'/0=34&32/''89:;"<'/////=H=''DL8EF)G !"#$%&#'()$**+,"-'.'/0=34:C// !"#$%&#'(?9*@*7#-'.'/0=34&::/''89:;"<'//////41''DJMNG !"#$%&#'()$**+,"-'.'/0=34:C2/ !"#$%&#'(?9*@*7#-'.'/0=34&:>/''89:;"<'//////C=''DL8EF)G !"#$%&#'()$**+,"-'.'/0=3A*/1/ !"#$%&#'(?9*@*7#-'.'/0=3A=45/''89:;"<'/////24=''DLO)PQG !"#$%&#'()$**+,"-'.'/0=3=&B:2 !"#$%&#'(?9*@*7#-'.'/05=4=2''89:;"<'//////4=''DLG !"#$%&#'()$**+,"-'.'/0=34:4&2 !"#$%&#'(?9*@*7#-'.'/0=34&3H/''89:;"<'//////H1''DJMNG !"#$%&#'()$**+,"-'.'/0=3A*/// !"#$%&#'(?9*@*7#-'.'/0=3A=45/''89:;"<'/////=H1''DO)PQG !"#$%&#'()$**+,"-'.'/0=3=&B32 !"#$%&#'(?9*@*7#-'.'/05=4=2''89:;"<'//////H1''DG !"#$%&#'()$**+,"-'.'/0=3A*>C/ !"#$%&#'(?9*@*7#-'.'/0=3A=4>/''89:;"<'/////24=''DLO?RJG !"#$%&#'()$**+,"-'.'/0=3A*HS2 !"#$%&#'(?9*@*7#-'.'/0=3A1A/2''89:;"<'/////2C=''DLT)KQ?G !"#$%&#'()$**+,"-'.'/0=34:>A2 !"#$%&#'(?9*@*7#-'.'/0=3A5B&/''89:;"<'//////H=''DL8EF)G !"#$%&#'()$**+,"-'.'/0=3A*>// !"#$%&#'(?9*@*7#-'.'/0=3A===/''89:;"<'/////=4=''DLP?)RG !"#$%&#'()$**+,"-'.'/0=3A*AS/ !"#$%&#'(?9*@*7#-'.'/0=3A===/''89:;"<'//////C1''DP?)RG !"#$%&#'()$**+,"-'.'/0=3A*H*2 !"#$%&#'(?9*@*7#-'.'/0=3A1A/2''89:;"<'/////4C1''DT)KQ?G !"#$%&#'()$**+,"-'.'/0=3A*>32 !"#$%&#'(?9*@*7#-'.'/0=3A=HH/''89:;"<'/////=H1''D)M)Q?G !"#$%&#'()$**+,"-'.'/0=3A*>4/ !"#$%&#'(?9*@*7#-'.'/0=3A=4>/''89:;"<'//////C1''DO?RJG !"#$%&#'()$**+,"-'.'/0=3A*>:2 !"#$%&#'(?9*@*7#-'.'/0=3A=HH/''89:;"<'/////>C=''DL)M)Q?G !"#$%&#'()$**+,"-'.'/0=34:>5/ !"#$%&#'(?9*@*7#-'.'/0=3A542/''89:;"<'//////41''DJMNG !"#$%&#'()$**+,"-'.'/0=/:H5>&!56"&,77*&#*5-''89:;"<'//////B2 !"#$%&#'()$**+,"-'.'/0=34:>*/ !"#$%&#'(?9*@*7#-'.'/0=3A542/''89:;"<'//////4=''DLJMNG !"#$%&#'()$**+,"-'.'/0=34:HB/ !"#$%&#'(?9*@*7#-'.'/0=34&32/''89:;"<'//////C1''D8EF)G !"#$%&#'()$**+,"-'.'/0=34:CC/ !"#$%&#'(?9*@*7#-'.'/0=34&:>/''89:;"<'//////41''D8EF)G

The attacker can modify kernel UI data, but such mods may crash user apps. mshtml.dll

Figure 21: Gyrus’ Outlook Express modules use VMI to extract data from the User VM memory. This data is interpreted and used to create an authorization for the user’s email message.

6.2.1.3 Enforcement

At some point later (possibly much later, if the message is composed and sent while the

user is offline), the message will be sent via SMTP to a mail server. When this occurs,

an iptables rule on the virtual network bridge redirects the network stream to the trans- parent SMTP proxy (we use proxsmtp [176]), which calls our enforcement script. The script parses the message according to RFCs 2822 [137] and 2045 [66] and consults the authorization database to find messages with a matching subject and recipient.

Finally, the HTML part of the message is compared with the stored HTML from the database by recursively comparing each HTML node’s content. The comparison between text nodes can be done exactly, because the copy in the database is extracted from memory and is not subject to OCR errors. Any message not found in the database is rejected with an SMTP reject (SMTP code 554: Transaction Failed). If the message is found and the contents match, it is allowed to be sent to the remote mail server. By placing authorizations in the database, we allow enforcement to occur at a time later than when the user sends the

6.2.1.4 Discussion

By detecting when the user has clicked on the send button and validating the message content against what the user sees on screen, we have confidence that we captured the user’s intent. It is reasonable to assume that the message on screen when the user clicks “Send” is what he intended to send. As these are the only emails we allow the host to send, all spam is blocked from leaving the User VM when Gyrus is running. It is possible

for some of the email text to be scrolled off the screen, in which case we are unable to

completely verify the message contents in memory. We discuss the security implications of this situation in Section 6.3.

This general procedure can also be applied to web-based email. Using knowledge of the browser and webmail application semantics, we would use memory analysis to determine when the user clicks on the send button in the webmail client’s composition page. As with a standalone email client, VMI could be used to extract the message text, validate it using the on-screen display, and place it in the authorization database. When the message is sent, an HTTP (rather than SMTP) proxy would be used to filter outgoing webmail messages to ensure they were generated by a human.