Enhanced Password Security on Mobile Devices

(1)

Enhanced Password Security on Mobile Devices by

Dongtao Liu

Department of Computer Science Duke University

Date:_______________________ Approved:

___________________________ Landon P. Cox, Supervisor ___________________________

Bruce M. Maggs

___________________________ Xiaowei Yang

___________________________ Romit Roy Choudhury

Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

in the Department of Computer Science in the Graduate School of Duke University

(2)

ABSTRACT

Enhanced Password Security on Mobile Devices by

Dongtao Liu

Department of Computer Science Duke University

Date:_______________________ Approved:

___________________________ Landon P. Cox, Supervisor ___________________________

Bruce M. Maggs

___________________________ Xiaowei Yang

___________________________ Romit Roy Choudhury

An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

in the Department of Computer Science in the Graduate School of Duke University

(3)

Copyright by Dongtao Liu

(4)

Abstract

Sleek and powerful touchscreen devices with continuous access to high-bandwidth wireless data networks have transformed mobile into a first-class development platform. Many applications (i.e., "apps") written for these platforms rely on remote services such as Dropbox, Facebook, and Twitter, and require users to provide one or more passwords upon installation. Unfortunately, today's mobile platforms provide no protection for users' passwords, even as mobile devices have become attractive targets for password-stealing malware and other phishing attacks.

This dissertation explores the feasibility of providing strong protections for passwords input on mobile devices without requiring large changes to existing apps.

We propose two approaches to secure password entry on mobile devices: ScreenPass and VeriUI. ScreenPass is integrated with a device's operating system and continuously monitors the device's screen to prevent malicious apps from spoofing the system's trusted software keyboard. The trusted keyboard ensures that ScreenPass always knows when a password is input, which allows it to prevent apps from sending password data to the untrusted servers. VeriUI relies on trusted hardware to isolate password handling from a device's operating system and apps. This approach allows VeriUI to prove to remote services that a relatively small and well-known code base

(5)

Dedication

To my dear maternal grandmother,

(6)

List of Tables

Table 1: App workloads used for energy and performance experiments. ... 61 Table 2: App types grouped by their password handling (saving to the file system or sending over the network). App categories are shown in parenthesis. ... 70 Table 3: Problematic password handling practices in eight of the studied apps. One of the apps stored and sent passwords in plaintext. ... 71 Table 4: Popular third-party apps for Twitter and Facebook in the study. ... 97 Table 5: Summary and comparison of ScreenPass and VeriUI. ... 114

(11)

List of Figures

Figure 1: Overview of the ScreenPass architecture. ... 35 Figure 2: When a user taps on a password field, ScreenPass’s secure keyboard prompts the user to tag their password, and then applies the tag to each character using TaintDroid. ... 37 Figure 3: Static attack images tested: a. Scripted typeface, b. Italicized typeface, c. Narrowed typeface, d. Widened typeface, e. Rotated characters, f. Warped characters, g. Colored background, h. Colored characters, i. Blurred background, j. Blurred characters, k. Shiny characters, l. Random noise. ... 45 Figure 4: Sample frame from a dynamic-attack keyboard that ScreenPass could not detect by analyzing individual frames. ScreenPass detected the attack by analyzing a squashed image. ... 46 Figure 5: At the beginning, middle, and end of our initial user study, participants were asked to rate how much they agreed with statements that (1) they are worried that malware will steal their passwords, (2) they make sure SSL is enabled before logging into a website, and (3) logging into apps and websites on a smartphone is difficult. Pre-study responses reflect users’ experiences before participating in the Pre-study. Mid-Pre-study and post-study responses reflect users’ experiences during the study. The agreement scale ranges from one (“Strongly disagree”) to seven (“Strongly agree”). The top edge of each dark-gray box represents the 75th percentile response, the bottom edge of each dark-gray box represents the median response, and the bottom edge of the light-gray box represents the 25th percentile response. The top whisker extends to the maximum response, and the bottom whisker extends to the minimum response. Note that an absent light-gray box indicates that the 25th percentile response was equal to the median response. ... 53 Figure 6: When a participant tapped on a password field, we recorded the total time the keyboard was displayed (“Total time”), the time spent tagging the input (“Tag time”), and the time spent typing in a password (“Pwd time”). The top edge of each dark-gray box represents the 75th percentile time, the bottom edge of each dark-gray box represents the median time, and the bottom edge of the light-gray box represents the 25th percentile time. The top whisker extends to the maximum time, and the bottom whisker extends to the minimum time. Note that the top whiskers for “Total time” and “Pwd time” have been cut off to improve readability. The maximum login time was 117

(12)

seconds, and the maximum time to enter a password was 116 seconds. These results are

for the initial study only. ... 54

Figure 7: The first bar (“Pre-study”) depicts data for the period starting with the beginning of the study and ending before the first mid-study survey was completed. The second bar (“Midstudy”) depicts data for the period starting with the completion of the first mid-study survey and ending before the first post-study survey was completed. The third bar (“Post-study”) shows data for the period beginning with the completion of the first post-study survey and ending with the end of the study. During the pre-study and mid-study periods, users were immediately prompted to tag their password. During post-study period, users were not prompted to tag their passwords. These results are for the initial study only. ... 57

Figure 8: This graph compares the tagging rates for the eight users who participated in our initial user study and our followup user study. The first three bars depict tagging rates for those users during the three phases of the first study, when the study keyboard did not include a password manager (“No PWM”). The last two bars depict the tagging rates for those users during the two phases of the follow-up study, when the study keyboard included a password manager (“w/ PWM”). Explicit prompting was turned off during the “Post-study” phase for both studies. ... 58

Figure 9: The average time taken by the FrameChecker to scan a single frame. ... 62

Figure 10: The average frame rates observed while running several applications. ... 65

Figure 11: The average energy consumed over one minute while running several applications. ... 67

Figure 12: Overview of the VeriUI architecture. ... 85

Figure 13: Domain selection UI of SecureWebKit. ... 89

Figure 14: Format of a remote attestation certificate. ... 90

Figure 15: Third-party app study for Facebook and Twitter - they use one of the three methods to handle user authentications: embedded WebView, mobile browser, or ask for user accounts directly. ... 98

(13)

Figure 17: Comparison of run time performance for VeriUI, embedded WebView, and mobile browser to start and load URL. ... 103

(14)

Acknowledgements

It is an interesting, rewarding, as well as hardworking journey to pursue a doctoral degree. Depressing and exciting moments alternate during the past five years. It would not have been possible to finish this dissertation without the guidance of my advisor and other committee members, help from colleagues and friends, and support from my family. Here is my sincere thanks to them for being with me.

First and foremost, I would like to express my gratitude to my advisor, Landon P. Cox, for helping me grow as a researcher. He has taught me a lot of things from how to find meaningful topics for research, how to think critically, how to make trade-offs, how to write logically and accurately, and how to give a clear and attractive presentation. I will always remember those days we were working together for paper deadlines.

I am also very thankful to my committee members: Bruce M. Maggs, Xiaowei Yang, and Romit Roy Choudhury. I consider myself very lucky for having such a strong committee. They have provided me with invaluable suggestions during my preliminary exam and the final defense.

I would like to thank Eduardo Cuervo and Peter Gilbert, for their help on my research. I would also like to thank Dong Liu, Yin Lin, Ryan Scudellari, Zhiqiu Kong, Bi

(15)

Wu, Valentin Pistol, Xin Liu, Ang Li, Xin Wu and many more people I was lucky to meet at Duke. Without them life as a PhD student would be less enjoyable.

Most importantly, my greatest gratitude goes to my wife, Meng Zhang, for her love, care, support, encouragement, and accompany. I consider myself so lucky to meet her here at Duke and words cannot express how grateful I am to her for sharing her life with me for more than five years. The long trip towards the PhD would be a dull and miserable one if it was not for her who has stood by me no matter how difficult things might seem. I would like to thank her for cheering me up every time I was in low spirit. She has offered me love, comfort and assurance when things did not look as promising and she has been there to enjoy the better times with me. In short, I would not survive the graduate school without her support. My adorable and lovely little son, Grayson Liu, is the best and precious gift of life, encouraging me to pursue my research even in the most difficult time. I would also like to thank my mother-in-law, Wei Xu, for her greatest help in taking care of my little son during the final year of my PhD life. Without her help, I have no way to complete this dissertation in time. Finally, I want to thank my mom, Lianying Liu, for her unconditional love and support throughout my life, that help me evolve from an infant to a productive member of the society. Her education in my early years keeps me motivated through the PhD journey. I am forever indebted to all of them.

(16)

1. Introduction

1.1 Booming Mobile and Cloud Services

Mobile devices, such as smartphones and tablets have become immensely popular in recent years. According to a 2012 survey, Smartphone ownership has reached half of all U.S. mobile consumers [8]. Only in the second season of 2013, nearly 180 million Android devices and over 30 million iOS devices were sold worldwide [6]. Users are attracted by the powerful and interesting apps on the devices and spend a lot of time on them. By 2013, Google Play had nearly 1 million applications available in the market with 50 billion downloads in total [7], versus 0.8 million applications and 40 billion downloads in Apple iOS app store [2]. American users spend an average of 58 minutes per day on their mobile devices, of which 14% on browsing the mobile websites while the rest through mobile apps [11]. 27% of photos were taken through smartphone according a survey in Dec. 2011 [16].

At the meantime, lots of new cloud services are booming rapidly and taking up almost all the aspects of people’s life: social networks, instant messaging, video streaming, photo gallery, e-business, location-based services (LBS), online storage, etc. These cloud services have attracted thousands of millions of users worldwide. Facebook reached 1.11 billion monthly-active users all over the world as of May 2013 [4], who

(17)

monthly-active users of YouTube upload 100 hours of video every minute, and over 6 billion hours of video are watched each month [17]. More and more people access cloud services from their mobile devices. Facebook has 680 million mobile users [5]. Instagram has attracted 150 million monthly active users who generate 55 million photos per day through the mobile app [12]. 80% smartphone owners use their mobile device to shop online, according to a study conducted in Sept. 2012 [3].

The reason for such high level of participation in the mobile and cloud services is obvious: they bring a lot of fun and convenience to the users. People can look up in the map through their smartphone when they are adventuring the new cities. Or they can watch funny YouTube videos while waiting in the subway. Users can make video calls with colleagues no matter how far away they are. Or they can share photos taken in the trip immediately with best friends all over the world through OSNs. As a result, the functionalities provided by mobile devices and cloud services restyle people’s daily life and redefine how they interact with the whole world.

1.2 New Security Challenges in the Mobile Era

Mobile and cloud services bring convenience and fun to the users. However, enhancing security on mobile computing becomes an important and pressing problem, because mobile devices not only store and generate huge amount of personal or even

(18)

sensitive data in local systems, but also become portable and popular clients to access all kinds of cloud services which control the entrances to users’ cloud data.

Firstly, smartphones and tablets contains lots of personal information in the system, a considerable part of it is highly private or sensitive. Mobile devices nowadays are far more than simple call phones to users. Instead, they are becoming important personal assistants and useful portable computers. Besides contacts and messages in these devices, they may also contain private data such as daily schedule, scratch notes, or even business presentations.

In addition, as most modern smartphones and tablets are equipped with a number of sensors such as GPS, camera, microphone, accelerometer, gravimeter, pressure and light sensors, etc., users are very easy to generate information-rich content through their mobile devices. The sensors on mobile devices can sense the environment as well as users themselves comprehensively. As a result, contents from user-centric sensing contains a lot of personal or even sensitive information about the users such as locations, life habits, health conditions, and social activities.

Lastly, more and more people routinely access all kinds of cloud services through their mobile devices because of availability and convenience. Due to the high portability of mobile devices and the wide availability of internet access, users are able

(19)

clients. It is very convenient to generate User Generated Contents (UCG) from mobile devices and send them to cloud services. Due to the limitation of power, storage, and computation capacities of mobile devices, many mobile apps only serves as light-weighted clients to display and handle user interactions, and they rely on cloud servers to process and store huge amount of data. Before mobile apps can access to cloud services, users have to give them credentials for authentication with servers. The credentials such as usernames and passwords are very sensitive because they are the keys to access users’ properties in the cloud. There are lots of private and sensitive data in the clouds, such as commercial emails, social network photos, instant messenger contacts, and online documents, etc. Private data in the cloud will be in great danger if the credentials are not properly handled in the mobile devices and stolen by malicious entities. In addition, it is also unavoidable for users to perform other sensitive interactions with cloud services through mobile apps, such as online payment. Such sensitive interactions and operations must take place in a secure environment as well.

Since mobile devices such as smartphones and tablets contains lots of private information and access all kinds of cloud services on behalf of users, no doubt, they become valuable targets for malwares and attackers. Malicious apps may steal users’ credentials or fool users to pay online. Improper or buggy apps may have vulnerabilities in their code and put users’ sensitive information in high risks. Although centralized

(20)

app markets and app review processes do help relieve these problems, apparently they are not enough. Examinations on mobile apps before release cannot detect all the problems in them. On the other hand, it does allow installing apps from untrusted sources in some mobile platforms such as Android.

In this dissertation, we focus on protecting sensitive input data when users are accessing cloud services through mobile apps and taking sensitive operations, such as authentication and online payment. When users interact with mobile apps and input sensitive information, they need to make sure that the mobile apps handle their sensitive operations properly and securely. Our goal is to enhance the security for users’ most important data, and also relieve the concerns of users when they are enjoying the fun and convenience provided by mobile apps.

Similar security problems have been studied in desktop computers. However, we are facing the following new challenges for touchscreen mobile devices:

Third-party apps: Third-party apps become more and more popular in the mobile app market. Many cloud services open their platform APIs to third-party developers and encourage them to contribute to the ecosystem. For example, popular OSNs such Facebook and Twitter allow third-party apps to access users’ OSN data once users grant permits to them. These third-party apps greatly extended the services

(21)

party apps because they provide augmented functionality and integrated services, or just because first-party app is unavailable in the market. However, when users are accessing first-party cloud services through third-party mobile apps, they unavoidably have to give sensitive data to third-party apps for authentication, online payment, etc. This causes great security issues. Currently there is no enforcing mechanism to isolate third-party apps completely from the secrets between users and cloud services. And there is also lack of regulation mechanism to monitor how the sensitive data is being handled by third-party apps so as to prevent improper behaviors by buggy or even malicious code.

Software keyboard: The input entry is no longer trusted in current touchscreen devices. Most mobile devices are touchscreen devices, which do not equipped with hardware keyboards and only provide users with software input methods. In desktop computers with hardware keyboard, the keyboard driver is part of system code and handles user inputs directly. But in touch screen devices, only the touchscreen driver is part of the system code, and the input method works as an application to turn touch actions into key strokes. So any mobile app can implement a software keyboard within the app. It can translate the user taps into character inputs all by itself, without calling the default input method service. At a result, the entry for inputting sensitive information is no longer trusted in touchscreen devices.

(22)

Embedded web UI: The web UI is not secure and can be manipulated by mobile apps. Besides mobile browsers, mobile system also provides developers with an UI widget called WebView. WebView provides basic browser functionalities and can be embedded in the mobile apps to display web contents. It is widely used in mobile apps because it is a convenience way to communicate with cloud services. Many mobile apps embed it to serve as a highly-customized mini-browser. However WebView also raises new security issues to the system because it can be easily manipulated by its host mobile app. To help web applications tightly integrated with mobile apps, WebView provides a number of APIs for developers to register event listeners, and inject JavaScript code, etc. The host app can easily hijack the web content and sniffer user input in WebView. As a result, WebView compromise the trust base for browsers. Different from desktop browsers such as Chrome and Firfox in which people trust, WebView is not trusted unless its host app is completely trusted and secured.

Small screen size: The much smaller screen size in mobile devices makes it even more difficult to protect users against social-engineering attacks than in desktop computers. Screen is very precious estate in smartphones so that most mobile system supports full-screen mode to maximize display utilization. There is no reservation area in the screen and mobile apps are allowed to arbitrarily write to any part of the screen.

(23)

computers and browsers, do not work anymore on mobile devices. What’s even worse, small-size screen with less contents and visual clues displayed, makes users more difficult to judge the authenticity of the UIs and easier to be deceived. Mobile browsers which usually hide the address bar to save the screen, attracts little attention from the users on the domain changes in the URLs. All these factors combined together make the social-engineering attacks much more difficult to prevent.

1.3 Enhance Trusted UI in Touchscreen Mobile Devices

To solve the new security challenges in touchscreen mobile devices, we propose to enforce trusted UI to guarantee the security for sensitive operations and protect users’ sensitive information against improper or even malicious mobile apps.

We practice the following three principles to design and implement trusted UI in our systems, which are also three major challenges we must consider and solve:

• Secure environment: Trusted UI must provide users with a secure environment to complete sensitive operations, such as authentication or online payment. It must be part of trusted computing base (TCB) in the system, or at least controlled and monitored by TCB. The display and operation of trusted UI must be handled by trusted code.

• Usage Enforcement: The usage of trusted UI by mobile apps must be enforced. In other words, there must be an enforcing mechanism to

(24)

guarantee mobile apps will use trusted UI to handle sensitive user interactions and input. Whenever mobile apps ask users to take sensitive operations, they have no way to circumvent trusted UI but only adopt it. At least, if an improper or malicious mobile app is trying to use its own UI to handle user interactions instead of trusted UI, TCB in the system must detect and prevent it in time.

• Anti-spoofing campaign: The authenticity of the trusted UI must be protected. Any security mechanism handling user interactions must consider social-engineering attacks. Malicious apps can display identical or highly similar UI to spoof the users, in order to steal sensitive information from them. TCB in the system must be able to detect and prevent phishing or spoofing attacks. More importantly, the user responsibility must be minimized in the anti-spoofing campaign. We cannot rely on the users to make critical judgments whether they are interacting with the trusted UI or not. Users are not expertise on security and prone to ignore warnings according to previous studies. They have less information than the system about what is happening on the device. If users have to be fully trained or educated to prevent spoofing attacks,

(25)

In our research, we have done two case studies, ScreenPass and VeriUI to enhance the security of sensitive input through trusted UI. Both studies practice the three major principles in their designs and implementations.

1.4 Dissertation Statement and Overview

Security on mobile computing, especially the security of sensitive input data through mobile apps, is the focus of my dissertation. Sensitive input data includes but not limited to: credentials such as passwords, financial accounts such as credit card numbers, personal identities such as SSN numbers, etc. My dissertation statement is as follows:

This dissertation validates the statement via the design, implementation, and experimental evaluation of two systems: ScreenPass and VeriUI.

We propose ScreenPass – a system that enhances password security on touchscreen devices through OCR and taint-tracking techniques. Passwords are highly sensitive data, and users need to ensure their passwords are handled properly by mobile apps. In order to provide users with trusted password entry, ScreenPass checks the

When users access cloud services from touchscreen mobile devices, it is feasible to enhance the security for sensitive input by enforcing trusted UI in mobile apps.

(26)

authenticity of software keyboard using optical character recognition (OCR) to validate it is legally called through system input method framework. ScreenPass provides a special UI to capture user intent explicitly when she inputs the password, and tags the password with the secure domain which the user selected. After the password is properly tagged, ScreenPass monitors the password usage throughout the system using taint-tracking, and enforce security policies to protect the password against improper or even malicious mobile apps.

We also propose VeriUI – a system that enhances security for sensitive operations on mobile devices using ARM TrustZone technique. VeriUI leverages ARM TrustZone technique to run two systems at the same time: a normal system with rich applications in the normal world, and a minimized secure system for users to take sensitive operations in the secure world. A minimized and secure webkit runs in the secure word to provide generalized service to third-party apps running in the normal world. Secure webkit sends requests to cloud servers together with attestations generated by the secure system, which proves the secure environment. First-party cloud services checks and verifies the remote attestations before they process the requests so that third-party apps are required to use trusted web UI in the secure world to handle sensitive user interactions with first-party cloud servers.

(27)

1.5 Dissertation Outline

The rest of this dissertation is organized as follows: Chapter 2 provides the background knowledge and techniques related with ScreenPass and VeriUI, including Android system, OAuth 2.0 protocol, taint-tracking, and trust computing. Chapter 3 and Chapter 4 describe the major contributions of this dissertation. Chapter 3 describes the design, implementation, and evaluation of ScreenPass, a system that enhances password security in touchscreen device. ScreenPass verifies the authenticity of software keyboard through OCR, and monitors the usage of passwords by mobile apps through taint-tracking. Chapter 4 presents the design, implementation, and evaluation of VeriUI, a system that enhances security for sensitive operations in mobile devices. VeriUI leverages ARM TrustZone technique to provide users with trusted web UI in the secure world, and enforces the usage of trusted UI by third-party apps through remote attestation. Chapter 5 provides the related work on password security, anti-phishing techniques, trusted UI, and trust computing, etc. It discusses the state of art in these fields and compares it with ScreenPass and VeriUI respectively. Chapter 6 concludes the dissertation with a summary of our contributions. It also discusses our experiences on designing and implementing ScreenPass and VeriUI.

(28)

2. Background

This chapter provides some background knowledge regarding the design and implementation of ScreenPass and VeriUI. We first briefly introduce some important components of Android system in Section 2.1: Android framework, web browser, and display subsystem. After that, we provide an overview of OAuth 2.0 protocol for authorization and authentication in Section 2.2. In Section 2.3 we discuss the taint-tracking technique and an existing implementation for Android - TaintDroid. Finally, we present the background knowledge related to Trust Computing especially ARM TrustZone and remote attestation on Section 2.4.

2.1 Android System

The Android mobile platform includes a Linux kernel, Java programming environment, and several layers of trusted middleware. We briefly describe the most relevant of these subsystems below.

2.1.1 Android Framework, Apps, and Services

Application framework: An Android app consists of three components. An activity defines the app’s UI, and controls all of an app’s widgets (called “views” in Android). Activities interact with background services and content providers. An application’s background service is a separate thread that contains the application’s core

(29)

logic and soft state. An app’s content provider offers an interface to persistent storage such as the file system or a SQL database.

Dalvik virtual machine: App components are written in Java, and this code is compiled into Dalvik Execution (DEX) byte codes. The Android team developed their own Dalvik Virtual Machine (VM), which is optimized for mobile devices. Android apps execute within separate VM instances, and VM instances are managed as processes by the underlying Linux kernel. Apps use Android’s custom IPC protocol (Binder) to communicate with processes outside of their own.

Input method framework: The Input Method Framework (IMF) provides a way to invoke and manage input methods such as onscreen software keyboards. When a user selects a text-input box, the widget makes a request to the IMF for an Input Method Engine (IME) that can receive input from the user. The IME runs in a separate process from the requesting app and writes user input to a string field in the text-input widget via IPC.

2.1.2 Mobile Browser and WebView Widget

There are two types of web UI facilities on Android: the mobile browser and WebView widgets.

A WebView is a system UI widget that serves as an embedded browser for loading URLs within an app. Mobile developers can easily integrate WebView

(30)

components in their apps to render web pages and display contents from cloud services.WebView supports basic browser functionality such as navigation and JavaScript execution. It also provides an API for mobile apps to interact with and customize WebView output. Because of its usefulness, flexibility, and ease of use, WebView are widely used in mobile apps.

There are several ways that an app can monitor its in-app WebView components. Firstly, it can register event listeners through WebView APIs. With these event listeners, the app can hijack the web content, or even sniff everything the user does with the webpage. Secondly, the app can inject and execute JavaScript code into the web page. The injected JavaScript code can also invoke Java code to pass information back to the host app. Through these methods, the webpage loaded in WebView is tightly integrated with the mobile app and can be easily customized. However, these WebView APIs compromise the trustworthiness of the embedded browsers. When users interact with desktop browsers, they trust them because they are independent programs. But the WebView is only as trustworthy as its host app.

The mobile browser is a stand-alone app supporting full-fledged web features as its desktop counterparts, and can be called by other apps via IPC. Each Android app runs as a Linux process with its own unprivileged UID. An app can request the mobile

(31)

browser and runs in background. The mobile browser becomes the foreground app to interact with the user. After finishing its work, the mobile browser can be redirected back to the app. Unlike the embedded WebView, third-party apps have no control over the mobile browser.

2.1.3 Display Subsystem

Frame Buffer: The Linux frame buffer (FB) is an abstraction that provides access to the graphical output of the device. Android’s FB device is found at /dev/graphics/fb0 and can be accessed by only the root user and the graphics group. The FB has a front and back buffer: the front buffer contains the pixel data currently displayed on the screen, and the back buffer is used for composition by the SurfaceFlinger.

SurfaceFlinger: The SurfaceFlinger (SF) is Android’s system-wide surface composition engine, and is responsible for managing surfaces and the virtual frame buffer device. When an activity creates a widget, it asks the SF to allocate a new surface. The SF allocates a back and front buffer for the surface and returns a handle to the requesting process. As the owner of this surface, the requesting process is free to update the back buffer as needed. The back buffer is used by the activity for rendering new frames. Once a process has finished rendering, it swaps the front and back buffers and asks the SF to draw the new frame to the screen. The front buffer is used by the SF for composition.

(32)

2.2 OAuth 2.0 Protocol

OAuth 2.0 [9] is an open and standard resource authorization protocol which is widely used by modern web services. It enables users to grant a third-party to access to their resources stored in the cloud to a limited extent, without sharing their long-term credentials, such as usernames and passwords. OAuth allows a cloud service to generate a capability called OAuth token for an app on the user’s behalf which will be used as permit to access the user’s data later. Every OAuth token corresponds to a list of resources in the cloud and has an expiration date. The user can also actively revoke an app's OAuth token before it expires through the cloud service. OAuth protocol supports different types of third-party apps, such as websites, browser plug-ins, mobile apps, and desktop applications. OAuth is designed as an authorization protocol, but it is also widely used by third-party apps in single sign-on (SSO) authentication services.

The procedure for mobile apps to obtain an OAuth token from cloud services is as follows: when a mobile app first request access to a resource in the cloud service, it redirect the user to the OAuth login website of the cloud service through WebView default browser, together with a list of resources it requires in the parameters. The user completes the authentication process through the web UI and approves the required permissions from the app. The cloud service will create an OAuth token, and a client

(33)

secret for the app to retrieve the OAuth token later. Then it redirects the WebView or default browser back to the app, returning it with the generated client secret.

After presenting the client secret to the cloud service and receiving its OAuth token, the app can make API calls to the cloud service to access the corresponding user data by presenting the OAuth token. The mobile app usually stores the OAuth token in the client for future use, or even send the token to the app’s own server.

OAuth 2.0 provides a standard solution for first-party cloud services to share data with third-party mobile apps, and it has been widely accepted and deployed. However, not all third-party apps implement the protocol properly and completely, a portion of third-party apps ask users for account information through its app UI directly and then complete the whole process by itself.

2.3 Taint-tracking and TaintDroid

ScreenPass relies on TaintDroid for taint-tracking. TaintDroid is a system-wide taint-tracking extension for Android [27]. TaintDroid associates tags with sensitive data that is released to untrusted code via taint sources such as requests for a device’s current location. Taint tags are 32-entry bit vectors that can be associated with program variables, IPC messages, and files. TaintDroid implements tag-propagation logic to capture data dependencies as the system executes.

(34)

2.4 Trust Computing and ARM TrustZone

2.4.1 ARM TrustZone

TrustZone [19] is a set of security extensions for trusted computing supported by ARM processors since ARMv6, such as ARM Cortex A8, A9, and A15. TrustZone’s security architecture embodies a “two worlds” paradigm. One world is called the “normal world” or rich execution environment (REE), and the other is called “secure world” or trusted execution environment (TEE). The two worlds are separated by introducing a special 33rd address line on the system bus. Hardware-based access control will be enforced according to the state on the 33rd address line, so that untrusted code in the normal world cannot access to the protected memory pages or peripheral registers of the secure world.

TrustZone supports running two operating systems at the same time without degrading system performance. The rich OS such as an Android system and all applications run in the normal world, whereas a secure OS such as a trusted kernel and secure applications runs in the secure world. Software stacks in the two worlds can be bridged through a secure monitor call (SMC). System switch is managed by the TrustZone monitor, which is part of the secure world. Once SMC is executed, the hardware switches to the TrustZone monitor to perform a secure context switch into the

(35)

secure world. Besides SMC, TrustZone monitor can also be entered by a number of hardware interrupts.

TrustZone protects code and data integrity and confidentiality in the secure world by isolating untrusted code running in the normal world from protected resources. Secure boot ensures the system components enter trusted states. When the platform boots, it first boots into the secure world and the system firmware setups the entire runtime environment for the secure world. After the secure OS boots, the secure world yields to the normal world by loading the bootloader for the rich OS. When the booting process finishes, the normal world must use SMC to call back into the secure world. As a result, even though the rich OS in the normal world is compromised, the data and code in the secure world is still not tampered.

2.4.2 Remote Attestation

Remote attestation is the process to make and verify a claim about the properties of a target by providing evidence to the authority to check through network. In system security, the evidence can be a hash of the system states to verify that the OS has not been tampered before taking sensitive operations. The attestation can be created by either trusted hardware or software.

Even though ARM TrustZone provides isolation between normal world and secure world, it does not provide remote attestation capabilities originally. This

(36)

requirement can be full filled through hardware or software solutions. For hardware method, an external TPM [18] or MTM module can be introduced into the platform to provide binding, sealing, and remote attestation. In software solution, these capabilities can be implemented in a trusted kernel in the secure world. Since TrustZone is robust, venders can build secure storage in the secure world. The keys and sensitive data can be stored in the TEE-only addressable area as protected resources to be isolated from the untrusted code. The trusted kernel can provide attestation to remote authorities.

(37)

3. ScreenPass: Secure Password Entry via OCR and

Taint-tracking

In this chapter we describe ScreenPass [34] – a system that enhances password security on touchscreen devices through OCR and taint-tracking techniques.

Users routinely access cloud services through third-party apps on smartphones by giving apps login credentials (i.e., a username and password). Unfortunately, users have no assurance that their apps will properly handle this sensitive information. In this chapter, we describe the design and implementation of ScreenPass, which significantly improves the security of passwords on touchscreen devices. ScreenPass secures passwords by ensuring that they are entered securely, and uses taint-tracking to monitor where apps send password data. The primary technical challenge addressed by ScreenPass is guaranteeing that trusted code is always aware of when a user is entering a password. ScreenPass provides this guarantee through two techniques. First, ScreenPass includes a trusted software keyboard that encourages users to specify their passwords’ domains as they are entered (i.e., to tag their passwords). Second, ScreenPass performs optical character recognition (OCR) on a device’s screenbuffer to ensure that passwords are entered only through the trusted software keyboard. We have evaluated ScreenPass through experiments with a prototype implementation, two insitu user studies, and a small app study. Our prototype detected a wide range of dynamic and static keyboard-spoofing attacks and generated zero false positives. As long as a

(38)

screen is off, not updated, or not tapped, our prototype consumes zero additional energy; in the worst case, when a highly interactive app rapidly updates the screen, our prototype under a typical configuration introduces only 12% energy overhead. Participants in our user studies tagged their passwords at a high rate and reported that tagging imposed no additional burden. Finally, a study of malicious and non-malicious apps running under ScreenPass revealed several cases of password mishandling.

3.1 Introduction

Users routinely access cloud services such as Facebook and Twitter via third-party applications (i.e., “apps”) on touchscreen devices. To interact with these services, an app must provide a user’s password to one or more remote servers. Passwords are highly sensitive data and handing them over to third-party apps raises the following question: how can users be sure that an app properly handles their passwords? The recent discovery of password-stealing apps and other vulnerabilities in Android demonstrates that users have reason to be concerned [31, 58].

A trusted information-flow monitor such as TaintDroid [27] can track the propagation of password data, but data must be tagged before it can be tracked. Past systems have tagged sensitive user input via a secure attention sequence (SAS), such as “@@”, to indicate the beginning of a password [41, 47]. Trusted software (i.e., a keyboard

(39)

driver) must monitor the incoming character stream for the SAS and, when it appears, must treat subsequent characters as password data.

However, recognizing SASes in the text-input stream of a touchscreen device is difficult because of software keyboards. Trusted code such as the touchscreen driver and user-interface manager receive taps on a touchscreen as a set of coordinates, and can only understand the intended meaning of a user’s taps when they understand the content of the screen.

The platform could reserve part of the screen for secure gestures, but modern devices’ small screens make screen real estate a precious resource. Important third-party apps such as games, media players, and web browsers need write access to the entire screen.

Unfortunately, a malicious app can abuse this privilege to spoof a platform’s user interface, including its trusted software keyboard.

Under a spoofing attack, a user may input an SAS meant for the trusted platform without realizing that the input was delivered directly to a malicious app.

To ensure that passwords are input securely, we have developed a system called ScreenPass. ScreenPass provides a special-purpose software keyboard for entering sensitive text such as passwords that allows a user to tag her input with a domain (e.g.,

(40)

Google, Facebook, or Twitter). ScreenPass uses these tags to taint-track the user’s password data as it propagates through the app.

Tags can subsequently be used to enable a number of useful policies.

For example, the system may want to know when plaintext password data is written to disk or when password data is shared between apps via IPC. Similarly, if an app tries to write password data to the network, a guard can check the write’s safety by reasoning about features of the network endpoint (is the destination port associated with unencrypted traffic?), the taint tag’s domain (is the destination IP address in the appropriate domain?), or a password’s usage history (is the app adhering to the OAuth specification and only sending a password once?). If an action is considered unsafe, then the guard can either block the data from being released, or it can raise an alert.

ScreenPass’s primary goal is ensuring that password data is only entered through a trusted keyboard so that it can be tagged before it is given to an app. To achieve this goal, ScreenPass performs dynamic optical character recognition (OCR) on regions of the screen where users expect a software keyboard to appear. If text in this region of the screen is sufficiently similar to the “qwerty” text of a keyboard and the foreground app has not yielded control of the screen to the trusted keyboard, the OS can kill the foreground process or raise an alert.

(41)

• Previous secure UIs have restricted where untrusted code can write to the screen [29, 55], but ScreenPass is the first system designed specifically for the limited screen real estate of a mobile device; ScreenPass protects sensitive input by restricting what untrusted code may write to sensitive parts of the screen.

• ScreenPass is the first system to prevent UI spoofing through efficient and robust online computer vision. Software-only computer-vision techniques such as OCR minimize ScreenPass’s hardware requirements and allow our approach to generalize to any modern touchscreen device.

• We have implemented a ScreenPass prototype for Android and evaluated its robustness to attack as well as its energy and performance overheads. Our attack study found that ScreenPass is robust to a wide range of static and dynamic attacks while generating zero false positives. ScreenPass only failed to detect spoofed keyboards with noisy backgrounds that look significantly different than a standard system keyboard. ScreenPass consumes no additional energy when the screen is not updated or tapped. Under a typical configuration, ScreenPass introduces 12% energy overhead in the worst case. It consumes far less for apps that infrequently update the screen or require little user input. ScreenPass also had negligible impact on interactivity; under

(42)

a typical configuration, no workload experienced a statistically significant drop in frame rate under ScreenPass.

• We have also conducted two in-situ user studies with ScreenPass-like software keyboards. Our initial 18-user, three-week user study showed that tagging passwords imposes little additional burden on users, and showed that users will tag passwords at a high rate when prompted. A smaller, follow-up study demonstrated that integrating a password manager with ScreenPass incentivizes users to tag their passwords at a high rate even when they are not prompted.

The rest of this chapter is organized as follows: Section 3.2 describes ScreenPass’s trust and threat model, Section 3.3 provides an overview of ScreenPass’s approach to securing passwords on mobile devices, Section 3.4 describes the design and implementation of ScreenPass, Section 3.5 describes our evaluation, Section 3.6 gives our conclusions.

3.2 Trust and Threat Model

To ensure fractal behavior, Fractal Coherence requires a hierarchical logical structure. However, Fractal Coherence does not place any requirements on the physical topology of the system. The hierarchical logical structure can be implemented on any

(43)

to a system’s structure, we are referring to its logical structure. In this thesis, we confine our discussion to the tree structure with a consistent degree at each level, but we believe our methodology can also apply to other hierarchical logical structures.

We assume that a user trusts the mobile platform running on her device, and relies on the operating system’s existing mechanisms to thwart attacks against the platform itself. We assume that the trusted computing base (TCB) consists of any code that sits behind the standard set of APIs on which a mobile app is implemented.

For example, the essential components of Android’s security model are a Linux kernel, user-space daemons called services running under privileged UIDs, and an IPC mechanism called Binder. Each Android app is signed by its developer and runs as a Linux process with its own unique, unprivileged UID. Apps access protected resources such as the software keyboard through Binder IPC calls. Apps and services can limit interactions with other code by specifying access-control policies to the Binder dispatcher.

An information-flow monitor such as TaintDroid cannot track important data unless it is properly tagged and cannot protect tagged data without release policies. However, even though these tags can allow a monitor to provide stronger security for a user’s passwords than is possible today, some classes of attacks are difficult or impossible to prevent with existing monitors.

(44)

For example, like many monitors, TaintDroid does not prevent data from leaking through covert channels such as a program’s control flow or timing information. Recent work on selectively tracking implicit flows using information from the symbolic execution of a program is promising [32], and such techniques may be applicable to existing monitors. We consider attacks against the taint-tracker to be outside the scope of ScreenPass; our goal is to ensure that password input is always tagged. Furthermore, ScreenPass is not designed to prevent passwords from leaking via covert channels like a device’s motion sensors [24, 40]. These vulnerabilities have straightforward solutions, such as disabling access to motion sensors whenever a password is input.

ScreenPass’s taint tags associate data with a coarse-grained domain. As a result, ScreenPass alone cannot prevent attacks in which passwords are leaked within a domain. For example, untrusted code could log into a service using credentials that are hardcoded into the app binary or accessed from an external server. Once logged in, the untrusted code could leak another user’s password through the original account (e.g., by writing the user’s password on an attacker’s Facebook wall). Alternatively, an untrusted app could encode a user’s Facebook password as a world-readable message on the user’s own wall, wait for an external machine to read the post, and then delete the post before the user noticed anything strange. These attacks are challenging, but allowing a

(45)

problems can be detected than if passwords continue to be shared without restriction. For example, many services maintain detailed records of all user actions. If a service detects a same-domain attack on mobile users’ passwords, then it may be able to use its logs and identify how many other users were affected.

Finally, this work assumes that passwords are input through a “qwerty” English keyboard, though we believe that our techniques can be generalized to more specialized keyboards, such as numeric keyboards or non-English keyboards. We assume that users will not trust apps that require passwords to be input through a keyboard with a non-standard layout or an unusual set of keys.

3.3 Approach Overview

ScreenPass must ensure that the TCB intercepts all password input so that it can be tagged and tracked by an information-flow monitor like TaintDroid. The original TaintDroid prototype [27] tracks sensitive data that is accessed through a small number of

API calls with well-defined semantics. By interposing on these taint sources, TaintDroid can tag several important classes of sensitive data before they are accessed by untrusted code. Once this data has been released, TaintDroid tracks its propagation by integrating dynamic taint analysis (taint-tracking) into Android’s Dalvik VM, native system libraries, file system, and Binder IPC mechanism. Tagging sensitive data as it

(46)

enters a program by interposing on API calls with well-defined semantics is sufficient for many classes of sensitive data (e.g., a device’s location and IMEI number), but this approach is not a good fit for passwords.

Apps access passwords through character-stream interfaces that do not distinguish between sensitive and non-sensitive data. Previous approaches to identifying password inputs have required users to explicitly identify their passwords through a secure attention sequence (SAS) [41, 47]. An SAS is a well-known sequence of characters (e.g., “@@”) that labels bytes in the stream as password data. As long as trusted code can identify individual characters, it can look for the SAS and tag password data. Unfortunately, on touchscreen devices with software keyboards, untrusted apps can circumvent the SAS by hiding text input from the TCB.

Software keyboards translate touchscreen gestures to characters by correlating the screen location where a user tapped or gestured with what was displayed at that location. Well-behaved apps allow trusted code to map gestures to characters by invoking the platform’s standard software keyboard. When invoked, the trusted software keyboard assumes control of the lower half of the screen, where it displays a virtual keypad. The keypad receives the coordinates of taps from the touchscreen driver, translates those inputs into characters, and returns the characters to the app. In addition,

(47)

boxes into “password” mode by manipulating its attributes. For example, Android apps can set the “android:password” attribute of a text-input widget to explicitly notify the TCB when input to the text box is a password.

Unfortunately, touchscreen platforms cannot force apps to yield control of the screen to the trusted UI. An untrusted app can bypass the trusted keyboard by retaining control of the screen, displaying a keypad that is visually indistinguishable from the trusted one, and translating touchscreen gestures to characters by itself.

Under such a spoofing attack, trusted code such as the touchscreen driver and UI manager would only see taps and swipes on the screen and cannot interpret the semantics of those gestures. Trusted code would have no way of knowing that a user’s taps and swipes were meant to be text inputs, and a user would have no way of knowing that the TCB was oblivious to her password input.

To address these challenges, we developed ScreenPass while keeping the following design considerations in mind:

Minimize hardware assumptions. ScreenPass should work on as many touchscreen devices as possible. There are hundreds of touchscreen devices with a wide range of hardware configurations. However, because of the minimalist industrial design of modern smartphones we conservatively assume that all interactions with a user occur through the touchscreen. Furthermore, ScreenPass should not re-purpose already

(48)

overloaded inputs such as a home or power button for an SAS, or rely on a non-touchscreen output such an external LED to signal the user.

Maximize display utilization. Screen real estate is a precious resource for mobile apps. Prior approaches to secure UIs [29, 55] have reserved part of the screen for messages from the TCB to the user, but this is inappropriate for consumer touchscreen devices. Android, iOS, and other mobile platforms provide a bar at the top of the screen for system information and notifications, but allow apps such as games, media players, and web browsers to hide this bar when in full-screen mode. To preserve the functionality of these important apps, ScreenPass should also support full-screen mode.

Minimize users’ responsibilities. ScreenPass should avoid burdening users’ with easy-to-ignore and error-prone new responsibilities. For example, ScreenPass could display a secret image, known only to ScreenPass and the user whenever it detected password input. If the image was absent during password entry, a user would be expected to notice and not provide her password. We did not pursue this approach because (1) ScreenPass would have to rely on users to choose images that could not be guessed by untrusted code, and (2) the secret image would have to be managed through additional layers of UI (e.g., for setting and resetting), each of which would need to be secured.

(49)

Furthermore, Schechter’s 2007 study of site-authentication images (SAIs) found that these secrets provide very little security in practice [49]. This study found that 92% of participants logged in to a banking site when their secret image was replaced with a generic maintenance message. Schechter’s study also found that 100% of participants logged in when the secure-connection indicator (i.e., a red “https” string) was absent. Other studies of web-based systems have found that relying on users to heed warnings or notice the absence of a visual signal are often ineffective [25, 26, 57].

The main observation behind our approach to securing password input is that a keyboard consists of a unique and predictable sequence of characters. Thus, ScreenPass uses efficient optical character recognition (OCR) to search for text in the portion of the screen where a user would expect a keyboard (i.e., the lower half). If the OCR engine detects character sequences that are close to those of a keyboard (e.g., “qwerty” or “qvvrty”), it checks whether the trusted software keyboard has been invoked. If not, ScreenPass can take a variety of actions, such as killing the foreground process or warning the user.

3.4 ScreenPass

ScreenPass is a realization of the approach to secure password entry for Android touchscreen devices outlined in Section 3.3.

(50)

3.4.1 Capturing and Tainting Password

Figure 1 shows a high-level overview of how ScreenPass captures and tags a user’s password. Arrows in the diagram represent IPC calls across components. All components in the figure are trusted except for the untrusted app in gray.

(51)

The first step in tagging a user’s password is for the untrusted app to request an IME from the IMF system service. The request contains attributes describing the target widget so that the software keyboard can be appropriately configured. For example, if a text field is an email address, the software keyboard can display an “@” key along with the usual alphabetical keys. The IMF assigns the app to the ScreenPass IME and passes along the target widget’s attributes. The IME checks these attributes to see if the widget is in password mode.

If a widget says that it is in password mode, the secure keyboard first contacts the ScreenPass system service to put the device in password mode. The ScreenPass system service acts as a central repository for ScreenPass state. IPCs to access this state can be controlled using Android’s code-signing framework. Only code that has been signed by a trusted entity is allowed to communicate with the ScreenPass system service.

A malicious app could claim that the target widget was not in password mode, while still obfuscating text input as a user would expect of a password field. If this happens, the ScreenPass IME will not prompt the user to tag their password, and the user must either (1) avoid entering their password, or (2) use an area at the bottom of the secure keyboard to actively tag their password and put the device in password mode. We call this a silent-widget attack. We will return to this attack shortly.

(52)

Once the device is in password mode, the keyboard makes a request to the Android Window Manager to display the secure keyboard and domain chooser. Figure 2 shows the software keyboard and chooser. The tag list allows a user to associate an administrative domain with her password. This association must be set before a user inputs her password or else characters may not be properly tagged. As a result, the software keyboard ignores input until the user has chosen a tag, and the secure keyboard immediately prompts the user for a domain when triggered. Figure 2 shows the prompt a user sees after the keyboard is displayed. ScreenPass uses an integrated password manger to incentivize users to tag their passwords when they are not explicitly prompted (e.g., during a silent-widget attack).

Figure 2: When a user taps on a password field, ScreenPass’s secure keyboard prompts the user to tag their password, and then applies the tag to each character

(53)

Once the user has chosen a domain, the keyboard looks up the associated taint tag and applies the tag to all subsequent character inputs (whether input by the user using the keypad or loaded from the password database). It should be noted that once a user has chosen a tag, there is no way to change the tags that have been previously applied. However, our prototype allows users to choose from a preset list of domains or to define a new tag.

As described earlier, TaintDroid’s taint tags are 32-bit bit vectors. The original TaintDroid design treats tags as bit vectors so that data derived from multiple sensitive sources can be represented. However, this approach leaves ScreenPass with too few bits to represent all of the services a user might use. As a result, ScreenPass reserves the highest 10 bits of each tag and treats those bits as a 10-bit namespace. Passwords do not need to be associated with multiple domains, and 1023 domains should be sufficient for most users.

When a user taps on the secure keyboard, it maps the coordinates of the taps to characters, and uses TaintDroid to tag the IPC message back to the untrusted app that contains password data. Once the untrusted app receives the tainted IPC message, TaintDroid propagates its tags as the app uses the password. When the app attempts to send a password over the network, to the file system, or over an IPC channel, TaintDroid inspects the buffer’s tags and can enforce a system-defined policy.

(54)

3.4.2 Prevent Spoofing Attacks

The FrameChecker is responsible for detecting attempts to spoof the software keyboard. The FrameChecker is a thread running within the SF. The SF’s main thread composes the system’s surfaces together and posts the resulting frame to the frame buffer’s (FB’s) front buffer. Implementing the FrameChecker as a separate thread keeps spoof detection out of the critical path of the SF’s main work.

At a high-level, a typical working cycle of the FrameChecker consists of three stages: (1) capturing the screen, (2) performing OCR to detect software keyboards, and (3) sleeping for a random, short amount of time. The random sleep time helps ScreenPass reduce its energy overhead to an acceptable level. As we will show in Section 3.5, performing OCR without resting could quickly drain a user’s battery under certain workloads. Our current implementation sleeps for a randomly chosen time between 0ms and 1,000ms; that is, the expected sleep time is 500ms.

3.4.2.1 Screen Capture

Upon waking up, the FrameChecker first checks with the ScreenPass system service to see if the device is in password mode, if the touchscreen has not been tapped, or if the screen has not changed since it went to sleep. If any of these conditions are true, then the FrameChecker does not need to analyze the screen and can go back to sleep.

(55)

the FrameChecker to avoid inspecting the FB if it has not changed since the last scan. In addition, a thread runs alongside the FrameChecker to detect touchscreen inputs. It detects all touch events by polling the device file /dev/input/event0 and maintains a status variable indicating whether a touch event occurred. After the FrameChecker wakes up, it checks this status variable.

The vast majority of the time, the screen will not have changed. Furthermore, only analyzing the screen while a user is interacting with it avoids performing analysis when the screen is updating but the user cannot be inputting her password. This is common for games that are primarily controlled by a device’s motion sensors and for watching videos.

If the FrameChecker finds that the device is not in password mode, that the screen has been updated, and that the user is interacting with it, then the FrameChecker must perform OCR. However, taking a single instantaneous screen capture would leave ScreenPass vulnerable to dynamic attacks. Under a dynamic attack, malicious code rapidly alternates between frames of a partial keyboard so that a user viewing the screen sees a complete keyboard, but no single frame contains one. To combat this attack, another thread in the SF computes a “squashed image” composed of all instantaneous screen captures between FrameChecker requests. The squashed image contains the

(56)

average pixel values over a period of time, and, as we will see in Section 3.5, closely approximates what a user perceives under a dynamic attack.

To reduce the computational load of computing the squashed image, ScreenPass takes two steps. First, we only target the bottom two-fifths of the screen where a keyboard may appear. Our Nexus S prototype has a screen resolution of 800 480 pixels, and our squashed image is 320 480 pixels. This not only reduces the effort needed to compute the squashed image, but the work of the OCR engine as well. Second, we used the NEON 128-bit SIMD instruction available on the Nexus S’s ARM processor. This instruction allows us to add up to 16 8-bit integers in a single instruction, and greatly increases the efficiency of computing a squashed image.

One danger of computing squashed images is that, for periods containing a transition from a non-keyboard scene to a keyboard, the averages will be polluted by pre-keyboard pixels. However, as we will see in Section 3.5.2 that users type passwords much more slowly than ScreenPass checks the screen. If a user starts to type her password, ScreenPass will have computed a “clean” squashed image well before she finishes.

3.4.2.2 OCR Analysis

(57)

images of text into machine-encoded characters, and is widely used to digitize a number of paper-based data sources including books, documents, receipts, and checks. A typical OCR process integrates techniques from computer vision, pattern recognition, and artificial intelligence. Usually, the steps include locating and segmenting the characters from input images, preprocessing the images to remove noise, extracting patterns from the characters for classifiers, organizing the identified characters to reconstruct original words, and postprocessing to correct OCR errors by checking the context.

One of the advantages of using OCR to detect keyboards is that the analysis does not have to be completely accurate. The FrameChecker only needs to identify fragments of text that are sufficiently similar to the character sequences expected of a keyboard. Furthermore, attackers have no room to alter the character sequences on a spoofed keyboard, since a user will become instantly suspicious of a keyboard in which the keys have been moved around. A keyboard is usually divided into three areas: the characters, the keys, and the background. The keys cover the majority of this area. To improve character identification, we apply preprocessing to the screen captures to highlight the characters and reconcile the color of the key and background. The keys cover more pixels than the characters and the background, so we can determine the color of the keys by identifying the peak color for all pixels. Then we refill the background with the same color of the keys so that the characters are highlighted against a pure color background.

(58)

The preprocessing algorithm is fast and traverses all pixels twice. In the first traversal, we count the number of pixels in gray scale and identify the color for the peak. Then we change the color of the background pixels in the second traversal.

It is also worth pointing out that TesseractOCR itself can tolerate some color differences between the keys and background. We apply preprocessing only to handle some extreme cases, such as a black background and white keys, and to better facilitate character recognition. Finally, we use the training data for the English language from the TesseractOCR official site. When the FrameChecker starts, it loads the training data and initiates the OCR engine.

3.5 Evaluation

To evaluate ScreenPass, we sought answers to the following questions:

• How robust is ScreenPass to spoofing attacks?

• What is the perceived burden of tagging passwords?

• How likely are users to tag their passwords?

• How often should ScreenPass check for spoofing attacks?

• What is the performance and energy overhead of ScreenPass?

• Can taint-tracking detect when apps mishandle passwords?

(59)

The first was a three-week, in-situ user study with 18 participants; the second was a one-week, follow-up study with eight of the initial participants. To answer the next two questions, we measured the performance and energy chara