3.1 XDroid: An Android permission control using Hidden Markov Models
3.1.2 Background
3.1.2.3 Finding the optimal state sequence
One of the most common queries of a HMMs is to ask what was the most likely series of states given an observed series of outputs. In the other words, we should choose the best state sequence that maximizes the likelihood of the state sequence for the given observation sequence. The solution to this problem is using Viterbi algorithm [78]. The Viterbi algorithm is similar to the forward procedure except that it only tracks the maximum
probability instead of the total probability. Letδt
i be the maximal probability of state sequences of the lengtht that end in state i and produce thet first observations for the given model. That is,
δit= max{P (Q1,· · · , Qt−1; O1,· · · , Ot|Qt= qi)} (3.10) The two differences between Viterbi algorithm and the Forward algorithm are: (1) it uses maximization in place of summation at the recursion and termination steps, (2) it keeps track of the arguments that maximizeδt
i for eacht and i, storing them in the N by T matrix ψ. This matrix is used to retrieve the optimal state sequence at the backtracking step.
We initial the model as:
δ1i = πibi(O1) (3.11)
ψ1i = 0, i = 1,· · · , N (3.12)
The recursion steps are:
δjt = maxi[δt−1i aij]bj(Ot) (3.13)
ψjt = argmaxi[δit−1aij] (3.14)
Finally the most probable sequence’s probabilityp(T ) and the most probable last state q(T ) givenO are:
p(T ) = maxi[δiT] (3.15)
q(T ) = argmaxi[δiT] (3.16)
We can have the path (state sequence) through backtracking:
3.1.3 System Design
The ultimate goal of XDroid is to monitor the behaviour of apps and generate alerts to users when suspicious app behaviours are detected. Figure 3.1.1 shows the architecture design of XDroid. The system contains components on the server side and the mobile device side. Each XDroid device contains an Interaction Portal and an Activity Logger. The interaction portal provides an interface for users to interact with the device. The activity logger is used to monitor the activities of the apps. The server side components include Risk Assessment, User Profiling, and Alert Customization. In the rest of this section, we describe the key features of the server.
• Risk assessment
• User profiling
• Alert customization
Apps Risk-levels
Risk alerts
<AppID, ResID, Risk>
App log
User’s response <AppID, ResID, User ID>
Ap p in st a lla tio n Android App Markets Interaction portal Activity logger XDroid Server Mobile Client
Fig. 3.1.1.: XDroid system overview
risk-threshold passed?
Does user block? App request
Serve the request
Block & Tune the
model
Serve, Record and Add
the request to Trusted list
App is notified No
Yes No
Yes
App XDroid User OS
Server
Agents:
3.1.3.1 Interaction Portal
The interaction portal is to facilitate the interaction between users and devices. In- stead of sending requests to the Android system’s legacy permission handler (e.g. Package Manager Service), the XDroid handles the permission requests through the process illus- trated in Figure 3.1.2. For example, when a user installs a popular messaging application and choose to monitor its behaviour using XDroid, the requested resources are displayed along with their estimated risk levels (Figure 3.1.3(a)). The numbers in the screenshots are generated as example and not actual risks. The user can check resources he/she want to monitor. If a resource is monitored and its suspicious activities are detected, the user is informed through a dialog box (Figure 3.1.3(b)). The user can decide whether to block the resource access or allow it based on the estimated risk suggested by XDroid.
For each installed app, the user can use the pre-installed XDroid application to view a list of apps which are under monitoring. If the user clicks on an app in the list, a set of requested resources is displayed (Figure 3.1.3(c)) where checked resources are monitored. By default all sensitive resources are monitored, and can be changed by the user.
3.1.3.2 Risk assessment
The purpose of the risk assessment is to provide a quantitative estimation on how likely a resource access from an app causes damage to users. For example, a SMS access from a puzzle game app may be malicious and XDroid can pop up a reminder for users to block it.
To assess the risk level of resource accesses, the activities of the apps are monitored by activity loggers and the data is sent to the server for analysis. Our risk assessment mech- anism uses a Hidden Markov Model (HMM) to analyze the behaviour sequences (Section 3) and provides users with a risk level of involved resource accesses.
Telegram
Resource Access Risk Levels
Your personal information Camera
Your messages Your location
Check all to monitor Network communication Storage
Phone state and identity High Low Low Low Med Med OK High (a)
Do you allow it? - Phone state and identity Telegram usage of:
Never ask me again for this resource.
Risk Alert
Block Allow
Estimated risk level: High
(b) XDroid: Application Applications 9:32 XDroid: Application Applications 9:32
Permission Risks: Telegram
Your personal information Camera
Your messages Your location
Network communication Storage
Phone state and identity High Low Low Low Med Med OK High (c)
Fig. 3.1.3.: User Interfaces: (a) illustrates the risk computed risk levels for app’s requested resources; (b) shows a popup notifying user the risk level of resource at runtime; (c) managing the permission policies after installation.
3.1.3.3 User Profiling
We are aware that users may have different tolerance level on various resources. For example, user A, is very concerned with leaking his/her location to a third party, while user B does not care about it. To provide customized risk estimation, we build user profiles. We assign each new user with an initial tolerance model and update the user profile after receiving the users’ permission control decisions. For example, if a user agrees to the GPS access every time it is requested, the tolerance level of the user with respect to GPS access is high.
3.1.3.4 Alert Customization
The purpose of XDroid user profiling is to provide customized risk assessment to users. Upon installing a new app, XDroid provides risk level estimation by integrating the apps’ behaviours and past responses from all users who responded to the same app requests. This way, we help users make a decision on wether to monitor the permission requests or not. If users choose to monitor an app’s permission, after capturing sufficient behaviour log, XDroid computes the risk level of that permission by considering the new logs and alerts the user. Users’ responses (block or allow) to the permission request alerts are integrated to provide customized alerts.
3.1.4 Model
In this work, we use hidden Markov model (HMM) for Android malicious apps risk assessment. We transform the app resource risk computation problem into a HMM prob- lem with two states: malicious and normal. We map the app’s behaviour into the HMM observations. To train the HMM model, we capture the behaviours from both malicious and normal apps and use them to generate an initial trained HMM for risk estimation. In this section we first present our HMM model and then explain how we use it for customized permission risk level estimation.
3.1.4.1 Hidden Markov Model
An HMM is a statistical Markov model widely used in science, engineering and many other areas (speech recognition, optical character recognition, machine translation, bioin- formatics, computer vision, finance and economics, and in social science) [42, 35]. Markov models provide powerful abstraction for data expressed as time series, but are unable to support reasoning about a series of states given some observations related to those states.
A HHM can be used to address such shortcoming. In this section we model the Android behaviour sequence into a HMM, and compute the risk levels of given resource usage.
The Hidden Markov Model is a variant of a finite state machine which can be rep- resented by a set of hidden states Q = {q1, q2,· · · , q|Q|}, a set of observations O = {o1, o2,· · · , o|O|}, a set of transition probabilities A = {αij = P (qjt+1|qt
i)}, and a se-
ries of output (emission) probabilities B = {bik = P (ok|qi)}. Among those notations,
P (a|b) denotes the conditional probability of event a given event b; t = 1, · · · , T is the time;qt
i denotes the event that the state isqi at timet. In other words, αij is the probabil- ity that the next state isqj given that the current state isqi; bik is the probability that the output is ok given that the current state isqi. The initial state probabilities are denoted by Π ={πi = P (q1
i)|∀1 ≤ i ≤ |Q|}, which is the initial probability of all states at time 1 (ini- tial time). In the HMM the current state is not observable. However, each state produces
an output with a certain probability (denoted by B). An HMM can also be represented
using a compact triple (λ = (A, B, Π)) [59]. Table 3.1.2 summarizes the notations and
preliminaries.
Table 3.1.2.: Notations Notation Description
Q {qi}, i = 1 · · · , N : Set of n hidden states . A A = {αij= P (qtj)} Transition probabilities O O = {ok}, k = 1, · · · , M : Observations (symbols) B B = {bik= P (ok|qi)}: Emission probabilities
Π Π = {πi= P (qi1)|∀1 ≤ i ≤ |Q|} Initial state probabilities. Ot Ot∈ O: Observation at time t.
Qt Qt∈ Q: State at time t.
Figure 3.1.4 shows the HMM for Android app behaviour modeling. The HMM con- sists of three states: Start, Normal (0) and Malicious (1). The set of observations are defined using apps’ behaviours during runtime (see Section 3.1.4.7). The HMM parameters are: initial state probabilitiesΠ = [π1, π2], state transition probabilitiesA = [α00, α01, α10, α11], malicious state emission probabilitiesBM = [b11, b12,· · · , b1N] and normal state emission
probabilitiesBN = [b01, b02,· · · , b0N]. O1 O2 O3 ON
↵
10↵
00↵
11↵
01⇡
1⇡
2 Start 1 (M) 0 (N) b1N b11 b12 b13 b01 b02 b03 b0N · · ·Fig. 3.1.4.: An overview of the proposed HMM model
How to determine the unknown parameters and how to seek optimal state sequence of HMM given a sequence of observations are two major challenges. In the next sections we describe these two problems and their solutions.