• No results found

Finding the optimal state sequence

3.1 XDroid: An Android permission control using Hidden Markov Models

3.1.2 Background

3.1.2.3 Finding the optimal state sequence

One of the most common queries of a HMMs is to ask what was the most likely series of states given an observed series of outputs. In the other words, we should choose the best state sequence that maximizes the likelihood of the state sequence for the given observation sequence. The solution to this problem is using Viterbi algorithm [78]. The Viterbi algorithm is similar to the forward procedure except that it only tracks the maximum

probability instead of the total probability. Letδt

i be the maximal probability of state sequences of the lengtht that end in state i and produce thet first observations for the given model. That is,

δit= max{P (Q1,· · · , Qt−1; O1,· · · , Ot|Qt= qi)} (3.10) The two differences between Viterbi algorithm and the Forward algorithm are: (1) it uses maximization in place of summation at the recursion and termination steps, (2) it keeps track of the arguments that maximizeδt

i for eacht and i, storing them in the N by T matrix ψ. This matrix is used to retrieve the optimal state sequence at the backtracking step.

We initial the model as:

δ1i = πibi(O1) (3.11)

ψ1i = 0, i = 1,· · · , N (3.12)

The recursion steps are:

δjt = maxi[δt−1i aij]bj(Ot) (3.13)

ψjt = argmaxi[δit−1aij] (3.14)

Finally the most probable sequence’s probabilityp(T ) and the most probable last state q(T ) givenO are:

p(T ) = maxi[δiT] (3.15)

q(T ) = argmaxi[δiT] (3.16)

We can have the path (state sequence) through backtracking:

3.1.3 System Design

The ultimate goal of XDroid is to monitor the behaviour of apps and generate alerts to users when suspicious app behaviours are detected. Figure 3.1.1 shows the architecture design of XDroid. The system contains components on the server side and the mobile device side. Each XDroid device contains an Interaction Portal and an Activity Logger. The interaction portal provides an interface for users to interact with the device. The activity logger is used to monitor the activities of the apps. The server side components include Risk Assessment, User Profiling, and Alert Customization. In the rest of this section, we describe the key features of the server.

• Risk assessment

• User profiling

• Alert customization

Apps Risk-levels

Risk alerts

<AppID, ResID, Risk>

App log

User’s response <AppID, ResID, User ID>

Ap p in st a lla tio n Android App Markets Interaction portal Activity logger XDroid Server Mobile Client

Fig. 3.1.1.: XDroid system overview

risk-threshold passed?

Does user block? App request

Serve the request

Block & Tune the

model

Serve, Record and Add

the request to Trusted list

App is notified No

Yes No

Yes

App XDroid User OS

Server

Agents:

3.1.3.1 Interaction Portal

The interaction portal is to facilitate the interaction between users and devices. In- stead of sending requests to the Android system’s legacy permission handler (e.g. Package Manager Service), the XDroid handles the permission requests through the process illus- trated in Figure 3.1.2. For example, when a user installs a popular messaging application and choose to monitor its behaviour using XDroid, the requested resources are displayed along with their estimated risk levels (Figure 3.1.3(a)). The numbers in the screenshots are generated as example and not actual risks. The user can check resources he/she want to monitor. If a resource is monitored and its suspicious activities are detected, the user is informed through a dialog box (Figure 3.1.3(b)). The user can decide whether to block the resource access or allow it based on the estimated risk suggested by XDroid.

For each installed app, the user can use the pre-installed XDroid application to view a list of apps which are under monitoring. If the user clicks on an app in the list, a set of requested resources is displayed (Figure 3.1.3(c)) where checked resources are monitored. By default all sensitive resources are monitored, and can be changed by the user.

3.1.3.2 Risk assessment

The purpose of the risk assessment is to provide a quantitative estimation on how likely a resource access from an app causes damage to users. For example, a SMS access from a puzzle game app may be malicious and XDroid can pop up a reminder for users to block it.

To assess the risk level of resource accesses, the activities of the apps are monitored by activity loggers and the data is sent to the server for analysis. Our risk assessment mech- anism uses a Hidden Markov Model (HMM) to analyze the behaviour sequences (Section 3) and provides users with a risk level of involved resource accesses.

Telegram

Resource Access Risk Levels

Your personal information Camera

Your messages Your location

Check all to monitor Network communication Storage

Phone state and identity High Low Low Low Med Med OK High (a)

Do you allow it? - Phone state and identity Telegram usage of:

Never ask me again for this resource.

Risk Alert

Block Allow

Estimated risk level: High

(b) XDroid: Application Applications 9:32 XDroid: Application Applications 9:32

Permission Risks: Telegram

Your personal information Camera

Your messages Your location

Network communication Storage

Phone state and identity High Low Low Low Med Med OK High (c)

Fig. 3.1.3.: User Interfaces: (a) illustrates the risk computed risk levels for app’s requested resources; (b) shows a popup notifying user the risk level of resource at runtime; (c) managing the permission policies after installation.

3.1.3.3 User Profiling

We are aware that users may have different tolerance level on various resources. For example, user A, is very concerned with leaking his/her location to a third party, while user B does not care about it. To provide customized risk estimation, we build user profiles. We assign each new user with an initial tolerance model and update the user profile after receiving the users’ permission control decisions. For example, if a user agrees to the GPS access every time it is requested, the tolerance level of the user with respect to GPS access is high.

3.1.3.4 Alert Customization

The purpose of XDroid user profiling is to provide customized risk assessment to users. Upon installing a new app, XDroid provides risk level estimation by integrating the apps’ behaviours and past responses from all users who responded to the same app requests. This way, we help users make a decision on wether to monitor the permission requests or not. If users choose to monitor an app’s permission, after capturing sufficient behaviour log, XDroid computes the risk level of that permission by considering the new logs and alerts the user. Users’ responses (block or allow) to the permission request alerts are integrated to provide customized alerts.

3.1.4 Model

In this work, we use hidden Markov model (HMM) for Android malicious apps risk assessment. We transform the app resource risk computation problem into a HMM prob- lem with two states: malicious and normal. We map the app’s behaviour into the HMM observations. To train the HMM model, we capture the behaviours from both malicious and normal apps and use them to generate an initial trained HMM for risk estimation. In this section we first present our HMM model and then explain how we use it for customized permission risk level estimation.

3.1.4.1 Hidden Markov Model

An HMM is a statistical Markov model widely used in science, engineering and many other areas (speech recognition, optical character recognition, machine translation, bioin- formatics, computer vision, finance and economics, and in social science) [42, 35]. Markov models provide powerful abstraction for data expressed as time series, but are unable to support reasoning about a series of states given some observations related to those states.

A HHM can be used to address such shortcoming. In this section we model the Android behaviour sequence into a HMM, and compute the risk levels of given resource usage.

The Hidden Markov Model is a variant of a finite state machine which can be rep- resented by a set of hidden states Q = {q1, q2,· · · , q|Q|}, a set of observations O = {o1, o2,· · · , o|O|}, a set of transition probabilities A = {αij = P (qjt+1|qt

i)}, and a se-

ries of output (emission) probabilities B = {bik = P (ok|qi)}. Among those notations,

P (a|b) denotes the conditional probability of event a given event b; t = 1, · · · , T is the time;qt

i denotes the event that the state isqi at timet. In other words, αij is the probabil- ity that the next state isqj given that the current state isqi; bik is the probability that the output is ok given that the current state isqi. The initial state probabilities are denoted by Π ={πi = P (q1

i)|∀1 ≤ i ≤ |Q|}, which is the initial probability of all states at time 1 (ini- tial time). In the HMM the current state is not observable. However, each state produces

an output with a certain probability (denoted by B). An HMM can also be represented

using a compact triple (λ = (A, B, Π)) [59]. Table 3.1.2 summarizes the notations and

preliminaries.

Table 3.1.2.: Notations Notation Description

Q {qi}, i = 1 · · · , N : Set of n hidden states . A A = {αij= P (qtj)} Transition probabilities O O = {ok}, k = 1, · · · , M : Observations (symbols) B B = {bik= P (ok|qi)}: Emission probabilities

Π Π = {πi= P (qi1)|∀1 ≤ i ≤ |Q|} Initial state probabilities. Ot Ot∈ O: Observation at time t.

Qt Qt∈ Q: State at time t.

Figure 3.1.4 shows the HMM for Android app behaviour modeling. The HMM con- sists of three states: Start, Normal (0) and Malicious (1). The set of observations are defined using apps’ behaviours during runtime (see Section 3.1.4.7). The HMM parameters are: initial state probabilitiesΠ = [π1, π2], state transition probabilitiesA = [α00, α01, α10, α11], malicious state emission probabilitiesBM = [b11, b12,· · · , b1N] and normal state emission

probabilitiesBN = [b01, b02,· · · , b0N]. O1 O2 O3 ON

10

00

11

01

1

2 Start 1 (M) 0 (N) b1N b11 b12 b13 b01 b02 b03 b0N · · ·

Fig. 3.1.4.: An overview of the proposed HMM model

How to determine the unknown parameters and how to seek optimal state sequence of HMM given a sequence of observations are two major challenges. In the next sections we describe these two problems and their solutions.

Related documents