defined as the set of devices d that are encountered during time point t, i. e., Dt ={d|∃Ed ∈ Ed(Ed = (b0, b1, . . . , bn) ∧t(b0) ≤t≤ t(bn))} (2.9)
2.4 evaluation 22
e x a m p l e. For our example user, familiar devices typically would be the mobile phones of his family members or colleagues, as it is likely that these devices will be encountered more often than fminenc =
5 times and the total duration of these encounters is likely to exceed tminenc =30 min.
2.3.3 Context Features
Based on the context model we define following features shown in Tab. 2.1 to be used by the machine learning model for context clas- sification. The features are calculated by Profiler and labelled based on ground truth provided by user feedback. The feature values are forwarded to Classifier, which uses them to train a machine learning- based classifier for predicting security-relevant properties of the con- text as described in Sect.2.4.3.
Features f1 to f4model the location context of the user, measuring
both the total visit time as well as number of visits the user has paid to the CoIs of the current location context ( f1 and f2 for GPS-based
and f3and f4 for WiFi-based CoIs), thus providing a measure for the
familiarity of the place the user is currently located in. Features f5 to
f8on the other hand aim at providing a characterisation of the social
context of the user. They measure the number of Bluetooth devices observed in the context and how many out of those are familiar ac- cording to Def.12. Features f7and f8measure the average encounter time as well as the average number of encounters the the user has had with familiar devices in the current context. This aims at modelling how familiar the persons in the context are to the user.
2.4 e va l uat i o n 2.4.1 Data Collection
To evaluate the ConXsense framework on real empirical data capturing the contexts and perceptions of real smartphone users, we implemen- ted a Context Collector app for Android that was was installed on the smartphones of test users. The Context Collector app recorded the pos- ition and context data once every 60 seconds, which was a reasonable trade-off between sufficient granularity of context information and battery lifetime of the used smartphones. Our target was to achieve a battery lifetime of at least 12 hours (a full working day). The col- lected context data included the GPS position of the device, nearby Bluetooth devices and WiFi access points, acceleration sensor read- ings, as well as information about user actions and her interactions with apps on the smartphone.
The Context Collector app also provided a GUI shown in Fig. 2.2 for collecting ground truth information about the user’s perceived
2.4 evaluation 23
Table 2.1: Context features derived based on the context model
f e at u r e d e s c r i p t i o n
f1 max total visit time of any GPS-based CoI in Lt
f2 number of visits to GPS-based CoI in Lt with max
total visit time
f3 max total visit time of any WiFi-based CoI in Lt
f4 number of visits to WiFi-based CoI in Lt with max
total visit time
f5 number of Bluetooth devices in Dt
f6 number of familiar Bluetooth devices in Dt
f7 average total encounter time of familiar devices in Dt
f8 average number of total encounters with familiar de-
vices in Dt
Lt: Location context (Def.10) at time t
Dt: Device context (Def.13) at time t
risk of device misuse and privacy exposure. The users were asked to use the Context Collector GUI to regularly label the current con- text they were in as “safe”, i. e., having a low risk of device misuse, or, “unsafe”, i. e., having a high risk of device misuse. To capture the aspect of privacy exposure, users were also asked to label the current context with one of the labels “home”, “work” or “public” to capture whether the privacy exposure of the context was due to private (home) or confidential (work) information in the context, or, whether the context was not considered to expose privacy-sensitive information (public). The meanings of these labels and how to ap- ply them were given to the test participants beforehand in order to avoid misunderstandings. The users were encouraged to provide feedback regularly, especially when entering or leaving contexts. To facilitate a more convenient user interaction for providing feedback, we provided users also with Near-Field Communication (NFC)-based context labels labelled ’Private’, ’Work’ and ’Public’ that users could attach to objects in contexts they frequently visited. By touching the NFCtag with the user’sNFC-enabled smartphone the GUI of the Con- text Collector app was automatically brought to the foreground and the corresponding feedback about the privacy sensitivity level recor- ded. A prompt as shown in Fig.2.2badditionally requested the user to provide feedback about the level of risk of device misuse for the current context.
2.4 evaluation 24
(a) GUI for providing explicit user feedback using feedback buttons.
(b) Feedback response dialog after using a ’Private’NFCtag for providing user feedback. The user is asked to provide additional information about the level of risk of device misuse.
Figure 2.2: GUI of the Context Collector app used for collecting ground truth information about the risk of device misuse and privacy sensitivity of the context. ’Safe’ indicates a context with low risk of device misuse, whereas ’Unsafe’ indicates high risk. ’Private’ and ’Work’ indicate contexts with high privacy sensitivity, and ’Public’ indicates a context not considered sensitive from a pri-
2.4 evaluation 25
2.4.2 Dataset
We collected context and ground truth data from a set of 15 users com- ing from technical and non-technical backgrounds. Users provided data over a period of 68 days, on 56 days per user on average, res- ulting in a dataset containing data from 844 distinct user days. On average users provided ground truth feedback labels for contexts on 46 days, resulting in a ground truth dataset containing 3757 labelled data points. Each user provided at least 50 or more feedback labels. In a deployment setting, this would roughly correspond to 2-3 feed- back labels given per day over a period of three weeks, which seems like a manageable burden for users. After this initial training period the need for further user feedback would significantly diminish, and further user interaction could be limited to giving occasional correct- ive feedback in cases where Classifier would provide incorrect predic- tions about the security and privacy properties of particular contexts. 2.4.3 Context Classification
To implement the Classifier component for context classification we used the Weka data mining toolkit [50], and its k-Nearest-Neighbours
(kNN), Naïve Bayes and Random Forest classifier implementations. The kNN classifier is an instance-based learning algorithm that bases its prediction on comparing the testing data point to the k nearest labelled data points in the training dataset. The predicted label is the most frequently occurring class label in this set. The Naïve Bayes clas- sifier is a simple probabilistic model that tries to estimate the most likely class label given a set of input features. Naïve Bayes classi- fiers have been successfully applied, e. g., in e-mail spam detection, where the task is to distinguish spam e-mail from benign mails [119].
The Random Forest classifier [16] is an ensemble learning method
based on training a number of different decision trees by randomly sampling the training data and using voting to determine the most common prediction result of the decision trees as the output of the classifier.
For each test participant we trained two binary Classifier instances using the user’s labelled context data vectors as input. One was used for for predicting the risk of device misuse in the current context (i. e., is the context “safe” or “unsafe”) and the other one for predicting the sensitivity of the context (i. e., is the current context “sensitive” or “public”). As we assume that the mobile device by default applies the most restrictive enforcement mechanisms to protect user data and privacy by using a very short time-out for the device lock and block- ing third-party apps’ access to environmental sensors of the device, the task of the Classifier component therefore is to identify such con- texts and situations in which these restrictive enforcement measures
2.4 evaluation 26 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 TP rate FP rate Random Forest kNN Naive Bayes
Figure 2.3: AverageROCcurves showing performance of classifying contexts with a low risk of device misuse, considering those users who provided at least five ground truth feedbacks per context class.
can be relaxed without increasing the risk of device misuse or privacy compromise too much in order to increase the usability of the device. Even though it would have been preferable to evaluate the classi- fication results by directly obtaining user feedback on concrete classi- fication decisions in real-time, we had to resort to off-line evaluation, as we wanted to have the opportunity to experiment with different machine-learning algorithms for context classification, and staging the experiment separately for each of the algorithms was not possible due to resource constraints.
2.4.3.1 Device Misuse Protection
The average performance of the classifiers in identifying contexts with a low risk of device misuse is shown in Fig. 2.3. It shows the true positive rate (TPR) and false positive rate (FPR) for different confidence values required for a measurement to be classified as “safe” by the classifier. TheTPRdenotes here fraction of measurements in contexts labelled as “safe” that are correctly classified as “safe”, whereas FPR denotes the fraction of measurements in contexts labelled as “unsafe” that are falsely classified as “safe”.
All three classifiers perform reasonably well in identifying “safe” contexts. The classifiers achieve a TPR or 70 % at a moderate FPR of 10 %. This would mean that our approach could be used to reduce about 70 % of unnecessary authentication prompts in contexts with low probability of device misuse, whereas only once in ten cases the device locking mechanism would be relaxed while the device is in a context with a high probability of device misuse. This would mean that, e. g., a potential thief would have a success probability of merely 10 % in finding a device in an unlocked state, in a context with high device misuse probability. This is significantly better than the cur- rent situation in which many people choose to use no device lock at all, due to the usability penalties imposed by the continuous bur-
2.4 evaluation 27 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 TP rate FP rate Random Forest kNN Naive Bayes
Figure 2.4:ROC curves of classifier performance in identifying public con- texts in which sensory malware protection can be relaxed.
den of having to enter a PIN code or password each time the user wants to use the device. The TPR of 70 % also clearly outperforms recent proposals like proposed by Riva et al. [115] utilising progressive
authentication. They report only a reduction of 42 % in the amount of unnecessary authentication prompts presented to the user when applying their approach.
2.4.3.2 Protection Against Sensory Malware
In our scenario we assume that access to the ambient context sensors of the mobile device can be granted to third-party applications in con- texts that do not contain sensitive information about the user. In our test setting, such contexts were labelled as “public”, whereas in con- texts labelled as “sensitive” the access should be denied due to the potential danger of sensory malware leaking sensitive information contained in the context to unauthorised external parties. The task of the classifiers in this scenario is therefore to distinguish between these two types of contexts. We therefore measure the TPR as the fraction of measurements labelled as belonging to the “public” class that are also classified as such. Accordingly, FPR in this setting de- notes the fraction of measurements performed in “sensitive” contexts (like home or work) that are incorrectly classified as “public”. The performance of the classifiers in dependence of the confidence level required to label a context as “private” is shown in Fig.2.4.
In this use case, there is a clear difference between the perform- ance of the classifiers. The Random Forest and k-Nearest-Neighbours achieve a TPR of approximately 70 % with a very low simultaneous FPRof 2 % to 3.5 %. This means that access to context sensors would be (erroneously) relaxed in less than 3.5 % of the cases when the device is situated in a sensitive context. This would consequently severely limit the ability of a sensory malware to exfiltrate potentially
2.5 enforcement 28
sensitive information about the user from contexts with high privacy exposure.
The TPR of 70 % implies that in 70 % of the cases when the mo- bile device is located in a public place with low privacy exposure the system would automatically relax the privacy protections and permit access for third-party apps to sensory information, thereby improving the user experience and utility of such applications. The remaining 30 % of cases in which the privacy protections are not auto- matically relaxed by the system, this can be handled by offering the user a manual override functionality for temporarily granting access to apps for sensory context information. A significant number of use cases in which contextual information is used by third-party mobile apps involves the active involvement of the user, e. g., when a navig- ation app is used to find directions to a particular location. Override mechanisms for granting access to sensory data are therefore relat- ively straightforward to integrate in the application usage flow, as the attention of the user is already focused on the interaction with the particular application. In addition, such override events can effect- ively be used as additional ground truth feedback when re-training the classification model for improving classification accuracy.
2.5 e n f o r c e m e n t
Enforcement of access control decisions made by the ConXsense frame- work is realised by the FlaskDroid architecture [18], as described in
detail in [90]. FlaskDroid is a fine-grained mandatory access con-
trol framework for the Android mobile operating system. FlaskDroid works by instrumenting components that provide access to sensitive resources like the SensorService, which controls access to on-board sensors, as User Space Object Managers (USOMs). USOMsgrant or deny requests to the resources they protect based on pre-defined access control rules. However, FlaskDroid supports also conditional access control rules with the help of ContextProviders that evaluate the current context at runtime and enable or disable rules according to context- specific conditions.
ConXsenseis integrated with FlaskDroid with a ConXsense-specific ContextProviderthat uses the classification result and associated con- fidence values provided by the Classifier component of ConXsense (cf. Fig. 2.1) to activate or deactivate specific access control rules. Such rules can be used, e. g., to either grant or deny third-party applica- tions access to environmental sensors in order to protect against the threat posed by sensory malware.
The activation and deactivation of access control rules can be fine- tuned by specifying rule-specific thresholds for the confidence of con- text classification provided by Classifier. The thresholds can be se- lected, e. g., by specifying an upper acceptable bound for the false
2.6 related work 29
positive rate of the system (cf. Figs.2.3and2.4) and using the corres- ponding classification confidence value achieving this FPR perform- ance as the confidence threshold. It is to be noted that thresholds can be sensor- or resource-specific. This means that for sensors provid- ing particularly sensitive information like, e. g., the GPS sensor that provides information about the detailed position of the user, a higher threshold on the classification confidence could be applied in order to protect the privacy of the user, whereas for sensors like a magneto- meter that provides less sensitive information about the user, a lower confidence level would be sufficient.
2.6 r e l at e d w o r k
2.6.1 Use of Contextual Data for User Profiling
Context sensing has been use in numerous tasks for profiling users and their relevant life circumstances. For example, Madan et al. [78]
investigated the use of context measurements sensed with mobile devices for predicting the health status of individuals carrying mo- bile devices. Factors incorporated in their model included nearby Bluetooth devices, WiFi access points as well as traces about users’ communication behaviour. Eagle and Pentland [33] pioneered a study
using Bluetooth sensing as a means for establish social proximity for the purpose of allowing users to discover or be introduced to other users sharing common interests or similar properties in their user profiles. In a similar way, ConXsense utilises the use of Bluetooth for proximity sensing of the social situation, with the target of dis- tinguishing situations in which only well-know persons are present from presence in public places in which often a number of unknown users are present.
2.6.2 Context-Aware Access Control
Context awareness has been included in a number of access control models. Covington et al. [26] introduced CASA (Context-Aware Se-
curity Architecture) which models security-relevant context or state of the ambient environment with the help of environment roles, which is a component of the Generalized Role-Based Access Control (GRBAC) model [98], an extension to the traditional Role-Based Access Con-
trol (RBAC) models [120]. Another extension of RBAC by Damiani et
al. [27], entitled GEO-RBAC, uses spatial roles to model location as a
factor for making access control decision. In contrast to ConXsense, however, all of these approaches require the user to specify the de- tails of used policies including contextual parameters defining the environmental roles or, locations comprising the spatial roles used for enforcement.
2.6 related work 30
Hull et al. [56] address the problem of policy specification in their
Houdini framework, in which they propose to use policy templates to ease the burden of policy definition for regular non-expert users. However, even in their system the final decision about concrete policy settings remains with the user. In a similar effort to improve the accur- acy of policy sets defined by end-users, Sadeh et al. [118] developed
an approach in which the process of specifying user policies is facilit-