Obtaining CopperDroid Behaviours - Analysis and Classification of Android Malware

In this section we provide an overview of our approach for the multi-class classification of Android malware. We first summarize behaviour reconstruction aspects of the CopperDroid platform and present our approach for classifying malware into classes using support vector machines. We then discuss a strategy for improving SVM-based decisions using conformal prediction. Figure 4.3, discussed in detail below, gives an overview of our classifier (later named DroidScribe [54]) in relation to CopperDroid. While generating and processing behaviours were solely the author’s work, the stan- dard SVM was applied by a collaborator and the novel hybrid component was a joint effort. While the collaborator calculated p-values from SVM results, the author applied selective CP, computed the new results, and analysed result improvements.

4.4.1 System Architecture

The first stage of our methodology, as seen in Figure 4.3, is data acquisition. By submit- ting the samples in the Malware Genome Project dataset to CopperDroid, we were also able to evaluate our machine learning methods on over 1,200 malware from 49 malware families in 2015. Our results of a larger, more current, dataset was accepted into MoST 2016 [54]. Reiterating segments from previous chapters, CopperDroid reconstructs traditional OS (e.g., process creation, file creation) and Android-specific (e.g., SMS send, IMEI access, Intent communications) behaviours from detailed system call traces. This includes the complex, in-depth, reconstruction of IPC binder transactions, which are normally achieved via ioctl system calls. CopperDroid also identifies sequences of related system calls to derive single, high-level, behaviours such as network access.

ĂƚĂƐĞƚ ŽƉƉĞƌƌŽŝĚ ĞŚĂǀŝŽƵƌ WƌŽĨŝůĞƐ K^Ͳ^ƉĞĐŝĨŝĐ ŶĚƌŽŝĚͲ^ƉĞĐŝĨŝĐ WƌŽĨŝůĞWƌĞƉƌŽĐĞƐƐŝŶŐ ĞĨŝŶĞ&ĞĂƚƵƌĞ^ĞƚƐ &ĞĂƚƵƌĞǆƚƌĂĐƚŝŽŶ EŽƌŵĂůŝǌĞ&ĞĂƚƵƌĞsĞĐƚŽƌ ŽŵďŝŶĞ&ĞĂƚƵƌĞsĞĐƚŽƌƐ ^ĂŵƉůĞϭ;Ϭ͘Ϭ͕Ϭ͘ϯ͕Ϭ͘ϭ͕͙Ϭ͘ϯͿ ^ĂŵƉůĞϮ;Ϭ͘ϭ͕Ϭ͘ϭ͕Ϭ͘Ϭ͕͙Ϭ͘ϭͿ ^ĂŵƉůĞϯ ;Ϭ͘ϰ͕Ϭ͘Ϭ͕Ϭ͘ϱ͕͙Ϭ͘ϮͿ ͙ ^ĂŵƉůĞŶ;Ϭ͘ϴ͕Ϭ͘ϰ͕Ϭ͘ϭ͕͙Ϭ͘ϰͿ ^sD ^sDнW ŽŵƉĂƌĞZĞƐƵůƚƐ dƌĂŝŶŝŶŐ dĞƐƚ ǇŶĂŵŝĐŶĂůǇƐŝƐ

CopperDroid provided the author with a lot of flexibility in choosing features for classification at multiple levels of abstraction. As we posses both system call traces and behaviour profiles for every sample, we were able to experiment with feature sets at multiple levels of granularity (Section 4.6): from bare-bones system calls to high-level actions such as sendText. This is has been key in demonstrating the advantages in using the author’s behaviour reconstruction over traditional system call traces.

4.4.2 Modes and Thresholds

As discussed in Section 3.6.1, malware do not always exhibit behaviours when running in the CopperDroid emulator. This can be due to incompatibility (i.e., wrong API level), wrong stimuli, or malware evasion (see Section 2.3). In these cases, CopperDroid oc- casionally outputs behavioural profiles containing little or no behaviours. Therefore, in some situations, we give the option to filter out these samples at the start of our analyses. Thus, our framework has the optional threshold for number of behaviours per sample. In Section 4.6 we experimented with this threshold and elaborate on the trade-offs.

Our second level of optional filtering is the number of samples per family. As this is classification, as opposed to clustering (see Section 2.2.6), all samples are already correctly labelled. Hence, we know before analysis how many samples per malware family, or class label, exist. With the option of filtering out families with very few samples, we can then improve accuracy. Histograms showing the number of samples per malware family class can be seen in Figure 4.4, where subfigure (a) is of all our samples, while (b) has a twenty samples per family cutoff used in several of our experiments.

0 50 100 150 200 250 300 0 10 20 30 40

(a) Histogram of malware family classes and the number of samples it has (0-300 samples).

0 50 100 150 200 250 300 0 2 4 6 8 10

(b) Histogram of malware family classes with 20 or more samples (20-300 samples).

Although discarding samples with activity levels below threshold reduces the training set (49 to 14), our hybrid solution can still provide accurate classification. As discussed previously, keeping these sparse profiles (i.e., outliers) would cause issues with traditional means of classification, such as SVM, as they are often forced to make a choice in all cases. However, by using conformal prediction with SVM to predict the class from a set of top matching classes, we can still visibly improve traditional SVM accuracy without the enforcement of family sample and/or sample behaviour thresholds. Once we have selected a training set (the same as the dataset1 _{if no filtering) we} can begin analysis. As mentioned previously, as we have both behavioural profiles and matching system call trace, we can perform classification with different sets of features sets. This will be discussed further in Section 4.5, but the available modes are essen- tially system call level, binary or frequency of call, and behaviour reconstruction with or without arguments, and with or without system call frequency included.

4.4.3 Parsing CopperDroid JSONs and Meta data

Per sample analysed, CopperDroid outputs a JSON (JavaScript Object Notation) file storing behaviours, a directory of recreated resources, and a file of system call data. To reduce the size of the system call trace, often 200 MB, the third file merely holds the frequency of system call names from the trace. While this excludes parameter values, it is sufficient for our two system call modes of analysis (i.e., binary or frequency).

From each JSON file the framework can read all reconstructed high-level behaviours. Furthermore, each high-level behaviour retains their corresponding low-level events, i.e. system call, parameters, and return values. The categories and subcategories of behaviour we can extract from these JSON files, are described below. Once all the behaviours of all samples has been translated into feature vectors, some additional data is generated at this point. First, network traffic size is added up across all network behaviours per sample. Directories holding reconstructed files are searched and analysed, for example to understand the file type, etc., instead of trusting the file extension.

4.4.4 Behaviour Extraction

In our analyses behaviour extraction occurs during rec * modes, which are four of our six analysis modes. Moreover, only two of those four rec * modes analyse the reconstructed parameter and return values of system calls belonging to a behaviour.

Feature Set Contained Details

S1 Network Access IP address, port, network traffic size

S2 File Access file type, file name, regular expression

S3 Binder Methods method name and parameters

S4 Execute File file type, user permissions, arguments

Table 4.1: Extracted CopperDroid behaviour classes and details for subcategorizing.

This includes arguments of methods invoked remotely via IPC Binder. These high- level behaviours, which we divide into behavioural feature sets, are network accesses, file accesses, binder methods, and file execution (see Figure 3.10 and Table 4.1). It is important to note that the features and sub-features were defined prior classification, mitigating overfitting. Details of our behaviour sets can be found below, several of which represent a series of system calls. That is, CopperDroid uses value-based data dependencies to group system calls based on file descriptors (see Section 3.5.1).

Although multiple system calls are condensed into single behaviours, all parameter values and return values are retained. Using this data, we were able to break popular behaviours (e.g., 50% of all sample behaviours were filesystem accesses) into subcategories for a more fine-grained behaviour feature set (e.g., type of file created). For example, by examining the parameter values of execution system calls, we can separate silent installations of APKs from other file execution behaviours like shell scripts.

While there are many additional ways to split behaviours into finer categories, we have found via experiments (i.e., incremental accuracy increase per category) that this feature set best captures the different behavioural patterns of malware families. These detailed behaviour-based feature sets, S1 to S4, were constructed from CopperDroid JSON files. More in-depth feature statistics of the dataset can be found in Section 4.6.

S1 Network Access: Roughly 66% of our malware samples regularly made network connections to external entities. Each network access behaviour represents a se- quence of system calls, normally beginning with connect, followed by sendto’s. By analysing their parameters, we were able to add granularity to our feature set (see Section 4.6) by creating subcategories based on IP address and traffic size.

S2 File Access: The second most popular behaviour in our dataset (see Table 4.2, page 107) is filesystem access. This behaviour is reconstructed from system calls using def-use-chains of file names and file descriptors. As mentioned previously (see Sec- tion 3.5.1 for details), CopperDroid uses these chains of system calls to fully recre- ate any actual file creation so that it may be analysed, or even executed, depending on the file type. The author implemented a file extension analysis and three filename character class-mapping (i.e., all characters, all numbers, and mixed) along the lines of other works which have modelled system call arguments in the past [152].

S3 Binder Methods: CopperDroid effectively reconstructions binder communications from the ioctl system calls. Since binder communications are the principal means of inter-process/inter-component communication, they are the gateway to services from the Android system. Consequently, monitoring binder communications and identifying the invoked method is crucial to modelling the behaviour of a malware. When modelling all binder communications we found that getDeviceID and

getSubscriberID were the most frequent methods to be invoked by our mal-

ware dataset. For many of these “get” methods, we are less interested in analysing the parameters as they should return predictable data values, but for methods such as SMS sendText the parameters (e.g., destination) tend to be more interesting.

S4 Execute: There are various files that may be executed within the Android system to run exploits, install apps silently, etc. CopperDroid reconstructs all such behaviours and we model them within our feature vector. In order to differentiate between different file executions, we broke down these behaviours by analysing their parameters. For example, if the parameters include a pm followed eventually by an install and a file name, this is an indication of an app being installed silently without the users’ permission. Furthermore, as there are multiple ways to execute the same file (i.e., the same app installation can be done with different arguments), being able to group them all as the same behaviours with same outcome is advanta- geous and makes our method less susceptible to misdirection (see Figure 3.11).

While we use these behaviour sets to classify malware, these can be easily applied to detect malware (i.e., binary classification). Furthermore, there are several additional behaviour features that we may use when implementing a two-class classification as opposed to multi-class classification. Such behaviours would be equally popular amongst all malware families, but only exhibited by malware. For instance, a user-level applica- tion directly altering network configuration files is against Android discretionary access protocols and would be a strong indicator of malware, but not necessarily what family due to similar malware behaviours. This concept is explored further in Chapter 5.

In document Analysis and Classification of Android Malware (Page 98-102)