• No results found

Thesis Goals and Contributions

In this chapter we studied multiple Android malware analysis and detection frameworks and illustrated trends in the state-of-the-art systems. We also analysed the mobile mal- ware evolution as it adapts to obstruct analysis and avoid detection. By analysing both threats and solutions, we have identified several areas that require further research and development. The author’s contributions in the following chapters aims to meet the following goals, which were shaped by the discoveries in this chapter.

Through our analysis, and by laying out all these Android studies in the extensive tables in Appendix A, we saw the need to develop more effective methods for low-level dynamic analysis to counter the high number of studies unable to analyse native code, dynamically loaded code, etc. From there, we also saw further opportunities to develop novel malware classifiers using our unique behavioural profiles.

In general, the research goals the author set out to fulfil based on the evidence pro- cured from surveying the current body of work are as follows. Each goal refers to a research gap identified in the previous sections, and will be used to evaluate the success- fulness and novelty of the author’s contributions in the following chapters.

Goal 1 Analyse network traffic, native code, encryption, etc., for code coverage as dis- cussed in Sections 2.1.4 and 2.5.2. This goal requires some level of dynamic analysis.

Goal 2 Gaining rich and thorough behaviour profiles without modifying the Android VM, OS, or applications, as discussed in Sections 2.1.4, 2.2.2.2, and 2.4.2. This adds robustness against changes in the Android OS and could be adapted to other platforms.

Goal 3 Scalable computations, e.g. analysis and classification, when dealing with large malware datasets, as discussed in Section 2.2.6.

Goal 4 Overcome as many malware anti-analysis techniques, e.g. obfuscation tech- niques as described in Section 2.3, as possible to enable the accurate analysis of so- phisticated malware, such as those described in Section 2.1.1.

Automatic Reconstruction of Android

Behaviours

Contents

3.1 Introduction . . . 58 3.2 Relevant Background Information . . . 60 3.2.1 Android Applications . . . 60 3.2.2 Inter-Process Communications and Remote Procedure Calls . 61 3.2.3 Android Interface Definition Language . . . 63 3.2.4 Native Interface . . . 63 3.3 Overview of CopperDroid . . . 64 3.3.1 Independent of Runtime Changes . . . 64 3.3.2 Tracking System Call Invocations . . . 65 3.4 Automatic IPC Unmarshalling . . . 66 3.4.1 AIDL Parser . . . 66 3.4.2 Concept for Reconstructing IPC Behaviours . . . 67 3.4.3 Unmarshalling Oracle . . . 68 3.4.4 Recursive Object Exploration . . . 73 3.4.5 An Example of Reconstructing IPC SMS . . . 74 3.5 Observed Behaviours . . . 76 3.5.1 Value-Based Data Flow Analysis . . . 78 3.5.2 App Stimulation . . . 79 3.6 Evaluation . . . 81 3.6.1 Effectiveness . . . 81 3.6.2 Performance . . . 84 3.7 Limitations and Threat to Validity . . . 86 3.8 Related Work . . . 87 3.9 Summary . . . 90

3.1

Introduction

As illustrated in previous chapters, the popularity of Android has unavoidably attracted cybercriminals and increased malware in app markets at an alarming rate. To better understand this slew of threats, the author augmented a base CopperDroid framework, an automatic VMI-based dynamic analysis system, to reconstruct Android malware be- haviours. The novelty of the author’s work lies in its agnostic approach to the recon- struction of interesting behaviours at different levels, by observing and dissecting sys- tem calls. The on-going CopperDroid project is therefore resistant to the multitude of alterations, or replacements (i.e., ART), the Android runtime is subjected to over its life-cycle. Moreover, CopperDroid can adapt to changes in the system call table.

The improved CopperDroid automatically and accurately reconstructs events of in- terest that describe, not only well-known process-OS interactions (e.g., file and process creation), but also complex intra- and inter-process communications (e.g., send SMS), whose semantics are typically contextualized through complex Android objects. Be- cause of this, CopperDroid can capture actions initiated both from Java and native code execution, unlike many related works both static and dynamic. Thus the improved Cop- perDroid’s analysis generates detailed behavioural profiles that abstract a large stream of low-level — often uninteresting — events into concise, high-level semantics, which are well-suited to provide insightful behavioural traits and opens possibilities for further research directions. In the following chapter, Chapter 4, we test the usefulness of these profiles by utilizing them to scalably classify malware into known families.

Unfortunately, the nature of Android makes it difficult to rely on standard, tradi- tional, dynamic system call malware analysis systems as is. While Android apps are generally written in the Java programming language and executed on top of the Dalvik virtual machine [35], native code execution is possible via the Java Native Interface. This mixed execution model has persuaded other researchers, see Appendix A, to re- construct, and keep in sync, different semantics through virtual machine introspection (VMI) [85] for both the OS and Dalvik views [233]. Zhang et al. further stressed this concept by claiming that traditional system call analysis is ill-suited to characterize the behaviours of Android apps as it misses high-level Android-specific semantics and fails to reconstruct inter-process communications (IPC)1 and remote procedure call (RPC) interactions, which are essential to understanding Android app behaviours [241].

In a significantly different line of reasoning from [75, 241], we observed that system call invocations remain central to both low-level OS-specific and high-level Android-

specific behaviours. However, as mentioned previously, a traditional or simplistic anal- ysis of system calls would lack the rich semantics of Android-specific behaviours.

This is where the novelty and real value of CopperDroid lies; the author’s contribu- tions enable seamless and automatic dissection of complex IPC messages from system calls, resulting in the deserialization of complex Android objects. This is achievable with the unmarshalling Oracle, developed by the author and showcased in this publica- tion [202]. It is this Oracle that enables the reconstruction of Android app behaviours at multiple levels of abstraction from a single point of observation (i.e., system calls). Equally as important, this approach makes the analysis agnostic to the runtime, allow- ing our techniques to work transparently with all Android OS versions. For instance, we have successfully run CopperDroid on Froyo, Gingerbread, Jelly Bean, KitKat, and the newest Lollipop (i.e., Android 5.x running ART) versions with no modification to An- droid and minimal configuration changes for CopperDroid. In summation, we present the following three contributions as resulted from the author’s research efforts.

1. Automatic IPC Unmarshalling: We introduce CopperDroid as a base, dynamic, system call collector (i.e., no analysis included), and present the design and im- plementation of a novel, practical, oracle-based technique to automatically and seamlessly reconstruct Android-specific objects involved in system call-related IPC/ICC and RPC interactions. The author’s approach avoids manual develop- ment efforts and transparently addresses the challenge of dealing with the ever in- creasing number of complex Android objects introduced in new Android releases. The Oracle addition allows CopperDroid to perform large-scale, automatic, and faithful reconstruction of Android apps behaviours (Section 3.4), suitable to en- able further research, including Android malware classification and detection.

2. Value-based Data Flow Analysis: To abstract sequences of related low-level system calls to higher-level semantics (e.g., network communications, file cre- ation) and enrich our reconstructed behavioural profiles, the author wrote a tool to automatically build data dependency graphs over traces of observed system calls and perform value-based forward slicing to cluster data-dependent system calls. Moreover, this gives CopperDroid the ability to automatically recreate file resources associated with a data dependent graph or “chain”. This compression of system call sequences into behavioural profiles summarizes each action’s se- mantics and, during file system accesses, can provides access to reconstructed resources. These files may be further inspected dynamically or statically by com- plementary systems or, if an APK, be fed back to CopperDroid.

3. Behaviour Reconstruction and Stimulation: We provide a thorough evaluation of CopperDroid’s behavioural reconstruction capability on more than 2,900 An- droid malware samples provided by three sources [52,141,249]. Furthermore, our experiments show how a simple yet effective malware stimulation strategy allows us to disclose an average of 25% of additional behaviours on more than 60% of the analysed samples, qualitatively improving our behavioural reconstruction ca- pabilities with minimal effort and negligible overhead (Section 3.6). Incremental stimuli is also experimented with, for a more fine-grained analysis.

Through our examination of other works on Android malware analysis, see Section 3.8 and Chapter 2, it is our belief that CopperDroid’s unified reconstruction significantly contributes to the state-of-the-art reconstruction of Android malware behaviour.