• No results found

Obfuscation fingerprinting in Android binaries

N/A
N/A
Protected

Academic year: 2021

Share "Obfuscation fingerprinting in Android binaries"

Copied!
240
0
0

Loading.... (view fulltext now)

Full text

(1)

Obfuscation

Fingerprinting in

Android Binaries

A Project

By M atthew Philip Van Veldhuizen

Presented to th e faculty of University of Alaska Fairbanks In Partial Fulfillment o f th e R equirem ents of

MASTERS IN C O M P U T E R SCIENCE

Fairbanks Alaska April 2015

(2)

A Project

RECOMMENDED:

APPROVED:

By Matthew Philip Van Veldhuizen

^ ! w 4 ' & 3 t ¥

Advisory C o n n o te s Chair

-A -,;;

X X ' ^ J sL

Date

zj/f-n-ib

Department Head, Corripatcr Science Department ms. Beaii, College of Science, Engineering, and Mathematics X t0:

(3)

Abstract

There are m a n y way to protect code from reverse engineering. O n e such way is to obfuscate either th e source code, m achine code or bytecode. Obfuscating A n d ro id applications n o t only m akes it h a rd e r to reverse engineer, it can also speed up execution by reducing th e size of th e application an d rem oving unnecessary code. O n e m e th o d o f obfuscation is to do it m anually an d th e o ther m e th o d is to use an obfuscation program . However, it m ay becom e necessary to reverse obfuscation, because o f th e loss o f source code or w hen investigating malware, trojans, or o ther h a rm fu l applications. This process is called deobfuscation. O nce an application has been obfuscated p e rfo rm in g deobfuscation is a tedious task, an d k n o w in g h o w th e application was obfuscated w ould increase th e probability o f correctly reversing th e obfuscation. By exam ining four A n d ro id application obfuscators I suc­ cessfully identified distinct fingerprints w ithin each o f th e obfuscated binaries by building a simple A n d ro id application, obfuscating it, an d th e n co m p a rin g obfuscated a n d unobfuscated bytecode. Using these fingerprints I was able to associate each obfuscator with an approxim ate probability th at it was used to pe rfo rm th e obfuscation.

(4)

Acknowledgments

I th a n k Dr. Brian Hay, D e p a rtm e n t of C o m p u ter Science, University o f Alaska Fairbanks, Dr. O rion Lawlor, D e p a rtm e n t of C o m p u ter Science, University o f Alaska Fairbanks, Dr. Kara Nance, D e p a r t­ m e n t o f C o m p u ter Science, University o f Alaska Fairbanks, an d Dr. Jon Genetti, C hair o f th e D e ­ p a rtm e n t of C o m p u ter Science, University o f Alaska Fairbanks, for their guidance, technical k n o w l­ edge an d support. I w ould also like to t h a n k m y fellow graduate student Karl Ott, for his technical knowledge a n d support. I w ould also like to t h a n k m y parents Robert Van Veldhuizen a n d Patricia Holloway for p r o o f reading m y paper.

(5)

Table of Contents

Abstract i Acknowledgments ii 1 Introduction 1 2 Related Work 2 3 Background Information 2 3.1 T e r m i n o l o g y ... 2 3.1.1 A n d ro id Application Package ( A P K ) ... 2 3.1.2 D alvik Virtual M a c h i n e ... 2 3.1.3 D eobfuscation ... 2 3.1.4 d e x ... 2 3.1.5 b y t e c o d e ... 2 3.1.6 Fingerprints ... 3 3.1.7 O b f u s c a tio n ... 3 3.1.8 O b f u s c a t o r ... 3

3.1.9 Software D evelopm ent Kit ( S D K ) ... 3

3.2 A n d ro id Platform ... 3

3.3 A n d ro id O bfuscation ... 4

3.3.1 Identifier ren a m in g ... 4

3.3.2 Junk Byte Insertion ... 4

3.3.3 Obfuscated or E n c rypted S t r i n g s ... 5

3.3.4 D ynam ic L oading o f C o d e ... 6

3.3.5 D ynam ic C ode M odification ... 6

3.3.6 Call G raph Obfuscation ... 6

3.3.7 Manifest Obfuscation ... 7

4 Tools 8 4.1 A n d ro id Obfuscators ... 8

4.1.1 P roguard ... 8

4.1.2 Java Archive G rinder ... 8

4.1.3 Zelix K l a s s M a s t e r ... 8

4.1.4 A l l a t o r i ... 8

4.2 A n d ro id Obfuscators C onsidered But N ot Used ... 8

4.2.1 ClassEncrypt ... 9

4.2.2 Java ByteCode O b f u s c a t o r ... 9

4.2.3 Java O ptim ize a n d Decom pile E nvironm ent ... 9

4.3 Tools Used ... 9

4.3.1 Apache A nt ... 9

4.3.2 dex2j a r ... 9

(6)

4.3.4 ja r2dex ... 10

4.3.5 unzip ... 10

4.3.6 xxd ... 10

5 Building Android Applications with Obfuscation 10 5.1 Simple A n d ro id Application ... 10

5.1.1 M ain A c tiv ity ... 10

5.1.2 String Encryption A c t i v i t y ... 10

5.1.3 Fibonacci Calculator Activity ... 11

5.1.4 Web Page A c t i v i t y ... 12

5.2 Apache A nt Build Process ... 12

5.3 P roguard O bfuscation ... 13

5.4 Java Archive G rin d er O bfuscation ... 13

5.5 Zelix KlassMaster O b f u s c a t i o n ... 13

5.6 Allatori O bfuscation ... 14

6 Android Obfuscation Fingerprints 15 6.1 P roguard F i n g e r p r i n t s ... 15

6.2 Java Archive G rin d er F i n g e r p r i n t s ... 17

6.3 Zelix KlassMaster F i n g e r p r i n t s ... 18

6.4 Allatori F i n g e r p r i n t s ... 20

7 Android Obfuscation Fingerprinting Tool 22 8 Conclusion 24 9 Further Research 24 References 26 A Custom Android Application 29 A.1 m anifest.xm l ... 29 A.2 s t r i n g s . x m l ... 29 A.3 TestOneActivity.java ... 30 A.4 a c tiv ity _ te s t_ o n e .x m l... 31 A.5 D isplayM essageActivity.java... 32 A.6 SimpleCrypto.java ... 33 A.7 activity_display_message.xml ... 34 A.8 DisplayMathsActivity.java ... 34

A.9 activity _display_m aths.xm l... 36

A.10 D isp layW ebA ctivity.java... 36

A.11 activity _display_web_page.xml... 37

B Obfuscator Settings 38 B.1 proguard-project.txt ... 38

(7)

B.3 Z K M S c r i p t . t x t ... 39

B.4 a ll a to r i.x m l ... 40

C dexdump Outputs 42 C.1 U nobfuscated ... 42

C.2 P roguard ... 72

C.3 Java Archive G rin d er ... 88

C.4 Zelix K l a s s M a s t e r ... 114

C.5 Allatori ... 144

D aof.py 210 E Android Obfuscation Fingerprinter Results 219 E.1 Simple A n d ro id Application ... 219

E.1.1 P roguard O bfuscation ... 219

E.1.2 Java Archive G rinder Obfuscation ... 220

E.1.3 Zelix KlassMaster Obfuscation ... 221

E.1.4 Allatori O b f u s c a tio n ... 222 E.2 Google Play S t o r e ... 223 E.2.1 Google C h ro m e [16] ... 223 E.2.2 D igitalchem y Calculator [9] ... 224 E.2.3 Facebook [1 1] ... 225 E.2.4 LED Flashlight [17] ... 226 E.2.5 A m azon Kindle [2] ... 227 E.2.6 Google My Business [13] ... 228 E.2.7 Instagram [18] ... 229 E.2.8 Netflix [22] ... 230 E.2.9 P an d o ra R a d io [24] ... 231 E.2.10 Clash of Clans [32] ... 232

List of Figures

1 Different Steps Between C om piling D alvik a n d Java bytecode... 3

2 Java Source C o d e ... 4

3 Java Source C o d e with Rew ritten Identifiers ... 4

4 Disassem bly with D etection o f Junk B y t e s ... 5

5 Linear Sweep w ith dex d u m p Fails D u e to Junk Bytes... 5

6 Recursive Traversal Fails D u e to C onditional Branches... 5

7 Java Source with U nencrypted Strings... 5

8 Java Source with E ncrypted Strings... 6

9 Call G raph O bfuscation... 7

10 Manifest Obfuscation E xam ple... 7

11 M ain Activity Screenshot... 11

12 String E n cry p tio n Activity Screenshot... 11

13 Fibonacci Calculator with a n d w ithout Overflow Activity Screenshot... 12

(8)

15 C o m p a riso n Between dex d u m p O u tp u t Showing th e Class # 0... 15 16 C o m p a riso n Between dexdum p O u tp u ts Showing source_file_idx, annotations_off,

a n d Variable R en a m in g ... 16 17 C o m p a riso n Between dexdum p O u tp u ts Showing th e Removal o f Positions a n d Lo­

cals In fo rm a tio n ... 16 18 C o m p a riso n Between dexdum p O u tp u ts Showing source_file_idx, annotations_off,

Variable R e nam ing... 17 19 C o m p a riso n Between dexdum p O u tp u ts Showing th e Removal o f Positions a n d Lo­

cals In fo rm a tio n 17

20 C o m p a riso n Between dex d u m p O u tp u ts for Class # 0... 18 21 O bfuscated dexdum p O u tp u t Showing Class a n d Variable R enam ing, source_file_idx

a n d A n n o ta tio n O bfuscation... 19 22 O bfuscated dex d u m p O u tp u t Showing Flow C ontrol O bfuscation... 19 23 O bfuscated dex d u m p O u tp u t Showing String E n c ry p tio n... 20 24 O bfuscated dexdum p O u tp u t Showing source_file_idx, annotation_off, Variable an d

Class N am e O bfuscation... 20 25 O bfuscated dex d u m p O u tp u t Showing String E n c ry p tio n... 21 26 C o m p a riso n Between dex d u m p O utputs Showing Positions O bfuscation a n d the

(9)

1 Introduction

The A n d ro id sm art p h o n e is one of the best selling sm art p h o n e platform s in th e w orld [3 1], with m ore th a n 1.5 billion A n d ro id sm art p h o n es sold betw een 2010 a n d 2013 [4 0]. In addition, the A n d ro id sm art p h o n e has th e largest install base of any m obile or no n -m o b ile operating system, an d since 2013 there have been m ore A n d ro id devices sold th a n W indow s, iOS an d M ac OS X devices c o m bined [4 0]. W ith this m a n y A n d ro id devices in use today, security is very im portant.

There are a m ultitude of options w h e n it comes to security in A n d ro id devices, an d m o st m eth o d s include a com bination o f physical, web, or intellectual processes [10, 3 1]. However, security begins with th e developer. For example, a developer has created an application for th e A n d ro id platform with an algorithm th at will out p e rfo rm all competitors. The n e w algorithm m ay be protected from being reverse engineered by using a m e th o d called obfuscation [33]. Obfuscation is a process of m ak in g the source code, m ac h in e code, or bytecode, difficult for h u m a n s to u nderstand, while still being able to r u n normally. Obfuscated bin ary code prevents analysis from developing timely, ac­ tionable insights by increasing code complexity a n d reducing th e effectiveness of existing tools [5]. W hile obfuscation is great for h id in g th e developer’s ne w algorithm , it can also be used by in d i­ viduals w ho wish to h a rm users. For instance, developers of malware, trojans, or viruses, can use obfuscation to m ake it difficult for com puter security professionals to discover w hat a potentially h a rm fu l application is doing [28, 29].

Regardless o f th e intent o f th e developer, there are tw o m e th o d s th at can p e rfo rm obfuscation [30]. O n e m e th o d is to use a tool or application th at will obfuscate th e A n d ro id application automatically. The developer only needs to set up th e obfuscation settings a n d r u n th e obfuscator on th e original application. The o ther m e th o d is to m anually obfuscate th e application. W hile this m e th o d is m ore tim e con su m in g a n d difficult, it does p ro d u ce a un iq u e application in th e end. The process o f re­ versing obfuscation is called deobfuscation. D eobfuscation m ea n s tak in g obfuscated source code, bytecode, or m achine code th at is difficult to und e rsta n d , a n d tra n sfo rm in g it back into code that can be u n d e rs to o d by h u m a n s [3 0]. P erform ing deobfuscation is a tedious task, a n d kn o w in g how th e application was obfuscated w ould increase th e probability o f correctly reversing th e obfuscation

[33].

The intent of this project is to address the following research questions related to th e obfuscation of A n d ro id applications:

• W ith in each obfuscation, w h ether it was done manually, or w ith an obfuscation program , are there any differences betw een th e m that could be u sed as reliable m arkers to correctly identify h o w it was obfuscated?

• If such m arkers are found, w ould it be possible to generate a distinct fingerprint for each of th e obfuscation pro g ra m s th at w ould correctly identify it?

The objectives o f this project were to exam ine a small subset of th e autom atic obfuscation program s, each with different m eth o d s for p e rfo rm in g obfuscation an d identify h o w an application was o b ­ fuscated. M y goal was to determ ine if it is possible to identify which obfuscation p ro g ra m was used o n any A n d ro id application, based o n a distinct fingerprint for each obfuscation program .

(10)

2 Related Work

Dr. Yiannis Pavlosoglou, researched this sam e topic but for Java applications. He developed a p r o ­ gram called elucidate [25], which is a Java O bfuscator F ingerprinter a n d C racking tool. This tool was able to identify th e obfuscator, recover k n o w n strings w ithin th e file, give an estimate of the complexity, a n d provide a m ap o f th e application given a jar or class file. Dr. Pavlosoglou provided m e with a set o f lecture slides an d sources but was unable to recover th e original elucidate tool. This tool was only able to lo o k at Java bytecode a n d n o t D alvik bytecode. M ost o f Dr. Pavlosoglou’s research was targeted for Java. W hile Java a n d D alvik are similar, th e D alvik Virtual M achine has completely different set o f bytecode instructions. Consequently, obfuscation tools Dr. Pavlosoglou m e n tio n e d in his presentation are either n o t relevant to A n d ro id or n o longer available.

3 Background Information

3.1

Terminology

3.1.1 Android Application Package (APK)

A n d ro id Application Package is th e package file form at used to distribute an d install application software a n d m iddlew are onto th e A n d ro id O p erating System. It is based on th e ZIP file archive structure [38].

3.1.2 Dalvik Virtual Machine

D alvik Virtual M achine is a process virtual m ac h in e on the A n d ro id O p erating System th at executes applications w ritten for A ndroid. Program s are c o m m o n ly written in Java a n d com piled into Java bytecode a n d th e n translated into Dalvik bytecode [38]. There are several reasons th e D alvik Virtual M achine is used over th e Java Virtual M achine for sm art phones. O n e such reason is th at th e Virtual M achine was slim m ed dow n to use less space. In addition, th e D alvik bytecode in struction set is m ore suited for register based m achines, which lowers th e n u m b e r o f total instructions a n d raises th e instruction interpreter speed. A n o th e r reason th e D alvik Virtual M achine is use d for sm art p h ones is th at it has been optim ized heavily to w o rk with low m e m o r y [34].

3.1.3 Deobfuscation

D eobfuscation is th e process o f converting a p ro g ra m th at is difficult to u n d e rs ta n d (in other words, a p ro g ra m th at has been obfuscated) into one th at is simple to u n d e rs ta n d [20].

3.1.4 dex

Dalvik Executable, otherwise k n o w n as dex, is th e bytecode to be executed on th e D alvik Virtual M achine [7].

3.1.5 bytecode

Bytecode is a form o f instruction set designed for efficient execution by a software interpreter. As long as there is an interpreter, the bytecode can be r u n on any hardw are an d operating system c o n ­ figuration.

(11)

3.1.6 Fingerprints

Fingerprints are distinct identifiers w ithin the application that can be u sed to identify various char­ acteristics about the application such as h o w it was obfuscated, h o w the application was w ritten or w hat m achine com piled the application [39].

3.1.7 Obfuscation

O bfuscation is the deliberate act o f creating obfuscated code, such that the source or m achine code is difficult for h u m a n s to u n d e rs ta n d b u t will still w ork properly w h e n executed or com piled [20].

3.1.8 Obfuscator

A n obfuscation application that will generate an obfuscated version of the target application.

3.1.9 Software Development Kit (SDK)

A Software D evelopm ent Kit is typically a set of software developm ent tools that allow for the cre­ ation of applications for a certain software package, fram ew ork, hardware, or operating system [4 1].

3.2 Android Platform

The A n d ro id Platform is an open source operating system designed for mobile, e m b e d d e d a n d w ear­ able devices. The operating system is based on the Linux kernel a n d is currently being developed by Google. The operating system uses the Dalvik Virtual M achine w ith a Just-In-Time com piler to execute A n d ro id applications. A n d ro id applications are w ritten in Java a n d th en com piled into the Dalvik bytecode, or dexcode, Figure 1 shows the differences betw een the steps in w hich Java an d Dalvik bytecode are com piled [6].

Java Source Code Java Source Code Java

^ ^ C o m p i l e r ^ ^

Java ^ ^ C o m p i l e r ^ ^ Java Bytecode Java Bytecode

Dex .Compiler Dalvik Bytecode I Dalvik Executable Dalvik Virtual Machine

Figure 1: Different Steps Between Com piling Dalvik a n d Java bytecode. Java Bytecode

Java Virtual Machine

(12)

D u rin g th e build process th e A n d ro id applications are encapsulated in an A PK archive. The APK is a com pressed file, very similar to a com pressed Zip archive [3 1] th at contains th e class bytecode an d all th e application resources (icons, sounds, etc), a n d any b inary native files. For each A n d ro id application a m anifest file is required. The m anifest file defines th e m eta d ata for th e application. The m eta d ata includes th e requested perm issions or registered services a n d activities. Activities are screens that allow the user to interact w ith th e program .

3.3 Android Obfuscation

3.3.1 Identifier renaming

Identifier ren a m in g is one of th e simplest m e th o d s o f obfuscation. It operates by changing th e nam es o f variables, functions, or m e th o d s from their original identifier na m e s to a n a m e th at is less m e a n ­ ingful or h a rd e r to u nderstand. Figures 2 an d 3 show an examples o f identifier re n a m in g [2 1]. public class Base64 {

public String decode( String input ) { . . . }

public String encode( String input ) { . . . }

}

Figure 2: Java Source C ode

public class a {

public String a ( String a ) { . . . }

public String b ( String b ) { . . . }

}

Figure 3: Java Source C ode with Rewritten Identifiers

3.3.2 Junk Byte Insertion

Inserting j u n k bytes into th e software complicates th e analysis o f th e binary. There are tw o assu m p ­ tions th at have to be considered. First, th e instructions have to be incorrect by using incom plete instructions. This produces a “red h errin g ” for disassemblers. The second assum ption is th at th e in ­ complete ju n k byte instructions m u st never be reached d u rin g n o rm a l execution. This is achieved by placing an u n conditional ju m p before th e inserted ju n k byte instructions or a conditional ju m p if th e result is k n o w n a n d predictable. Figure 4 shows th at th e integer 6 is returned, a n d due to the un conditional branch at address 0x3be (sh o w n in red), th e inserted ju n k bytes (show n in blue) will never be executed.

Figure 5 is th e same code as Figure 4, but was analyzed using a linear sweep algorithm . This algo­ rith m fails to disassemble th e j u n k byte code. Linear Sweep starts at th e first byte o f th e b inary’s text segm ent an d proceeds from there, d ecoding one instruction at a tim e. They are p ro n e to errors th at result from data e m b e d d e d in th e instruction stream. Thus resulting in errors w h e n try in g to decode th e ju n k bytes th at were inserted [19].

(13)

0003bc 1250 | 0000 const / 4 v0, #in t 5 0003be 2900 0400 | 0001 goto / 16 0005 0003c2 0001 | 0003 <j unkbytes> 0003c4 0000 | 0004 <j unkbytes> 0003c6 d800 0000 | 0005 add-int / l i t8 v0, v0, 0003ca 0f00 | 0007 return v0

Figure 4: Disassem bly w ith D etection o f Junk Bytes 0003bc: 1250 0003be: 2900 0400 0003c2: 0001 0000 d800 0001 0003ca: 0f00 | 0000: const / 4 v0, #in t 5 | 0001: goto / 16 0005 | 0003: dummy-function | 0007: return v0

Figure 5: Linear Sweep w ith dexdum p Fails D u e to Junk Bytes.

Figure 6 is the same as Figure 4, but was analyzed using a recursive traversal algorithm , a n d the conditional bran c h led to a failure o f th e disassembler. Recursive traversal works by following the control flow o f th e program , which m akes it possible for th e disassembler to skip any j u n k byte data th at h a d th e conditional branch before it. H ow ever th e control flow cannot always be rec o n ­ structed precisely because it can be influenced by external processes such as th e r u n tim e state or o ther conditional branches. These external processes are n o t available to th e disassembler du rin g static analysis, an d only du rin g r u n time. W h e n th e algorithm cannot determ ine th e ju m p location statically, for instance in th e case o f an indirect jum p, it fails to analyze p arts o f th e program s code [19]. 0003bc: 1250 0003be: 2900 0400 0003c2: 0001 0000 d800 0001 0003ca: 0f00 | 0000: const / 4 v0, #in t 5 | 0001: if-g tz v0, 0005 | 0003: dummy-function | 0007: return v0

Figure 6: Recursive Traversal Fails D u e to C onditional Branches.

3.3.3 Obfuscated or Encrypted Strings

The use o f encryption renders strings unreadable. The strings are stored as clear text inside the com piled A n d ro id application, which m akes extraction trivial. To obfuscate strings, the strings are encrypted a n d stored inside th e application. W h e n th e strings are n eed ed d u rin g r u n time, th ey are decoded or decrypted. O nce th e y are dec o d ed th ey can be used norm ally inside th e application. Figure 7 shows th e Java m e th o d with u n e n c ry p ted strings a n d Figure 8 shows th e same m e th o d but with encrypted strings [2 1].

public void i n i t ( ) {

String host = "www.example.com"; String username = "secretuser"; String password = "secretpass"; }

(14)

public void i n i t ( ) {

String host = decrypt("b4177923565cfbe84eae33e4efdb637a"); String user = decrypt("a58be63b1602ab2a6ac24d9a4689d278"); String pass = decrypt("a0133dc939c4f54571faf329a904a3ec"); }

Figure 8: Java Source with E ncrypted Strings.

3.3.4 Dynamic Loading of Code

W h e n a p ro g ra m is r u n n in g a n d d u rin g execution, code from a rem ote location is loaded a n d ex­ ecuted. The A n d ro id specific m e th o d o f fetching, em b e d d in g a n d if necessary, unp a c k in g or d e ­ crypting th e rem ote code p arts is simply by using readily available library functions such as the java.net.url an d javax.crypto.cipher libraries. Both loading a n d execution are possible th ro u g h the sta ndard DexFile class. It is possible to load a dex file into m e m o r y o f th e currently r u n n in g process [2 1].

3.3.5 Dynamic Code Modification

D ynam ic code m odification increases th e difficulty o f static analysis especially w hen em ploying multiple layers o f m odification. There are tw o different ways to accomplish dynam ic code m odifi­ cation [2 1].

The first m e th o d is to m o d ify th e D alvik byte code itself. The D alvik byte code has a lim ited in ­ struction set, m ea n in g th at it is n o t possible to alter the bytecode dynamically w ithout th e use of an external helper. Using th e Java Native Interface [26], it is possible to execute native code in the context o f th e currently r u n n in g process, a n d therefore th e native code can access m em ory. This native code has to be called an d loaded by th e D alvik bytecode, a n d it produces bytecode th at will fu rth e r be executed by th e D alvik Virtual Machine.

The second m e th o d is to execute native code directly by th e processor. There are several differences betw een th e instruction sets o f Intel x86 an d ARM, but dynam ic code m an ipulation is very similar to th e well k n o w n a n d m u c h discussed techniques on x86 m achines [3 1]. This technique is not considered for the pu rp o se s o f this project.

3.3.6 Call Graph Obfuscation

Every A n d ro id application begins as a fork of th e A n d ro id zygote process. This zygote process has a set of libraries a n d the A n d ro id fram ew ork preloaded. This obfuscation m e th o d works by including classes into th e APK th at have th e sam e n a m e as one o f th e p reloaded system libraries. The resulting Dalvik bytecode will p o in t to the internal library definition, but d u rin g r u n tim e it will use the preloaded library [2 1]. Figure 9 shows h o w Call G raph Obfuscation works.

(15)

Figure 9: Call G raph Obfuscation.

3.3.7 Manifest Obfuscation

Every an d ro id application has a m anifest file th at defines th e application’s m etadata. This m etadata stores inform ation about th e requested perm issions, registered services, a n d activities. Activities are an application c o m p o n e n t th at provides a screen th at the user interacts with, such as dialing the phone, tak in g a photo, sending an email or view ing a map. Each activity is given a w in d o w that can either fill th e screen or be a smaller, floating w indow on top o f o ther windows. A n d ro id itself looks for certain attributes by a n u m eric identifier instead of a nam e. However, static analysis tools drop th e attribute ID an d instead leave th e attribute n a m e intact. This can be exploited by including an attribute with an invalid ID (o x o o o o o o o o ) in the application’s m anifest file [2 1]. A n d ro id will ignore th e attribute since it is invalid, but static analysis tools will drop th e ID w h e n decoding th e A n d ro id manifest file, a n d only consider th e attribute nam e. This m eans that any e n try in th e m anifest will be ignored by A n d ro id at r u n time, but any static analysis tools will attem pt to decode th e ID an d will fail a n d rep o rt back as either a ba d m anifest file or co rru p ted application package.

<manifest xmlns:android="h t t p ://schemas.android. com/apk/res/android" android: sharedUserId="string"

android:sharedUserLabel="string resource" android:versionCode="integer"

0x00000000="string" </manifest>

(16)

4 Tools

4.1 Android Obfuscators

4.1.1 Proguard

P roguard [8] is an O p e n Source Java an d A n d ro id obfuscation tool th at can shrink, optimize, obfus­ cate a n d preverify Java classes. It comes with th e A n d ro id SDK a n d comes preconfigured for m ost purposes. It can:

• D etect a n d rem ove u n u s e d classes, fields, m e th o d s a n d attributes. • O ptim ize bytecode a n d rem ove u n u s e d instructions.

• R enam e classes, fields a n d m e th o d s using short m eaningless names. • Preverify th e processed code for Java 6 a n d higher.

4.1.2 Java Archive Grinder

Java Archive G rin d er [23] is an O p e n Source Java optimizer, obfuscator, shrinker, an d reducer. It can:

• Remove u n u s e d fields, m ethods, classes an d interfaces as well as debug inform ation. • R enam e fields, m ethods, classes a n d interfaces.

• O ptim ize Java bytecode, such as rem oving N O P instructions a n d com pressing local variable slots.

4.1.3 Zelix KlassMaster

Zelix KlassMaster [42] is a com m ercial Java an d A n d ro id bytecode obfuscator. It can: • R enam e m ethods, fields, an d classes.

• Im plem ent flow control, exception, an d string obfuscation.

• Integrate with Apache A nt build system to w ork w ith D alvik bytecode obfuscation.

4.1.4 Allatori

Allatori [1] is a com m ercial Java a n d A n d ro id O bfuscator where it can: • D o n a m e a n d flow control obfuscation.

• Obfuscate debug inform ation as well as string encryption.

• W o rk with th e Apache A nt build system to w ork with D alvik bytecode obfuscation.

4.2 Android Obfuscators Considered But Not Used

The following obfuscation applications were considered w h e n doing this project, but in th e course o f exam ination th ey were discounted due to one or m ore factors th at m ad e their use im practical for this project. These obfuscators were considered due to their past p o pularity with Java obfuscation, an d their stated ability to w ork properly w hen obfuscating A n d ro id applications.

(17)

4.2.1 ClassEncrypt

ClassEncrypt [37] is an O p e n Source Java obfuscator th at can encrypt class files to prevent malicious users from stealingthe source code. This obfuscator was n o t used because every tim e it was executed, it p r o d u c e d em pty Java class files, a n d gave n o errors, even with different settings applied.

4.2.2 Java ByteCode Obfuscator

The Java ByteCode O bfuscator [3] is O p e n Source Java bytecode obfuscator based on Soot’s advanced typing an d flow analysis fram ew ork to preform obfuscation. It can:

• O perate on Java class files, p ro d u c in g obfuscated Baf, Jasmin, or class files.

• A dd dead-code switch statements, disobey c o nstructor conventions, flow control obfusca­ tion, a n d package local variables into bitfields.

• R enam e class, m eth o d , field, or variable names.

This obfuscator was n o t used because it was n o t compatible with th e A n d ro id bytecode, even w hen it was converted to Java bytecode before ru n n in g th e obfuscator. It required th at a M ain function be defined in order to properly obfuscate an d A n d ro id does n o t use M ain to start th e application.

4.2.3 Java Optimize and Decompile Environment

The Java O ptim ize a n d Decom pile E n vironm ent [15] is an O p e n Source Java decom piler a n d opti­ mizer. It can:

• R enam e class, m eth o d , field, an d local n am es to shorter, obfuscated or u n iq u e nam es or according to a given translation table.

• Remove debugging inform ation, dead code, an d constant fields. • O ptim ize local variable allocation.

This obfuscator was n o t u sed because every tim e it was run, it p ro d u ce d different exception errors, even w hen the sam e input was given an d th e sam e settings were applied.

4.3

Tools Used

4.3.1 Apache Ant

Apache A nt [12] is a software tool for autom ating th e software build process for Java a n d Android. It uses a set o f XML files to describe th e build process a n d its dependencies. Apache A nt was d e ­ signed to be m o d u la r so th at changing or adding in n e w steps in th e build process is simple an d efficient, usually called A nt Tasks. Several of th e obfuscators m entioned, included A nt Tasks to be im plem ented w hen building th e A n d ro id Application with their obfuscator.

4.3.2 dex2jar

The dex2jar tool [27] is a Unix an d W indow s c o m m a n d line tool th at can translate D alvik bytecode to Java bytecode.

(18)

4.3.3 dexdump

The dexdum p tool is a Unix c o m m a n d line tool th at was b u n d le d along with th e A n d ro id SDK. The dex d u m p tool is a static disassembler for Dalvik bytecode. It provides inform ation about the A n d ro id application by looking at th e classes.dex file inside th e APK.

4.3.4 jar2dex

The ja r2dex tool [27] is a U nix an d W indow s c o m m a n d line tool that can translate Java bytecode to Dalvik bytecode.

4.3.5 unzip

The unzip tool [14] is a Unix c o m m a n d line tool th at will list, test or extract files from ZIP archives.

4.3.6 xxd

The xxd tool [36] is a Unix c o m m a n d line tool th at creates a hex d u m p o f a given file or standard input. It can also convert a hex d u m p back to its original b inary form.

5 Building Android Applications with Obfuscation

The first step in finding obfuscation fingerprints was to look at th e differences betw een an u n o b fu s ­ cated A n d ro id application an d an obfuscated A n d ro id application. To m ake this com parison easier to achieve, a simple A n d ro id application was written. Each tim e th e A n d ro id application was c o m ­ piled a n d built, a different obfuscator was used. After all o f the applications were obfuscated, each application was disassembled a n d com p ared against th e unobfuscated disassembly.

5.1

Simple Android Application

The A n d ro id application th at was w ritten included four parts: th e m ain activity, string encryption activity, Fibonacci calculator activity, a n d get web page activity. Each o f th e activities attem pted to achieve som e functionality th at could be fo u n d in m an y A n d ro id applications. This w ould result in seeing obfuscation th at could be use d to identify their correlating fingerprints. Each activity also included extraneous a n d u n u s e d variables a n d classes that w ould be affected by each o f the obfuscators. The source code for this application is located in A ppendix A.

5.1.1 Main Activity

The m ain activity included three input entry boxes with accom panying send buttons. Input was entered, th e Send b u tto n was pushed, a n d the c o rresponding activity was called a n d executed. The m ain activity can be seen in Figure 1 1. The Java code for this activity is located in A ppendix A.3

while th e activity XML is located in A ppendix A.4.

5.1.2 String Encryption Activity

The string encryption activity used th e text entered into th e m a in activity a n d encrypted it. It would th e n display th e encrypted text in a n e w activity window. The string activity can be seen in Figure

12. The Java code for th e string encryp tion is located in A ppendix A.5 an d th e encryption code is located in A ppendix A.6 a n d th e activity XML is located in A ppendix A.7.

(19)

▼ 5554:Nexus_4_API_19 - X 35ll Q 6:26 l ^ l Test One °% ; Enter a M e s s a g e Send Enter a Num ber Send W eb Site Send

Figure 1 1: M ain Activity Screenshot.

Figure 12: String E ncryption Activity Screenshot.

5.1.3 Fibonacci Calculator Activity

The Fibonacci calculator activity takes input from the m ain activity as an integer a n d calculated the Fibonacci n u m b e r u p to the given num ber. Two versions o f the Fibonacci Calculator were written. O n e that correctly h a n d le d very large num bers, a n d one that did not, w hich causes integer overflow, see Figure 13. The Java code is located in A ppendix A.8 an d the activity XM L is located in A ppendix A.9.

(20)

(a) With Overflow. (b) Without Overflow.

Figure 13: Fibonacci Calculator with an d w ithout Overflow Activity Screenshot.

5.1.4 Web Page Activity

The web page activity u sed input from the m ain activity in the form o f a URL an d displayed the associated web page. This activity was n o t fully im plem ented. It was able to m ake a connection to the specified URL, b u t it is n o t able to ren d e r the web page correctly. Having full functionality here is not im portant. The m ain p o in t o f this activity was to im plem ent ne tw o rk activity an d not to properly displaying web pages. This activity can be seen in Figure 14. The Java code is located in A ppendix A.10 a n d the activity XM L is located in A ppendix A.1 1.

Figure 14: Web Page Activity Screenshot.

5.2

Apache Ant Build Process

Every application was built in release m ode, digitally signed, a n d byte aligned. Release m o d e directs the b uild script to build the A n d ro id application so that it is ready for distribution to the end-users.

(21)

This m o d e removes som e inform ation from th e resulting APK an d requires th at it be signed with a keystore. Therefore, a signed key was generated a n d included along w ith each application. After the application h a d been signed it was properly byte aligned, m ea n in g th e A PK archive was aligned to ensure th at all uncom pressed data started with a particular alignm ent relative to th e start o f th e file. This allowed for th e best com pression rates as well as efficient an d pro p er use o f the A PK ’s.

5.3

Proguard Obfuscation

Since P roguard is th e default obfuscator released with the A n d ro id SDK, enabling P roguard was a simple task. M ost o f th e P roguard settings enabled by default were acceptable to use w hen obfus­ cating. How ever th e following settings were enabled to ensure strict obfuscation settings. For a full list o f th e settings P roguard used, see A ppendices B.1 a n d B.2.

overloadagressivly

The overloadagressivly setting allows for m ultiple fields an d or m e th o d s to use th e same nam e. A ny variable in one class could be n a m e d the sam e as a variable in a different class.

flattenp ackagehierachy

The flattenpackagehierachy setting will repackage all th e generated or included packages into a single pa re n t APK package.

5.4 Java Archive Grinder Obfuscation

The Java Archive G rin d er did not function properly as an A nt Task, therefore the resulting o b ­ fuscation h a d to occur after th e APK was generated. This task was accom plished by o p e ning the APK archive a n d finding th e Dalvik bytecode, th e classes.dex file. The classes.dex file was th e n c o n ­ verted to Java bytecode using th e tool dex2jar. O nce th e D alvik bytecode is converted, Java Archive G rin d er could th e n be used. The Java Archive G rin d er was set to obfuscate as strictly as possible. O nce the obfuscator com pleted its task, it was converted back into th e D alvik bytecode, using the tool ja r2dex, an d repackaged into th e original APK, replacing th e old classes.dex with th e obfuscated version.

W h e n attem pting to get th e Java Archive G rin d er to obfuscate as strict as possible it w ould fail. Java Archive G rin d er was failing w hen attem pting bytecode obfuscation. Therefore, th e obfuscator was r u n with th e -nobco or n o bytecode obfuscation setting. This failure was possibly due to a conversion issue, w h en it was converted from dexcode to Java bytecode. It m ay n o t have correctly converted som e o f th e bytecode. Alternatively, it m ay have been a problem w ith the version o f the Java bytecode. The Java Archive G rin d er obfuscator m ay have been written using an older version o f Java, a n d could n o t u n d e rs ta n d som e o f th e new er bytecode instructions.

5.5

Zelix KlassMaster Obfuscation

The Zelix KlassMaster obfuscator was compatible with th e A nt build system. The easiest m e th o d of using KlassMaster was to replace th e call to th e P roguard program , with a call to th e KlassMaster program . This w ould force A nt to use KlassMaster instead o f P roguard w hen obfuscation was e n ­ abled. KlassMaster fu n ctio n ed by reading in a script file th at has th e obfuscation settings defined for th e A n d ro id application. This settings file was generated w ith a tool that was provided along with KlassMaster th at to o k th e P roguard settings an d generated an appropriate script file th at

(22)

KlassMas-ter could u nderstand. AfKlassMas-ter th e script file was generated, additional settings were applied to m ake th e obfuscation as strict as possible. The full script is located in A ppendix B.3. The following settings were included into th e script.

obfuscateFlow

The obfuscateFlow setting will m ake slight changes to th e bytecode th at will obscure the control flow w ithout changing the code functionality at r u n time.

exceptionObfuscation

The exceptionO bfuscation setting is similar to th e obfuscateFlow setting. It changes th e exception h a n dling o f th e bytecode.

encryptStringLiterals

The encryptStringLiterals setting encrypts all strings fo u n d in th e bytecode constant pools. It adds code to th e bytecode th at will decrypt th e strings at r u n time.

mixedCaseClassNames

The m ixedCaseClassN am es setting allows for any class to be n a m e d w ith a ra n d o m set o f characters o f any case, u p p e r or lower.

agressiveMethodRenaming

The agressiveM ethodR enam ing setting allows for multiple identifiers to be re n a m e d with th e same name.

5.6

Allatori Obfuscation

Allatori w orked with A nt by using an A nt Task. The A nt Task replaced th e obfuscation call that w ould have executed th e P roguard obfuscation, but instead executed th e Allatori obfuscation. There­ fore w hen r u n n in g A nt it executed Allatori w hen th e obfuscation step was processed. Allatori ru n s w ith a script th at defined th e settings for obfuscation. The full script is located in A ppendix B.4. The script was changed to m ake th e obfuscation as strict as possible. The following settings were added to the base script.

string encryption

This setting fo u n d all string data a n d enc o d ed it, a n d also ad d e d in code to allow for th e decryption o f strings at r u n time.

control flow obfuscation

This setting changed th e sta n d a rd Java constructions (loops, conditional, an d b ranching in stru c ­ tio n s ) where possible, a n d altered th e c o m m a n d s such th at decom pilation is m ore difficult.

line number obfuscation

This setting changed any of th e line num bers.

remove toString

(23)

6 Android Obfuscation Fingerprints

After generating an obfuscated application using each o f th e obfuscators, the resulting A PK ’s were u n p acked to access th e D alvik bytecode file, classes.dex. O nce the th e classes.dex files were extracted each was op e n e d with dexdum p. This p rovided a static disassembly of th e obfuscated dexcode. Each o f th e resulting dex d u m p outputs was th e n c o m pared against th e unobfuscated version o f the same A n d ro id application. All o f these dexdum p outputs are located in A ppendix C. A ny difference from the unobfuscated an d obfuscated dexdum p was considered to be a p a rt o f th e obfuscator’s fingerprint. The following fingerprints were fo u n d for each o f th e obfuscators.

6.1 Proguard Fingerprints

The first difference th at was discovered betw een th e obfuscated a n d unobfuscated files was the re­ m oval o f th e BuildConfig Class. Figure 15 shows th at the first class to be fo u n d in th e obfuscated version was DisplayMathsActivity. N o reference to BuildConfig was fo und in th e obfuscated ver­ sion. Unobfuscated Class #0 header: class_idx access_flags superclass_idx interfaces_off source_file_idx annotations_off class_data_off static_ fields_ size instance_fields_size direct_methods_size virtu a l methods size

16 17 (0x0011) 46 0 (0x000000) 10 0 (0x000000) 11630 (0x002d6e) 1 0 1 0 Class #0 Class descriptor Access flags Superclass Interfaces S ta tic fields 'Lcom/uaf/matt/testone/ BuildConfig;' 0x0011 (PUBLIC FINAL) 'Ljava/lang/Object;' O bfuscated Class #0 header: class_idx access_flags superclass_idx interfaces_off source_file_idx annotations_off class_data_off static_ fields_ size instance_fields_size direct_methods_size v irtual methods size

15 1 (0x0001) 2 0 (0x000000) -1 0 (0x000000) 6515 (0x001973) 1 0 3 2 Class #0 Class descriptor Access flags Superclass Interfaces S tatic fields 'Lcom/uaf/matt/testone/ DisplayMathsActivity;' 0x0001 (PUBLIC) 'Landroid/app/Activity;'

Figure 15: C o m p a riso n Between dex d u m p O u tp u t Showing th e Class #0.

Secondly, th e source_file_idx value was set to an invalid ID, a n d all an n o ta tio n inform ation was removed. The source_file_idx norm ally holds th e index into th e string_ids list th at w ould hold the n a m e o f th e source file. A n invalid ID is set to -1 [4]. If annotations have been rem oved from the file, th e annotations_off value w ould be set to 0. O therw ise there is n o file offset to the location of w here th e an notations for th at class occur. In addition, I discovered th at P roguard re n a m e d each of th e variables. P roguard re n a m e d each o f th e variables to a lower case letter in the alphabet. At the start o f each class, P roguard restarted th e variable nam es to th e beginning of th e alphabet. Figure

16 shows a com p ariso n betw een th e obfuscated a n d un o bfuscated application.

The final difference betw een th e obfuscated a n d unobfuscated files was th at any reference to p o si­ tions or locals was rem oved from th e obfuscated version. M ost of this inform ation was to aid in

(24)

debugging so its rem oval did n o t affect th e application. Figure 17 shows th e difference betw een the unobfuscated a n d obfuscated outputs o f th e DisplayM athsActivity class.

Unobfuscated Obfuscated

Class #1 header Class #0 header

class ids 17 class_idx 15

access_flags 1 (0x0001) access_flags 1 (0x0001)

superclass_idx 3 superclass_idx 2

interfaces_off 0 (0x000000) interfaces_off 0 (0x000000)

source f i l e idx 13 source_file_idx -1

annotations_off 6296 (0x001898) annotations_off 0 (0x000000)

Class #1 Class #0

Class descriptor : ' Lcom/uaf/matt/testone/ Class descriptor : ' Lcom/uaf/matt/testone/ DisplayMathsActivity;' DisplayMathsActivity;' Access flags : 0x0001 (PUBLIC) Access flags 0x0001 (PUBLIC)

Superclass Landroid/app/Activity;' Superclass Landroid/app/Activity;

Interfaces - Interfaces

-S ta tic fields - S tatic fields

-#0 : (in Lcom/uaf/matt/testone/ #0 : (in Lcom/uaf/matt/testone/ DisplayMathsActivity;) DisplayMathsActivity;)

name : 'fibCache' name : 'a'

type : 'Ljava/util/ArrayList;' type : 'Ljava/util/ArrayList;' access : 0x000a (PRIVATE STATIC) access : 0x000a (PRIVATE STATIC)

Instance fields - Instance fields

-Figure 16: C o m p a riso n Between dexdum p O u tp u ts Showing source_file_idx, annotations_off, an d Variable Renam ing.

catches positions 0x0000 0x0003 0x0008 0x000c 0x0013 0x0018 0x001d 0x0037 0x003a locals 0x000c 0x0013 0x0018 0x0000 0x0000 U nobfuscated (none) line=47 line=48 line=50 line=51 line=53 line=54 line=55 line=57 line=58 - 0x003b - 0x003b - 0x003b - 0x003b - 0x003b catches positions locals O bfuscated (none)

reg=0 intent Landroid/content/Intent; reg=1 number I

reg=2 textView Landroid/widget/TextView;

reg=5 th is Lcom/uaf/matt/testone/DisplayMathsActivity; reg=6 savedInstanceState Landroid/os/Bundle;

Figure 17: C o m p a riso n Between dex d u m p O utputs Showing th e Removal o f Positions a n d Locals Inform ation.

In sum m ary, n o BuildConfig Class was found, a n d all references to th e source_file_idx were set to an invalid ID. The annotation_off value was set to zero indicating th at for every class, all the an n o ta tio n inform ation was rem oved. N o positions or locals were defined for any o f the classes, an d every variable was renam ed. The full dexdum p o f the P roguard obfuscation is located in A ppendix C.2.

(25)

6.2

Java Archive Grinder Fingerprints

The first set of differences betw een th e obfuscated an d unobfuscated files was th at th e source_file_idx value was set to an invalid ID, a n d th at all a n notation inform ation was removed. In addition, vari­ ables were renam ed. Java Archive G rinder ren a m ed each o f the variables to a lower case letter in the alphabet. At th e start o f each class it restated th e variable nam es to th e b e g inning o f th e alphabet. Figure 18 shows a com parison betw een th e obfuscated a n d unobfuscated applications.

Unobfuscated Obfuscated

Class #1 header Class #0 header

class_idx 17 class_idx 16

access_flags 1 (0x0001) access_flags 1 (0x0001)

superclass_idx 3 superclass_idx 2

interfaces_off 0 (0x000000) interfaces_off 0 (0x000000)

source f i l e idx 13 source_file_idx -1

annotations_off 6296 (0x001898) annotations_off 0 (0x000000)

Class #1 Class #0

Class descriptor : ' Lcom/uaf/matt/testone/ Class descriptor : ' Lcom/uaf/matt/testone/ DisplayMathsActivity;' DisplayMathsActivity;' Access flags : 0x0001 (PUBLIC) Access flags 0x0001 (PUBLIC)

Superclass Landroid/app/Activity;' Superclass Landroid/app/Activity;

Interfaces - Interfaces

-Static fields - Static fields

-#0 : (in Lcom/uaf/matt/testone/ #0 : (in Lcom/uaf/matt/testone/ DisplayMathsActivity;) DisplayMathsActivity;)

name : 'fibCache' name : 'a'

type : 'Ljava/util/ArrayList;' type : 'Ljava/util/ArrayList;' access : 0x000a (PRIVATE STATIC) access : 0x000a (PRIVATE STATIC)

Instance fields - Instance fields

-Figure 18: C o m p a riso n Between dex d u m p O u tp u ts Showing source_file_idx, annotations_off, Vari­ able Renam ing.

Finally any reference to positions or locals was rem oved from th e obfuscated version. Figure 19

shows th e difference betw een the unobfuscated an d obfuscated outputs o f th e DisplayMathsActivity class.

U nobfuscated Obfuscated

catches : (none) catches : (none)

positions : positions : 0x0000 line=47 l ocal s : 0x0003 line=48 locals : 0x0008 line=50 0x000c line=51 0x0013 line=53 0x0018 line=54 locals :

0x000c - 0x003b reg=0 intent Landroid/content/Intent; 0x0013 - 0x003b reg=1 number I

0x0018 - 0x003b reg=2 textView Landroid/widget/TextView;

0x0000 - 0x003b reg=5 th is Lcom/uaf/matt/testone/DisplayMathsActivity; 0x0000 - 0x003b reg=6 savedInstanceState Landroid/os/Bundle;

Figure 19: C o m p a riso n Between dexdum p O utputs Showing the Removal of Positions an d Locals Inform ation.

(26)

In sum m ary, all references to th e source_file_idx were set to an invalid ID. The annotation_off value was set to zero indicating that, for every class, all th e a n notation inform ation was rem oved. N o positions or locals were defined for any o f th e classes a n d every variable was renam ed. The full dex d u m p o f th e Java Archive G rin d er obfuscation is located in A ppendix C.3.

6.3

Zelix KlassMaster Fingerprints

The first difference betw een th e obfuscated a n d unobfuscated files was th at there was n o reference to th e BuildConfig Class. Figure 20 shows th at the first class th at was fo u n d in th e obfuscated version, was ‘La’. N o reference to BuildConfig was fo und in th e obfuscated version, possibly due to class renam ing. Either way, BuildConfig was n o t fo u n d in the obfuscated version.

U nobfuscated Obfuscated Class #0 header: class_idx access_flags superclass_idx interfaces_off source_file_idx annotations_off class_data_off static_ fields_ size instance_fields_size direct_methods_size virtu a l methods size

16 17 (0x0011) 46 0 (0x000000) 10 0 (0x000000) 11630 (0x002d6e) 1 0 1 0 Class #0 header: class_idx access_flags superclass_idx interfaces_off source_file_idx annotations_off class_data_off static_ fields_ size instance_fields_size direct_methods_size v irtual methods size

0 (0x0000) 6 0 (0x000000) -1 0 (0x000000) 9619 (0x002593) 0 0 1 2 Class #0

Class descriptor 'Lcom/uaf/matt/testone/ BuildConfig;'

Class #0

Class descriptor 'La;'

Figure 2 0: C o m p a riso n Between dex d u m p O u tp u ts for Class #0.

Secondly, th e source_file_idx value was set to an invalid ID an d th at all an n o ta tio n inform ation was removed. In addition, each of th e variables were ren a m ed to a lower case letter in th e alphabet. At th e start o f each class it restarted th e variable nam es to th e beg in n in g o f th e alphabet. Figure

21 shows th e obfuscated version. Because o f h o w it was obfuscated, it was impossible to find its u nobfuscated counterpart. Additionally, any reference to positions or locals was rem oved from the obfuscated version.

The next difference betw een the obfuscated an d unobfuscated files was that control flow obfuscation was im plem ented on som e o f th e classes. The example in Figure 22 shows th at a group o f goto instructions were a d d e d to th e en d o f th e class.

Finally, every string in th e obfuscated version was encrypted, an d there was a call to a function th at w ould decrypt th e string w hen it was going to be used. Figure 23 shows the instructions for creating a n e w string, setting th at n e w string to som e encrypted set o f characters, a n d th e n calling th e function to decrypt th e string.

(27)

Class #1 header: class_idx access_flags superclass_idx interfaces_off source_file_idx annotations_off class_data_off static_ fields_ size instance_fields_size direct_methods_size virtu a l methods size

16 1 (0x0001) 3 0 (0x000000) -1 0 (0x000000) 9637 (0x0025a5) 2 0 2 2 Class #1 Class descriptor Access flags Superclass Interfaces Static fields #0 name type access #1 name type access 'Lb; ' 0x0001 (PUBLIC) 'Landroid/app/Activity;' : (in Lb;) 'a' 'I ' 0x0009 (PUBLIC STATIC) : (in Lb;) 'z' ' [Ljava/lang/String;'

0x001a (PRIVATE STATIC FINAL)

Figure 2 1: O bfuscated dexdum p O u tp u t Showing Class a n d Variable R enam ing, source_file_idx an d A n n o ta tio n Obfuscation. 000c10 0e00 | 0070 return-void 000c1 2: 0176 | 0071 move v6, v7 000c14 28b0 | 0072 goto 0022 // -0050 000c16 1306 7d00 | 0073 const/16 v6, #in t 125 // #7d 000c1a 28ad | 0075 goto 0022 // -0053 000c1c 0186 | 0076 move v6, v8 000c1e 28ab | 0077 goto 0022 // -0055 000c20: 1306 6d00 | 0078 const/16 v6, #in t 109 // #6d

000c24 28a8 | 007a goto 0022 // -0058

000c26 0175 | 007b move v5, v7 000c28 28d7 | 007c goto 0053 // -0029 000c2a 1305 7d00 |007d const/16 v5, #in t 125 // #7d 000c2e 28d4 |007f goto 0053 // -002c 000c30: 0185 | 0080 move v5, v8 000c32 28d2 | 0081 goto 0053 // -002e 000c34 1305 6d00 | 0082 const/16 v5, #in t 109 // #6d 000c38 28cf | 0084 goto 0053 // -0031 000c3a 0132 | 0085 move v2, v3 000c3c 28aa | 0086 goto 0030 // -0056

000c3e 0000 | 0087 nop // spacer

000c40 0001 0400 0000 0000 5300 0000 5500 .. . | 0088 packed-switch-data (12 units 000c58 0001 0400 0000 0000 2c00 0000 2e00 .. . | 0094 packed-switch-data (12 units

Figure 22: O bfuscated dex d u m p O u tp u t Showing Flow C ontrol Obfuscation.

In sum m ary, n o reference to BuildConfig Class was fo u n d a n d all references to th e source_file_idx were set to an invalid ID. The annotation_off value was set to zero indicating th at for every class all th e an n o ta tio n inform ation was rem oved. N o positions or locals were defined for any o f th e classes

(28)

an d every variable an d class was renam ed. Lastly, every string was encrypted. The full dex d u m p of th e KlassMaster obfuscation is located in A ppendix C.4.

000b3c 1 2 1c | 0006 const/4 v12, #in t 1 // #1 000b3e 1203 | 0007 const/4 v3, #in t 0 // #0 000b40 1220 | 0008 const/4 v0, #in t 2 // #2

000b42 230a 3300 | 0009 new-array v10, v0, [Ljava/lang/String; // type@0033 000b46 1a00 4c00 1000b const-string v0, "[ D5RGx h ,ARCnAXR~" // string@004c 000b4a 6e10 4800 0000 |000d invoke-virtual {v0}, Ljava/lang/String;.toCharArray:( )[C 000b50: 0c00 | 0010 move-result-object v0

000b52 2101 |0011 array-length v1, v0

000b54 36c1 7300 | 0012 i f - g t v1, v12, 0085 // +0073

Figure 23: O bfuscated dex d u m p O u tp u t Showing String Encryption.

6.4 Allatori Fingerprints

Allatori is a com m ercial product, an d th e version that was used to obfuscate th e A n d ro id appli­ cation was a trial version. Therefore, th ro u g h o u t th e dex d u m p output some, but n o t all variable n am es w ould be set to ALLATORIxDEMO. This w ould n o t h a p p e n if th e A n d ro id application was obfuscated with a licensed version o f Allatori. Because o f this, any identifier th at is n a m e d ALLA­ T O R IxD E M O was not considered as p a rt o f th e fingerprint.

The first set o f differences betw een th e obfuscated an d unobfuscated versions was th e source_file_idx value was set to 14 0, a value n o t in th e string_ids array. All an n o ta tio n inform ation was also set to a value where n o a n notation inform ation was located. In addition, th e variables a n d classes were renam ed. Allatori re n a m e d each of th e variables to a capital letter in th e alphabet. Each class n am e started w ith ‘Lcom /package/’ followed by a capital letter. At th e start o f each class it restated the variable n am es to th e b e g inning of the alphabet. In addition, all o f th e variable access types changed to becom e a synthetic o f that type. Figure 24 shows th e obfuscated application.

Class #1 header class_idx access_flags superclass_idx interfaces_off source_file_idx annotations off 17 17 (0x0011) 45 0 (0x000000) 140 6548 (0x001994) Class #1 Class descriptor Access flags Superclass Interfaces Static fields #0 name type access #1 name type access 'Lcom/package/G;' 0x0011 (PUBLIC FINAL) : 'Ljava/lang/Object;' : (in Lcom/package/G;) 'ALLATORIxDEMO' 'I '

0x1019 (PUBLIC STATIC FINAL SYNTHETIC) : (in Lcom/package/G;)

'H' 'I '

0x1019 (PUBLIC STATIC FINAL SYNTHETIC)

Figure 24: Obfuscated dex d u m p O u tp u t Showing source_file_idx, annotation_off, Variable an d Class N am e Obfuscation.

(29)

Secondly, every string in th e obfuscated version was encrypted, a n d there was a call to a function th at w ould decrypt th e string w hen it was going to be used. Figure 25 shows th e instructions for creating a n e w string, setting th at n e w string to som e encrypted set o f characters, a n d th e n calling th e function to decrypt th e string.

000f8a : 000f8e : 7110 2000 01001a01 0a00 || 0003:0001: 000f94: 0c01 | 0006: 000f96: 7110 6b00 0100 | 0007: const-string v1, "Bq" // string@000a invoke-static {v1}, Lcom/package/e;.d:(Ljava/lang/ String;)Ljava/lang/String; // method@0020 move-result-object v1 invoke-static {v1}, Ljava/security/MessageDigest; .getInstance:(Ljava/lang/String;) Ljava/security/MessageDigest; // method@006b

Figure 25: O bfuscated dex d u m p O u tp u t Showing String Encryption.

Finally, any reference to positions or locals was changed to different num bers, th at n o longer fol­ lowed th e bytecode specification, in th e obfuscated version. C hanging th e n u m b e rs does n o t change h o w th e application runs. In th e unobfuscated version, th e position line n u m b ers were increm ented by one, or som e small a m ount, but always increm enting. Figure 26 shows th e difference betw een th e unobfuscated an d obfuscated outputs.

Unobfuscated Obfuscated catches : 1 catches : 1 0x0000 - 0x0038 0x0000 - 0x0038 Ljava/io/IOException; -> 0x0039 Ljava/io/IOException; -> positions : positions : 0x0000 line=19 0x0000 line=182 0x0008 lin e=20 0x0008 line=92 0x000c lin e=21 0x000c line=42 0x001a lin e=22 0x001a line=188 0x001c line=23 0x001c line=70 0x0022 line=24 0x0022 line=44 0x0039 line=26 0x0039 line=28

0x003a line=27 0x003a line=104

0x003d line=29 0x003d lin e=122

locals locals

0x000c - 0x0039 reg=0 conn

Ljava/net/URLConnection; 0x0000 - 0x003f reg=5 this Lcom/package/

0x001c - 0x0039 reg=2 0x0000 - 0x003f reg=6 arg0

inputLine Ljava/lang/String; [Ljava/lang/String;

0x001a - 0x0039 reg=3

reader Ljava/io/BufferedReader; 0x0008 - 0x0039 reg=4 url Ljava/net/URL; 0x003a - 0x003d reg=1 e Ljava/io/IOException;

0x0000 - 0x003f reg=8 th is Lcom/uaf/matt/testone/getURLData;

0x0000 - 0x003f reg=9 urls [Ljava/lang/String;

Figure 26: C o m p a riso n Between dex d u m p O u tp u ts Showing Positions Obfuscation an d th e Re­ m oval o f Locals.

In sum m ary, all references to the source_file_idx was obfuscated. The annotation_off value was o b ­ fuscated. Every variable a n d class were renam ed, the variable access types were changed to becom e synthetic, a n d all line n u m b e rs were obfuscated. Lastly, every string was encrypted. The full dex­ d u m p o f th e Allatori obfuscation is located in A ppendix C.5.

(30)

7 Android Obfuscation Fingerprinting Tool

In order to autom ate th e process of finding w hich obfuscator was u sed on th e A n d ro id application, a P y thon p ro g ra m was developed. This program , titled aof.py [35], can be seen in A ppendix D. The aof.py tool takes a valid A PK a n d checks for a valid classes.dex file. D e p e n d in g on w hat set­ tings th e user applies, th e aof.py tool will lo o k th o u g h th e classes.dex file looking for th e following obfuscations:

Source File ID X Obfuscation

Looks in th e class h eader definitions for th e source_file_idx field an d checks to see if it was set to an invalid ID.

Removal of the BuildConfig Class

Looks in th e strings section o f th e classes.dex file for any reference to BuildConfig.

Removal of Annotations

Looks in th e class he a d er definitions for th e annotations_off field an d checks to see if it was set to zero.

Removal of Debug Information

Looks in each class h eader definitions for any debug inform ation.

Class Name Obfuscation

Looks in th e strings section of th e classes.dex file for any strings th at m atch any o f these patterns, ’L com /package/[A -Z ]’, ’L [a -z ]’ or ’L [A -Z ]’.

Variable Name Obfuscation

Looks in th e strings section o f th e classed.dex file for any strings th at m atc h any o f these patterns, ’[a -z ]’ or ’[A -Z ]’.

Removal of Positions and Locals

Looks in every m e th o d for th e offset to th e location o f th e positions a n d locals inform ation, if set to zero th e n n o positions or locals for th at m e th o d were defined.

String Encryption

Looks in the strings section o f th e classes.dex file a n d for every string it will count th e n o n - printable characters, a n d com pare it to th e total string size. If there are m ore th at 20% of no n -p rin ta b le characters it is assum ed th at th e string has been encrypted.

The aof.py tool checks these obfuscations to develop an educated guess on which obfuscator (P r o ­ guard, Java Archive Grinder, Zelix KlassMaster, or Allatori) was m o st likely used. This guess is displayed as a percentage o f h o w m a n y o f th e obfuscations m atc h ed to those that were discovered. W hile developing th e aof.py tool, it was tested against th e simple A n d ro id application th at was d e ­ veloped a n d several A n d ro id applications th at were dow nloaded from th e Google Play store. Table

1 shows th e resulting output o f th e aof.py tool exam ining th e simple A n d ro id application th at was written. The full results is located in A ppendix E.i.

(31)

Proguard JARG KlassMaster Allatori proguard.apk 99.99% 87.49% 62.49% 24.99% jarg.apk 87.49%

99

.

99

% 49.99% 12.49% klassmaster.apk 74.99% 62.49%

99

.

99

% 37.49% allatori.apk 24.99% 12.49% 37.49%

99

.

99

%

Table 1: S u m m a ry of Results From th e aof.py Tool W h e n Ran Against th e Simple A n d ro id Applica­ tion

From aof.py it shows th at for every one o f th e obfuscated A n d ro id applications, th at th e y m atched to their corresponding obfuscator 99.99%. In addition th e obfuscated applications m atc h ed to the o ther obfuscators due to the overlapping fingerprints. M eaning th at som e o f th e obfuscators will m atch to th e sam e obfuscation in the application, but n o t all o f them . It is im p o rta n t to note that since several obfuscation m e th o d s can be done by hand, th e aof.py tool will n o t report a 10 0% m atch to any o f th e obfuscators, it will always be 99.99%. Table 2 shows results from r u n n in g th e aof.py tool against A n d ro id applications from the Google Play Store. The full results is located in A ppendix E.2.

P roguard JARG KlassMaster Allatori Google C h ro m e 37.49% 2 4.99% 37.49%

74

.

99

% Calculator 49.99% 37.49% 49.99% 62.49% Facebook 37.49% 2 4.99% 37.49%

74

.

99

% LED Flashlight 62.49% 49.99% 62.49% 62.49% A m azon Kindle 24.99% 12.49% 37.49% 87.49% Google M y Business 27.49% 2 4.99% 37.49% 87.49% Instagram 37.49% 2 4.99% 24.99%

74

.

99

% Netlfix 37.49% 2 4.99% 24.99%

74

.

99

% P an d o ra Radio 37.49% 2 4.99% 37.49% 87.49% Clash o f Clans 62.49% 49.99% 49.99% 49.99%

Table 2: S u m m a ry o f Results from the aof.py Tool W h e n Ran Against the A n droid Applications from Google Play Store.

The Google C hrom e, D igitalchem y Calculator, Facebook, A m a z o n Kindle, Google M y Business, Instagram , Netflix, a n d P an d o ra Radio applications all m atc h ed to th e Allatori Obfuscator, th e LED Flashlight application m atched to every obfuscator except Java Archive Grinder, and th e Clash of Clans application m atched to only Proguard. This result was expected because out o f all o f the obfuscators, Allatori is the m o st developed a n d widely used. It is also im p o rta n t to note th at no n e o f th e applications m atc h ed any of th e obfuscators completely. This result was also expected. W hile it is n o t k n o w n at this p o in t h o w any o f th e applications were obfuscated, it is assum ed th at all of th e m were. The obfuscation p ro g ra m used in each application can be predicted by th e evidence of th e obfuscation in th e applications.

(32)

8 Conclusion

Using a simple A n d ro id application an d building th e application with each o f th e four obfuscation program s, Proguard, Java Archive Grinder, Zelix KlassMaster, a n d Allatori, it was possible to find distinct fingerprints for each obfuscator. I fo u n d th at each o f these obfuscators h a d sim ilar obfus­ cation rules th at ended with similar results. The fingerprints for each obfuscator are sum m arized here:

P roguard’s fingerprint included rem oving th e BuildConfig Class, setting source_file_idx values to an invalid ID, rem oving annotations, positions, and locals, and ren a m in g variables to low er case letters.

Java Archive G rin d er’s fingerprint included setting source_file_idx values to an invalid ID, a n d re­ m oving annotations, positions, a n d locals from each o f th e classes.

Zelix KlassMaster’s fingerprint included rem oving th e BuildConfig class, positions, and locals, re­ n a m in g classes and variables to low er case letters, and encrypting all string literals.

Allatori’s fingerprint included changing th e access type o f classes, fields, a n d m e th o d s to a synthetic version, ren a m in g classes, a n d variables with u p p e r case letters. In addition, Allatori was able to change th e source_file_idx, annotations_off, positions, a n d locals to values th at n o longer correlate to the D alvik bytecode standard.

Proguard, Java Archive Grinder, a n d Zelix KlassMaster h a d sim ilar results for a few of th e obfusca­ tions. For instance, th ey all re n a m e d every variable to a lower case letter starting at th e beginning of th e alphabet. All three pro g ra m s m an a g e d to rem ove all inform ation from th e locals a n d positions for each class. Even th o u g h there are similarities betw een each of th e obfuscators, there was enough o f a difference to be able to identify which obfuscator was used.

W h e n r u n n in g th e aof.py tool against A n d ro id applications from th e Google Play Store, it was fo und th at Google Chrom e, Digitalchem y Calculator, Facebook, A m a z o n Kindle, Google M y Business, I n ­ stagram, Netflix, a n d P an d o ra Radio all m atc h ed to the Allatori O bfuscator while th e LED Flashlight m atc h ed to every obfuscator except Java Archive Grinder, a n d Clash o f Clans m atc h ed to Proguard. These results show th at m o st of th e A n droid applications used obfuscation th at is similar to what Allatori can do.

The obfuscator fingerprints th at were discovered could have been m anually im plem ented instead o f using one o f th e obfuscator program s. W h e n a fingerprint was found, there was a chance th at it was written by h a n d an d w ould result in a false positive. This m eans th at th e A n d ro id O bfuscation Fingerprint Tool, will always retu rn th e best educated guess, or whatever obfuscator fingerprint was th e m o st similar.

9 Further Research

The results of this project show ed th at th e four obfuscators e xam ined have overlapping fingerprints. How ever there are m a n y o ther obfuscators th at n e e d to be examined. F u rth e r studies could show th at there are o ther obfuscators th at have th e same or u n iq u e fingerprints. This project w ould benefit from m ore testing with a b roader range o f obfuscators.

(33)

M ost of th e fingerprint identifiers were located in th e class headers. W hile this discovery gives an indication o f w hat obfuscator was used, looking m o re in depth at th e bytecode could provide m ore detailed inform ation about what obfuscator was used. This could possibly include w hat string decryption m e th o d was called or if any of th e obfuscators used different

References

Related documents