• No results found

Automated Analysis and Deobfuscation of Android Apps & Malware.pdf

N/A
N/A
Protected

Academic year: 2021

Share "Automated Analysis and Deobfuscation of Android Apps & Malware.pdf"

Copied!
84
0
0

Loading.... (view fulltext now)

Full text

(1)

Automated Analysis and Deobfuscation of

Android Apps & Malware

Jurriaan Bremer @skier t

Freelance Security Researcher

(2)

Introduction

I Who am I?

I Student (University of Amsterdam)

I Freelance Security Researcher

I Cuckoo Sandbox Developer (Malware Analysis System)

(3)

Introduction

I Who am I?

I Student (University of Amsterdam)

I Freelance Security Researcher

(4)

Introduction

Android?

I Smartphones

I Runs custom Linux

I Millions of Devices

I Hundreds of thousands of applications

I etc..

(5)

Introduction

Android?

I Smartphones

I Runs custom Linux

I Millions of Devices

I Hundreds of thousands of applications

(6)

Android Applications

Android Applications?

I Application Package File (APK)

I Download from Google Play

I Zip file

I Some Metadata (Manifest, Images, ..)

I classes.dex

I All your code are belong to classes.dex

I More on this later.

I Resources

I Images

I Data files

I Native libraries

(7)

Android Applications

Android Applications?

I Application Package File (APK)

I Download from Google Play

I Zip file

I Some Metadata (Manifest, Images, ..)

I classes.dex

I All your code are belong to classes.dex

I More on this later.

I Resources

I Images

I Data files

(8)

Android Applications

Android Applications?

I Application Package File (APK)

I Download from Google Play

I Zip file

I Some Metadata (Manifest, Images, ..)

I classes.dex

I All your code are belong to classes.dex

I More on this later.

I Resources

I Images

I Data files

I Native libraries

(9)

Android Applications

Android Applications?

I Application Package File (APK)

I Download from Google Play

I Zip file

I Some Metadata (Manifest, Images, ..)

I classes.dex

I All your code are belong to classes.dex

I More on this later.

I Resources

I Images

I Data files

(10)

Running Code on Android

There are two ways.

I Running native libraries

I Extremely awesome

I This talk does not focus on native

(11)

Running Code on Android

There are two ways.

I Running native libraries

I Extremely awesome

(12)

Running Code on Android

There are two ways.

I Running native libraries

I Extremely awesome

I This talk does not focus on native

I Running Dalvik Bytecode

I Dalvik is Compiled Java I Dalvik != Java

I classes.dex

I (More on this later)

(13)

Dex File Format (I)

I Dalvik Executable Format

I classes.dex

I Container format to store Dalvik Bytecode with Metadata

I Various Data Pools

I Strings ”Hello World”

I Classes Ljava/lang/String; I Fields Ljava/lang/String;->value

I Prototypes (I)Ljava/lang/String;

I Lots of headers

I Complex Cross-references between fields and headers

I The Classname is a String

I A Prototype has a String as return value

(14)

Dex File Format (I)

I Dalvik Executable Format

I classes.dex

I Container format to store Dalvik Bytecode with Metadata

I Various Data Pools

I Strings ”Hello World”

I Classes Ljava/lang/String;

I Fields Ljava/lang/String;->value

I Prototypes (I)Ljava/lang/String;

I Lots of headers

I Complex Cross-references between fields and headers

I The Classname is a String

I A Prototype has a String as return value

I A method links to a Prototype, etc..

(15)

Dex File Format (I)

I Dalvik Executable Format

I classes.dex

I Container format to store Dalvik Bytecode with Metadata

I Various Data Pools

I Strings ”Hello World”

I Classes Ljava/lang/String;

I Fields Ljava/lang/String;->value

I Prototypes (I)Ljava/lang/String;

I Lots of headers

I Complex Cross-references between fields and headers

I The Classname is a String

I A Prototype has a String as return value

(16)

Dex File Format (II)

(17)

Dalvik Bytecode Example

public static void hello() {

System.out.println(”Hello AthCon”); }

->

sget-object v0, Ljava/lang/System;->out:Ljava/io/PrintStream; const-string v1, ”Hello AthCon”

invoke-virtual v0, v1,

Ljava/io/PrintStream;->println(Ljava/lang/String;)V return-void

(18)

Dalvik Bytecode Example

public static void hello() {

System.out.println(”Hello AthCon”); }

->

sget-object v0, Ljava/lang/System;->out:Ljava/io/PrintStream;

const-string v1, ”Hello AthCon” invoke-virtual v0, v1,

Ljava/io/PrintStream;->println(Ljava/lang/String;)V return-void

(19)

Dalvik Bytecode Example

public static void hello() {

System.out.println(”Hello AthCon”); }

->

sget-object v0, Ljava/lang/System;->out:Ljava/io/PrintStream; const-string v1, ”Hello AthCon”

invoke-virtual v0, v1,

Ljava/io/PrintStream;->println(Ljava/lang/String;)V return-void

(20)

Dalvik Bytecode Example

public static void hello() {

System.out.println(”Hello AthCon”); }

->

sget-object v0, Ljava/lang/System;->out:Ljava/io/PrintStream; const-string v1, ”Hello AthCon”

invoke-virtual v0, v1,

Ljava/io/PrintStream;->println(Ljava/lang/String;)V

return-void

(21)

Dalvik Bytecode Example

public static void hello() {

System.out.println(”Hello AthCon”); }

->

sget-object v0, Ljava/lang/System;->out:Ljava/io/PrintStream; const-string v1, ”Hello AthCon”

invoke-virtual v0, v1,

Ljava/io/PrintStream;->println(Ljava/lang/String;)V return-void

(22)

What’s your point?

I Decompiling is mostly trivial

I JEB - http://android-decompiler.com/

I Smali/Baksmali allows you to quickly modify code

I Based on .smali files, a wrapper around Dalvik bytecode

I Free and Open Source

https://code.google.com/p/smali/

(23)

What’s your point?

I Decompiling is mostly trivial

I JEB - http://android-decompiler.com/

I Smali/Baksmali allows you to quickly modify code

I Based on .smali files, a wrapper around Dalvik bytecode

I Free and Open Source

(24)

Let’s welcome Obfuscators

I Commercial solutions

I Make Reverse Engineering harder

I Make automated analysis harder (what to look at?)

I What can we do..?

I Deobfuscate the obfuscated code!

But first..

(25)

Let’s welcome Obfuscators

I Commercial solutions

I Make Reverse Engineering harder

I Make automated analysis harder (what to look at?)

I What can we do..?

I Deobfuscate the obfuscated code!

(26)

Let’s welcome Obfuscators

I Commercial solutions

I Make Reverse Engineering harder

I Make automated analysis harder (what to look at?)

I What can we do..?

I Deobfuscate the obfuscated code!

But first..

(27)

Let’s welcome Obfuscators

I Commercial solutions

I Make Reverse Engineering harder

I Make automated analysis harder (what to look at?)

I What can we do..?

I Deobfuscate the obfuscated code!

(28)

Introduction to Our Tools

readdex(1)

I Custom utility to read .dex files

I Not very strict

I Works in cases where traditional tools fail

I E.g., dexdump, dex2jar, sometimes even JEB

I (Will report JEB bugs later)

I Handles the following cases correctly

I Invalid checksum hashes (fails dexdump)

I Unused opcodes (fails dex2jar/dexdump)

I Invalid Data Pool Indices (dexdump/dex2jar)

I Unicode function names (IDA Pro?!)

I Etc..

(29)

Introduction to Our Tools

readdex(1)

I Custom utility to read .dex files

I Not very strict

I Works in cases where traditional tools fail

I E.g., dexdump, dex2jar, sometimes even JEB

I (Will report JEB bugs later)

I Handles the following cases correctly

I Invalid checksum hashes (fails dexdump) I Unused opcodes (fails dex2jar/dexdump)

I Invalid Data Pool Indices (dexdump/dex2jar)

I Unicode function names (IDA Pro?!)

(30)

Introduction to Our Libraries

I Dalvik Disassembler

I Basic Dalvik Emulator

I Supports most Dalvik Instructions

I Supports simple Java Classes (Strings, etc.)

I Dex File Parser

I Dex File Creator is Work in Progress

I Totalling more than 5kloc C (including readdex)

I Not to mention basic Python wrappers

I All of it will be Open Source soon (TM)

(31)

Introduction to Our Libraries

I Dalvik Disassembler

I Basic Dalvik Emulator

I Supports most Dalvik Instructions

I Supports simple Java Classes (Strings, etc.)

I Dex File Parser

I Dex File Creator is Work in Progress

I Totalling more than 5kloc C (including readdex)

I Not to mention basic Python wrappers

(32)

Introduction to Our Libraries

I Dalvik Disassembler

I Basic Dalvik Emulator

I Supports most Dalvik Instructions

I Supports simple Java Classes (Strings, etc.)

I Dex File Parser

I Dex File Creator is Work in Progress

I Totalling more than 5kloc C (including readdex)

I Not to mention basic Python wrappers

I All of it will be Open Source soon (TM)

(33)

Introduction to Our Libraries

I Dalvik Disassembler

I Basic Dalvik Emulator

I Supports most Dalvik Instructions

I Supports simple Java Classes (Strings, etc.)

I Dex File Parser

I Dex File Creator is Work in Progress

I Totalling more than 5kloc C (including readdex)

I Not to mention basic Python wrappers

(34)

Introduction to Our Libraries

I Dalvik Disassembler

I Basic Dalvik Emulator

I Supports most Dalvik Instructions

I Supports simple Java Classes (Strings, etc.)

I Dex File Parser

I Dex File Creator is Work in Progress

I Totalling more than 5kloc C (including readdex)

I Not to mention basic Python wrappers

I All of it will be Open Source soon (TM)

(35)

Introduction to Our Libraries

I Dalvik Disassembler

I Basic Dalvik Emulator

I Supports most Dalvik Instructions

I Supports simple Java Classes (Strings, etc.)

I Dex File Parser

I Dex File Creator is Work in Progress

I Totalling more than 5kloc C (including readdex)

I Not to mention basic Python wrappers

(36)

What’s next? This stuff is actually useful?

(37)
(38)

Class & Function Name Obfuscation

Used by for example Dexguard & Freedom.apk..

Welcome to China..

(39)

Class & Function Name Obfuscation

Used by for example Dexguard & Freedom.apk.. Welcome to China..

(40)

Class & Function Name Obfuscation

Used by for example Dexguard & Freedom.apk.. Welcome to China..

(41)
(42)

China?

I Unreadable identifiers

I Problematic when Modifying Dalvik Code (.smali)

I unchina.py to the rescue!

(43)

China?

I Unreadable identifiers

I Problematic when Modifying Dalvik Code (.smali)

(44)

unchina.py

I Walks the Dex file

I Enumerates all classes and methods

I Renames Chinese names with something readable

I ”zmagic ” + number

I (For now, can be changed of course..)

I Simple Python script using some hacky functionality

I Rewrites parts of the Dex file as needed

I Writes a new Dex file (still kind of experimental)

I Sounds easier than it is!

(45)

unchina.py

I Walks the Dex file

I Enumerates all classes and methods

I Renames Chinese names with something readable

I ”zmagic ” + number

I (For now, can be changed of course..)

I Simple Python script using some hacky functionality

I Rewrites parts of the Dex file as needed

I Writes a new Dex file (still kind of experimental)

(46)

unchina.py Demo

Demo of Unchina.py..

(47)

Obfuscated Strings (I)

Used by for example Dexguard, Whatsapp.apk, Freedom.apk

I Instead of using Hardcoded Strings

I Build strings up at runtime

I Makes it harder to analyze

I Strings usually have meaningful information

I (Function names, Debug information, URLs, etc.)

I More code in the binary

I Normally one string

(48)

Obfuscated Strings (I)

Used by for example Dexguard, Whatsapp.apk, Freedom.apk

I Instead of using Hardcoded Strings

I Build strings up at runtime

I Makes it harder to analyze

I Strings usually have meaningful information

I (Function names, Debug information, URLs, etc.)

I More code in the binary

I Normally one string

I Now entire functions for decoding, function calls, etc..

(49)

Obfuscated Strings (I)

Used by for example Dexguard, Whatsapp.apk, Freedom.apk

I Instead of using Hardcoded Strings

I Build strings up at runtime

I Makes it harder to analyze

I Strings usually have meaningful information

I (Function names, Debug information, URLs, etc.)

I More code in the binary

I Normally one string

(50)

Obfuscated Strings (I)

Used by for example Dexguard, Whatsapp.apk, Freedom.apk

I Instead of using Hardcoded Strings

I Build strings up at runtime

I Makes it harder to analyze

I Strings usually have meaningful information

I (Function names, Debug information, URLs, etc.)

I More code in the binary

I Normally one string

I Now entire functions for decoding, function calls, etc..

(51)

Obfuscated Strings (II)

We want to reconstruct the obfuscated strings

I Use our Simple Dalvik Emulator

I Combined with some heuristics (in the future)

(52)

Obfuscated Strings (II)

We want to reconstruct the obfuscated strings

I Use our Simple Dalvik Emulator

I Combined with some heuristics (in the future)

I For now a bit hardcoded..

(53)

Three different String Obfuscation examples

I Whatsapp.apk

I Freedom.apk

(54)

Whatsapp (I)

#1 - Whatsapp.apk

I Defines <clinit>for lots of classes

I Class Initialization function

I Called when the class is being loaded

(55)
(56)

Whatsapp (III)

I We emulate the method

I Intercept the sput-object instruction

I sput-object v0, mb->z:Ljava/lang/String;

I ”Assign Static Class Variable”

I We now have the deobfuscated string

I (or multiple strings, in some cases)

I Roughly 5000 strings deobfuscated!

(57)

Whatsapp (III)

I We emulate the method

I Intercept the sput-object instruction

I sput-object v0, mb->z:Ljava/lang/String;

I ”Assign Static Class Variable”

I We now have the deobfuscated string

I (or multiple strings, in some cases)

(58)

Whatsapp (III)

I We emulate the method

I Intercept the sput-object instruction

I sput-object v0, mb->z:Ljava/lang/String;

I ”Assign Static Class Variable”

I We now have the deobfuscated string

I (or multiple strings, in some cases)

I Roughly 5000 strings deobfuscated!

(59)

Freedom (I)

#2 - Freedom.apk

I Has xor decryption methods

(60)

Freedom (II)

(61)

Freedom (III)

I The xor decryption methods have a specific signature

I Their prototype is always (B)Ljava/lang/String;

I (Accepts an 8bit integer, returns a String.)

I We scan every method in the Dex file

I Function Call to Decryption Method ->Decrypt the String

(62)

Freedom (III)

I The xor decryption methods have a specific signature

I Their prototype is always (B)Ljava/lang/String;

I (Accepts an 8bit integer, returns a String.)

I We scan every method in the Dex file

I Function Call to Decryption Method ->Decrypt the String

I Roughly 600 strings deobfuscated!

(63)

Freedom (III)

I The xor decryption methods have a specific signature

I Their prototype is always (B)Ljava/lang/String;

I (Accepts an 8bit integer, returns a String.)

I We scan every method in the Dex file

I Function Call to Decryption Method ->Decrypt the String

(64)

Dexguard (I)

#3 - Dexguard is a Commercial Obfuscator As example we use an obfuscated Cyanide.apk

I Root exploit for some Motorala device

I (Thanks to Justin Case for the sample)

(65)
(66)

Dexguard (III)

I Dexguard initializes a lookup table on <clinit >

I Decrypts strings using this lookup table

I One dedicated decryption method

I Signature (III)Ljava/lang/String;

(67)

Dexguard (IV)

I Dexguard is a combination of Whatsapp and Freedom

I (With regards to techniques)

I First emulate <clinit >

I To obtain the lookup table

I Then scan every method in the Dex file

I Find function calls to the decryption method

(68)

Dexguard (IV)

I Dexguard is a combination of Whatsapp and Freedom

I (With regards to techniques)

I First emulate <clinit >

I To obtain the lookup table

I Then scan every method in the Dex file

I Find function calls to the decryption method

I Decrypt strings!

(69)

Dexguard (IV)

(70)

Rewriting the Dex file (I)

Rewriting Whatsapp, Freedom and Dexguarded Cyanide.apk

I We have the decrypted strings

I Obfuscated code always takes more instructions than

deobfuscated code

I Patching time..!

(71)

Rewriting the Dex file (II)

Some problems..

I We have to introduce new strings

I Extend the String Data Pool

(72)

Rewriting the Dex file (II)

(73)

Rewriting the Dex file (III)

Some problems..

I We have to introduce new strings

I Extend the String Data Pool

I Shuffle around half the Dex..

I Patch Dalvik instructions (straightforward)

I Remove obsolete functions

I String Decryption Methods are now unused

I Quite painful.. Dex file-wise

(74)

Rewriting the Dex file (III)

Some problems..

I We have to introduce new strings

I Extend the String Data Pool

I Shuffle around half the Dex..

I Patch Dalvik instructions (straightforward)

I Remove obsolete functions

I String Decryption Methods are now unused

I Quite painful.. Dex file-wise

I *Work in Progress*

(75)

Rewriting the Dex file (IV)

I We move all strings to EOF

I We fixup other data structures

(76)

Rewriting the Dex file (V)

Demo of reconstructing Dexguarded Cyanide.apk

(77)

How do we go from here?

I Generic Deobfuscation

I Based on Heuristics with Prototypes etc

I Classification based on stripped down binaries

I One binary can have many obfuscated representations

I Deobfuscate to something like the original binary

I Allows more accurate classification

I Did I mention plaintext strings?

(78)

How do we go from here?

I Generic Deobfuscation

I Based on Heuristics with Prototypes etc

I Classification based on stripped down binaries

I One binary can have many obfuscated representations

I Deobfuscate to something like the original binary

I Allows more accurate classification

I Did I mention plaintext strings?

I Plaintext Strings!

(79)

How do we go from here?

I Generic Deobfuscation

I Based on Heuristics with Prototypes etc

I Classification based on stripped down binaries

I One binary can have many obfuscated representations

I Deobfuscate to something like the original binary

I Allows more accurate classification

I Did I mention plaintext strings?

(80)

How do we go from here?

I Generic Deobfuscation

I Based on Heuristics with Prototypes etc

I Classification based on stripped down binaries

I One binary can have many obfuscated representations

I Deobfuscate to something like the original binary

I Allows more accurate classification

I Did I mention plaintext strings?

I Plaintext Strings!

(81)

Automated Malware Analysis!

Yesterday a new malware was found in the wild..

http://www.securelist.com/en/blog/8106/The_most_ sophisticated_Android_Trojan

(82)

High Expectations Asian Dad strikes again!

(83)

Backdoor.AndroidOS.Obad.a

I Seems like a pretty advanced android malware

I Multiple obfuscation layers (for strings)

I Got a start, but far from complete..

I *Quick Demo*

I Some Plaintext Strings..

I Tries to enable Bluetooth

I getSimSerialNumber

I ..

(84)

Questions?

Any questions?

Cheers to..

p1ra, nex‘, rep, blasty, thuxnder, diff-, jcase, George, jduck, ..

Interested in Android Security?

Join #droidsec on irc.freenode.org (thanks jduck!)

References

Related documents

Shell Helix Plus Extra SAE 5W-40 Shell International Petroleum Company. Shell Helix Plus S SAE 5W-40 Shell International

Effective data management is a focus area for organizations today, offering business-ready data while mitigating the risks of data migration failure, resolving data quality

The basic control, called CCSC [5, 7, 20] (circulating Current Suppression Control) in many publications does not manage the energy. It is based on a classical

Based on our whole income trust sample, Table 7, Panel C shows the number and corresponding percentage of income trusts that have worse performance in the post-IPO years than in

Tang, April 2012, Wind Turbine Gearbox Fault Detection using Multiple Sensors With Features Level Data Fusion Analytical techniques for performance monitoring of modern wind

First, the clear appreciation trend in the real exchange rates against the euro deflated with tradables prices in many CEECs proves that approximation of real exchange rates (in

Recall that, before proceeding to compute and decompose inequality measures, we need to assign a particular conditional quantile function to each individual according to

The principle of lex posterior derogat legi priori (in the case of a conflict, the applied legal provision later won over previous legal provision) also cannot be used in this