Malware:
Prevenire (e curare) con strumenti open-source
Roberto Paleari
Linux Day 2011 Modena, 22 ottobre 2011
Who am I?
PhD
Research activities I Malware analysis
I Systems & applications security
Laboratorio di Sicurezza e Reti (LaSeR)
Senior security consultant Penetration testing Reverse engineering
The rise of malicious code
2003 2004 2005 2006 2007 2008 2009 2010 3,500,000 7,000,000 10,500,000 14,000,000 17,500,000 21,000,000 Numb er of new signatures PeriodThe rise of malicious code
Today malware is a very lucrative activity
The rise of malicious code
Who lasts longer earns the most . . .
Long lasting malware
Spread/replicate fast
Hide the presence on the system
Obfuscate the code (e.g., encryption,
polymorphism, metamorphism)
Traditional signature-based approaches arenot effective anymore!
Long lasting malware
Spread/replicate fast
Hide the presence on the system
Obfuscate the code (e.g., encryption,
polymorphism, metamorphism)
Traditional signature-based approaches arenot effective anymore!
Wait. . .
+
??
Wait. . .
+
??
Why malware infects ?
Wait. . .
+
??
Wait. . .
+
??
So, why should we care about malware?
Linux isnot free from malware!
The majority of embedded devices runs Linux
Many Linux users use Windows emulators (e.g., WINE) or virtual machines
How malware is detected today?
Application code
+ +
+ +
How malware is detected today?
Application code
+ +
+ +
+ +
A signature is a sequence of bytes that identifies a malicious sample
How malware is detected today?
Application code
+ +
+ +
+ +
Anti-malware tools are shipped with a database of known signatures
How malware is detected today?
+ +
+ +
+ +
When a signature is found, the application is considered to be infected
Why signature-based detection is not effective anymore?
E.g., Packing
Malicious code is hidden behind 1+ compression layers
Unpacking is performed at run-time
Malicious code
>80% malware samples are packed
Why signature-based detection is not effective anymore?
E.g., Packing
Malicious code is hidden behind 1+ compression layers
Unpacking is performed at run-time Malicious
code
>80% malware samples are packed
∼200 packer families, with 2000 variants
Why signature-based detection is not effective anymore?
E.g., Packing
Malicious code is hidden behind 1+ compression layers
Unpacking is performed at run-time
Malicious code Malicious code Unpacking routine
>80% malware samples are packed
Why signature-based detection is not effective anymore?
E.g., Packing
Malicious code is hidden behind 1+ compression layers
Unpacking is performed at run-time
Malicious code Malicious code Unpacking routine Unpacking routine
>80% malware samples are packed
∼200 packer families, with 2000 variants
Why signature-based detection is not effective anymore?
E.g., Packing
Malicious code is hidden behind 1+ compression layers
Unpacking is performed at run-time
Malicious code Malicious code Unpacking routine Unpacking routine
>80% malware samples are packed
Current trend for malware analysis and detection
Static analysis is either too onerous or impossible
(malware is obfuscated & self-modifying)
Dynamic, behavior-based malware analysis
Run the suspicious program and monitor its execution
Does itbehavemaliciously?
Current trend for malware analysis and detection
Static analysis is either too onerous or impossible
(malware is obfuscated & self-modifying)
Dynamic, behavior-based malware analysis
Run the suspicious program and monitor its execution
Dynamic analysis
Run-time monitoring of closed-source applications
Suspicious applications should be alwaysexecuted inside a
controlled environment!
What kind of events can be observed?
Interaction with the environment(e.g., files, networking
activity, library functions)
Interaction with the OS (e.g., system calls)
Low-level information(e.g., registers and memory values)
Dynamic analysis
Run-time monitoring of closed-source applications
Suspicious applications should be alwaysexecuted inside a
controlled environment!
What kind of events can be observed?
Interaction with the environment(e.g., files, networking
activity, library functions)
Interaction with the OS(e.g., system calls)
Dynamic analysis
Tools
All Linux distribution include several run-time monitoring tools I lsof,netstat,/proc/<pid>/*,tcpdump, . . .
I ltrace
I strace
Dynamic analysis
Tools
All Linux distribution include several run-time monitoring tools I lsof,netstat,/proc/<pid>/*,tcpdump, . . .
I ltrace
I strace
$ ltrace /bin/ls 2>&1 | head -n 20
__libc_start_main(0x804e5f0, 1, ... <unfinished...>
setlocale(6, ‘‘) = it_IT.ISO-8859-15@euro
bindtextdomain(coreutils, /usr/share/locale) = /usr/share/locale
textdomain(coreutils) = coreutils __cxa_atexit(0x8051860, 0, 0, 0xb7f01ff4, 0xbf95cf98) = 0 isatty(1) = 0 getenv(QUOTING_STYLE) = NULL ... getenv(BLOCK_SIZE) = NULL getenv(COLUMNS) = NULL
Dynamic analysis
Tools
All Linux distribution include several run-time monitoring tools I lsof,netstat,/proc/<pid>/*,tcpdump, . . .
I ltrace I strace
$ strace /bin/ls 2>&1 | head -n 10
execve(/bin/ls, [/bin/ls], [/* 34 vars */]) = 0
brk(0) = 0x8e93000
access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xb7f74000
access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file)
open(/etc/ld.so.cache, O_RDONLY) = 3
fstat64(3, st_mode=S_IFREG|0644, st_size=58489, ...) = 0 mmap2(NULL, 58489, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f65000
close(3) = 0
access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file)
...
Dynamic analysis
Tools
All Linux distribution include several run-time monitoring tools I lsof,netstat,/proc/<pid>/*,tcpdump, . . .
I ltrace I strace
. . . and what about ?
Most of these tools have a counterpart for
For a nifty Windows system call tracer, try WUSSTrace :-)
Dynamic analysis
Do you want something more?
Introducing ProcessTap
Adynamic tracing framework
Leverages dynamic binary instrumentation to intercept the events of interest
ProcessTap scripts are written in Python!
Dynamic analysis
Do you want something more?
Introducing ProcessTap
Adynamic tracing framework
Leverages dynamic binary instrumentation to intercept the events of interest
Dynamic analysis
ProcessTap: An example
What ProcessTap scripts look like?
1 #!/usr/bin/env processtap 2 3 include("stdlib.h") 4 5 @function_entry(function_name == "malloc") 6 def malloc_entry(ctx):
7 print "[F] >>> %s called from %.8x with argument %u" % \
8 (ctx.function_name, ctx.caller, ctx.args[0])
9
10 @syscall_exit(syscall_name >> ["open", "close"])
11 def fexit(ctx):
12 print "[S] <<< %s returning to %.8x" % (ctx.syscall_name, ctx.regs.RIP)
Dynamic analysis
ProcessTap: An example
What ProcessTap scripts look like?
1 $ ./examples/malloctrace.pcap -- /usr/bin/id
2 [*] Started parser (listening on 45352)
3 [*] Executable file: /usr/bin/id
4 [*] PTAP file: ./examples/malloctrace.pcap
5 [*] Loaded 344 system calls
6 [*] Parsing ’stdlib.h’ (123 functions)
7 [*] Loaded probes:
8 [*] function_entry
9 [+] (function_name == @malloc) malloc_entry
10 [*] syscall_exit
11 [+] ((syscall_name == 5) | (syscall_name == 6)) fexit
12 [*] Parsing ’/usr/bin/id’ [0000000008048114-000000000804f544]
13 [*] Parsing ’/lib/i386-linux-gnu/ld-2.13.so’ [00000000b5792000-00000000b57ae908]
14 ...
15 [S] <<< open returning to b771aef4
16 [S] <<< close returning to b771af2d
17 [*] Parsing ’/lib/i386-linux-gnu/libselinux.so.1’ [00000000b52ef000-00000000b530dc3c]
18 [*] Parsing ’/lib/i386-linux-gnu/i686/cmov/libc-2.13.so’ [00000000b5112000-00000000b526b978]
19 [*] Parsing ’/lib/i386-linux-gnu/i686/cmov/libdl-2.13.so’ [00000000b4bf6000-00000000b4bf9078]
20 ...
Try it!
Limitations of dynamic approaches
Dynamic solutions are not free from limitations, especially when dealing with malicious applications
Limitations of dynamic approaches
User-level approaches are not enough
often includes kernel-level components In these situations, user-level anti-malware solutions are ineffective
Non-transparency The analysis tool can be detected
If detects the analyzer, it behaves like
Limitations of dynamic approaches
User-level approaches are not enough
often includes kernel-level components In these situations, user-level anti-malware solutions are ineffective
Non-transparency The analysis tool can be detected
Limitations of dynamic approaches
High run-time overhead
End hosts have strict real-time constraints If the analysis takes too much, the detector assumes a suspicious program is
How to remediate an infected system? Remediation procedures included in
anti-malware tools are often impecise Can we generate these procedures automatically?
Limitations of dynamic approaches
High run-time overhead
End hosts have strict real-time constraints If the analysis takes too much, the detector assumes a suspicious program is
How to remediate an infected system? Remediation procedures included in
anti-malware tools are often impecise Can we generate these procedures automatically?
Transparent and efficient analysis
How to monitor the execution of a suspicious program?
(worst-case scenario: kernel-level malware)
Kernel-based analysis Out-of-the-box analysis
Transparent and efficient analysis
How to monitor the execution of a suspicious program?
(worst-case scenario: kernel-level malware)
Transparent and efficient analysis
How to monitor the execution of a suspicious program?
(worst-case scenario: kernel-level malware)
Kernel-based analysis Out-of-the-box analysis Roberto Paleari Malware: Prevenire (e curare) con strumenti open-source 15
Kernel-based approaches
The analysis tool is implemented as a kernel module
To analyze kernel-level code, these approaches leverage another kernel-level module . . .
Kernel-based approaches
The analysis tool is implemented as a kernel module
To analyze kernel-level code, these approaches leverage another kernel-level module . . .
. . . it is like a dog chasing its tail!
Out-of-the-box approaches
The analyzer leveragesVM-introspection techniques
The target system must be already running inside a VM
can detect when it is running inside a VM!
Out-of-the-box approaches
The analyzer leveragesVM-introspection techniques
The target system must be already running inside a VM
can detect when it is running inside a VM!
Source: A fistful of red-pills (R. Paleari, L. Martignoni, G. Fresi Roglia, D. Bruschi) Roberto Paleari Malware: Prevenire (e curare) con strumenti open-source 17
How to analyze kernel-level malware?
Kernel-based analysis Out-of-the-box analysis
?
How to analyze kernel-level malware?
Exploit hardware support for virtualization to
achieve both
efficiency
and
transparency
Hardware-assisted virtualization in a nutshell (Intel VT-x)
App App App Kernel R3 R0 App App App Kernel Hypervisor R3 R0 Ro ot mo deHardware-assisted virtualization in a nutshell (Intel VT-x)
App App App Kernel R3 R0 App App App Kernel Hypervisor R3 R0 Ro ot mo deHardware-assisted virtualization in a nutshell (Intel VT-x)
App App App Kernel R3 R0 App App App Kernel Hypervisor R3 R0 Ro ot mo deThe OS needs not to be modified
The hardware guarantees transparency & isolation Minimal overhead
HyperDbg
An overview
Operating system kernel
User mode Kernel mode User process User process Non-root mode Root mode Framework Analysis tool Non-root mode Root mode Framework Analysis tool Exit Insp ect
HyperDbg
An overview
Operating system kernel
User mode Kernel mode User process User process Non-root mode Root mode Framework Analysis tool Non-root mode Root mode Framework Analysis tool Exit Insp ect
HyperDbg
An overview
Operating system kernel
User mode Kernel mode User process User process Non-root mode Root mode Framework Analysis tool Non-root mode Root mode Framework Analysis tool Exit Insp ect
The analyzed OS needs not to be modifiedat all
(i.e., the approach can be applied to closed-source OSes)
HyperDbg
An overview
Operating system kernel
User mode Kernel mode User process User process Non-root mode Root mode Framework Analysis tool Non-root mode Root mode Framework Analysis tool Exit Insp ect
HyperDbg
An overview
Operating system kernel
User mode Kernel mode User process User process Non-root mode Root mode Framework Analysis tool Non-root mode Root mode Framework Analysis tool Exit Insp ect
At the end of the analysis, the infrastructure
can be removed on-the-fly
HyperDbg
Implementation
User interface
We cannot rely on the guest OS graphic libraries
A small VGA driver to interact with the system’s video card The driver is neither OS nor hardware dependent
User interaction
An user can activate HyperDbg by pressing an hot-key
In non-root mode keystrokes are intercepted by leveraging VT-x
functionalities(i.e., IOOperationPort events)
HyperDbg
Implementation
User interface
We cannot rely on the guest OS graphic libraries
A small VGA driver to interact with the system’s video card The driver is neither OS nor hardware dependent
User interaction
An user can activate HyperDbg by pressing an hot-key
In non-root mode keystrokes are intercepted by leveraging VT-x
functionalities(i.e., IOOperationPort events)
In root mode a simple driver reads the keystrokes
HyperDbg
Try it!
http://code.google.com/p/hyperdbg
Remediation of a system infected by a malware
Remediation of a system infected by a malware
To reinstall the system or to remediate the infection?
A sample malware
1. Generates random file and key names
I file = po + rand() + rand() + rand() + .exe
I key = rand() % 2 ? qv : vq
2. Drop malicious exe
I c:\windows\ + file
3. Start the new exe at boot
I KEY LOCAL MACHINE + ...\Windows\CurrentVersion\Run\ +
key, file
4. Infect system dll
I user32.dll
5. Hijack network traffic
I c:\windows\system32\drivers\etc\hosts, www.google.com, www.citibank.com
Sound and complete remediation of malware infection
1. Generates random file and key names
I file = po + rand() + rand() + rand() + .exe
I key = rand() % 2 ? qv : vq
2. Drop malicious exe Remove malicious exe
I c:\windows\ + file
3. Start the new exe at boot Remove registry key
I KEY LOCAL MACHINE + ...\Windows\CurrentVersion\Run\ + key, file
4. Infect system dll Restore original dll
I user32.dll
5. Hijack network traffic Remove malicious mappings
I c:\windows\system32\drivers\etc\hosts,
www.google.com, www.citibank.com
6. Delete main exe
Automatic generation of remediation procedures
Behavior monitoring High-level behavior analysis S 1 S 2 S 3 S 4 Behavior clustering B1 B2 B3 B4 Cluster generalization Remediation procedure generation C1 C2 C3 C1 C2 C3Automatic generation of remediation procedures
Behavior monitoring High-level behavior analysis S 1 S 2 S 3 S 4 Behavior clustering B1 B2 B3 B4 Cluster generalization Remediation procedure generation C1 C2 C3 C1 C2 C3Source: Automatic Generation of Remediation Procedures for Malware Infections (R. Paleari et al.)
Automatic generation of remediation procedures
Behavior monitoring High-level behavior analysis S 1 S 2 S 3 S 4 Behavior clustering B1 B2 B3 B4 Cluster generalization Remediation procedure generation C1 C2 C3 C1 C2 C3Automatic generation of remediation procedures
Behavior monitoring High-level behavior analysis S 1 S 2 S 3 S 4 Behavior clustering B1 B2 B3 B4 Cluster generalization Remediation procedure generation C1 C2 C3 C1 C2 C3Source: Automatic Generation of Remediation Procedures for Malware Infections (R. Paleari et al.)