Drilling down to attacks on the binary code itself is the next stop. A debugger
is a piece of software that will take control of another program, and allow things like stopping at certain points in the execution, changing variables, and even changing the machine code on the fly in some cases. The debugger’s ability to do this may depend on if the symbol table is attached to the exe- cutable (for most binary-only files, it won’t be). Under those circumstances, the debugger may be able to do some functions, but you may have to do a bunch of manual work, like setting breakpoints on memory addresses rather than function names.
A decompiler(also called a disassembler) is a program that takes binary code, and turns it into some higher-level language, often assembly language. Some can do rudimentary C code, but the code ends up being pretty rough. A decompiler attempts to deduce some of the original source code from the binary (object) code, but a lot of information that programmers rely on during development is lost during the compilation process; for example, variable names. Often, a decompiler can only name variables with some non-useful numeric name while decompiling, unless the symbol tables are there.
The problem more or less boils down to you having to be able to read assembly code in order for a decompiler to be useful to you. Having said that, let’s take a look at an example or two of what a decompiler produces.
One commercial decompiler for Windows that has a good reputation is IDA Pro, from DataRescue (shown in Figure 4.2). It’s capable of decompiling code for a large number of processor families, including the Java Virtual Machine.
We’ve had IDA Pro disassemble pbrush.exe (Paintbrush) here. We’ve
scrolled to the section where IDA Pro has identified the external functions that pbrush.exe calls upon. For OSs that support shared libraries (like Windows and all the modern UNIXs), an executable program has to keep a list of
libraries it will need. This list is usually human readable if you look inside the binary file. The OS needs this list of libraries so it can load them for the pro- gram’s use. Decompilers take advantage of this, and are able to insert the names into the code in most cases, to make it easier for people to read.
We don’t have the symbol table for pbrush.exe, so most of this file is unnamed assembly code. A short, limited trial version of IDA Pro is available for download at:
www.datarescue.com/idabase/ida.htm
Another very popular debugger is the SoftICE debugger from Numega. Information about that product can be found at:
www.numega.com/drivercentral/default.asp
To contrast, I’ve prepared a short C program (the classic “Hello World”) that I’ve compiled with symbols, to use with the GNU Debugger (gdb). Here’s the C code:
#include <stdio.h> int main ()
{
printf ("Hello World\n"); return (0);
}
Then, I compile it with the debugging information turned on (the –g option.):
[ryan@rh ryan]$ gcc -g hello.c -o hello [ryan@rh ryan]$ ./hello
Hello World
I then run it through gdb. Comments inline:
[ryan@rh ryan]$ gdb hello GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"... (gdb) break main
I set a breakpoint at the “main” function. As soon as the program enters main, execution pauses, and I get control. The breakpoint is set before run.
Breakpoint 1 at 0x80483d3: file hello.c, line 5. (gdb) run
Running the program.
Starting program: /home/ryan/hello Breakpoint 1, main () at hello.c:5
5 printf ("Hello World\n"); (gdb) disassemble
Program execution pauses, and I issue the “disassemble” command.
Dump of assembler code for function main: 0x80483d0 <main>: push %ebp 0x80483d1 <main+1>: mov %esp,%ebp 0x80483d3 <main+3>: push $0x8048440
0x80483d8 <main+8>: call 0x8048308 <printf> 0x80483dd <main+13>: add $0x4,%esp
0x80483e0 <main+16>: xor %eax,%eax
0x80483e2 <main+18>: jmp 0x80483e4 <main+20> 0x80483e4 <main+20>: leave
0x80483e5 <main+21>: ret End of assembler dump.
This is what “hello world” looks like in x86 Linux assembly. Examining your own programs in a debugger is a good way to get used to disassembly listings.
(gdb) s
printf (format=0x8048440 "Hello World\n") at printf.c:30 printf.c: No such file or directory.
I then “step” (s command) to the next command, which is the printf call. Gdb indicates that it doesn’t have the printf source code to give any further details.
(gdb) s 31 in printf.c (gdb) s Hello World 35 in printf.c (gdb) c Continuing.