Understanding Shellcode
25
Solutions in this Chapter:
■ Overview of Shellcode ■ The Addressing Problem ■ The Null Byte Problem ■ Implementing System Calls ■ Remote Shellcode
■ Local Shellcode
Introduction
Writing shellcode involves an in-depth understanding of assembly language for the target architecture in question. Usually, different shellcode is required for each version of each operating system in each hardware architecture.This is why public exploits tend to take advantage of a vulnerability on a highly specific target system and why a long list (albeit usually very incomplete) of target ver- sion/OS/hardware is included in these exploits. Within shellcode, system calls are used to perform actions.Therefore, most shellcode is operating as system- dependent because most operating systems use different system calls. Reusing program code in which the shellcode is injected is possible but difficult, and not often seen. As you saw in the previous chapter, it is always recommended that you first write the shellcode in C using system calls only and then write it in assembly.This forces you to think about the system calls used and facilitates translating the C program to them.
After an overview of the assembly programming language, this chapter looks at two common problems that shellcode must overcome: the addressing problem and the null byte problem. It concludes with some examples on writing both remote and local shellcode for the 32-bit Intel Architecture (IA32) platform (also referred to as x86).
An Overview of Shellcode
Shellcode is the code executed when a vulnerability has been exploited. Shellcode is usually restricted by size constraints, such as the size of a buffer sent to a vulnerable application, and is written to perform a highly specific task as efficiently as possible. Depending on the goal of the attacker, efficiency, such as the minimum number of bytes sent to the target application, may be traded off for the versatility of having a system call proxy, the added obfuscation of having polymorphic shellcode, the additional security of establishing an encrypted tunnel, or a combination of these and/or other properties.
From the hacker’s point-of-view, having accurate and reliable shellcode is a requirement for performing any real-world exploitation of a vulnerability. If the shellcode isn’t reliable, the remote application or host could potentially crash. An administrator almost certainly will wonder why a full system crash occurred and will attempt to track down the problem; this is certainly not ideal for anony- mous or stealth testing of a vulnerability. Furthermore, the unreliable shellcode or exploit could corrupt the memory of the application in such a way that the application remains running but must be restarted in order for the attacker to exploit the vulnerability. In production environments, this restart could take place months later during a scheduled downtime or application upgrade.This upgrade could fix the vulnerability and thus remove the attacker’s access to the organization.
From a security point-of-view, accurate and reliable shellcode is just as crit- ical. In legitimate penetration testing scenarios, it is a requirement because a customer would certainly be unhappy if a production system or critical applica- tion were to crash during testing.
The Tools
During the shellcode development process, you will need to make use of many tools to write, compile, convert, test, and debug the shellcode. Understanding how these tools work will help you become more efficient in creating shellcode. The following is a list of the most commonly used tools, with pointers to more information and downloads.
■ NASM The NASM package contains an assembler named nasm and a disassembler named ndisasm.The nasm assembly syntax is very easy to understand and read and therefore is often preferred above the AT&T
syntax. More information and NASM downloads can be found on their homepage at http://nasm.sourceforge.net/.
■ GDB GDB is the GNU debugger. In this chapter, we will mainly use it to analyze core dump files. GDB can also disassemble functions of compiled code by just using the command disassemble <function name>. This can be very useful if you want to have a look at how to translate your C code to assembly language. More information about GDB can be found on the GNU Web site at www.gnu.org/.
■ ObjDump ObjDump is a tool used to disassemble files and obtain important information from them. Even though we don’t use it in the shellcode archive, it deserves some attention because it can be very useful during shellcode development. More information about ObjDump can be found on the GNU Web site at www.gnu.org/soft- ware/binutils/.
■ Ktrace The ktrace utility, available on *BSD systems only, enables kernel trace logging.The tool creates a file named ktrace.out, which can be viewed by using the kdump utility. Ktrace allows you to see all system calls a process is using.This can be very useful for debugging shellcode because ktrace also shows when a system call execution fails. More information about ktrace can be found on most *BSD-based operating systems by using the command man ktrace.
■ Strace The strace program is very similar to ktrace: it can be used to trace all system calls a program is issuing. strace is installed on most Linux systems by default and can also be found for other operating sys- tems such as Irix.The strace homepage can be found at
www.liacs.nl/~wichert/strace/.
■ Readelf readelf is a program that allows you to get all kinds of infor- mation about an ELF binary. In this chapter, we will use readelf to locate a variable in a binary and then use that variable within shellcode. This program is (like objdump) part of the GNU bintools package. More information about that package is available at www.gnu.org/soft- ware/binutils/.
The Assembly Programming Language
Every processor comes with an instruction set that can be used to write exe- cutable code for that specific processor type. Using this instruction set, you can assemble a program that can be executed by the processor.The instruction sets are processor-type dependent; you cannot, for example, use the assembly source of a program that was written for an Intel Pentium processor on a Sun Sparc platform. Because assembly is a very low-level programming language, you can write very tiny and fast programs. In this chapter, we will demonstrate this by
writing a 23-byte piece of executable code that executes a file. If you write the same code in C, the end result will be hundreds of times bigger because of all the extra data added by the compiler.
Also note that the core of most operating systems is written in assembly. If you take a look at the Linux and FreeBSD source codes, you will find that many system calls are written in assembly.
Writing programs in assembly code can be very efficient, but it also has many disadvantages. Large programs get very complex and hard to read. Also, because the assembly code is processor-dependent, you can’t port it easily to other platforms. It’s difficult to port assembly code not only to different proces- sors but also to different operating systems running on the same processor.This is because programs written in assembly code often contain hard-coded system calls—functions provided by the operating system (OS)—and these differ a lot with each OS.
Assembly is very simple to understand and instruction sets of processors are often well documented. Example 2.1 illustrates a loop in assembly language. Example 2.1 Looping in Assembly Language
1 start:
2 xor ecx,ecx
3 mov ecx,10
4 loop start
Analysis
■ Within assembly, you can label a block of code using a word. We did this at line one.
■ At line 2, we xor ecx with ecx. As a result of this instruction, ecx will become 0.This is the most proper way to clean a register before using it.
■ At line 3, we store the value 10 in our clean ecx register.
■ At line 4, we execute the loop instruction.This instruction takes the value of the ecx register and subtracts 1 from it. If the result of this sub- traction is not equal to 0 then a jump is made to the label that was given as the argument of the instruction.
The jmp instructions are also very useful in assembly.You can jump to a label or to a specified offset, as shown in Example 2.2.
Example 2.2 Jumping in Assembly Language
1 jmp start
Analysis
The first jump will go to the place where the start label is present, while the second jump will jump 2 bytes in front of the jmp call. Using a label is highly recommended because the assembler will calculate the jump offsets for you, which saves a lot of time.
To make executable code from a program written in assembly, you need an assembler.The assembler takes the assembly code and translates it in executable bits that the processor understands.To be able to execute the output as a pro- gram, you need to use a linker such as ldto create an executable object.The fol- lowing is the “Hello, world” program in C:
3 int main() {
4 write(1,"Hello, world !\n",15);
5 exit(0);
6 }
Example 2.3 shows the assembly code version of the C program. Example 2.3 The Hello, World Program in Assembly Language
1 global _start 2 _start: 3 xor eax,eax 4 5 jmp short string 6 code: 7 pop esi 8 push byte 15 9 push esi 10 push byte 1 11 mov al,4 12 push eax 13 int 0x80 14 15 xor eax,eax 16 push eax 17 push eax 18 mov al,1 19 int 0x80 20 21 string: 22 call code
23 db 'Hello, world !',0x0a
Analysis
Because we want the end result to be an executable for FreeBSD, we have added a label named “_start” at the beginning of the instructions in Example 2.3. FreeBSD executables are created with the ELF format and to make an ELF
file, the linker program seeks “_start” in the object that was created by the assembler.The “_start” label indicates where the execution has to start. For now, don’t worry too much about the rest of the code. It is explained in more detail later in this chapter.
To make an executable from the assembly code, create an object file first using the nasm tool and then make an ELF executable using the linker ld.The following commands can be used to do this:
bash-2.05b$ nasm -f elf hello.asm bash-2.05b$ ld -s -o hello hello.o
The nasm tool reads the assembly code and generates an object file of the type “elf ” that will contain the executable bits.The object file, which automati- cally gets the .o extension, is then used as input for the linker to make the exe- cutable. After executing the commands, you will have an executable named “hello”.You can execute it to see the result:
bash-2.05b$ ./hello Hello, world ! bash-2.05b$
The following example uses a different method to test the
shellcode/assembly examples.That C program reads the output file of nasm into a memory buffer and executes this buffer as though it is a function. So why not use the linker to make an executable? Well, the linker adds a lot of extra code to the executable bits in order to modify it into an executable program.This makes it harder to convert the executable bits into a shellcode string that can be used in example C programs, which will prove critical later on.
Have a look at how much the file sizes differ between the C Hello World example and the assembly example:
Example 2.4 Differing File Sizes
1 bash-2.05b$ gcc -o hello_world hello_world.c
2 bash-2.05b$ ./hello_world
3 Hello, world !
4 bash-2.05b$ ls -al hello_world
5 -rwxr-xr-x 1 nielsh wheel 4558 Oct 2 15:31 hello_world
6 bash-2.05b$ vi hello.asm
7 bash-2.05b$ ls
8 bash-2.05b$ nasm -f elf hello.asm
9 bash-2.05b$ ld -s -o hello hello.o
10 bash-2.05b$ ls -al hello
11 -rwxr-xr-x 1 nielsh wheel 436 Oct 2 15:33 hello
As you can see, the difference is huge.The file compiled from our C example is more then ten times bigger. If we only want the executable bits that can be executed and converted to a string by our custom utility, we should use different commands:
Example 2.5 Using Different Commands
1 bash-2.05b$ nasm -o hello hello.asm
2 bash-2.05b$ s-proc -p hello
3
4 /* The following shellcode is 43 bytes long: */
5 6 char shellcode[] = 7 "\x31\xc0\xeb\x13\x5e\x6a\x0f\x56\x6a\x01\xb0\x04\x50\xcd\x80" 8 "\x31\xc0\x50\x50\xb0\x01\xcd\x80\xe8\xe8\xff\xff\xff\x48\x65" 9 "\x6c\x6c\x6f\x2c\x20\x77\x6f\x72\x6c\x64\x20\x21\x0a"; 10 11
12 bash-2.05b$ nasm -o hello hello.asm
13 bash-2.05b$ ls -al hello
14 -rwxr-xr-x 1 nielsh wheel 43 Oct 2 15:42 hello
15 bash-2.05b$ s-proc -p hello
16 17 char shellcode[] = 18 "\x31\xc0\xeb\x13\x5e\x6a\x0f\x56\x6a\x01\xb0\x04\x50\xcd\x80" 19 "\x31\xc0\x50\x50\xb0\x01\xcd\x80\xe8\xe8\xff\xff\xff\x48\x65" 20 "\x6c\x6c\x6f\x2c\x20\x77\x6f\x72\x6c\x64\x20\x21\x0a"; 21 22
23 bash-2.05b$ s-proc -e hello
24 Calling code ...
25 Hello, world !
26 bash-2.05b$
So, the eventual shellcode is 43 bytes long and we can print it using our tool, s-proc, with the -pparameter and execute it using s-proc with the -e
parameter.You’ll learn how to use this tool later in the chapter.
Windows vs. Unix Assembly
Writing shellcode for Windows differs a lot from writing shellcode for Unix systems. In Windows, you have to use functions that are exported by libraries, while in Unix you can just use system calls.This means that in Windows you need exact pointers to the functions in order to use them and don’t have the luxury of calling a function by using a number, as is done in Unix.
Hardcoding the function addresses in the Windows shellcode is possible but not recommended. Minor changes to the system’s configuration may cause the shellcode (and thus your exploit) to fail. Windows shellcode writers have to use lots of tricks to get function addresses dynamically. Writing Windows shellcode is thus harder to do and often results in a very large piece of shellcode.
The Addressing Problem
Normal programs refer to variables and functions using pointers that are often defined by the compiler or retrieved from a function such as malloc, which is used to allocate memory and returns a pointer to this memory. If you write shellcode, very often you like to refer to a string or other variable. For example, when you write execve shellcode, you need a pointer to the string that contains the program you want to execute. Since shellcode is injected in a program during runtime, you will have to statically identify the memory addresses where it is being executed. For example, if the code contains a string, it will have to determine the memory address of the string before it can use it.
This is a big issue, because if you want your shellcode to use system calls that require pointers to arguments, you will have to know where in memory your argument values are located.The first solution to this issue is finding out the location of your data on the stack by using the “call” and “jmp” instructions. The second solution is to push your arguments on the stack and then store the value of the stack pointer ESP. We’ll discuss both solutions.
Using the “call” and “jmp”Trick
The Intel “call” instruction looks the same as a “jmp”, but this is not the case. When “call” is executed, it pushes the stack pointer (ESP) on the stack and then jumps to the function it received as an argument.The function that was called can then use “ret” to let the program continue where it stopped when it used call.The “ret” instruction takes the return address put on the stack by “call” and jumps to it. Example 2.6, Call and Ret, illustrates how “call” and “ret” are used in assembly programs.
Example 2.6 Call and Ret
1 main: 2 3 call func1 4 … 5 … 6 func1: 7 … 8 ret
Analysis
When the func1 function is called at line 3, the stack pointer in ESP is pushed on the stack and a jump is made to the func1 function.
When the func1 function is done, the ret instruction pops the return address from the stack and jumps to this address.This will cause the program to execute the instructions at line 4 and so on.
Okay, time for a practical example. Let us say we want our shellcode to use a system call that requires a pointer to a string as an argument and we want this string to be “Burb”. We can get the memory address of the string (the pointer) using the code in Example 2.7.
Example 2.7 Jmp 1 jmp short data 2 code: 3 pop esi 4 ; 5 data: 6 call code 7 db 'Burb'
Analysis
On line 1, we jump to the “data” section, and within the data section, we call the “code” function (line 6).The call results that the stack point, which points to the memory location of the line Burb, is pushed on the stack.
On line 3, we take the memory location of the stack and store it in the ESI register.This register now contains the pointer to our data.
You’re probably wondering how jmp knows where “data” is located. Well, jmp and call work with offsets.The compiler will translate “jmp short data” into something like “jmp short 0x4”.
The 0x4 represents the amount of bytes that have to be jumped.
Pushing the Arguments
The jmp/call trick to get the memory location of your data works great but makes your shellcode pretty big. Once you have struggled with a vulnerable program that only uses very small memory buffers, you’ll understand that the smaller the shellcode, the better. In addition to making the shellcode smaller, pushing the arguments will also make the shellcode more efficient.
Let’s say we want to use a system call that requires a pointer to a string as