Specialized Knowledge - Procedural Aspects of Understanding Programs

5. Semi-Structured Interviews with Subject Matter Experts

5.6 Procedural Aspects of Understanding Programs

5.7.2 Specialized Knowledge

background knowledge. One of the things many of the SMEs pressed upon was the vast amount of knowledge that is required to be good at reverse engineering programs. The SMEs reported that reverse engineers require knowledge from most of the areas involved with computer science. The primary knowledge areas identified from analysis of the interview responses are presented in Table 13.

Apart from the general knowledge involved in reverse engineering, the SMEs also indicated specialized knowledge which they believe separates experts from novices. These areas of domain-specific expertise are presented in Table 14 and the findings from the SME interviews related to these knowledge areas are discussed in the rest of this section.

5.7.2.1 Translating from Assembly Into Higher-Level Languages. The SMEs identified the knowledge of and facility with assembly language as one of the most important components of a reverse engineer’s practical knowledge. Sequences of assembly language instructions comprise the major data representation that reverse engineers deal with. One of the most helpful capabilities is that expertise in assembly

language allows reverse engineers the ability to see common patterns in program code and quickly translate these patterns to higher-level representations.

The SMEs reported having a fluidity with this process which they gained through experience reverse engineering code. Experts have a built-up mental repository of patterns or “plans” which them connect a sequence of assembly language instructions in the task environment to a representative representation in a higher-level programming language.

Understanding how to translate from assembly language to a higher-level representation also requires understanding the target computer processing unit (CPU) architecture in depth. One must know the instruction set, understand the common uses of different instructions and opcodes, be able to notice when a pattern represents a compiler optimization or something anomalous in the code.

Knowledge about computer architecture theory and basics can be gained through advanced undergraduate and graduate computer science courses. Sometimes the courses include hands-on coursework (often with simpler fixed-length reduced instruction set architectures). More detailed knowledge is specific to a particular processor, so many reverse engineers learn this by studying reference manuals for the processor of interest such as the Intel Architecture Manuals [97] while reverse engineering programs. The SMEs mentioned that another way to gain this pattern recognition capability was to write small programs in a higher-level language, compile them, dis-

Table 14 Specialized Knowledge Areas. Knowledge Area

Translating from assembly language into higher-level languages System API functionality

System internals knowledge (processes, I/O, synchronization, etc.) How compilers generate machine code

Classes of vulnerabilities and exploits Knowledge of and recognition of malware

assemble the compiled code, and read through the assembly-level translations while comparing it to what was written.

5.7.2.2 System API Functionality. Knowledge of a system’s application programming interface (API) is essential to understanding the behaviors of a program. Nearly all programs use the operating system’s API at some level to access the input and output (I/O) functionality of the system. It is the system’s API that allows graphics and message box windows to be displayed to the screen, file operations, security functions, process creation, and more.

An operating system’s API is specific to that operating system architecture, and college courses in computer science or computer engineering do not usually prepare a person with this knowledge. This type of knowledge is also gained through experience, or by reading specialized texts in software development and performing the exercises found in those texts, such as Petzold [149]. The SMEs reported gaining knowledge of the operating system APIs through reverse engineering programs that use system calls, or reverse engineering the operating system functions themselves to verify what operations they perform.

5.7.2.3 System Knowledge. System knowledge consists of knowledge about the operating system internals and software architecture of a system. It includes an understanding of how the entire ecosystem surrounding the target program works. This knowledge encompasses an understanding of the internal structures and functions of the operating system, and how the heap, stack and individual stack frames are laid out. It also includes knowledge of the location of different kernel data structures in memory and how to access their contents. System knowledge includes an understanding of how the processor fetches and executes instructions, how the processor implements its functionality and how the program loader works to read the program into memory.

The SMEs reported that expert reverse engineers should understand how function callbacks, asynchronous events, and thread execution work “under the hood” rather than just the name of the system function that implements them. SMEs also outlined that skilled reverse engineers would have a detailed knowledge about how processes and threads work in the operating system, as well as the user and kernel levels in the operating system and how the protection rings provided by the processor are implemented.

The theoretical component of this knowledge can be acquired through upper- level undergraduate or graduate computer science courses in operating systems and computer architecture. However, more detailed knowledge is specific to a processor or operating system and is gained through experience working in or reverse engineering the operating system’s kernel. The SMEs also mentioned studying books like Russi- novich and Solomon [167] to understand the design and architecture of the operating system the programs run in.

5.7.2.4 How Compilers Generate Machine Code. Another important aspects of domain-specific knowledge is how programs are compiled into assembly instructions. The assembly instructions investigated by reverse engineers have been through the process of compilation from source code into machine code, and then for analysis have been converted back into assembly instructions by a disassembler. The SMEs said that knowledge about how compilers generate machine instructions can help someone recognize the difference between a compiler optimization and an anomalous or malicious code segment.

Additionally, compilers manipulate, parse, and arrange instructions differently, which makes the layout of assembly instructions from one program to another different. It can also change other assumptions like the function calling convention that is applicable to the program. The SMEs expressed learning the theoretical component of compiler knowledge from compiler textbooks like Aho et al. [1] and from college computer science or computer engineering courses. They described gaining

more applied knowledge of how each compiler works from experience compiling their own programs with one or more compilers and reading the assembly code that each compiler generates.

5.7.2.5 Classes of Vulnerabilities and Exploits. Vulnerability knowledge includes understanding the different types vulnerabilities that can exist in each piece of the computing infrastructure. This knowledge consists of knowledge about vulnerability classes, knowledge about how to develop exploits, and knowledge of the ways that different exploits can be leveraged on a system.

Understanding vulnerability classes can mean understanding the different phases in which vulnerabilities are generated in system development, what types of systems they affect, what types of errors lead to vulnerabilities, what attack scenarios use them, how they are exploited, and several other facets [122]. In particular, the SMEs identified that reverse engineers must understand in great detail how memory corruption vulnerabilities (like buffer overflows, integer overflows and underflows, null pointer dereferences, heap corruption, format string vulnerabilities, and so on) occur and how to prevent them.

The SMEs mentioned that expert reverse engineers working in vulnerability discovery or malicious software analysis should how to craft an exploit which takes advantage of a vulnerability. This can be as simple as the ability to generate a malicious input from a user prompt or as complicated as crafting a document which allows an attacker to gain elevated remote access when a user opens it in a document reader. The knowledge of how to exploit a system is important for both developing proof-of-concept exploits, and for knowing what constitutes an exploitable vulnerability rather than just a bug. The SMEs mentioned that it often takes developing a proof-of-concept exploit before the sponsor will accept that the system is, in fact, vulnerable to attack.

Knowledge of vulnerability classes also includes understanding the ways that attacks are carried out on different types of systems. This involves understanding how

attackers identify systems to attack, how they use vulnerabilities to craft exploits, and how they use exploits to attack the systems. It also involves understanding what type of advantage each type of attack gains an attacker.

Finally, understanding vulnerability classes involves understanding the systems in which the vulnerabilities are found. Hardware vulnerabilities often involve miscon- figurations and improper assumptions made during the design of a hardware component which allow attackers to gain access to write to or read from protected devices or segments of memory. Operating system vulnerabilities involve misplaced assumptions in the design of the operating system software or any of the software that the operating system puts trust in. Application vulnerabilities involve ways in which applications can be made to perform operations that violate the interests of users of these applications or system administrators. Web-based vulnerabilities involves understanding software implementation flaws where web-exposed code with logic errors can allow a person to access information and gain unauthorized privileges on a web server. Though all of these vulnerability types involve unauthorized access and con- trol, each requires its own extensive domain knowledge to for a person to be an expert at finding and analyzing these vulnerabilities.

5.7.2.6 Knowledge and Recognition of Malware. Reverse engineers working in malware analysis rely on a wide range of knowledge about the functionality malware can exhibit. SMEs described knowing about what malware does at a high- level, and also an ability to recognize and interpret malicious behaviors when they are seen in a program.

Malware knowledge involves understanding the behaviors, mechanisms, and manifestations of how rootkits, worms, viruses, Trojan horses, botnets, and other types of malicious software work. It also involves understanding how different classes of malware are implemented on the target operating system and processor.

SMEs also made reference to knowledge and use of good “lab practices.” Best practices in analyzing malware involve knowing and being able to apply memory

forensics to extract malicious software from a computer system without tainting the trail of evidence or losing essential data. It also means understanding the effects malware can have on a host system and what precautions must be taken in to protect the reverse engineers’ systems and networks from the effects of the malicious software. To the SMEs, understanding malware also means understanding how the malware is protected, and being able to get around those protections to analyze the program.

5.7.2.7 Knowledge of Software Protection Techniques and How They Work. Programs that employ software protections employ them to prevent reverse engineers from achieving their analysis goals. The SMEs referenced several instances of encountering software protections while analyzing malicious software or when facing an industrial protection employed to prevent piracy, tampering, or reverse engineering. An important element of reverse engineers’ specialized knowledge is in understanding software protections, how they work, how they can be defeated, and in understanding other ways to perform the same tasks when they cannot be defeated.

Protection knowledge referenced in the interviews involved understanding the different types of protections, which among others can include:

• Static analysis protections, • Dynamic analysis protections, • Obfuscations,

• System hardening protections, • Virtualization-based protections, • Packing, and

• Encryption.

Reverse engineers that analyze malware or software protections need to know about these areas and how to break or circumvent these protections when they stand in

the way of analysis. If breaking a protection is not feasible, the reverse engineer must know what kinds of actions these protections inhibit so they can generate alternate actions to accomplish roughly the same things.

5.7.3 Automaticity and Tacit Knowledge. The SMEs were prompted for

In document Understanding How Reverse Engineers Make Sense of Programs from Assembly Language Representations (Page 146-153)