4.2 Parameter Collection
4.3.4 Data Decoding
4.3.4.1 ASCII Strings
The parser evaluates each byte to confirm if a raw data value is an ASCII string. We assume that ASCII strings only contain printable characters (i.e., decimal value 32 to 127). If any byte is found to be outside of this range, the parser aborts string parsing, and moves on to another data type.
If 4-byte alignment isTrue, then the appropriate number of NULL bytes following the
string are removed from the record.
4.3.4.2 Integers
The default integer reconstruction for the parser is an unsigned 32-bit little endian integer. Known exceptions are listed below.
Anonymous DBMS#1. A signed 32-bit little endian integer is reconstructed.
Anonymous DBMS#2. A zero-compression key is used to reconstruct the number. In
this case, there are three main components to integer reconstruction:
1. The first byte represents the number of bytes needed to store the integer.
2. The second byte is a zero-compression key. For example, the value 192 means a 1- digit integer, 193 means a 2-digit integer, 194 means a 4-digit integer, and 195 means a 6-digit integer. Each time the zero-compression key increases by (past 193), the
number of digits for the integer increases by two. The integer zero uses the value 128 for the zero-compression key.
3. Any remaining bytes represent two digits of the integer, but 1 must be subtracted from that value. For example, the values 12 and 12 represent the integer 1111. If the resulting integer does not have a number of digits that corresponds to the zero- compression key, the the integer is padded to the right with the appropriate number of zeros.
4.3.4.3 Other Datatypes
Besides strings and integers, other datatypes often require DBMS-specific functions. A decoding function can be created by collecting the byte storage representation for a set of known synthetic values and determining the encoding. Once a decoding function is created for a specified DBMS datatype, an IF condition is added to the carver to detect the datatype; the DBMS is known since this information is included in the parameter file. When the IF condition is met by the carver, the bytes are passed to the respective decoding function.
Chapter 5
Standard Storage Format
5.1
Introduction
Database forensic carving tools have been proposed [33, 17, 86, 87, 91, 65], but incorporat- ing their output into an investigation remains difficult to impossible. The storage disparity between DBMSes and operating systems may well in fact be the main culprit for the stunted growth and limited applications of database forensics. We identified two major pieces cur- rently missing from the field of database forensics that have prevented its involvement in forensic investigations: 1) a standardized storage format, and 2) a toolkit to view and search database forensic artifacts.
Standard Storage Format A standard database forensic storage format would abstract
the specifics of DBMS storage engines for users unfamiliar with DBMS internals and guide the development of database carving tools. All DBMSes use their own storage engine. A standard storage format would allow users to view and search database forensic artifacts, generate reports, and develop advanced analytic tools without knowledge of storage engine specifics for any given DBMS. A uniform output for database carving tools would also allow these tools to be compared and tested against each other.
View and Search Toolkit A toolkit to view and search reconstructed DBMS artifacts
would allow investigators to easily interpret the artifacts. While database data is stored and queried through records in tables, the records alone do not accurately represent the forensic state of a database since this data is accompanied by a variety of metadata (e.g., byte offset of the record). Investigators need a way to view how the metadata and table records are interconnected.
In this chapter, we describe a comprehensive framework to represent and search database forensic artifacts. A preliminary version of this framework was implemented for this chap-
Tool A Tool B
DB3F
DF-Toolkit End User
Advanced Applications Database Carving Tool C Digital Evidence Disk Images RAM Snapshots
Storage & Filtering
Figure 5.1: The role of DB3F and DF-Toolkit in database forensics.
ter, which includes a format specification document and an evidence querying application. Section 5.2 considers the work related to our framework, and Section 5.3 defines our frame- work requirements. Next, Sections 5.4 and 5.5 present the two main contributions of the research work presented in this chapter, which are the following:
1. We define a storage format and data abstraction for database forensic artifacts called the Database Forensic File Format (DB3F). Section 5.4 provides a detailed descrip- tion of DB3F. The DB3F definition can be downloaded from our research group website:
http://dbgroup.cdm.depaul.edu/DF-Toolkit.html
2. We describe a toolkit called the Database Forensic Toolkit (DF-Toolkit) to view and search data stored in DB3F. Along with a description of DF-Toolkit, Section 5.5 presents a user interface that implements DF-Toolkit. This user interface can be downloaded from our research group website:
http://dbgroup.cdm.depaul.edu/DF-Toolkit.html
Figure 5.1 displays how DB3F and DF-Toolkit are used in database forensic analysis. Database carving tools return output in DB3F. DB3F files are filtered and searched us- ing DF-Toolkit, which stores filtered results in DB3F. DB3F files are then either directly reported to the end user or passed to further advanced analytic applications.
The introduction of a standardized intermediate format and a comprehensive toolkit for database forensics benefits the community in two important ways. First, it streamlines the addition of new tools on either side of the flow chart in Figure 5.1. With the introduction of a new database carving tool (e.g., Tool D), users would benefit from all available advanced applications that support DB3F. Similarly, any newly developed advanced application can trivially process output from any carving tool that supports DB3F output. This intermedi- ary approach is conceptually similar to Low Level Virtual Machine (LLVM) [44], a collec- tion of reusable compiler technologies that defines a set of common language-independent
primitives. The second benefit is the explicit documentation and built-in reproducibility of the analyses process and outcomes, bringing a scientific approach to digital forensics. Garfinkel [24] emphasized the lack of scientific rigor and reproducibility within the field; although in [24] he focused on developing standard corpora, a standard storage format as well as a querying and viewing mechanism is also necessary to achieve these goals. Rather than building custom analytic tools (e.g., [85]), DF-Toolkit’s approach will offer a well-documented querying mechanism based on defined standard fields in DB3F. Any query report can be easily reproduced by another party or re-tested via a different database carver.
This chapter serves as the foundation for a vision of a complete system with full sup- port for database forensics and integration with other forensic tools. Section 5.6 discusses planned improvements for future developments to our framework, including advanced an- alytic applications.