Page Header - Parameter Collection - Auditing database systems through forensic analysis

4.2 Parameter Collection

4.2.1 Page Header

The page header stores general information about the page and its data. User data is not stored in the page header. This section discusses how to deconstruct page header meta-

data that is commonly used by RDBMSes, which includes general page identifier (Sec- tion 4.2.1.1), structure identifier (Section 4.2.1.2), unique page identifier (Section 4.2.1.3), and record count (Section 4.2.1.4). Other metadata may exist in a page header, such as free space pointer, checksum, or logical timestamp, but we believe this metadata has little forensic significance or is not commonly used by RDBMSes. Experience has demonstrated that all page header metadata is located within the first 2% of a page. Therefore, when a page is evaluated in this section, we only consider the first 2% of bytes for the page.

Page header metadata is generally deconstructed by:

1. Locate the metadata typically using similarities or differences between pages. 2. Record the address of the metadata as a parameter.

4.2.1.1 General Page Identifier

The general page identifier is a sequence of non-NULLbytes shared by all pages in a partic-

ular DBMS version. The general page identifier is used by our parser to initially search for pages within a file. We make two assumptions about the general page identifier: 1) it must

be between two and three bytes in length and 2) at least one byte must be non-NULL, mean-

ing a decimal value not equal to zero. In addition to extracting the general page identifier as a parameter, the address of the general page identifier is also determined for use as a

parameter. These parameters must not be non-NULLfor parsing to be performed.

The following steps summarize how we determine the general page identifier and general page identifier address parameters:

1. Byte commonalities are found for all data pages containing synthetic data. Byte commonalities are found as follows:

(a) A dictionary is created for each page. The address of a byte within the page and the decimal value of a byte are stored as the key-value pairs.

(b) All dictionaries are then compared and the results are written to a new dictionary, called the results dictionary. If the values across all dictionaries match for a particular key, the byte value and its address are recorded in the results dictionary; otherwise, the address and the value -1 is added to the results dictionary. 2. Using the results dictionary, all sequences of two to three contiguous bytes are con-

3. The longest sequence with the minimum number of NULL characters is returned as the general page identifier and the address of this sequence from beginning of the page is returned as the general page identifier address. In case of a tie, the first sequence is chosen.

4.2.1.2 Structure Identifier

The structure identifier is a 16-bit or 32-bit integer that is shared by all pages that comprise an individual structure (e.g., table or index); this identifier is unique across structures. The structure identifier is ascertained by the structure identifier address and structure identifier size parameters. These parameters can be NULL. Note that most DBMSes store a structure identifier in the page header. However, if a structure identifier is not stored in the page header, one may be stored with each record in the row data of the page. In this instance, the structure identifier would be considered a piece of row data metadata.

In determining where the structure identifier is located, only the low byte of the identifier is considered. Although more rigorous comparisons could be utilized, such a process would require creating at least 256 structures and comparing them. Experience has shown that such a robust process for this identification is unnecessary. The following steps summarize how we determine the structure identifier address and structure identifier size:

1. Byte commonalities are found using the process described in Section 4.2.1.1; however, only pages from the same table are used to determine candidate identifiers. 2. The first byte of common sequences are compared across structures. If they differ

and either that byte or the next one is non-null, then the byte position is returned. 3. When sequences of bytes across all pages are found to meet the conditions, the ad-

dress of these sequences is returned as structure identifier address, and the number of high bytes plus the low byte is returned as structure identifier size.

4.2.1.3 Unique Page Identifier

The unique page identifier is a 32-bit integer that is unique for each page across the entire database or within a file. The unique page identifier is located using the unique page identifier address parameter, which is the address of the unique page identifier. This parameter can be NULL in the case there is no unique page identifier.

The unique page identifier address is determined as follows. Byte differences are found across all data pages that comprise a single synthetic table (since the unique page

identifier need only be unique per structure). To reduce runtime, we currently consider differences for the first three bytes and assume the fourth byte. A fourth byte can be confirmed,

but 2563 (or 16M) pages are needed. Currently then, each three byte sequence is tested for

uniqueness among all of the data pages. Additionally, the identifier must increase as page address within the file increases. When a sequence of bytes across all pages is found to meet these conditions, the address of these sequences is returned as unique page identifier address.

4.2.1.4 Record Count

The record count is a 16-bit number that represents the number of active records stored within the page. The record count is located using the record count address.

The record count address is determined using the following process. Each page containing synthetic table data is searched for records. In particular, the synthetic data contains unique string data that conforms to a specific pattern, e.g., ‘Curly0001’, ‘Curly0002’, .... The count of these strings indicates the number of records on that page. The page header is then searched for a 16-bit number representing that count. This process is repeated on several table data pages in order to confirm that the location of the record count is consis- tent across pages, thereby confirming that the actual position of the record count has been located.

In document Auditing database systems through forensic analysis (Page 55-58)