4 Determining Metadata based Associations in Digital Evidence
4.2 f-FIA: Functional Forensic Integration Architecture
The functional Forensic Integration Architecture (f-FIA) is illustrated in Figure 4.1 and its layers are as follows:
1. Digital Evidence Layer
2. Digital Artifact Traversal & Metadata Parser Layer; and 3. Evidence Composition Layer.
83 f-FIA is component oriented and multi-layered (refer to subsections below) and is designed to integrate the functionalities provided by contemporary forensic and analysis tools to examine heterogeneous sources of digital evidence. Besides this, it is also designed to identify associations within and across sources of digital evidence to conduct analysis.
Figure 4.1 Block schematic of the functional Forensic Integration Architecture (f-FIA)
The architecture of f-FIA is consistent with forensic principles (maintaining data integrity and read-only access during the examination) and lends itself naturally to automation of forensic examination, while at the same time seamlessly integrating forensic examination with analysis. Its layered architecture is designed to allow scope for future extensions based on technological advances. As my work focuses on identifying associations amongst digital artifacts among sources of digital evidence, I focus on the Evidence Composition Layer and concentrate on methods to group related digital artifacts. I describe the different layers of the f-FIA in the sequel.
4.2.1 Digital Evidence Layer
The Digital Evidence layer provides binary abstractions of digital evidence sources that are part of an investigation. The media operated by this layer must comply with read only semantics to maintain integrity of data during an investigation. The functionality of this layer can be likened to
Evidence source #1 Evidence source #2 Evidence source #3 f-FIA Repository …
Digital Evidence Layer
Digital Artifact Traversal and Metadata Parser Layer
Digital artifact traversal Metadata
Parsers
Evidence Composition Layer: Metadata Equivalence and Association
Cross Referencing sub-layer Knowledge Representation sub-layer
Focus of this thesis
Developed by integrating existing technology
84 the binary (and possibly hexadecimal) data support extended by Encase, FTK and Sleuthkit to forensic images such as dd, EWF and AFF and so on.
4.2.2 Digital Artifact Traversal & Metadata Parser Layer
The Digital Artifact Traversal and Metadata Parser layer provides access to the artifacts in digital evidence and their metadata. The layer provides appropriate file system and/or schema support to the digital evidence sources for examination. For instance, in forensic file system images, the layer interprets the files, but in logs, the layer interprets the individual log records and in network packet captures, the layer interprets the individual packets. The functionalities of this layer are file system and schema support to examine the files in forensic disk images, processes in memory dumps, log records in log files and network packets in packet captures. Succinctly, this layer is responsible for providing suitable abstractions to the digital artifacts and their corresponding metadata present on each source as well as building indices for the same that can be utilized by the upper layer to determine associations in digital evidence.
In order to parse for metadata, the layer can determine an artifact’s application type based on which suitable metadata can be extracted. For example, in hard disk images, files and metadata carry their usual meaning. In log file and packet capture sources, the records and packets take on their attributes as metadata in addition to inheriting the metadata of the log file or network capture file. The functionality of this layer also includes the development of source traversal algorithms and metadata parsers according to the source and the specific application types. The output of the metadata extraction and the indexing process feed into the repository which is then used by the upper layer during analysis.
4.2.3 Evidence Composition Layer
The Evidence Composition layer is responsible for integrating information from various sources of evidence and composing the components into consistent and comprehensive evidentiary material to present to a forensics examiner. I achieve evidence integration at two levels; at the first level, by determining related evidence artifacts based on value matches to group them together during analysis and at the second level, by validating the consistency of grouped artifacts to determine relevant evidence. Therefore, this layer is composed of two sub-layers, the Cross Referencing sub-layer and the Knowledge Representation and Reasoning sub-layer.
The cross referencing sub-layer correlates content and metadata from the digital artifacts in the repository. The repository is capable of supporting data from multiple sources of digital evidence
85 (digital artifacts, indexed content and metadata). The repository can also support information gathered from external sources (e.g., identity related databases such as social security, bank accounts, driver’s license database, etc.) that are deemed to be relevant to the investigation. The internal architecture of the evidence composition layer is illustrated in Figure 4.2.
Figure 4.2 Internal architecture of the Evidence Composition Layer
4.2.3.1 Cross-Referencing Sub-layer
The Cross Referencing sub-layer is responsible for cross referencing content, including metadata, within and across digital evidence sources. It is the responsibility of this sub-layer to utilize the indices provided by the immediate lower layer to identify associations both on the same source as well as across multiple sources. Since the immediate lower layer abstracts each source by its artifacts and associated metadata, the cross-referencing sub-layer accesses each artifact through its metadata and determines value matches across artifacts, irrespective of the type of artifact. The functionality of this sub-layer is conceived in a technology-agnostic manner, in order to scale across arbitrary sets of digital evidence sources. The resulting sets of associated artifacts are stored in the repository for subsequent analysis. The cross referencing sub-layer can access data on the repository from multiple heterogeneous sources that can be deemed related to an investigation and consists of algorithms that aid in discovering the associations.
4.2.3.2 Knowledge Representation & Reasoning Sub-layer
The Knowledge Representation and Reasoning sub-layer is concerned with the logical validity of the digital evidence, based on the associations discovered. This sub-layer is responsible for
Digital Artifact Traversal and Metadata Parser Layer
Digital artifact traversal Metadata
Parsers
Evidence Composition Layer: Metadata Equivalence and Association
Cross Referencing sub-layer Knowledge Representation sub-layer
Metadata Association
Repository
Digital Artifact & Metadata Repository
86 determining causal relationships between one or more assertions that can be established based on the artifacts that are associated. Establishing causal relationships between artifacts can help in the identification of relevant evidence for user activity reconstruction as part of the forensic analysis. For instance, consider the evidence that an email was sent by user X with a file attachment. This introduces three distinct predicates as listed below.
1. User X was logged into that system when the email was sent. 2. User X was logged into the email account when the email was sent. 3. The file existed on the system from which the email was sent.
While each predicate can be independently verified based on the evidence available, it may be necessary to identify the associations between the following.
1. The email and the file attachment.
2. The email and the email server logs for user X’s email login.
3.
The system and the system access logs for user X’s system login.This requires that evidence be considered across heterogeneous sources and associated to establish causation and relevance during an investigation. To achieve this, this sub-layer consolidates the syntactic metadata associations and metadata equivalence relationships across sources to derive semantic inferences on sets of associated artifacts. A few examples are listed below.
1. If digital image files in the evidence match on one or more of their technical metadata, like EXIF metadata, then one may infer that the images were digital photographs captured with the same make and model of digital still camera.
2. If web browser logs indicate a file download whose metadata matches against a file in the user’s hard drive, one can infer that the file was not authored by the user. 3. If there are two records in a mail server for the same user at time instant T to
indicate simultaneous logins from both Sydney and Melbourne, the information leads to two mutually-exclusive assertions “The user was in Sydney at time T ” and “The user was in Melbourne at time T”.
In the last case, the login attempts in themselves cannot be treated as incriminating evidence. However, the assertions warrant further scrutiny since it is impossible that an individual was in both Sydney and Melbourne at the same time. Resolution of such conflicting assertions requires
87 that the examiner takes recourse to other (related) external databases – perhaps flight details and passenger manifest databases to determine if the individual travelled between the two cities immediately prior to time T. Correctly ordered timestamps become especially useful in such cases. Validating the correctness and accuracy of timestamps obtained from metadata on digital artifacts including related sources beyond evidence is an example of the reasoning process.
This sub-layer allows recording assertions and validating them by corroborating them within the scope of the data sources provided. Any evidence to the contrary is flagged and presented to the examiner. Besides assertion validation, often examiners need to repeatedly query the sources of digital evidence for information. Examples of such queries are “list the set of files that were modified on June 10th 2008 between 2:00 PM and 6:00 PM” or “list all HTTP sessions on the network capture with IP address X as the source”. This sub-layer’s architecture enables an examiner to query the digital evidence in this way and determine evidence associated with the query results simultaneously, without having to search for it.
To utilize such a framework, it is necessary to understand the implications of determining metadata based associations in digital evidence and develop a model to represent such associations for analysis. However, when heterogeneous sources need to be correlated for analysis, it is essential to characterize the homogeneity of a single source of digital evidence. This topic is addressed in the sequel.