2.3 Forensic tools
2.3.3 CFIT
The CFIT1is an integrated computer forensics tool developed by the DSTO, Department of Defence, Australia [48]. CFIT1provides efficient and flexible automated forensic methods for analyzing the content of data streams such as disk drives, network data, disks, and telecommunications call-data, thereby enabling investigators to discard data that are peripheral to their investigation. CFIT1provides a forensic problem-solving environment that integrates tools in a visual framework for investigating the unauthorized use of computer and network facilities. The main advantages of CFIT1are (1) the ability to integrate multiple interactive forensic tools into a common, easy-to-use visual framework; (2) the facility for adding new specialized forensic tools to the framework; and (3) the ability to capture the history of an investigation in a simple visual manner.
The basic investigative environment in CFIT1 is the case, in which investigators can work individually or as a team to solve one or more criminal cases. Networked multiple investigators can investigate a case at the same time using CFIT1. The CFIT1platform includes case management, forensic data stream access and manipulation, data visualization, and forensic processing. CFIT1 incorporates a two-dimensional visual language environment, calledPicasso, for graphically expressing a forensic case on a visual framework or workbench. Forensic tools that analyze the
case data can be dragged and dropped onto the workbench, interconnected, and executed. Investigators use the interactive visual workbench to undertake an investigation and share their results with other investigators working on the same case.
Forensic tools included in CFIT1 include a hard disk analyzer, file system analyzer (currently ext2 and FAT), log extractor, ontological search engine, unallocated space extractor, time event resolver, and time-lining tool. Investigators can interconnect these tools using flows within Picasso, though the interconnections are not always universal since some tools cannot interconnect with other tools due to semantically incompatible data types. CFIT1 also ensures the consistency in the interpretation of time differences from computers running different operating systems (which may interpret time in different ways), located in different countries and possibly covering multiple time zones. It does this by associating each piece of case evidence, or metadata generated by forensic tools, with a time reference defined by the investigator. These time references are then automatically mapped into the common UTC timeframe (see Section 6.4 for further information).
An example computer forensic tool available in CFIT1 is the Ferret Discovery Engine—a tool for textual concept ontology generation, navigation, and searching. It can be used for searching files or documents for particular concepts and identifying those documents that might have a forensic signi- ficance. It is particularly useful for searching text-based files, though it can also be used for searching text in nonprintable files such as binaries (exe- cutable files) and even network packets.Ferretallows the investigator to
w Discover suspicious text byte streams, such as files/documents from one or more file systems;
w Establish the inherent relationships between the streams based on a set of concepts.
Anontologyis a domain of discourse where one or more keywords (or terms in the Ferret terminology) is organized as a domain-specific graph- based concept structure that best describes knowledge or information about a given domain. A concept is a set of one or more terms and their sets of relationships with other concepts (we define relationshiplater). In Ferret, a concept is initially restricted to containing a single term (e.g., money, transfer, and, account) and its set of relationships with other concepts.
The most basic function available in Ferretis to perform searches for a set of terms on some input data streams defined by upstream forensic tools
in CFIT1. This can be most useful when searching unallocated space and hidden space on disks. Investigators can select a set of search options (such as stop terms, allowable errors in each term, and case-sensitivity) as well as being able to subselect data streams and run multiple concurrent searches. Figure 2.4 shows the results of a search operation for a single term on eight input data streams (in this case, the streams are Linux log files). The termskernelandapollohave been found in four of the data streams and can be viewed in themessages logfile in the lower panel.
The term nodes in the graph-based concept structure are usually related to each other by one or more relationships (represented as the arcs of the graph), such as generality, specificity, synonymy, and meronomy. Semantic relationships may also arise in the context of the text language model employed. The language model captures and characterizes the regularities in the natural language used in the text stream. For example, short- and long-distance textual information such asN-grams and triggersdescribe the underlying associative relationships used in a text document. AnN-gram [63] is a sequence of contiguous words in a text stream with its significance
in that text stream defined by the conditional probability of one word given the preceding sequence of words. Consequently, N-grams capture well short-term and local dependencies in the text stream. N-grams can, unfortunately, capture nonsensical text frames that are unrelated to their linguistic role. Atriggeris a pair of terms that cooccur, usually within a fixed word window size, in the text stream. Triggers are effectively long-distance bi-grams (2-grams) capable of extracting relationships from a large-window document history [64]. Triggers have been shown to be effective in capturing semantic information over small-to-medium text stream window sizes (distances up to 5) [65].Ferretuses triggersto extract semantic relationships within text documents.
The ability to describe semantic relationships using a concept graph allows the investigator to visualize the concept domain of the case under investigation much more succinctly. The graph combines both language semantic relationships as well as data-driven semantic relationships (e.g., triggers). This allows the investigator to navigate the concept domain and possibly discover new relationships. Figure 2.5 displays the concept graph
Figure 2.5 Concept graph usingFerret.
within theFerret concept browser window, showing the different concepts derived from the input data streams, as selected in Figure 2.4, related to the central conceptapollo. Some of these concepts are semantically related by generality and specificity (e.g., agency, and supernatural), and to others derived from data stream triggers (e.g., kernel, entry, and succeeded) shown together with their associated mutual information index values (a measure of the distance or strength between the two terms).
In summary, CFIT1 is an easy-to-use forensic investigation environ- ment that provides the ability to integrate multiple interactive forensic tools into a common, visual framework. It is an ongoing development and requires the addition of more file system support (e.g., NTFS is currently being included) and the inclusion of an improved reporting facility.