Disk and file imaging and analysis

2.2 Electronic evidence

2.2.3 Disk and file imaging and analysis

In cases where the investigation involves actual seizure of a computer, disk imaging takes place once the computer has been seized according to warrant and is properly in custody. Once the imaging is complete, further analysis can then take place on the duplicate(s) with an assurance that the integrity of the original evidence is not compromised.

In some situations, imaging of disks or files will take place at the site of the investigation, for instance, if the case involves an individual within an organization whose employers sanction the imaging of that person’s desktop. In such situations, the question arises as to whether the entire disk is to be physically imaged (disk imaging) or whether instead the investigator should identify the required files and copy just those for later analysis in the laboratory. In the past, and still today in many cases, the entire disk is imaged in order to allow subsequent analysis at both physical

and logical levels. This is known also as a bit-by-bit or sector-by-sector or, increasingly, abit-streamimage or duplicate. There is also partition aligned imaging which allows each separate partition to be duplicated to a separate image; this is particularly useful if the intention is to use Linux to mount such a partition image (i.e., the file to which it has been imaged) in loopback mode as mentioned previously to access the file system it constitutes.

Increases in disk size and in particular the increased use of RAID disk technology, with its ability to stripe a file logically across multiple physical disk drives, have seen a recent change in that imaging an entire disk can in such circumstances be impracticable as one requires an identical RAID system for the image. As a result, there is a trend to duplicate files at the site of the investigation rather than the entire disk. This is sometimes referred to as file-by-file imaging or logical imaging. While in some circumstances inevitable, this carries with it the disadvantage that subsequent analysis at the physical level (of the entire disk image) is not possible.

Disk image analysis at the physical level allows an analysis of the disk image,sector-by-sector, and by definition provides for analysis of each disk partition in the image and possibly of nonpartition areas also. Analysis at the physical level is important in situations in which there is the likelihood of forensically valuable information appearing on the disk in ambient data areas. In this case, a detailed physical analysis of the disk image needs to be carried out, either at the site of the investigation or later on off-site. Information that is residual on or hidden in ambient data areas will otherwise be overlooked.

Before leaving the topic of disk imaging, and given that there are a number of variations on the theme, it is useful to note the related terms or phrases used to describe these variations. The terms are not orthogonal:

1. Sector-by-sector: The image disk comprises the same sectors in the same sequence as does the original disk.

2. Disk mirroring: Sammes and Jenkinson [19] argue that this term should be avoided as it implies a target disk which is a physical replica of the original disk from which it has been duplicated which is typically not the case. The point here is that the physical CHS geometries of disks vary, some have more cylinders and fewer heads (recording surfaces), others have fewer cylinders and more heads. For this and other hardware-related reasons, disk input or output involves what is known asCHS translationand as a result, while one can make a faithfulsector-by-sectorcopy of one disk to a target disk

with a different CHS geometry, it will not be a mirror image of the source disk with its different physical CHS geometry. Sammes and Jenkinson [19] provide a very thorough explanation of this and many other hardware-related aspects of the forensics of personal computers.

3. Physical imaging:This is a commonly used term to signify asector-by- sectorduplicate which is what is needed and acceptable in court. 4. Bit-by-bit imaging: This is an ambiguous term, it can be used as a

synonym for disk mirroring, or it can mean a sector-by-sector duplicate.

5. Partition imaging: This is sector-by-sector imaging of a single partition.

6. Bit-stream duplicate,andqualified bit-stream duplicate:These terms [21] have been defined by the NIST Computer Forensics Tool Testing (CFTT) project in developing their standard requirements and terminology for disk imaging tools. The first, bit-stream duplicateis defined as ‘‘a bit-for-bit digital copy of a digital original document, file, partition, graphic image, entire disk, or similar object.’’ The second, qualified bit-stream duplicate is bit-stream duplicate but allowing for identified portions of thebit-streamwhich differ.

2.2.3.2 File imaging and authentication

If disk imaging of an entire disk is not contemplated, either because seizure of the entire computer is for some reason not possible, perhaps not permissible by warrant or because of the intractable size of the disk, then it is up to the investigator to identify the file or files that need to be examined, and to image or duplicate just those files for subsequent analysis. Whether the file imaging is carried out at the site of the investigation or in a police laboratory, in either case—as with entire disk images—the subsequent analysis must take place on a duplicate, not the original. Furthermore, in either case, whether it is a file image or a disk image that is to be analyzed, both the original and the copy or copies of the original must be authenticated. That is, there must be assurance of the continued integrity of the original and it must be demonstrable that the copies are exact, bonafide, copies of the original, so that any conclusions drawn from analysis of the copies are valid. This involves what amounts to computing a kind of checksum [see one-way hash function (OWHF) later] on the original file or disk at the time the copies are made, and an assurance that the copying process is noninvasive, that is, that

it leaves the original in pristine condition, unchanged in any way including in respect of the modification, access and creation (MAC) date and timestamps which are discussed in more detail later in this chapter.

The imaging or duplication operation will, if possible, use a set of procedures and utilities for making an image of the disk or file(s) without relying upon invoking the system’s native operating system program. This is typically accomplished by using the secure boot and write blocking measures discussed earlier and trusted imaging software. As alluded to earlier, disk imaging may in some circumstances be impossible, for instance in the case of servers running RAID arrays. In this case, the best that can be done is to do file-by-file imaging.

Use of aread-onlymedium onto which to copy an image of the original disk or file(s), such as write-once CDs (recordable CD—CD-Rs) or DVD-Rs, provides an assurance that the image retains its integrity throughout successive steps of analysis. It is not uncommon in the interests of performance for forensic teams to use very large SCSI disks (e.g., 182 GB) for the images, whether they are files or disks or disk partitions. These then may be write protected usingwrite blockertechnology, or physically setting a jumper on the SCSI drive or alternatively under Linux/UNIX the partition/ drive can be mounted read-only. Archive copies of the images will also be made, typically to CD-R/DVD-R.

Most of the commonly used forensic toolsets provide a capability for file image authentication and disk image authentication. This capability typically makes use of what is known as a cryptographic OWHF [22]. OWHF make use of block cipher (cryptographic) algorithms to compute a fingerprint or digest (typically a value represented by a 128-bit string or longer) of a file or disk. This value is sensitive to the change of even a single bit in the original file or disk and so provides an authenticator for that file or disk. It is necessary to fingerprint the seized files or disk at the earliest possible point (e.g., before or at imaging time), so that later on it is possible to compute the OWHF for the copy or copies and by comparison demonstrate that the copy or copies are faithful. This assists in demonstrating integrity of the evidence and that the chain of evidencehas been maintained but in turn relies upon the assurance that the original OWHF value(s) were calculated on the uncompromised original data. The most commonly used OWHF technologies are SHA-1 [23] and MD5 [22].

We note that another application of OWHF which occurs in a related area is that of integrity checkers for operational file systems. The best known such integrity tool is Tripwire originally from Purdue University and now marketed by Tripwire Inc. [24]. Such integrity checkers detect changes in designated files or directories and notify the administrator when this occurs.

For example, SMART Watch from WetStone Technologies Inc. [25] provides real-time notification of unexpected file changes to alert the administrator of unexpected system behavior. Hash databases (repositories of computed OWHF values for commonly deployed files) are discussed in Section 2.3, and also in Chapter 3. Section 2.3 also examines the hash capabilities of EnCase and ILook Investigator.

2.2.3.3 Hidden areas

Physical analysis will identify forensically useful information which is not part of any file listed in a directory or in the filemap, information which has persisted accidentally as the residue from previously deleted files, or which occurs as a result of a deliberate attempt to hide information in an unexpected disk location. Deliberate hiding of data can be accomplished in a variety of ways so that its presence may not be apparent from a simple listing of all the directories in each of the disk partitions. One means of hiding data on disk as noted earlier is by manipulating disk configuration information, such as partition tables. See for example PartitionMagic [26], a popular partition management tool that allows users to configure their partition tables. A potential variation on this is noted by Sanderson [27]—this involves exploitation of a characteristic of devices compatible with the ANSI AT Attachment Interface specification (ATA-4 and beyond), which allows disks to be configured with system areas that are then not visible to applications [28]. Data can also be hidden by using a steganographic file system as in [29]. Other simpler ways of hiding data on a disk include the use of files and directories with nonprintable characters for names and in a UNIX file system to mount another disk over the directory that contains the hidden files. Reference [20] gives a detailed breakdown of the many ways in which information can be hidden on a disk.

The termambient datarefers to those areas on disk that are not accessible at the logical or application level. The term actually encompasses a number of separate areas on disk where forensically useful information may reside and from which it may be recoverable. One of the most important is the so- called file slack space, which refers to the space left overin the last cluster allocated to a file. Residual information appearing in file slack space is not accessible using standard file processing utilities that are designed to prevent a user from reading past the end of file in order to avoid the processing of meaningless data. Nonetheless, data residing in file slack space is potentially of forensic value. For example, if the cluster size is 8 sectors, a file of size 11.5 sectors will result in 4.5 sectors of file slack space which may contain data useful to the investigator. Figure 2.2 shows this concept using the file

Figure 2.2 Ambient data areas on disk.

Current

NotTwoBig. There are in fact two kinds of slack space at the end of a bonafide file: the unused space within the last sector of the file (assuming the file size is not an exact multiple of sector size), and the wholly unused sectors (if any) in the file’s last cluster. If files are written to disk from a RAM system buffer one sector at a time, the last sector of the file (the 12th sector of the file in Figure 2.2) will comprise some bona fide file content and some residual information from the RAM system buffer while the four entirely unused sectors may represent residual information from a file (now deleted), which had previously occupied the cluster. Both of these may be forensically useful. In the case of the latter, it may simply reflect residual data or it may reflect a deliberate attempt to hide data. An implementation of a MS-DOS program to store encrypted data in file slack space is provided by Johnson [30].

Unallocated disk sectors too may contain unintended residual data or be used to hide data deliberately although to conceal data is somewhat more problematic to implement than in the case of file slack space as the operating system may well overwrite such hidden data when allocating clusters for new files. Deleting a file removes the filemap reference to the file and marks the no longer needed clusters as deallocated. However, this leaves the original data in the clusters where it was previously resident and unless the user makes special provisions, the system does not actually overwrite that data until and unless it allocates those clusters at some future time to a new file. So until that happens, the information is no longer easily accessible but it is still there. As a result, and unless those clusters have been reallocated in the meantime, the original information may be recovered during physical analysis by the use of specialized software tools that search the disk at a sector level rather than relying upon the filemap. One can reconstitute a previously deleted file by searching deallocated clusters (e.g., looking for text string matches) and piecing together the pieces of the jigsaw, of course not all the pieces may necessarily exist which makes it more difficult. Some operating systems will leave obsolete possibly fragmented filemap records lying around, and if located these will assist the task enormously.

System swap-files (known as page-files in Microsoft Windows NT onwards) too will contain residual data, which is of potential value. Swap- files will include residual information, such as from previously opened files and print spooling, and the more sophisticated forensic tools allow an investigator to check such areas.

Latent data and fragile data are two related terms of interest. Latent data reflects the fact that disk information even after being overwritten one or more times can be recovered using specialized techniques including

magnetic force microscopy (MFM). This has obvious implications for forensic investigations and data recovery, and has in fact spawned a whole mini- industry devoted to the development of the so-called safe file deletion software—safe in the sense of not recoverable. Section 2.2.4 deals further with this topic. Fragile data is intended to emphasize the ease with which digital data (e.g., disk or file images), can be altered and the extent to which it is thus vulnerable to claims in court of having been mishandled, which is exactly why integrity and chain of evidence considerations are particularly vital in the context of electronic evidence.

There are a host of highly respected and widely used computer forensics toolsets or systems which carry out disk analysis at the physical level and which inter alia recover unallocated space for analysis. Some comprise both hardware and software, for instance, DIBS [31] comes as a configured workstation. Most are software tools, such as EnCase from Guidance Software [32] (discussed in Section 2.3), the Law Enforcement Computer Evidence Suite from New Technologies Inc. (NTI) [33], The Coroner’s Toolkit (TCT) [14] developed by Dan Farmer and Wietse Venema, The Forensic ToolkitTM(FTKTM) from AcessData [34], DataTrail FacTracker from Ontrack Data International [35], and GenX from Vogon International Ltd. [36].

2.2.3.4 Logical analysis (file-by-file analysis)

Logical analysis, sometimes known asfile-by-file analysis, analyzes disk files at the application level, something that is far more convenient and efficient— and in some ways more effective—than physical analysis. Logical analysis investigates the contents of a file using the application that produced the file or an application-specific tool designed to read files produced by the application. This is the natural way of accessing and inspecting a file, and provides two benefits:

1. It overcomes the principal shortcoming of physical-only analysis, which by its nature will overlook search strings split across two logically consecutive sectors of a file if they happen to be not physically consecutive on disk.

2. It provides a high-level or semantic view of the file contents. For instance, inspecting a file directory is much simpler using a Microsoft Windows ‘‘dir’’ or UNIX ‘‘ls’’ command than using a simple text editor. A similar and more powerful example occurs in the case of inspecting the Microsoft Windows NT/2000/XP Registry, which is much facilitated by use of the programregedit.

The files may be identified by an investigator by filename or extension type or both with appropriate listing and visualization utilities employed, following which the investigation will employ key word or key phrase searches. File extensions can, however, be deliberately corrupted in order to attempt to confuse an investigation based simply on file extensions. For example, a JPEG image file may be given a .doc extension in order to attempt to hide the existence of an image. The commonly available forensic packages deal with situations in which extensions have been deliberately corrupted, and report on files that do not match their extension, for example, a .gif picture file stored as an .xls spreadsheet program. In UNIX systems one can use the ‘‘file’’ command to check themagic numberat the start of the file to attempt to determine the file type, while in Microsoft systems file identification is normally based upon hexadecimal file header information. Of course, magic numbers could also be corrupt.

Logical analysis may lead to the discovery of encrypted files, which brings its own set of challenges. This is addressed further in Chapter 7.

In document Computer & Intrusion Forensics pdf (Page 71-79)