A multimedia database system for 3D crime scene representation and analysis

(1)

A multimedia database system for 3D crime scene

representation and analysis

Marcin Kwietnewski

1

_{, Stephanie Wilson}

1

_{, Anna Topol}

1

_{, Sunbir Gill}

1

_,

Jarek Gryz

1

_{, Michael Jenkin}

1

_{, Piotr Jasiobedzki}

2

_{and Ho-Kong Ng}

2 1

_{York University, Department of Computer Science and Engineering}

4700 Keele St., Toronto, ON, Canada, M3J 1P3

2

_{MDA Corporation, 9445 Airport Rd., Brampton, ON, Canada, L6S 4J3}

ABSTRACT

The development of sensors capable of obtaining 3D scans of crime scenes is revolutionizing the ways in which crime scenes can be analyzed and at the same time is driving the need for the development of sophisticated tools to repre-sent and store this data. Here we describe the design of a multimedia database suitable for representing and reason-ing about crime scene data. The representation is grounded in the physical environment that makes up the crime scene and provides mechanisms for representing both traditional (forms-based) data as well as audiovisual (3D scan and other complex spatial data). Some initial results obtained with tools designed to seed the dataset are reported as well as some components of the final database system.

1. INTRODUCTION

Crime scene investigation requires the collection and anal-ysis of vast amounts of data. This data comes in a variety of formats, such as traditional paper forms collected by offi-cers at the scene, results of scientific evaluations in the crime scene lab, fingerprint data, high resolution still imagery, and video and audio recordings. Representing this information in an effective manner in order to enhance later investigation of the crime is a complex task. This task becomes even more difficult as advanced technologies such as laser (e.g., [5]) or optical scanners (e.g., [10, 9]) become available which allow for full 3D scans of the scene and objects within the scene to be collected and made available for investigators. It is im-portant for investigators to be able to access this large and varied collection of data in an intuitive and efficient manner. This paper describes the design of a multimedia database to address the problem of representing and manipulating the complex datasets associated with crime scene analy-sis. Given the importance of spatial relationships in these databases, a central concept in the resulting representation

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

CVDB 2007 June 10, Beijing, P. R. China

is that entities in the database have a specific pose (or col-lection of poses) that they occupy in a spatial representation of the scene. A second underlying principle in the represen-tation is the hierarchical nature of data collected at crime scenes. In general, entities are associated with larger struc-tures: They may contain one another (e.g., the gun is in the drawer) or the process of data collection enforces specific relationships between entities (e.g., the coroner’s report is associated with the body of the victim). One final property of crime scene data, especially the data associated with 3D scene acquisition and other sensor systems, is that the data may be available at a variety of different resolutions. The use of different resolutions may be related to performance issues (e.g., it is more efficient to render the 3D represen-tation of the crime scene with fewer polygons), or it may be related to issues related to the desirability of viewing the crime scene with certain objects removed or obscured (e.g., to protect the identity of the victim).

Given the nature of the data to be represented the multi-media database design presented here relies on the use of a hierarchical tree structure. This tree structure encapsulates both the property that entities may ‘contain’ other entities (both in a spatial as well as a logical sense), as well as the property that entities may be represented at different ‘res-olutions’. Different representations of each object can be stored.

The user interacts with the database through a user inter-face that provides them with a 3D view of the crime scene. The user can ‘walk’ through the resulting 3D model query-ing objects that are encountered. Interactions – either tra-ditional graphical operations or more complex queries of the database – are processed by the underlying database engine that determines what should be displayed as well as retriev-ing the set of actions that can be performed on the displayed objects.

This paper provides a description of the basic structure of the database system being constructed, as well as planned user interactions. The paper begins with an introduction to the problem of crime scene investigation from a data pro-cessing viewpoint. The basic underlying data flows within the database system and preliminary tools that have been developed in order to populate the database with 3D mod-els of crime scenes are then described. Finally, the paper concludes with a description of planned development and extensions to the current system.

(2)

Figure 1: A 3D view of a simulated crime scene. Data within the crime scene may be collected over a wide geographical area, by many different people, and much of the data will be sent to external ex-perts to provide additional information about the data collected.

2. CRIME SCENE DATA

Crime scenes provide a vast range of complex data, col-lected by a number of different personnel, over a lengthy time period. Data is collected on-site, in laboratory or fin-gerprint analysis, by the coroner, and by a myriad of other personnel (see Figure 1). All of this data must be amalga-mated into a single coherent representation so that it can be used by investigators as well as later in courts by both the prosecution and defense.

Traditionally this data is collected and managed using pa-per forms. Each item that is collected is assigned a unique identifying label. Investigating officers choose identifiers and assign textual descriptions to the data collected based on experience gleaned in the field. As the complexity of crime scenes increases, due in part to the use of more complex scientific tests and the presence of evidence that is not well described by a free form textural description, limitations of the existing data collection and representational mecha-nisms have become apparent. As a result, there has been an interest in the development of crime scene database sys-tems. Although these representational systems have their advantages over the traditional pen-and-paper note taking mechanisms typically deployed in the field, they are not ca-pable of dealing with the complex datasets that are asso-ciated with 3D scanners and chemical, nuclear, radiological and biological sensors.

Such sensors typically obtain large 3D representations of the spatial data. For example, the simulated crime scene shown in Figure 1 is actually a rendering of the 3D dataset obtained with the iSM sensor[10]. The raw data obtained with this sensor can be used to construct a 3D textured tri-angular mesh which can then be manipulated in order to view it from different directions. The mesh can be decom-posed into different objects, and the mesh along with both traditional and non-traditional data must be represented in a manner that supports crime scene investigation.

Figure 2: Object hierarchy example. All objects in the database exist within this hierarchy. All objects represent a bounding volume which contains all of its children. The bounding volume of siblings may overlap. Each node contains an attribute that de-scribes how it is to be displayed in terms of the user interface, and a node may have multiple display at-tributes. See text for details.

3. DATABASE DESIGN

A database for crime scene investigation provides dual functionality. It provides persistent storage of the raw data acquired from the crime scene and a method of analysis of the scene data for crime scene investigators. The first func-tion is important for providing evidence in a court of law. Here, the data must be represented in an unaltered state so that the court can view the evidence as it was when it was recorded. The second function is important to aid the investigators by providing the data they require in a quick and easy to understand manner. Firstly, it is important to be able to find/identify objects of interest in the scene. Secondly, the representation must provide straightforward access to the spatial and other relationships that exist be-tween entities. For example, it is important to be able to identify entities that are in close proximity to each other and to represent the property that certain entities belong to oth-ers. This spatial hierarchical property of the data suggests a hierarchical representation for the database itself. At the top level there is the overall crime scene. The children of the root-level may include such things as a chair or a table in the room. The children of the table may include a box resting on top of it. The children of the box may include a pencil inside the box, and so on (see Figure 2). In order to simplify establishing relationships between objects, each object is assigned a bounding volume and children of a node must exist within this volume. Two children of a given node may have intersecting bounding volumes.

In addition to providing a mechanism for representing the property that some objects are contained within others, we need a way to reflect the fact that certain objects in a crime scene may have significant information associated with them that was collected at different sites ( e.g., at the crime lab or in coroner’s lab) or at a different time. Those represen-tations or descriptions can be treated as separate ’scenes’ in terms of processing and each node in the hierarchy will be associated with a set of them. This will provide a straight-forward mechanism to generate views of the scene in the browser application, e.g., it will be able to query for 3D

(3)

rep-resentations or low quality reprep-resentations of a subset of the object hierarchy it wants to render.

Objects within the hierarchy may have a range of different primitive type representations associated with them, includ-ing

• ‘Traditional’ descriptions like standard paper forms

• Multimedia objects such as video or audio clips that describe (or encode) an object located at a specific location.

• Textured 3D meshes that describe the surfaces of ob-jects and locations within the crime scene that can be easily to or removed from a 3D view of a crime scene

• Spatial distributions such as levels of radiation, tem-perature, or gas concentrations.

Given the highly visual nature of much of the data that is encoded in a crime scene dataset, and the needs of crime scene investigators, the ability to visualize the dataset in various ways is a critical requirement. This raises a number of interesting problems related to both performance (at what resolution should large triangular meshes be rendered) and also to the security of views of the crime scene. It may be desirable, for example, to display portions of the crime scene in such a manner as to obscure certain facts that are being withheld during an ongoing investigation. Thus objects in the representation, either external or internal nodes in the hierarchy, may have a range of visualization attributes that may be accessed by the database.

Another important aspect of the database design is effi-cient storage of user-generated objects. A user may retrieve an object from a database and modify it in various ways (for example, by taking a 2D snapshot, changing resolution, removing certain objects from it, etc.). Such modifications may be time consuming and may require the use of com-plex multidimensional indexing methods[2]. Thus it may make sense to store these newly generated “redundant” ob-jects for future use. This database functionality is similar to storing materialized views[3], which from a conceptual point of view are also redundant objects in a database. The task of storing the redundant objects in our environment is easier in the sense that we do not expect updates to our original data, hence do not need to solve the maintenance problem of the user-generated objects. On the other hand, we do not have a formal language like SQL to describe precisely to the user how the objects are related to each other. If the num-ber of such objects becomes large, the task of representing efficiently and clearly their relationships may be a challenge.

4. DATA EXCHANGE

The end-user application and the database must be able to unobtrusively exchange information about the objects, their representations, their relationship with other child/parent objects, and available object manipulation actions. The Ex-tensible Markup Language (XML) is well suited for sharing the information organized in a tree-based structure between systems and decouples the database interface and the user interface implementations. XML is used to describe queries and responses to the database as well as encoding the set of possible interactions the user can make with respect to a specific object in the database.

Figure 3: Data exchange diagram: communication between the application and the database.

Associated with every object in the database is the collec-tion of possible interaccollec-tions that the user can have with the object. When an object is retrieved from the database the set of possible interactions and the code necessary to process these interactions is retrieved from the database as an XML description. For example, if the object is displayable at dif-ferent resolutions, then the available commands to choose these resolutions are encoded in the XML description as is the set of actions that must be executed in order to cause the user interface to display the object.

When a user selects an object of interest in the scene the set of possible actions associated with that object (as iden-tified in its XML description) are displayed. The manipu-lations may include viewing the object at a finer resolution, opening associated reports or audio files, or even removing the object from the scene altogether. Each such action will trigger the interaction of the application with the database, which in turn will produce a new XML formatted descrip-tion of the scene and the object according to the latest query (Fig. 3). The application will then use this information to return the results to the user. This may include redrawing the scene or launching a document viewer or a media player. This approach of separating the database interface and the user interface implementations provides greater poten-tial for future extendibility. By using the hierarchical ob-ject association and the XML to return the query results we can communicate arbitrarily complicated scenes between the database and the application that avoids any direct mapping to the database tables.

5. ACTIONS AND USER INTERACTION

In order to ensure high-performance for rendering and in-teraction with a large number of 3D model, audio, and video datasets the end-user client application is written natively utilizing operating-system supported hardware accelerated libraries (see Fig. 4). The task of decoupling interactions from the client application thus requires a mechanism at

(4)

Figure 4: User Interface: The client application viewing a sample crime scene model.

run-time to bridge the gap between the XML response from the database and the desired result in the client application. This is achieved using Python[6], a dynamic, scriptable, multiplatform, freely available object oriented programming language[6], as seen in Fig. 5.

Aforementioned, the XML response from the database to the client is an arbitrary length list of available interactions. These interactions are defined by two fields: a UI description tag and an inline Python script. The UI description tag is used by the client application to describe the interaction to the user. This is normally text used to populate the list of interactions displayed to the user but is produced in HTML markup to support more advanced UI features in the future. The corresponding Python script is what drives the client application.

When a user selects an interaction to perform, the native client application executes the Python script. The script is capable of doing a variety of tasks: primarily the generation of database queries, but also executing natively compiled code for UI-related convenience tasks such as toggling of 3D rendering options for an object, and finally updating its in-ternal global state. Upon startup of the application a Python session is created with a persistent global state that is acces-sible by the native client application. The client application determines what to display strictly by reading what infor-mation is present within this global state. This simplifies the role of the native client end-user application to the pre-sentation of multimedia and UI while leaving the details of user interaction and database querying to an extensible and flexible scripting system.

6. DATA ACQUISITION

Data within the database comes from many sources. Many of these sources are “traditional” in that this is textual in-formation entered by crime scene investigators either in the field or in the office based on notes taken in the field. Al-though this is crucial information in the database, perhaps the most interesting data that must be acquired and repre-sented in the database is the collection of large 3D struc-tural elements that describe the scene and objects within the scene.

These large 3D structural descriptions are collected us-ing a vision-based stereo reconstruction system1_{. Data is} 1

Both MDA’s iSM[10] and the AQUASensor[9] have been

Figure 5: Data exchange diagram: adding data to the database.

collected using a standalone hardware platform. It consists of a Bumblebee Stereo Camera[7] to extract depth infor-mation from images of the environment (see [8] for product information). Egomotion is extracted from the stereo image sequences and this information is used to project the data to a common reference frame. This depth information is then used to create three-dimensional models of the scene (see [10] for a detailed description of this system). These models are high-resolution dense uniform meshes textured with the actual real-world video capture making them very accurate and realistic representations of the crime scene environment. A sample 3D reconstruction is shown in Figure 1.

7. 3D SCENE SEGMENTATION

Devices such as iSM provide high resolution 3D scans of salient scene structure. It is necessary for the user to be able to manipulate individual objects within the scene, in addition to the overall scene itself. The task of manually isolating and segmenting these objects can be daunting. It would require the user to indicate which polygons are part of the object to be isolated and which are not. This would be very time consuming. A semi-automatic tool to enable object segmentation to be a quick and easy operation is therefore required.

3D Lazy snapping is a technique for assisted segmentation of 3D images based on the 2D Lazy Snapping approach[4]. The user indicates the 3D object to be segmented simply by marking sample foreground and background elements of the scene. The system then selects those polygons it considers part of the object and rejects those it considers part of the background. The user can then either augment the set of marked foreground and background polygons, allowing the system to recompute the selection, or can manually adjust any of the polygons which have been misclassified. Due to the system classifying the majority of the polygons auto-matically, the demands on the user are drastically reduced. The problem of segmenting the object from the 3D scene is considered as a graph cut problem, with the cut dividing the foreground regions from the background regions. Here a segmentation is described based on a voxel representation of the scene, although a polygon-mesh based representation segmentation is similar. A graph is constructed with each voxel in the scene representing a node in the graph. Edges

used, although the details of the iSM system are described here.

(5)

(a) (b) (c)

(d) (e)

Figure 6: Scene segmentation. (a) a raw scan. (b) a voxelized version of the scene. (c) voxels identified by the user as foreground(red) and background(blue). Note that this is the extent of the user interaction. (d) the voxel segmentation performed by the system. (e) the resulting foreground object.

connect adjacent nodes when the corresponding voxels are adjacent in the scene. These edges are assigned weights so that an edge between two similarly coloured voxels has a high weight, while an edge between two dissimilarly coloured voxels would have a low weight. Source and sink nodes are added and have a connecting edge to every other node in the graph. For each internal node (i.e. nodes other than the source or sink), the minimum distance from its colour to the foreground and background colours is computed. When an internal node is similar to the foreground colours, the edge weight between it and the source node is high. When it is similar to the background colours, the edge weight between it and the sink node is high. When the colors are dissimilar, the edge weights are low. By applying a min-cut/max-flow algorithm [1] to the graph, the source-connected nodes in the resulting cut are the foreground of the object and the sink-connected nodes are the background. The segmented mesh can then be stored into the database as a separate child object of the crime scene object. Figure 6 shows the process of segmenting an object (here a person) from the 3D scan.

8. DISCUSSION

The database requirements for representing non homoge-neous spatial datasets such as those required for crime scene investigation are novel and complex. Data must be repre-sented at a variety of resolutions (including some that are deliberately obfuscating). Data must be grounded in an un-derlying spatial representation. It must also be possible to

associate data in a hierarchical manner to objects within the scene.

Here we have described the design of a visual database system designed to provide an effective and efficient repre-sentation of crime scene data. Components of the system have been tested in isolation, and work is ongoing in inte-grating these components into a coherent whole.

Acknowledgements

This work was supported by NSERC, MDA and the Chemi-cal, BiologiChemi-cal, Radiological-Nuclear and Explosives Research and Technology Initiative (CRTI Project 05-0122TD).

(6)

9. REFERENCES

[1] Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. InEnergy Minimization Methods in Computer Vision and Pattern Recognition, pages 359–374, 2001.

[2] V. Gaede and O. G¨unther. Multidimensional access methods.ACM Computing Surveys, 30(2):170–231, 1998.

[3] H. Gupta and I. S. Mumick. Selection of views to materialize in a data warehouse.IEEE Trans. Knowl. Data Eng., 17(1):24–43, 2005.

[4] Y. Li, J. Sun, C. Tang, and H. Shum. Lazy snapping.

ACM Trans. Graph., 23(3):303–308, 2004. [5] Polhemus. Polhemus fastscan.

http://www.fastscan3d.com, March 2005. [6] Python. The python programming language.

http://www.python.org, 1990.

[7] P. G. Research. Bumblebee: Digitial stereo vision camera. Product data sheet, 2005.

[8] P. G. Research. Point grey research. http://www.ptgrey.com, 2006.

[9] J. M. Saez, A. Hogue, F. Escolano, and M. Jenkin. Underwater 3d slam with entropy minimization. In

Proceedings of IEEE International Conference on Robotics and Automation, ICRA2006, May 15-19 2006.

[10] S. Se and P. Jasiobedzki. Photorealistic 3d model reconstruction. InProceedings of IEEE International Conference on Robotics and Automation, ICRA2006, pages 3076–3082, 2006.