Analysis of XBOX Kinect Sensor Data for Use on Construction Sites: Depth Accuracy and Sensor Interference Assessment

(1)

Analysis of XBOX Kinect Sensor Data for Use on Construction Sites: Depth Accuracy and Sensor Interference Assessment

Nima RAFIBAKHSH1, Jie GONG2, Mohsin K. SIDDIQUI3, Chris GORDON4, and H. Felix LEE5

1_{Graduate Student, Department of Industrial and Manufacturing Engineering, SIUE, Edwardsville, IL} 62025; email: [email protected]

2_{Corresponding author, Assistant Professor, Dept. of Construction, Southern Illinois University} Edwardsville, IL 62025, Phone +1 618-650-2498; email: [email protected]

3_{Assistant Professor, Construction Engineering and Management Department, King Fahd University of} Petroleum and Minerals (KFUPM), Phone: +966-3-860-2325, E-mail: [email protected]

4_{Assistant Professor and Chair, Dept. of Construction, Southern Illinois University Edwardsville, IL} 62025, Phone +1 618-650-2867; email: [email protected]

5_{Professor and Director, Department of Industrial and Manufacturing Engineering, SIUE,} Edwardsville, IL 62025; email: [email protected]

ABSTRACT

Sensing and site data acquisition are active areas of research for the construction engineering and management domain. A number of research initiatives around the globe are focused on novel sensing applications for managing site safety, productivity improvements, progress monitoring, site layout planning, and for innovative approaches to supply chain management. Time of Flight cameras and laser scanners are the tools of choice for real time and near real time decision making on jobsites. However, most of these applications are limited to academic research and limited field trials have been carried out. A number of operational decisions are necessary before sensing equipment can be deployed. These decisions are challenging for companies and researchers alike as there is limited test data available regarding the performance characteristics for the various equipment options. This paper reports on the performance of Xbox Kinect Sensors for spatial modeling on construction sites. Designed experiments were conducted to characterize the accuracy and resolution of Xbox Kinect sensors as well as the interference between multiple Xbox Kinect sensors. The experiments provided quantitative knowledge about the performance of XBOX Kinect sensors in terms of spatial modeling, therefore establishing a baseline for Kinect performance expectations in similar modeling applications.

INTRODUCTION

The dynamics of workspace and mixed usage of moving equipment and personnel often pose significant safety concerns to construction workers. Common causes for construction accidents include falls, struck-by, caught-in-between, and electrical shock (OSHA 2000; Bureau 2009). Many of these causes are rooted in the victim’s unawareness of the impeding dangers upon performing a certain action,such as entering the blind side of equipment or stepping too close to openings or edges.In addition, accidents are also caused

(2)

when the worker is unaware of the risks that his actions may poseto his co-workers; actions such as operating equipment in unclear situations or failure to put warning signs on dangerous area.

For the past forty years, technologies have been continuously developed and applied to shield workers from dangerous activities. For instance, automation in general has decreased the necessity of manual work in construction. However, the shift from labor intensive to capital intensive paradigms raises safety concernsthat have not been adequately addressed for the construction industry. One of the major causes of the high accident rate in construction continues to be the lack of proactive workspace safety features. Workspaces on construction jobsites are dynamic and often includemoving machinery and personnel. A potential approach to address the safety issues in a construction environment is to develop a virtual safety network which continuously senses the workspace, observes and analyzes object interactions, and predicts imminent dangers based on collected sensing data.

To enable such real time and near real time analysis, range cameras in particular have attracted considerable attention from the construction research community. Preliminary research studies have shown the potential and limitations of commercially available range cameras for real-time site spatial modeling applications (Gong and Caldas 2008; Teizer et al. 2007). Compared to the range sensors evaluated in these studies, the recently released XBox Kinect sensors provide improved performance at a fraction of cost of the earlier sensors. The performance of the Kinect sensors, in particular their depth accuracy and resolution, can be impacted by a variety of factors such as level of ambient light, type of materials, material color, and the presence of infrared light emitted by other Kinect sensors. Experimental evaluationof depth accuracy and resolution of these Kinect sensors is scarce; yet this is critical to develop the proposed construction jobsite safety applications. This paper reports on a research effort on the use of Kinect sensors on construction jobsite for possible safety applications. A series of experiments were conducted in various controlled conditions. The experiment results were analyzed to characterize the performance of Kinect sensors, including depth accuracy, resolution, and interference among multiple Kinect sensors. The research findings enhance our understanding of the potential of XBox Kinect Sensors for real-time spatial information acquisition, modeling, and monitoring. Such understanding is critical to establishing an intelligent workspace, where the spatial information of materials, tools, equipment and people can be continuously sensed, and the technology can be further developed to obviate the danger of fall, struck-by, and caught-in-between.

OVERVIEW

A hybrid technology with low resolution and high scanning frequency (30 frames per second) has emerged relatively recently,and devices with such capabilities are often collective termed under 3D range cameras. Examples of such include D-Imager EKL3104, MESA Swiss Ranger, and Canesta. These cameras emit infrared light and measure the time of flight if the light is bounced back from objects. As a result, these cameras can operate in completely dark environments.The cost of these cameras is typically in the range of several thousand dollars. The XBox Kinect senor is a motion sensing input device for Microsoft Xbox 360. It is capable of providing streaming depth information and color information at the resolution of 640x480 and

(3)

the rate of 30 frames per second. Despite of its initial purpose for gaming, Kinect has been found to be an effective solution for spatial sensing (Figure 1a). An example 3D scene that was captured and modeled by a Kinect sensor is shown in Figure 1b. A comparison of the performance of these range cameras in terms of resolution, speed, distance, and field of view is shown in Table 1. Indeed, the XBOX Kinect sensor, whose cost is just a fraction of others, represents considerable improvement over other types of range cameras.

(a) (b) Figure 1 (a)The XBOX Kinect Sensors; (b) A 3D Scene acquired from a Kinect Sensor

Table 1. Comparison of 3D Range Cameras

Model XBOX Kinect D-Imager EKL3104 MESA SR-3000

Resolution 640 x 480 160 x 120 176 x 144 Speed 30 20, 25, 30 25 Distance 1.2 m to10 m 1.2m to 9m 7.5 m Field of View Horizontal 57 60 47.5 Vertical 43 44 39.6

Physical tilt range (-27 ,+27) N/A N/A

Price $150 >$5,000 >$5,000

RELATED WORK

Due to their near real-time capability in modeling workspace and tracking moving objects in a three-dimensional space, 3D range sensors have been proposed and studied for time critical applications on construction sites, such as collision avoidance and construction robotics (Chi et al. 2008; Son et al. 2010). Most 3D range sensors have limited detection range due to their size constraints. Construction sites are often cluttered, lacking clear line of sight, and occupy much larger space than the detection range of a single range camera. Use of multiple range sensors is possible but with ascending cost due to the high price of real-time 3D range sensors (>$5000 per unit). This situation has been changed since the release of Microsoft Kinect sensors (~$120 per unit). There is a large amount of Kinect research studies published since the release of Kinect in November 2010. Some of these studies have focused on introducing, calibrating and investigating accuracy and precision in various environments (Stone and Skubic 2011, Berger 2011, Dutta 2011, Khoshelham2011, Stoyanov 2011). Meanwhile, Kinect sensors have been experimented for applications

(4)

such as falling detection (Stone and Skubic 2011) and rehabilitation(Chang 2011). Evaluation of Kinect performance in construction environments has not been reported in previous research studies. Of particular note is that Kinect is designed for gaming; its accuracy in order to being used as a 3D range sensor for construction modeling applications needs to be evaluated. Also, the potential interference among multiple Kinect sensors needs to be characterized since multiple Kinect sensors with overlapping coverage would be needed in large scale environments such as construction sites. The overall purpose of this research is to characterize the accuracy of Xbox Kinect sensors, the resolution of Xbox Kinect sensors, and the interference among multiple Xbox Kinect sensors in a simulated construction environment.

EXPERIMENT SETUP AND RESULTS Calibration of the Kinect Sensors

We calibrated the Kinect Sensors used in our experiments for obtaining accurate sensor parameters for spatial sensing. In previous studies, different methods have been used to calibrate Kinect sensors and all of them are based on depth values returned from the sensor (Khoshelham2011). In this research, the Kinect sensors were calibrated using an A4 checkerboard as shown in Figure 2. A set of 30 different images had been captured by the Kinect sensors from different angles. The calibration process eliminated lens noises from cameras and determined the mapping between depth and color images such that the color image could be overlaid on top of the depth image (Dutta 2011). The reprojection error found from this method is 0.955 pixels, which is in line with the typical reprojection error achieved in similar Kinect studies (< 1 pixel).

(a) (b) Figure 2.(a) Image captured before calibration; (b) Image after calibration process

Investigation of Kinect Resolution and Depth Accuracy

To understand the resolution and depth accuracy of Kinect sensors, we used Kinect sensors and a High-Definition terrestrial laser scanner to scan an indoor experiment scene which includes common construction site objects such as metal doors and pipes, lumber, and so on (Figure 3). The captured point clouds from both types of sensors were compared in terms of resolution and depth accuracy. We used a Faro Focus3D scanner as the High-Definition terrestrial laser scanner. This scanner has a maximum error of 2mm at 10 and 25 meters, and has noise of 0.6mm at 10

(5)

meter and 0.95mm at 25 meters. Because of its high accuracy, we used the point clouds from Focus3D as the ground truth measurement for the indoor scene. The scene was scanned using a common indoor 10m scanning setting available on the scanner. The setting includes 3x quality and 1/8 resolution, measuring 5,120 points per 360 degree. The quality and resolution settings are adjustable in the range of 1x-8x and 1/1-1/32, respectively. The quality setting controls the number of points captured in each second (1x – highest capturing speed), and the resolution determines point density (1/1 - the highest density.

Figure 3. The Indoor Experiment Scene

A detailed workflow used in sensor data processing and analysis is shown in Figure 4. More specifically, the point clouds from a Kinect sensor are registered to the Focus3D point clouds, and the average error distance between two types of clouds is used to determine the relative depth accuracy of Kinect sensors. Registration spheres were placed in the scene for point cloud registration purposes. The numbers of points captured on these spheres were also used to compare the resolution of these two types of scanners.

Figure 4. Sensor Data Processing and Analysis

Figure 5 shows two registered point clouds from a Kinect sensor and the Focus3D scanner. After rough registration using targets and refined registration using the Iterative Closest Point (ICP) method, the average distance error between two aligned point clouds is 3.49 cm. This result indicated that the depth accuracy of Kinect sensors is considerably lower than the high-end terrestrial laser scanner. Following the depth accuracy analysis, the numbers of scanning points on the sphere targets are retrieved to analyze the resolution of Kinect sensor data. The results are summarized in Table 2. It is clear that at each distance in the Table 2, the resolution

(6)

of Faro Focus3D (with the setting of 1/8 resolution and 3x quality) is 4 times higher than that of Kinect sensors. However, it is worthy to mention that it took 2 minutes and 51 seconds for the Faro Focus3D to scan the scene, while the Kinect sensor can scan the scene 30 time per second.

(a) (b)

(c)

Figure 5.(a) A Kinect Sensor Point Cloud; (b) A Focus3D Point Cloud; (c) The Registered Point Clouds

Table 2. Resolution Data based onScan Points on Spheres

Distance Scan Points(Kine ct) Scan Points(Faro) Resolution (Kinect in points/cm2₎ Resolution (Faro in points/cm2₎ Sphere 1 1.7 m 1486 5611 4.5 17 Sphere 2 2.9 m 705 3450 2.1 10.5 Sphere 3 2.8 m 774 3679 2.3 11.1 Sphere 4 3.4 m 610 2852 1.8 8.6

Note: Scanned sphere area = 2 * 3.14 * 0.07252_{= 0.033 m}2_{= 330 cm}2 Investigation of Kinect Interference

To characterize the interference between multiple Kinect sensors, we put two Kinect sensors facing each other, and varied the distances and angles between them to

(7)

quantify the interference effects by examining the resolution loss (Figure 6). Table 3 shows the distances and angles we have experimented with. The generated point clouds from the Kinect senor facing the scene were imported into MeshLab to examine the number of captured scan points. The interference between multiple Kinect sensors is apparent in situations where the Kinect sensors directly face each other (Figure 7). In each position, the size of generated point cloud is computed, and the data for all the positions is shown in Table 3.

Figure 6. The Kinect Inference Experiment Setup

(a) _(b) (c)

(8)

Figure 7.Point Clouds at Varying Angles (8 feet distance) (a) Angle (0); (b) Angle (5o); (c) Angle (10o); (d) Angle (15o); (e) Angle (20o); (f) Angle (25o)

Table 3. Kinect Interference Data

Angle (in degrees)

0 5 10 15 20 Distance (in ft) 6 148,768 139,916 45,388 44,062 164,994 8 153,268 126,398 179,627 176,654 172,883 10 165,389 184,970 197,005 192,542 192,470 12 151,516 172,435 190,056 189,461 93,328 25 30 35 40 45 6 123,087 185,398 179,246 191,770 196,901 8 195,576 155,852 186,847 191,353 202,184 10 174,397 185,533 198,911 191,480 199,963 12 180,143 194,193 190,579 195,681 204,369 Further analysis of the Kinect interference data revealed that the angle between Kinect sensors has significant influences on the Kinect data resolution, as shown in Figure 8. In general, as the linear regression analysis indicates, when the angle between multiple Kinect sensors increases, the performance of Kinect sensors in terms of resolution improves. It is recommended a minimum angle of 35 degree between two Kinect sensors should be maintained in order to minimize interferences.

Figure 8. The Plot of Relationship between Angle and Kinect Data Resolution CONCLUSION

In this study, a series of experiments were conducted to study the resolution, depth accuracy, and sensor interference of XBox Kinect sensors. Kinect sensors have great potential for spatial sensing and modeling applications on construction sites due to its lower cost and real-time sensing ability. However, the eventual applicability of such sensors for construction applications is closely related to their spatial sensing

R² = 0.3614 0 50000 100000 150000 200000 250000 0 5 10 15 20 25 30 35 40 45 Si ze of P oi nt C lou ds

Angle between Kinect Sensors

6 feet 8 feet 10 feet 12 feet

(9)

performance in terms of accuracy, resolution, and robustness. In this study, we found that the point clouds generated from Kinect sensors have considerable lower resolution and depth accuracy than high-end terrestrial laser scanners, and the interference among multiple Kinect sensors is evident in situations where these sensors were not carefully positioned. However, given the price of Kinect sensors is 1/100th or lesser than terrestrial laser scanners, Kinect sensors could be a cost effective site spatial sensing device in situations such as workspace modeling and visualization. To ensure better performance, the Kinect sensors should not face each other. Further, a minimum of 35 degree angle between two Kinect sensors should be maintained. Overall, the result of this research suggests that Kinect sensors are best for real-time applications that do not require sub-centimeter accuracy, and have modest requirement on resolutions (<= 3 points/cm2 at 2-4 meters). Use of multiple Kinect sensors for real-time spatial modeling on construction sites is promising but requires further research, in particular on the performance of the sensors in outdoor conditions.

REFERENCES

Berger K., Ruhl K. Schroeder Y., Bruemmer C., Scholz A., Magnor M. (2011). “Markerless Motion Capture using multiple Color-Depth Sensors”, Vision, Modeling, and Visualization (VMV), Berlin, Germany, Oct. 4–6, 2011, pp. 317-324.

Bureau, Health and Safety Statistics. (2010). U.S. Department of Labor, Bureau of Labor Statistics, <http://www.Bureau-gov/iaghome.htm>, 2010.

Chang Y.J., Chen S.F. and Huang J.D, (2011). “A Kinect-Based System for Physical Rehabilitation: A Pilot Study for Young Adults with Motor Disabilities”,

Research in Developmental Disabilities, 32 (2011) 2566-2570.

Chi, S., Caldas, C., and Gong, J. (2008). "A crash avoidance framework for heavy equipment control systems using 3D imaging sensors." ITCON, 13, 118-133. Dutta T. (2011). “Evaluation of the Kinect_ sensor for 3-D kinematic measurement in

the workplace”, Applied Ergonomics, (2011) 1-5.

Gong, J., and Caldas, C. H. (2008). "Data processing for real-time construction site spatial modeling." Automation in Construction, 17(5), 526-535.

Khoshelham K. (2011) “Accuracy Analysis of Kinect Depth Data”, International Society for Photogrammetry and Remote Sensing (ISPRS), Zurich, Switzerland, Aug. 25-Sep. 1, 2011.

Stone, E., and Skubic, M. (2011) “Evaluation of an Inexpensive Depth Camera for Passive In-Home Fall Risk Assessment”, Proceedings, Pervasive Health Conference, Dublin, Ireland, May 21-23, 2011, Best Paper Award.

Son, H., Kim, C., and Choi, K. (2010). "Rapid 3D object detection and modeling using range data from 3D range imaging camera for heavy equipment operation." Automation in Construction, 19(7), 898-906.

(10)

Stoyanov T., Louloudi A., Andreasson H. and Lilienthal A.J. (2011) “Comparative Evaluation of Range Sensor Accuracy in Indoor Environments”, European Conference on Mobile Robots (ECMR), Orebro, Sweden, Sep. 7-10, 2011. Teizer, J., Caldas, C. H., and Haas, C. T. (2007)."Real-time three-dimensional

occupancy grid modeling for the detection and tracking of construction resources."ASCE Journal of Construction Engineering and Management, 133, 880.