Human Detection - Kinect Pipeline - Autonomous Operation and Human-Robot Interaction on an Indo

4.4 Kinect Pipeline

4.4.3 Human Detection

One of the key objectives of this project is to enhance MARVIN’s ability to interact with humans, see section 1.2. To enable this, MARVIN must be capable of detecting and tracking humans. This is achieved by processing the Kinect’s depth images, as shown in figure 4.24.

Figure 4.24: Block diagram of the human detection algorithm.

To determine if and where humans exist inside the depth image, it under- goes a number of processing steps. The pixels associated with the ground plane are first filtered out, see section 4.4.1. The image is inverted and sharpened to enhance the edges for the segmentation stage. The output of this stage is shown in figure 4.25b.

Segmentation separates the foreground objects so they don’t touch each other, which is required for the blob detection stage. The image is thresholded so that only the background pixels remain. These background pixels are the pixels that have been removed from the image by the kinect pipeline. This is stored as a separate binary image, shown in figure 4.25c. A dis- tance transform is applied to the binary image resulting in figure 4.25d. This image is thresholded once again, resulting in figure 4.25e. This has effectively expanded the background regions of the image. The original input depth image is multiplied by the expanded background binary image, resulting in figure 4.25f. This resulting image has been segmented, with

large separations between the foreground objects.

The OpenCV blob detection algorithm [72] is applied to the segmented image. The parameters of the blob detector have been set to look for human-like blobs. This includes; blobs that have an area within a particular range, are less than 2 m away and have an inertia less than 0.3 (where an inertia of 1.0 is a circle and 0.0 is a straight line). The blobs that are considered human by this algorithm are shown as red circles in figure 4.25f. The value from the centre pixel of detected human blob is converted to a range and bearing using the same technique for converting the depth image into a laser scan message, as described in section 4.4.1. The range and bearing are published as a transform from MARVIN’s origin to the centre of the detected human.

(a) Input (b) Inverted and Sharpened

(e) Threshold (f) Blob Detection

This section shows a viable technique for human detection from still frames using only 2D image processing techniques. It has a relatively low com- putation overhead resulting in good real time performance. However, there are a number of areas where this method could be extended. This technique is reliable at detecting a single human, but can struggle when multiple humans are in frame. The algorithm currently handles this prob- lem by only returning the position of the closest human blob. It also has a relatively high false positive rate when looking at vertical surfaces that result in vaguely human shaped blobs. The multiple human issue can be resolved by tracking the humans, rather than only performing frame by frame detection. This is typically achived using some form of Kalman filter, such as the uncentred Kalman filter approach described by [68].

The full code for the human detection node can be found in Appendix C.

4.5 Summary

This chapter discussed the algorithms used to process the sensor data, accounting for errors introduced by the Segway’s dynamic pitch and converting it to a format the navigation stack can use. It covered the interfaces between the sensor’s hardware and ROS, the steps taken to process the laser rangefinder and ultrasonic network, the novel approach used to re- move the ground and ceiling planes from the Kinect’s dept images and the process developed to detect humans.

The performance of the Kinect pipeline is evaluated by measuring its frame rate as additional layers of processing are applied. This is shown in table 4.2.

Table 4.2: Kinect Pipeline Performance in Frames per Second (FPS)

Processing Ave. Min Max Std. Dev.

No Filtering 30.00 23.72 40.92 1.453

Depth Removal 30.00 21.74 46.50 3.020

Depth Removal & Human Detection

30.00 19.13 58.23 3.617

From this table it can be seen that the additional processing layers do not impact the average performance of the Kinect; however, it can increase the variation of the frame rate. If the average frame rate drops, it would suggest that the Kinect pipeline requires more processing power than is available. This demonstrates that the code is sufficiently transparent and has no negative affect on the performance of MARVIN’s other systems. This processed data is used by the navigation stack for localisation and obstacle detection (as discussed in Chapter 5) and for human-robot interaction (detailed in Chapter 7).

Navigation Stack

As discussed in section 1.2, MARVIN must be capable of autonomously operating in an environment designed for, and shared with, humans. To achieve this, a robust navigation stack is required. Rather than redeveloping existing solutions, the ROS navigation package is implemented. This is a “2D navigation stack that takes information from odometry, sensor streams and a goal pose, and outputs safe velocity commands that are sent to a mobile base” [73]. Safe velocity commands are defined as commands that will not result in the robot colliding with an obstacle. This navigation package has been shown to work effectively (for example) in Willow Garage’s paper [25]. Figure 5.1 shows how the navigation package is implemented in MARVIN.

Figure 5.1: Block diagram of the Navigation Stack.

The navigation stack must be capable of three major tasks; Localisation, Obstacle Detection and Motion Planning. Localisation allows MARVIN to estimate where it is in a known map. This is achieved by the amcl node, explained in section 5.1. Obstacle detection informs MARVIN about the obstacles immediately around MARVIN, allowing it to navigate without colliding with them. This is achieved by the costmap 2d node, explained in section 5.2. Motion planning calculates a path from the position estimate to a goal location taking into account any detected obstacles and the known map. This is achieved by the move base group of nodes, explained in section 5.3. The generation of the known map is achieved using simultaneous localisation and mapping (SLAM) techniques and is supplied to the other nodes using the map server node, explained in section 5.4.

5.1 Localisation

MARVIN needs to know its current pose (position and rotation) to allow it to navigate. The Segway measures the amount each of its wheels ro- tates over time. From these measurements we can estimate the change in MARVIN’s pose relative to its starting pose. This is known as the robot’s odometry. Odometry can be used to estimate MARVIN’s pose, but it prone to errors. These errors can be caused by the wheel radius changing (due to flat tires), by the wheels slipping or by stochastic errors in the wheel rotation measurements. Over time these errors accumulate providing a less certain estimate. This is known as odometry drift. Because of this, MAR- VIN needs additional information to correctly estimate its pose. MARVIN’s operating environment is confined to known areas, as discussed in section 1.1.1, so a pre-built map can be used. We can compare what MARVIN is currently sensing to the map to narrow down the list of possible poses. If we then combine this list of possible poses with the odometry and any previous pose estimates, a much more certain pose estimate can be made. This process is known as probabilistic localisation.

A well known and effective probabilistic localisation method is Monte Carlo Localisation (MCL). MCL uses a particle filter to localise, where the particles represent pose estimates. It works by incrementally updating the weights of each particle based on the sensor and odometry data, then redistributing the particle pose estimates based on those weights. Over time the distribution of the particles should converge on the robot’s ac- tual pose. The returned pose estimate from MCL is the pose at the peak of the probability density function (PDF) created by the particles. MCL becomes more effective (more likely to produce a correct pose estimate in fewer steps) the larger the number of particles, however this also makes it more computationally intensive. MCL’s effectiveness is therefore inversely proportional to its computational efficiency. One method to reduce this

trade-off is to adaptively change the number of particles. For example, when the pose estimate is less certain it may be desirable to have more particles to increase the chance of generating a more accurate estimate. When the pose is more certain, the number of particles can be reduced to improve the computational efficiency. This is known as Adaptive Monte Carlo Localisation (AMCL). AMCL is implemented in MARVIN using the ROS amcl package [74] which implements the approach as described by Dieter Fox in [75]. Figure 5.2 shows how the amcl package is integrated in MARVIN.

Figure 5.2: Block diagram of the amcl node.

The amcl uses the /map topic as the known map, the /odom topic as odometry and the /laser/scan/processed topic as the sensor update topic. The laser scanner topic is the only sensor topic the node is subscribed to as it is the same sensor used to generate the map, and therefore should produce the most consistent results. The primary output of the amcl node is a transform from the map frame to the base link frame (where the base link frame is MARVIN’s origin, as explained in Chapter 4). This transform is used to link the global (map) coordinates to the local (robot) coordinates.

In document Autonomous Operation and Human-Robot Interaction on an Indoor Mobile Robot (Page 107-117)