Solution - Visual Perception For Robotic Spatial Understanding

We call our solution the multi-sensor graph calibration (MSG-Cal) framework. MSG- Cal handles cameras, 2D laser range finders, and 3D sensors that output point clouds. It also handles any number of sensors, with the only constraints that: (1) every sensor field of view overlaps at least one other sensor’s point of view enough to see the calibration object simultaneously, and (2) that 2D lasers must interact with at least

one other modality (camera or 3D)3_{. These constraints are very reasonable, but do}

imply that MSG-Cal cannot handle the extrinsic calibration of systems with non- overlapping fields of view. However, MSG-Cal is a very modular system, and has the capability of handling new sensors and data collection schemes, simply by initializing the graph with the previous calibration and collecting calibration object observations from the new sensor in concert with the existing system.

3_{6-DoF 2D laser to 2D laser extrinsic pose calibration is difficult (at least when using a simple}

planar calibration target). Since the laser intersects the plane as a line, the single observation does not fully constrain the pose of the plane; there is a degree of freedom of rotation of the plane around the line. Since we assume we know nothing about the transform between the sensors (indeed, this is what we want to determine) and we cannot fix the plane pose relative to one of the sensors, we cannot use multiple observations of the plane to constrain the pose (i.e., we have fewer constraints/equations than unknowns).

MSG-Cal conceptually simplifies the problem by interpreting sensor readings as observations of one of two kinds of geometry: planes and lines. Cameras can detect planes through the use of fiducial markers that have a known size (e.g. AprilTags [155] or checkerboards), and 3D lasers and other point cloud sensing devices can detect planes directly by grouping points together that support a specific model of a plane. 2D lasers, on the other hand, can observe the lines that occur with the intersection of the target plane with the plane of the laser scan. If we can associate the observations of a single plane over two or more sensors, and we have multiple observations of this plane in different poses, then we can compute the relative poses of the pairs of sensors observing it. While we don’t know the actual pose of the target object, this is not required; all we need to determine is the transform between a pair of sensors. Each observation provides a new constraint on the relationship between the sensors (usually; an observation may be removed as an outlier during the RANSAC-based pairwise calibration procedure).

In addition, MSG-Cal also provides methods and tools to make it easy to configure a system for calibration, collect data, and run the calibration process. The algorithm

proceeds in three stages: data collection, target detection, and calibration. The

first stage is obvious, but MSG-Cal provides two features that help make it more robust for multiple-sensor, multiple time-scale configurations: background subtraction and manual triggers. Background subtraction is used for the 2D and 3D lasers and any other point cloud producing sensor. This greatly simplifies the data association problem with very little cost: a small portion of the initial data collection is used to collect the background (meaning the calibration object should not be in view), and then the robot and background should stay relatively static during the remainder of the collection process. Some dynamic objects are tolerated, with both the assumption that they will not be as planar as the target object, and by using random sampling consensus (RANSAC) to minimize outliers in the pairwise calibration stage. Triggers are a particularly practical way to indicate to the system that it is time to collect data across sensors with varying time scales. The trigger indicates that the calibration object is ready to be recorded by all sensors, and the system captures the most recent

complete observation for all sensors into a single sensor frame.

The second phase is target detection. In this phase, each frame collected previously is processed to detect and compute the geometric properties of the target for each sensor (plane or line), determining the sets of valid pairs for each frame. Imaging sensors by default use the AprilTag detector for computing a plane estimate, 2D laser scanners subtract the background from the scan to yield candidate a candidate line, and the 3D lasers and other point cloud sensors run a standard RANSAC-based plane estimation process after filtering the background-subtracted cloud to reduce noise and lower the point count. The output of this phase is an organized set of pairwise detections consisting of plane-to-plane observations or plane-to-line observations.

The third and final calibration phase is itself split into two sub-phases. The

first sub-phase uses each set of pairwise calibrations to estimate the sensor to sensor transform using RANSAC to filter outliers. This occurs for each pair of sensors that had a sufficient number of overlapping observations of the target object. The second sub-phase constructs a global hypergraph consisting of the estimated sensor poses, edges from sensors to observations, and constraints between cliques of sensors that observed the target object at the same time. We use the graph to construct and solve a non-linear optimization problem that yields optimized poses to minimize the error over all sensors.

In document Visual Perception For Robotic Spatial Understanding (Page 82-84)