Evaluation of RGB-D SLAM Data - Interest Point Additions and Testing

Chapter 4 Interest Point Additions and Testing

4.10 Evaluation of RGB-D SLAM Data

Quantitative evaluation of the feature detection, extraction, and matching functions in the RGB-D SLAM System is accomplished in this thesis through the comparison of the estimated camera trajectory to a set of ground truth poses.

4.10.1 Trajectory Evaluation

Two methods were developed for the comparison of the estimated trajectory obtained from the RGB-D SLAM System to the ground truth trajectory. The absolute trajectory error (ATE) [62] examines the difference in location and orientation between matching poses in the aligned trajectory estimate and the ground truth. The relative motion error (RME) [63] compares the relative motion between poses in the ground truth and trajectory estimate.

4.10.1.1 Goal and Notation

There are two collections of time stamped trajectories `D and E. The sequence `D = { `d1, `d2, . . . `di. . . `dn} is the collection of ground-truth poses and

timestamps, and E = {e1, e2, . . . ej. . . em} is the set of discrete pose estimates

and their timestamps. The ground truth sequence describes the position and orientation of a camera throughout a video sequence. The estimated trajectory is the output of a SLAM algorithm which is estimating the position and

orientation of that camera throughout the same video sequence.

• The set of ground truth timestamped poses `D is recorded at a rate of 100 Hz, while the set of estimated timestamped poses E is recorded asynchronously and much less frequently.

• Each ground truth timestamped pose `di = (`td,i, ¯Ji) consists of a time-

stamp `td,i and a pose `Ji, describing the rigid transformation from cam-

era coordinate frame at time `td,i to G, the ground truth’s world frame.

• This transformation `Ji can be expressed as a translation vector `ρd,i

and a quaternion `Q_d,i. `

Ji ⇔D `Qd,i, `ρd,i

• Each estimated timestamped pose ej = (te,j, ¯Tj) gives the estimated

rigid transformation ¯Tj from the camera coordinate frame at time te,j

to W , the estimate’s world frame.

• This transformation ¯Tj can be expressed as a translation vector ρe,j

and a quaternion Q_e,j. ¯

Tj ⇔Qe,j, ρe,j

• The timestamps for both sets are synchronized, giving the time since Unix epoch.

4.10.1.2 Alignment in Time

Both ATE and RPE require alignment of the higher rate ground truth timestamped poses with the asynchronous trajectory estimate. This is achieved by interpolating a ground truth pose for every estimate pose, giving a set of interpolated pose and time estimates D = {d1, d2, . . . di. . . dm}. The pro-

cess, using linear interpolation on the translation terms and spherical linear interpolation (Slerp) [35] on the rotational terms, is as follows:

For each timestamped pose ej ∈ E:

1. Find the two sequential timestamped poses `di and `di+1 such that `td,i <

2. Compute the interpolation weight u.

u = te,j− `td,i `

td,i+1− `td,i

3. Obtain ρ_d,j via linear interpolation.

ρ_d,j = `ρ_d,i+ u(`ρ_d,i+1− `ρ_d,i)

4. Obtain Q_d,j via Slerp.

Q_d,j = sin (γ(1 − u)) sin(γ)

Q_d,i+sin (uγ) sin(γ)

` Q_d,i+1

γ = arccos( `Q_d,i· `Q_d,i+1) 5. Construct dj = (te,j, ¯Jj).

Jj ⇔Qd,j, ρd,j

4.10.1.3 Absolute Trajectory Error

The absolute trajectory error is similar to the error metric used in [1]. The error is measured at each matching pose pair G_T_¯

i, ¯Ji. Here, GT¯i is the

estimated timestamped pose at i expressed in terms of the ground truth world frame G.

In order to calculate the error, the rigid transformation T from W to G must be found. This transformation is given using a quaternion as hQ, ρi and using a rotation matrix as hR, ρi.

T ⇔ hQ, ρi ⇔ hR, ρi

The transformation T is estimated by computing the rigid transformation between the translation terms of matching poses (ρ_e,j, ρ_d,j) as in [44]. For a more complete documentation of this process refer to Section 2.9.

The approach estimates T as the rigid transformation ˆT that minimizes the sum of squared distance between the estimated pose translation terms transformed according to T and the corresponding ground truth pose trans-

lation terms. ˆ T ⇔D ˆQ, ˆρE⇔D ˆR, ˆρE D ˆ_{R, ˆ}_ρE = argmin R,ρ m X i=1 ||Rρ_e,i+ ρ − ρ_d,i||2

The estimated poses in the ground truth world frame are given as:

G_T_¯ i ⇔ * ˆ QQ_e,i , ˆQ " 0 ρ_e,i # ˆ Q−1+ " 0 ˆ ρ #+ , i = 0 . . . m

ATE evaluates how closely the poses in the estimated trajectory match the ground truth poses. The error is measured using the translational error and the rotational error terms. The translational error is given by Equation (4.18). The rotational error is given by Equation (4.19).

G_T_¯ j ⇔ G_Q e,j, G_ρ e,j i = ||Gρe,i− ρd,i|| (4.18) φi = 2 arccos(|Re{ (GQe,i) −1 Q_d,i}|) (4.19)

Here, Re{Q} is the real, scalar component of the quaternion Q and Q−1 is the inverse of the quaternion Q.

4.10.1.4 Relative Motion Error

The relative motion error metric is an error measurement inspired by [63]. It evaluates how closely the motion between poses in the estimated trajectory matches that of the ground truth. The motion is recovered by finding the rigid transformation between sequential poses for both datasets. The angular difference between the two rotations gives the rotational error and the difference in translation terms gives the translational error.

The relative motion ˜Ti between two estimated poses ¯Ti and ¯Ti+1 is given

by:

Ti = ¯Ti+1T¯ −1 i

Ti ⇔D ˜Qe,i, ˜ρe,i

If the SLAM algorithm has estimated the poses correctly, this motion should be the same as the ground truth motion ˜Ji between the ground truth poses

Ji at di and ¯Ji+1 at di+1.

˜ Ji = ¯Ji+1J¯ −1 i ˜ Ji ⇔D ˜Qd,i, ˜ρd,i E

The difference in angle of rotation and translation between the two trans- formations ˜Ji and ˜Ti indicates how much angular and positional error was

introduced by the motion from ¯Ti to ¯Ti+1.

The angular error φ can be found as in Equation (4.20). φi = 2 arccos Re n ˜_Q e,iQ˜ −1 d,i o for i = 1 . . . m (4.20) The translational error is found by taking the Euclidean distance between the two translation terms.

i = ||˜ρe,i− ˜ρd,i|| for i = 1 . . . m

In document From interest points to map transformation: a discussion of RGB-D SLAM and its applications (Page 108-112)