4.5 Asynchronous Sensor Fusion
4.5.2 Proposed Asynchronous Sensor-Fusion
Achieving and maintaining synchronization between disparate modules of a distributed system is a daunting task in practice. In this thesis, a sensor-scheme is developed to lighten the load of maintaining synchronization. The scheme is based on the fusion algorithm outlined in Section 4.4.2. In order to generalize the scheme to asynchronous measurements, the HMM framework on which the scheme is based is augmented with switching coefficients to maintain the possible temporal mis-alignments between the different sensing modalities.
The resultant algorithm, shown in Algorithm 9, is able to handle minor misalignments
between the inertial and visual measurements which might accrue due to variable-rate sensors or errors during algorithm setup, such as inaccurate estimates of the sensor rates. The algorithm provides correction for temporal misalignments in real-time, this is important for systems to be able to provide immediate feedback to users on the tracking performance of the sensor-fusion scheme.
The asynchronous sensor-fusion scheme maintains in real-time multiple candidates for the alignment of the inertial and visual sub-systems. The different alignments and correspond-ingly different state-transition matrices are selected based on the switching coefficients. A B-Best MM filter, outlined in Algorithm 8, is integrated into the sensor-fusion Algorithm 7.
The B-Best filter is a type of hard-decision MM filter which sets an upperbound on the number of model candidates (i.e. B). By selecting only the top B model candidates at each time-step (i.e. positon measurement), the B most probable alignments and their correspond-ing mean and covariance matrices are maintained. Theortically, all possible alignments could be maintained and carried over for each time-step; however, the number of total candidates rises exponentially and in practice a restriction on the number of candidates is necessary for computational reasons.
The asynchronous sensor-fusion scheme maintains multiple alignments, represented as delay-shifts between the inertial and visual measurement series (⌧). The delay-shifts are quantized as integer multiples of the smallest sampling period of the system (i.e. accelerom-eter measurements). The asynchronous scheme assumes the different sensor sub-systems are commenced at approximately the same instant to within a sampling period of the highest rate sensor. The ability of the scheme to handle initial synchronization errors greater than this amount requires further research. After each visual measurement, the scheme enter-tains the possibility of M different configurations with corresponding delay-shifts. Each of the M configurations can be represented based on the latent inertial (acceleration and bias) states considered during the higher-level KF steps which take into account the visual measurements. The configurations are described in Table 4.2.
The delay shifts are inferred based on the compatibility of the position measurements (via visual sub-system) and the state-transition process applied to the shifted inertial mea-surements for each configuration. After successive steps of the asynchronous sensor-fusion scheme is able to handle significant synchronization errors (i.e. spanning tens of sampling
Input : initial model PMF ✓✓✓0, homogeneous model transition probabilities ⇡, and observations yt
Data:
⇡ij = p(✓it|✓tj)
N is the length of the time-sequence M is the total number of candidate models B is the maximum number of best models
Result: hard-decision multiple model scheme for estimating ˆxt= E{xt|Y1:t}
/* Initialization */
Given {µµµi0|0, ⌃i0|0}Mi=1; for t 0 to N do
/* Kalman Filter Time-Update and Measurement-Update */
{ˆµit 1|t 1, ˆ⌃it 1|t 1}i=1,...,B ! {ˆµi,jt|t, ˆ⌃i,jt|t}i=1,...,B
Algorithm 8:B-Best Multiple Model [113]
periods), which may have accumulated as a result of erroneous sensor-rate estimates. Evi-dently, the larger synchronization errors may warrant correspondingly large delays between the inertial and visual measurements; hence, the concept of real-time, which is synonymous with the tracker processing being applied within the span of adjacent visual measurements, becomes conditional on the availability of required inertial measurements. The level of com-patibility is based on a prior for model transitions as well as the likelihood of the shift given the visual measurement. The scheme can be likened to the integration of a DTW routine into the visual-inertial sensor fusion for real-time processing of asynchronous measurements.
4.5.3 Simulation Results
The asynchronous sensor-fusion scheme is designed for improving fusion results when the multiple sensing modalities are not aligned. Hence, in the case of visual-inertial sensor
Model Type Model Index Offset to delay-shift Inertial State Indices
Table 4.2: Descriptions of each of the model configurations. Includes the model index ( ), the incremental effect of model on the delay-shift (i.e. offset), and the indices of the inertial states used during the higher-level KF. The indices of the inertial states are implicitly offset by the delay-shift.
fusion, the asynchronous fusion scheme accounts for the delay (possible time-varying) which persists between the acceleration estimates from the IMU and the positon estimates from the visual-tracking sub-modules. This scheme works in real-time providing a causal fusion estimate. In order to evaluate the performance of the proposed asynchronous sensor-fusion, test scenarios include the IMU data and visual data having sample rates which are non-integer multiples of eachother, as well as a scenario with visual data captured at a variable-rate.
As evidenced by the sensor-fusion trials of Table 4.1, the fusion algorithm has difficulty outperforming the visual-tracking algorithm, in terms of positional precision, if the cameras have clear line-of-sight view of the target object (no occlusions). The high-degree of precision from the visual-tracking algorithm in these clear situations is difficult to improve upon given the model uncertainty present in the fusion scheme. The occlusion test scenarios demonstrate the efficacy of the fusion algorithms, as the cameras lose line-of-sight and hence offer room for an improved position estimate. In effect, the asynchronous sensor-fusion algorithm aligns the IMU information with the visual information in preparation for the target trajectory (in a possibly cluttered environment) necessitating the need for inertial assistance (e.g. occlusions or tracking loss ).
The results of the proposed asynchronous sensor-fusion scheme are detailed in Table 4.4, as well as the test definitions summarized in Table 4.3. The proposed sensor-fusion scheme presented in Section 4.4.2 is shown to be a special case of the asynchronous implementation, with the transition matrix set to model-1-only which means the probability of remaining in the first model is 1. In order to test the asynchronous operation of the fusion scheme, a
0 50 100 150 200 250 300 350 400 -0.5
0 0.5
x-pos. Sensor Fusion
Ground-truth Visual-Tracker
0 50 100 150 200 250 300 350 400
0.5 1 1.5 2
y-pos. Sensor Fusion
Ground-truth Visual-Tracker
0 50 100 150 200 250 300 350 400
Sample Index 0.5
1 1.5 2
z-pos. Sensor Fusion
Ground-truth Visual-Tracker
Figure 4.19: Illustrates advantage of sensor-fusion algorithm for target-tracking in the pres-ence of occlusion regions. Plots of navigation frame position (m). The sensor-fusion results are from the asynchronous (uninformative prior) algorithm with no IMU resampling. The IMU at 100Hz and camera at 20Hz.
0 50 100 150 200 250 300 350
Figure 4.20: Illustrates the proposed asynchronous sensor-fusion scheme for handling variable-rate (asynchronous) camera frame rate. Plots of navigation frame position (m).
The IMU at 100Hz and camera at a variable-rate between 15 16Hz.
uniform transition matrix is selected, which means the probability of transitioning to any model is equal.
The asynchronous and synchronous operation of the proposed sensor-fusion algorithm is benchmarked in Fig. 4.22, 4.23, and 4.24. Fig. 4.22 demonstrates operation without any resampling of the IMU data. As expected, the synchronous operation performs better for test scenarios R0 - corresponding to an IMU sample rate of 100Hz which is an integer multiple of the camera frame rate. The asynchronous operation performs slightly worse, this is due to the weaker prior (uninformative transition matrix). The situtation reverses with the asynchronous outperforming the synchronous operation for test scenarios R1 and R2 - corresponding to scenarios where the camera frame rate is either a fixed or variable non-integer divisor of the IMU rate. The synchronous operation clearly fails to provide a usual estimate, as shown in Fig. 4.21. Contrarily, the asynchronous operation does not suffer from the drift error seen previously, and is able to maintain a estimate in the event of visual tracking loss, shown in Fig. 4.20.
0 50 100 150 200 250 300 350
Figure 4.21: Illustrates the proposed synchronous sensor-fusion scheme for handling variable-rate (asynchronous) camera frame variable-rate. Plots of navigation frame position (m). The IMU at 100Hz and camera at a variable-rate between 15 16Hz.
4.6 Conclusion
A simulation environment is proposed for exploring visual-inertial sensor fusion algo-rithms by generating coupled visual information in the form of camera frames with inertial information in the form of nine-axis IMU measurements. The environment provides the ability to inject various noise sources into the two paired sensing modalities. A sensor-fusion scheme is then introduced which takes a loosely-coupled approach to fusing visual and inertial data. The scheme is shown to be advantageous over purely vision-based sys-tems in the presence of significant occlusions which will arise in a sparse camera array setup.
The sensor-fusion scheme is extended to handle asynchronous measurements by combining ideas from DTW as well as multi-rate data fusion in order to align visual and inertial in-formation in real-time. The asynchronous sensor-fusion is demonstrated to be beneficial for variable-rate sensors or in scenarios where the sensor-rate is known only approximately. The real-time capabilities of the scheme are important for human motion capture cases where the application experts may want immediate feedback on the quality of the target tracking.
The sensor-fusion schemes proposed can be expanded to offline methods, if the real-time
CG0I0R0 CG0I0R1 CG0I0R2 CG0I1R0 CG0I1R1 CG0I1R2 OC2I0R0 OC2I0R1 OC2I1R0 OC2I1R1 Test Scenario
10-3 10-2 10-1 100
World-Domain Root Mean-Square Error (RMSE)
Sensor-Fusion: M0RS0 Sensor-Fusion: M1RS0
Figure 4.22: Comparison of world-domain RMSE (m) performance for the proposed asyn-chronous and synasyn-chronous sensor-fusion algorithms on control-group and occlusion test sce-narios.
CG0I0R0 CG0I0R1 CG0I0R2 CG0I1R0 CG0I1R1 CG0I1R2 OC2I0R0 OC2I0R1 OC2I1R0 OC2I1R1 Test Scenario
10-3 10-2 10-1 100
World-Domain Root Mean-Square Error (RMSE)
Sensor-Fusion: M0RS1 Sensor-Fusion: M1RS1
Figure 4.23: Comparison of world-domain RMSE (m) performance for the proposed asyn-chronous and synasyn-chronous sensor-fusion algorithms on control-group and occlusion test sce-narios. The IMU acceleration data has been resampled to 105Hz
CG0I0R0 CG0I0R1 CG0I0R2 CG0I1R0 CG0I1R1 CG0I1R2 OC2I0R0 OC2I0R1 OC2I1R0 OC2I1R1 Test Scenario
10-2 10-1 100
World-Domain Root Mean-Square Error (RMSE)
Sensor-Fusion: M0RS2 Sensor-Fusion: M1RS2
Figure 4.24: Comparison of world-domain RMSE (m) performance for the proposed asyn-chronous and synasyn-chronous sensor-fusion algorithms on control-group and occlusion test sce-narios. The IMU acceleration data has been resampled to 96Hz
requirement is relaxed, in which case smoothing filters such as Rauch-Tung-Striebel (RTS) Smoother can be applied to the entire motion capture sequence in a non-causal manner [24].
The fusion scheme also leaves room for velocity information, inferred from the motion-blur of fast-moving targets, to be incorporated into the HMM framework [40, 158].
4.7 Acknowledgements
This chapter is, in part, a reproduction of material accepted for publication: Mehdi P. Stapleton, Md. Zulfiquar A. Bhotto, and Ivan V. Bajić, "Simulation Environment for Visual-Inertial Sensor Fusion", IEEE CCECE 2016. I was the primary investigator and author for this paper.
Input :
ypt = visual subsystem position estimates yat = inertial subsystem acceleration estimates
Na= number of inertial measurements per visual measurement M = number of different models
{µµµk,li,j, ⌃k,li,j} = mean and covariance of multi-variate gaussian posterior distribution for ith position, jthvelocity, kthacceleration, and lthacceleration-bias. The acceleration and bias indices reset to 0 at each new position index. See Fig. 4.7.
⇡= model transition matrix (⇡ij= probability of transitioning from the jthto ithmodel ) Output:
ˆ
pt= sensor-fusion position estimate Data:
nV isis the length of the visual measurement time-sequence nAccis the length of the inertial measurement time-sequence B is the maximum number of model candidates maintained
Nyp and Nya are the measurement covariance matrix of the visual position estimates (see Section 4.4.2) and inertial acceleration estimates, respectively
⌧t indicates estimate of time-shift between visual and inertial measurements
t model index
Ka order of acceleration autoregressive model
Result: Hidden Markov Model scheme for estimating pt= E{pt|Yp1:t, Ya1:t}; despite asynchronous visual and inertial sensor measurements
for t 1 to nV is do for b 1 to B do
/* Select one of the corresponding B-best models */
{µµµ, ⌃, ⌧, } {µµµb, ⌃b, ⌧t 1b , t 1b };
/* Update acceleration and bias states up to next visual sample */
for j 1 to Nado
/* State-update with acceleration and bias, measurement-update with
accelerometer reading */
{µµµ, ⌃}( K(t 1),(t 1)a:j),( 1:j) KalmanFilter({µµµ, ⌃}( K(t 1),(t 1)a:j 1),( 1:j 1), ya(t 2)⇥(Na 1)+⌧ +1+j) end
/* Lookahead prediction */
{µµµ, ⌃}( K(t 1),(t 1)a:Na+1),( 1:Na) KalmanFilter({µµµ, ⌃}( K(t 1),(t 1)a:Na),( 1:Na), ya(t 1)⇥(Na 1)+⌧ +2);
/* Compute state-transition coefficients for position and velocity based on
Simpson’s Rule : {fp(·), fv(·)} */
/* Compute Jacobians of visual measurement functions: Hyp */
for m 1 to M do
/* Time-update of position */
pt|t 1= fp(pt 1, vt 1, at 1, Ka:Na+1);
/* Time-update of velocity */
vt|t 1= fv(vt 1, at 1, Ka:Na+1);
/* Compute mixture component model weight */
Lmt =N (p; µµµpt|t 1, Hyp⌃( K(t|t 1),(t|t 1)a:Na+1),( 1:Na)(Hyp)T+ Nyp);
!m= ! ⇡m Lmt ;
/* Measurement-update with visual sample yxt */
{µµµ, ⌃}( K(t|t),(t|t 1)a:Na+1),( 1:Na);
/* Update time-shift based on model: ⌧tm= ⌧t 1(±1) */
/* Marginalize posterior distribution */
{µµµ, ⌃}(N(t|t),(t|t 1)a Ka:Na+1),(Na 1:Na); end
/* Select best B models based on weights */
/* Normalize model weights */
/* Update sensor-fusion position estimate */
ˆ xt=PB
b=1µµµbt!b; end
end
Abbreviation Descriptions
( CG-#1I-#2/
OC-#1-I#2 )-R#3
Control group/Occlusions: the identifier R#3 indicates the sampling rate combination simulated: 0 - IMU at 100Hz,
camera at 20Hz; 1 IMU at 100Hz, camera at 15Hz; 2 -IMU at 100Hz, camera at variable sampling rate (linear
ramp) 15-16Hz.
M#1-RS#2
Sensor-fusion mode: the identifier M#1 indicates the transition matrix ⇡ used in the fusion algorithm: 0 -uniform transition probabilities indicating an uninformative
prior, 1 - model-1-only indicating the fusion algorithm stays in the first model throughout the process. The secondary identifier RS#2 indicates whether the IMU
information has been resampled from 100Hz: 0 - no resampling, 1 - resampled to 105Hz, 2 - resampled to 96Hz.
Table 4.3: Summary of test scenario abbreviatons and sensor fusion modes. The table addresses new tags not covered in previous Table 3.1
Test
M0RS0 M1RS0 M0RS1 M1RS1 M0RS2 M1RS2
CG0I0R0 0.00168 0.00503 0.00282 0.02274 0.36004 0.10311 0.30184 CG0I0R1 0.00206 0.06569 0.37492 0.00564 0.00357 0.03317 0.43660 CG0I0R2 0.00204 0.05623 0.46181 0.01322 0.15485 0.02062 0.39860 CG0I1R0 0.00168 0.01066 0.00679 0.03779 0.36204 0.11908 0.30081 CG0I1R1 0.00206 0.05189 0.37756 0.01238 0.00767 0.03945 0.43842 CG0I1R2 0.00204 0.04197 0.46368 0.02163 0.15367 0.03265 0.40138 OC2I0R0 0.01532 0.00687 0.00423 0.01917 0.36157 0.10404 0.30232 OC2I0R1 0.01428 0.06316 0.37588 0.00905 0.00462 0.04138 0.43810 OC2I1R0 0.01532 0.01273 0.00717 0.02486 0.36361 0.11910 0.30120 OC2I1R1 0.01428 0.06054 0.37864 0.01289 0.00755 0.04093 0.44001 Table 4.4: The RMSE (m) performance of the proposed aysnchronous sensor-fusion scheme is evaluated on various test scenarios: control group (CG) and occlusion (OC). The sensor-fusion results are seperated into pairs of columns, each representing a specific resampling mode (see Table 4.3). The sensor fusion mode (asynchronous vs. sychnronous) with the lower RMSE is underlined.
Chapter 5
Real-World Experiment
5.1 Introduction
A prominent application of visual-inertial sensor-fusion would be in the motion capture industry; whereby, the fusion of inertial and visual measurements will provide a system with robustness as well as precision as alluded to in Chapter 1. The scenario envisioned would have the target (e.g. human, animal, etc...) outfitted with an inertial measurement suit (e.g. Xsens suit [160]); while being under observation by a sparse array of cameras;
thereby, maintaining a certain level of portability akin to the strictly inertial approach.
A real-world experiment is performed as a proof-of-concept for the algorithms proposed herein. The camera calibration, proposed visual-tracking algorithms, as well as proposed fusion techniques are put to the test in a real-world setting.
This chapter is organized as follows. Firstly, Section 5.2 will summarize the experimental procedure and preparation, including camera calibration. The experimental results will be discussed in the following Section 5.3. Lastly, Section 5.4 will address any concluding remarks and notes for future work.
5.2 Experiment Setup
In order to side-step the data-association problem with tracking multiple similar objects (i.e. the IMU’s), the experiment will use a single Xsens [160] MTx-series IMU device.
The sparse camera array will be provided by a single Point Grey Research XB3 [121] stereo-camera; however, all visual-tracking techniques developed thusfar have assumed an arbitrary number of cameras (i.e. they generalize to larger arrays).
A difficulty not found in simulations [137] is the need for ground-truth data in order to benchmark the various tracking and fusion algorithms proposed. In a real-world setting, ground-truth data is impossible to acquire and the experimenter must accomodate certain levels of measurement uncertainty in the chosen substitue. In the following experiment,
"ground-truth" information is provided by a Qualisys [116] system fitted with 8 Opus cameras. The system is designed to track retro-reflective dots attached to the targets of interest using the IR-sensitive cameras, which doubly emit IR light onto the scene using an array of IR-LEDs.
Another complication of performing experiments consisting of multiple different sensing modalities (i.e. distributed system) is the idea of synchronization, which ensures all mea-surements start at a common time and maintain a constant sample rate from that point onwards. Section 5.2.1 will elaborate on the various sensors used as well as explain the attempt at overcoming synchronization issues.