Gaussian processes - Parameter and function estimation

Parameter and function estimation

5.3 Gaussian processes

case the used model is not in the conjugate-exponential family, there is a need for further approximations [55].

A drawback with VBEM is that, even for fairly simple models, the deriva-tions required for the analytical approximation of the posterior density can be troublesome. However, the resulting VBEM algorithm is computationally efficient compared to many alternative methods that rely on Markov Chain Monte Carlo sampling [56].

5.3 Gaussian processes

With the focus on learning unknown functions based on data, in this section we introduce the concept of Gaussian processes [57, 58]. This is a Bayesian non-parametric method that can be used to describe a distribution over func-tions. A major advantage with this approach is that Gaussian processes are very flexible and are capable to describe very complex functions. The follow-ing discussion is based on the definition of a Gaussian process:

Definition: “A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.”

For illustration, we consider an unknown function f , with argument x ∈ Rⁿ and scalar output y. In the discussion on Bayesian filtering, many of the involved densities are assumed to be Gaussian, characterized by its mean and covariance. Similarly, a prior for the function f can be defined by a Gaussian process that is specified by its mean function and covariance function. This prior is denoted by

f (x)∼ GP m(x), k(x, x^′)

, (5.22)

where the mean and covariance functions are defined as m(x) = E

By choosing the mean and covariance functions, any knowledge regarding the function can be incorporated into the prior. Based on this prior and a set of training samplesD = {x⁽ⁱ⁾, y⁽ⁱ⁾}^Ni=1 that are related through f , the objective in this section is to find a description of f (x_∗), where x_∗is an arbitrary vector in Rⁿ.

As often in reality, the observations are affected by noise. This noise is modelled as Gaussian such that each training sample is

y⁽ⁱ⁾ = f (x⁽ⁱ⁾) + w⁽ⁱ⁾, (5.24)

Chapter 5. Parameter and function estimation

where w⁽ⁱ⁾ ∼ N (0, σn²). From the definition of a Gaussian process, the noisy samples from the process are jointly Gaussian, implying that

y ∼ N m(x), K + σn²IN×N

Given the training samples and the prior knowledge regarding the function, the output for an arbitrary input vector can be estimated. Denoting a set of M input vectors by x_∗ = [(x⁽¹⁾_∗ )^T, . . . , (x^{(M )}_∗ )^T]^T, we wish to find the corresponding outputs y_∗ = [y_∗⁽¹⁾, . . . , y^{(M )}_∗ ]^T. From the definition of the Gaussian process, the vector of training outputs, y, and the sought outputs y_∗ are jointly Gaussian. That is,

In [58], it is shown that the conditional distribution p(y_∗|y) is given as p(y_∗|y) = N m(x∗) + K^T_∗K⁻¹(y− m(x)), K∗∗− K^T_∗K⁻¹K_∗

. (5.28) To illustrate the discussed theory, the section ends with an example.

Example 5.2 (Gaussian processes)

In this example we consider an unknown function f , such that y = f (x), where both x and y are scalars. A prior of f can be specified as

f ∼ GP(m(x), k(x, x^′)) (5.29)

where m(x) and k(x, x^′) shall be chosen to summarize our prior knowledge regarding the function. Assume that we know that the function is fairly smooth and varies around y = 0. These properties are captured by

m(x) = 0 (5.30)

where L = 3. Changing L affects the smoothness of the function.

5.3 Gaussian processes

For illustration, we study the function on a limited interval by construct-ing a vector x that consists of n equally spaced values on [−30, 30]. A sample from the process in (5.29) is a function [58]:

fs(x) = m(x) +√

Figure 5.2: Samples from the prior process.

Assume that we observe the function for 7 different values of x. For i = 1, 2, . . . , 7, these samples can be described as

y⁽ⁱ⁾ = f (x⁽ⁱ⁾) + w⁽ⁱ⁾, (5.33) where w(i) ∼ N (0, σn²) is the observation noise. Depending on the noise level, the information regarding f obtained from the samples will differ. For illustration, we consider two scenarios. First, the noise variance is σ_n² = 0, i.e., we observe the true function values, and second, σ_n² = 1.

The estimated functions, including uncertainties, are shown in Figure 5.3. Worth noting in 5.3(a) is that when the samples are noise-free, the true function is observed and consequently there are no uncertainties regarding the function at these points. This can be compared to the case where the samples are noisy. Then, as shown in Figure 5.3(b), the samples provide less information about the underlying function. Finally, in both figures it is clear that in unobserved intervals, such as x ∈ [20, 30], the posterior function is very similar to the prior. The reason is that the available samples do not provide much information regarding the function on these intervals.

Chapter 5. Parameter and function estimation

-30 -20 -10 0 10 20 30

-3 -2 -1 0 1 2 3

Samples

Posterior mean function

± 2 standard deviations

(a) Posterior mean function including uncertainties for σ_n² = 0.

-30 -20 -10 0 10 20 30

-3 -2 -1 0 1 2 3

Samples

Posterior mean function

± 2 standard deviations

(b) Posterior mean function including uncertainties for σ²_n= 1.

Figure 5.3: The posterior functions for training samples with two different noise levels.

Bibliography

[1] F. Gustafsson, “Automotive Safety Systems,” IEEE Signal Processing Magazine, vol. 26, pp. 32–47, July 2009.

[2] M. Bertozzi, A. Broggi, and A. Fascioli, “Vision-based intelligent ve-hicles: State of the art and perspectives,” Robotics and Autonomous Systems, vol. 32, no. 1, pp. 1 – 16, 2000.

[3] K. Kozak, J. Pohl, W. Birk, J. Greenberg, B. Artz, M. Blommer, L. Cathey, and R. Curry, “Evaluation of lane departure warnings for drowsy drivers,” in Proceedings of the human factors and ergonomics society annual meeting, vol. 50, pp. 2400–2404, Sage Publications, 2006.

[4] E. Coelingh, A. Eidehall, and M. Bengtsson, “Collision Warning with Full Auto Brake and Pedestrian Detection - a practical example of Au-tomatic Emergency Braking,” in 13th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 155–160, September 2010.

[5] M. Distner, M. Bengtsson, T. Broberg, and L. Jakobsson, “City safety – a system addressing rear-end collisions at low speeds,” in Proc. 21st International Technical Conference on the Enhanced Safety of Vehicles, no. 09-0371, 2009.

[6] P. Doll´ar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian Detection:

An Evaluation of the State of the Art,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, pp. 743–761, April 2012.

[7] D. Geronimo, A. Lopez, A. Sappa, and T. Graf, “Survey of Pedestrian Detection for Advanced Driver Assistance Systems,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, pp. 1239–1258, July 2010.

BIBLIOGRAPHY

[8] D. Forslund and J. Bj¨arkefur, “Night Vision Animal Detection,” in Pro-ceedings of IEEE Intelligent Vehicles Symposium, pp. 737–742, June 2014.

[9] “Preliminary Statement of Policy Concerning Automated Vehicles.”

http://www.nhtsa.gov, May 2013.

[10] A. Eugensson et al., “Environmental, safety, legal and societal implica-tions of autonomous driving systems,” in The 23rd Enhancement Safety of Vehicles Conference, 2013.

[11] C. Urmson et al., “Autonomous driving in urban environments: Boss and the Urban Challenge,” Journal of Field Robotics, vol. 25, no. 8, pp. 425–466, 2008.

[12] S. Kammel et al., “Team AnnieWAY’s autonomous system for the 2007 DARPA Urban Challenge,” Journal of Field Robotics, vol. 25, no. 9, pp. 615–639, 2008.

[13] E. Guizzo, “How Google’s self-driving car works,” IEEE Spectrum On-line, October, vol. 18, 2011.

[14] J. Ziegler et al., “Making Bertha Drive – An Autonomous Journey on a Historic Route,” IEEE Intelligent Transportation Systems Magazine, vol. 6, pp. 8–20, Summer 2014.

[15] G. Brooker, M. Bishop, and S. Scheding, “Millimetre waves for robotics,”

in Australian Conference for Robotics and Automation, 2001.

[16] M. I. Skolnik, Radar handbook. New York: McGraw-Hill, 2008.

[17] J. Hasch, E. Topak, R. Schnabel, T. Zwick, R. Weigel, and C. Wald-schmidt, “Millimeter-wave technology for automotive radar sensors in the 77 GHz frequency band,” IEEE Transactions on Microwave Theory and Techniques, vol. 60, no. 3, pp. 845–860, 2012.

[18] W. Fleming, “New Automotive Sensors – A Review,” IEEE Sensors Journal, vol. 8, pp. 1900–1921, November 2008.

[19] B. Hofmann-Wellenhof, H. Lichtenegger, and E. Wasle, GNSS – global navigation satellite systems: GPS, GLONASS, Galileo, and more.

Springer Science & Business Media, 2007.

BIBLIOGRAPHY

[20] R. Kalman, “A New Approach to Linear Filtering and Prediction Prob-lems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82 (Series D), pp. 35–45, 1960.

[21] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with applica-tions to tracking and navigation: theory algorithms and software. John Wiley & Sons, 2001.

[22] B. Bell and F. Cathey, “The iterated Kalman filter update as a Gauss-Newton method,” IEEE Transactions on Automatic Control, vol. 38, pp. 294–297, February 1993.

[23] G. Sibley, G. Sukhatme, and L. Matthies, “The Iterated Sigma Point Kalman Filter with Applications to Long Range Stereo.,” in Robotics:

Science and Systems, vol. 8, pp. 235–244, 2006.

[24] S. J. Julier and J. K. Uhlmann, “New Extension of the Kalman Filter to Nonlinear Systems,” Proc. SPIE, vol. 3068, pp. 182–193, 1997.

[25] I. Arasaratnam and S. Haykin, “Cubature Kalman Filters,” IEEE Transactions on Automatic Control, vol. 54, pp. 1254–1269, June 2009.

[26] N. Gordon, D. Salmond, and A. Smith, “Novel approach to nonlinear/non-Gaussian Bayesian state estimation,” IEE Proceedings of Radar and Signal Processing, vol. 140, pp. 107–113, April 1993.

[27] M. S. Arulampalam, S. Maskell, and N. Gordon, “A tutorial on parti-cle filters for online nonlinear/non-Gaussian Bayesian tracking,” IEEE Ttransactions on Signal Processing, vol. 50, pp. 174–188, 2002.

[28] A. Doucet, S. Godsill, and C. Andrieu, “On sequential Monte Carlo sam-pling methods for Bayesian filtering,” Statistics and computing, vol. 10, no. 3, pp. 197–208, 2000.

[29] Y. Bar-Shalom and E. Tse, “Tracking in a Cluttered Environment with Probabilistic Data Association,” Automatica, vol. 11, no. 5, pp. 451 – 460, 1975.

[30] T. Fortmann, Y. Bar-Shalom, and M. Scheffe, “Sonar Tracking of Multi-ple Targets Using Joint Probabilistic Data Association,” IEEE Journal of Oceanic Engineering, vol. 8, pp. 173 – 184, July 1983.

[31] D. B. Reid, “An Algorithm for Tracking Multiple Targets,” IEEE Trans-actions on Automatic Control, vol. 24, no. 6, pp. 843–854, 1979.

BIBLIOGRAPHY

[32] R. P. Mahler, Statistical Multisource-Multitarget Information Fusion.

Artech House, Inc., 2007.

[33] R. Mahler, “Multitarget Bayes Filtering via First-Order Multitarget Moments,” IEEE Transactions on Aerospace and Electronic Systems, vol. 39, no. 4, pp. 1152–1178, 2003.

[34] O. Erdinc, P. Willett, and Y. Bar-Shalom, “Probability Hypothesis Den-sity Filter for Multitarget Multisensor Tracking,” in 8th International Conference on Information Fusion, vol. 1, July 2005.

[35] R. Mahler, “PHD Filters of Higher Order in Target Number,” IEEE Transactions on Aerospace and Electronic Systems, vol. 43, no. 4, 2007.

[36] B.-N. Vo and W.-K. Ma, “The Gaussian mixture probability hypothesis density filter,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4091–4104, 2006.

[37] B.-T. Vo, B.-N. Vo, and A. Cantoni, “Analytical Implementations of the Cardinalized Probability Hypothesis Density Filter,” IEEE Trans-actions on Signal Processing, vol. 55, no. 7, 2007.

[38] R. Mahler, “PHD Filters for Nonstandard Targets, I: Extended Tar-gets,” in 12th International Conference on Information Fusion, 2009.

FUSION ’09., pp. 915 –921, July 2009.

[39] K. Granstr¨om and U. Orguner, “A PHD Filter for Tracking Multiple Ex-tended Targets Using Random Matrices,” IEEE Transactions on Signal Processing, vol. 60, pp. 5657–5671, November 2012.

[40] C. Lundquist, K. Granstr¨om, and U. Orguner, “An Extended Target CPHD Filter and a Gamma Gaussian Inverse Wishart Implementa-tion.,” Journal of Selected Topics in Signal Processing, vol. 7, no. 3, pp. 472–483, 2013.

[41] C. Lundquist, L. Hammarstrand, and F. Gustafsson, “Road Intensity Based Mapping Using Radar Measurements With a Probability Hypoth-esis Density Filter,” IEEE Transactions on Signal Processing, vol. 59, pp. 1397–1408, April 2011.

[42] M. Adams, J. Mullane, E. Jose, and B.-N. Vo, Robotic Navigation and Mapping with Radar. Artech House, 2012.

BIBLIOGRAPHY

[43] J. Mullane, B.-N. Vo, M. Adams, and B.-T. Vo, “A Random-Finite-Set Approach to Bayesian SLAM,” IEEE Transactions on Robotics, vol. 27, pp. 268–282, April 2011.

[44] R. Mahler, B.-T. Vo, and B.-N. Vo, “CPHD Filtering With Unknown Clutter Rate and Detection Profile,” IEEE Transactions on Signal Pro-cessing, vol. 59, pp. 3497 –3513, August 2011.

[45] D. Schuhmacher, B.-T. Vo, and B.-N. Vo, “A Consistent Metric for Performance Evaluation of Multi-Object Filters,” IEEE Transactions on Signal Processing, vol. 56, no. 8, pp. 3447–3457, 2008.

[46] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood From Incomplete Data Via the EM Algorithm,” Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1–38, 1977.

[47] K. P. Murphy, Machine Learning: a Probabilistic Perspective. MIT press, 2012.

[48] S. Borman, “The Expectation Maximization Algorithm – a Short Tuto-rial.” http://www.seanborman.com/publications/EM algorithm.pdf.

[49] C. M. Bishop, Pattern recognition and machine learning, vol. 1.

Springer, New York, 2006.

[50] M. J. Beal, Variational Algorithms For Approximate Bayesian Inference.

PhD thesis, University of London, 2003.

[51] M. J. Beal and Z. Ghahramani, “The variational Bayesian EM algo-rithm for incomplete data: with application to scoring graphical model structures,” Bayesian statistics, vol. 7, pp. 453–464, 2003.

[52] S. Kullback and R. A. Leibler, “On Information and Sufficiency,” The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951.

[53] T. Minka et al., “Divergence measures and message passing,” tech. rep., Microsoft Research, 2005.

[54] Z. Ghahramani and M. J. Beal, “Propagation algorithms for variational Bayesian learning,” Advances in neural information processing systems, pp. 507–513, 2001.

[55] C. Wang and D. M. Blei, “Variational Inference in Nonconjugate Mod-els,” The Journal of Machine Learning Research, vol. 14, no. 1, pp. 1005–

1031, 2013.

BIBLIOGRAPHY

[56] D. M. Blei, M. I. Jordan, et al., “Variational inference for Dirichlet process mixtures,” Bayesian analysis, vol. 1, no. 1, pp. 121–143, 2006.

[57] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. The MIT Press, 2006.

[58] C. E. Rasmussen, “Gaussian processes in machine learning,” in Advanced lectures on machine learning, pp. 63–71, Springer, 2004.

In document Bayesian filtering for automotive applications (Page 61-70)