There are different techniques to describe the estimation of unknown parameters from given observations, one among these is the well known least-squares estimation. Least- squares estimation is a very effective numerical method and leads to best unbiased esti- mators for linear relationship and observations disturbed by Gaussian noise.
In this section, we will briefly introduce the Gauss–Markov model, which contains a functional and stochastic model to frame the observation process. The functional model specifies the assumed relation between the acquired observations and the unknown pa- rameters as an explicit function, which usually results from physical or geometrical laws. The stochastic model specifies the statistical properties of the observation process, and is assumed to be sufficiently described by the first and second moments of a normal distri- bution. The Gauss–Markov model covers many practical estimators including maximum likelihood (ML) and maximum a posteriori (MAP) estimators. For a detailed introduction into estimation theory with emphasis on least squares estimation please refer to the books of Koch (1999, Chap. 3) or F¨orstner and Wrobel (2016, Chap. 4).
2.3.1 Estimation with Non-linear Gauss–Markov Model
The Gauss–Markov model starts from N observations l = [ln], n = 1, ..., N , which are
assumed to be a sample of a multivariate Gaussian distribution N (˜l, Σll) around a true but
unknown observation vector ˜l with a symmetric and positive definite covariance matrix Σll.
Due to the noise induced by the observation process, there are in general no parameters x for which a functional model f (x) = l holds. Therefore the goal is to find corrections bv for observations l and best estimates x such that the relationb
f (x) = l +b bv = bl (2.24)
between the fitted observations bl = l +bv and the estimated parameters bx holds and the weighted sum of the squared residuals
Ω(x) =b bvTΣ−1ll bv (2.25)
is minimum.
The optimization problem therefore reads as
b
x = argminx(f (x) − l)TΣ−1ll (f (x) − l) , (2.26)
which leads to estimated parametersx, which have minimal variance, i.e., are best.b For a nonlinear function f (x) the solution is iterative. Starting from initial valuesxb(ν=0)
2.3 Weighted Least-squares Estimation 33
for the estimated parametersx in the first iteration ν = 0 we determine updates db ∆x(ν=0)
b
x(ν+1) =xb(ν)+ d∆x(ν). (2.27)
Each following iteration solves for the updates d∆x(ν) of the linearized function
l +bv(ν)= f (xb(ν)) + Ad∆x(ν) (2.28)
with Jacobian matrix
A= ∂f (x) ∂x x =xb (ν) (2.29)
evaluated at initial parametersxb(ν). With the reduced observations
∆l(ν)= f (bx(ν)) − l (2.30)
we can determine the unknown parameter updates d∆x(ν)from the normal equation system
ATΣ−1ll Ad∆x(ν)= ATΣ−1ll ∆l(ν) (2.31)
for example with Cholesky factorization (Golub and Loan, 1996, Sec. 4.2).
The corrections of the observations can be determined linearly after each iteration by
bv(ν)= Ad∆x(ν)− ∆l(ν) (2.32)
which after convergence are equal to the non-linearly determined corrections
bv = f(bx) − l . (2.33)
We arrive atx :=b bx(ν)in case of convergence, i.e., d∆x → 0. Convergence is achieved if all updates for parametersx are small compared to their standard deviation, |∆b bxu/σxu| < Tc,
e.g. with a threshold Tc= 0.01, requiring the updates to be less than 1 % of their standard
deviation.
The full covariance matrix of the estimated parameters is obtained by
Σ b xbx =bσ 2 0(ATΣ−1ll A) −1 (2.34)
with estimated variance factor
bσ2
0 =
bvTΣ−1ll bv
R (2.35)
with the redundancy R = N − U of the optimization problem with the number N of observations, i.e. the dimension of vector l, and the number U of unknown parameters, i.e. the dimension of vector x.
2.3.2 Robust Estimation
The presented least squares estimation is highly sensitive to outliers in the observations as the weighted sum of squared residuals is minimized. Observations are usually consid- ered as outliers if the realized measurement is significantly out of the dispersion range of the expected value. Within an estimation procedure, outliers can be detected based on the magnitude of a computed residual bvn. Following Baarda (1967) for uncorrelated
observations the test value
Tn= bv n σ b vn (2.36) with Σ b vbv = Σll− AΣbxbxA T (2.37)
follows the standard normal distribution Tn∼ N (0, 1) if there are no gross errors in the
observations. Assuming all observations to have an equally high influence on the parameter vector, one could use σln instead of σbvn in Eq. (2.36). If Tn deviates significantly from
the standard normal distribution, the corresponding observation can be assumed to be an outlier, thus should be eliminated from the estimation process. Rigorous testing for outliers by means of hypothesis testing is treated by Koch (1999).
Alternatively, the influence of high residuals on the cost function can be reduced by robust estimation techniques as reweighting procedures, which can be directly incorpo- rated into the iterative estimation procedure of non-linear least-squares. Assuming again stochastically uncorrelated observations, Eq. (2.25) can be rewritten as
Ω(x) =X n 1 2 vn σln 2 =X n ρ(yn) (2.38)
with normalized residuals yn= vn/σln and piecewise influence functions
ρ(yn) =
1 2y
2
n. (2.39)
To arrive at a robust estimation procedure, Huber (1981) proposes using a probability density function for the observations which consists of a normal distribution in the middle and of a Laplace distribution at the ends. This way the density function has more prob- ability mass at the ends and thus allows to model a certain amount of gross errors in the observations. The modified influence function ρH(yn) is defined as
ρH(yn) = 1 2y2n for |yn| ≤ k, k(|yn| −k2), otherwise, (2.40)