ANZIAM J. 45 (E) ppC232–C247, 2004 C232
V -invariant methods, generalised least squares problems, and the Kalman filter
M. R. Osborne
∗Inge S¨ oderkvist
†(Received 8 August 2003; revised 31 March 2004)
Abstract
V -invariant methods for the generalised least squares problem ex- tend the techniques based on orthogonal factorization for ordinary least squares to problems with multiscaled, even singular covariances.
These methods are summarised briefly here, and the ability to han- dle multiple scales indicated. An application to a class of Kalman filter problems derived from generalised smoothing splines is consid- ered. Evidence of severe illconditioning of the covariance matrices is demonstrated in several examples. This suggests that this is an appropriate application for the V -invariant techniques.
∗School of Mathematical Sciences, ANUmailto:[email protected]
†Department of Mathematics, Lule˚a University of Technology
See http://anziamj.austms.org.au/V45/CTAC2003/Osbo for this article, c Aus- tral. Mathematical Soc. 2004. Published April 23, 2003, amended May 11, 2004. ISSN 1446-8735
ANZIAM J. 45 (E) ppC232–C247, 2004 C233
Contents
1 Introduction C233
2 V -invariant transformation C235
3 Solution of the GLSQ problem C238
4 g-splines and the Kalman filter C241
References C246
1 Introduction
The generalised least squares problem is
min
xr
TV
−1r ; r = Ax − b , (1) where A : R
p→ R
n, V : R
n→ R
n. It will be assumed that A has its full rank p < n , but only that V is positive semi-definite. Typically, in data analytic situations, the covariance matrix V has the dimension n of the data set and is often large, while p, the number of independent variables in the fitted model, is relatively small. Our aim here is to present an application of V -invariant methods appropriate to a class of Kalman filter problems which have distinctly illconditioned covariances and both stable and unstable dynamics. This application forces well defined sparse structures on both the design A and the covariance V .
A class of V -invariant algorithms has been introduced by Gulliksson and
Wedin [3]. Here we follow their basic development closely. Their problem
of particular interest is equality constrained least squares which can be ex-
pressed formally as a generalised least squares problem with singular (and
1 Introduction C234
diagonal) V . This class of problems provide a particular example of the abil- ity of V -invariant algorithms to support a form of multi-scaling. However, there are some costs, and they point out the importance of column pivoting in this application. This proves to be collateral damage occasioned by the multiple scale capability.
S¨ oderkvist [9] considered the Kalman Filter in the generalised least squares form [1] for the particular case of a diagonal covariance matrix V and was able to demonstrate superior numerical performance of his V -invariant methods for problems in which V possessed several distinct scales. The restriction to diagonal V is important in developing algorithms, and he has experimented with methods for reducing the problem covariance to one having this diago- nal form employing both an eigenvalue decomposition using Jacobi’s method (V = QΛQ
T) and a rank revealing Cholesky factorization with diagonal piv- oting (P V P
T= LDL
T) [10]. In this latter case he was concerned about the influence of possible errors in small elements of D. Recently, Osborne [7] has argued that provided the number of small elements in D is k ≤ p then their effect is benign so that errors in them due to the rank revealing factorization are insignificant.
The plan of this paper is as follows. First the main results of [3] are
summarised and an approach to deriving stability results in the case of ill-
conditioned V outlined. A class of Kalman filter problems which can occur
in computing smoothing splines and related objects is sketched. The struc-
ture of the generalised least squares form of this problem is then exploited in
implementing V -invariant factorization algorithms. Finally, results showing
that the relevant covariances can be very illconditioned are presented.
1 Introduction C235
2 V -invariant transformation
We say that the transformation matrix J : R
n→ R
nis V -invariant if
J V J
T= V . (2)
Let J
1and J
2be V -invariant. Then
• J
1−1, J
2−1, J
1J
2and J
2J
1are V -invariant,
• J
1T, J
2Tare V
−1-invariant (V nonsingular).
If V is singular then it is assumed that it has the reduced form V = 0 0
0 V
2. (3)
In this case J is V -invariant iff J = J
110
J
21J
22, J
22V
2J
22T= V
2, and J
11, J
22are nonsingular.
The ordinary least squares problem provides the best known example of V -invariance. Here J is orthogonal and the invariance condition becomes:
V = I ⇒ J IJ
T= I .
The V -invariant analogue of the Aitken-Householder elementary orthogonal transformation is the elementary reflector
J = I − 2 V vv
Tv
TV v , J
2= I . (4)
2 V -invariant transformation C236
To use (4) in matrix reduction to upper triangular form let u stand for the current pivotal column in the partially reduced matrix and u
2for the subvector to be reduced. Then v defining the current transformation step must be found such that
J u
1u
2=
u
1γe
1.
It follows from (4) that the scale of v is not important, and it is convenient to set
V v = s
0
u
2− γe
1, (5)
where s is a scale factor. The computation of v in a standard reduction to upper triangular form requires solution of this equation. However, it has a complexity comparable with the original problem unless V is readily invertible! The important special case corresponds to V = D diagonal with elements ordered in increasing magnitude. To complete the specification of v in this case let dim(u
1) = j − 1 ,
D = diag{d
j, j = 1, . . . , n} = diag{ε
1, ε
2, . . . , ε
k, ν
k+1, . . . , ν
n} , ε
1≤ ε
2≤ · · · ≤ ε
kν
k+1≤ · · · ≤ ν
n,
v =
0 v
2= D
j0
D
−12(u
2− γe
1)
.
Here s = d
jand the specified ordering results in the effective diagonal matrix N
j= d
jdiag{d
j, d
j+1, . . . , d
n}
−1having elements ≤ 1 . The importance of the ordering of the elements of D is shown also in the calculation of γ. This computation uses V -invariance:
u
1u
2 TJ
TD
−1J u
1u
2=
u
1γe
1 TD
−1u
1γe
1⇒ u
T2D
−12u
2= γ
2e
T1D
2−1e
1. (6)
There are two cases to consider in (6) depending on the relative values of j
and k:
2 V -invariant transformation C237
j ≤ k
γ
2= (u
2)
2j+
k
X
s=j+1
ε
jε
s(u
2)
2s+
n
X
s=k+1
ε
jν
s(u
2)
2s, and j > k
γ
2= (u
2)
2j+
n
X
s=j+1
ν
jν
s(u
2)
2s.
Note that there is multiple scale behavior when j ≤ k and that the limit ε → 0 can be defined, for example, by setting ε
s/ε
j= 1 , s > j . Also γ is the column length in the re-scaled metric defined by N
j.
If elements of J are large then this is an indicator of possible stability problems [3]! Let
J = I − 2cd
Tbe an elementary V -invariant reflector. Then
kJk
2= η + p
η
2− 1 , η = kck
2kdk
2. Here
η =
D
2−1(u
2− γe
1)
ku
2− γe
1k (u
2− γe
1)
TD
−12(u
2− γe
1) ,
⇒ kJk ≥ η ≥ ku
2k 2γ . That is kJ
jk will be large if
D
ju
T2D
2−1u
2ku
2k . The limit ε → 0 gives η large if
ku
εk ku
2k . (7)
Here u
εcorresponds to {ε
j, . . . , ε
k} in D. The role of column pivoting is
important in avoiding just this case.
2 V -invariant transformation C238
Example 1 The inequality (7) is a good guide to instability in V -invariant factorizations. Let
D =
ε
ε 1
, w =
α β ν
. The transformation taking w to γe
1in the limit ε → 0 is
I − π
α β ν
+ θ(π1)e
1
α β 0
+ θ(π1)e
1
T
,
where
π = 1
π1π2 , θ = sgn(α) , π1 = (α
2+ β
2)
1/2,
π2 = |α| + (α
2+ β
2)
1/2.
Note that the term πν would be large if α, β ν violating the stability condition (7).
3 Solution of the GLSQ problem
In the spirit of the Gauss-Markov theorem write the solution of (1) as a linear function of the data x = T b where T generates the best linear unbiassed estimator and satisfies
T Λ
D A
A
T0
= 0 I ,
3 Solution of the GLSQ problem C239
where V = D diagonal. Let J A = [R 0]
T, and J V -invariant. Under this transformation the system becomes
h h
T e
1T e
2i Λ
i
D
ε0 0 D
210
0 D
22
R 0
R
T0
0
= 0 I .
The solution of the generalised least squares problem can now be written down. This gives:
T e
1= R
−1, T e
2= 0 , Λ = R
−1D
ε0
0 D
21R
−T, x = R
−10 Jb .
The connection with the solution of the ordinary least squares problem will be recognised immediately.
If V is not diagonal then start with an LDL
Tfactorization of V and rewrite the problem (1) by setting L
−1r = e r = D
1/2s to obtain
min
xs
Ts ; D
1/2s = L
−1Ax − L
−1b .
This has been implemented by making a rank-revealing Cholesky factoriza- tion of V :
P V P
T→ Ldiag {d
n, d
n−1, . . . , d
1} L
T,
where the the lower triangular matrix L has unit diagonal, and diagonal pivoting ensures both
d
n≥ d
n−1≥ · · · ≥ d
1(8)
and
L
ij≤ 1 , i > j . (9)
3 Solution of the GLSQ problem C240
Our code terminates if a sufficiently small (or negative) d
iis encountered.
A key point is that any illconditioning in V is largely forced into D by this factorization as a consequence of the bound (9) on the subdiagonal elements of L. Conditions for success [4, e.g.] correspond to the assumptions made already on D. However, a further permutation step is needed to reverse the order of the elements in the computed D to stabilize the V -invariant transformation. This means the diagonal elements computed last in the Cholesky algorithm enter first into the factorization process. It would be expected that these small d
icould have high relative error. It is an important result that these errors are not significant. This is demonstrated in [7]. The result can be made plausible by considering the case
D = diag {0, . . . , 0, d
k+1, . . . , d
n} , k < p , which gives the equality constrained problem
min
xs
Ts ; 0 D
1/22s = A
1A
2x − b
1b
2. This is the limiting problem associated with the penalised objective
min
xr
T2D
−12r
2+ λr
T1r
1; r = A
1A
2x − b
1b
2, which has the alternative form
min
xs
Ts ; λ
−1/2I D
1/22s = A
1A
2x − b
1b
2. From the theory of penalty functions [2] expect
kx (λ) − x (∞)k = O (1/λ) , λ → ∞ .
The argument is developed by showing that a certain Jacobian matrix
corresponding to the λ = ∞ limit is nonsingular, and that the large λ case
is a small perturbation of this.
3 Solution of the GLSQ problem C241
4 g-splines and the Kalman filter
Generalised smoothing splines having the representation E {h
Tx(t) | y
1, . . . , y
n} can be defined by means of the class of stochastic differential equations:
dx = M x dt + σ √
λb dw , (10)
where M : R
m→ R
mand w is a unit scale Wiener process, in conjunction with the observation process
h
Tx(t
i) + ε
i= y
i, ε
i∼ N (0, σ
2) , (11) where the errors ε
iare assumed independent. Here the vector h must satisfy identifiability conditions and, together with b, determines the smoothness of the resulting spline [6]. Let the fundamental matrix X(t, ξ) be defined by
dX
dt = M X , X(ξ, ξ) = I .
Variation of parameters gives the discrete dynamics equation x
i+1= X
ix
i+ σ √
λu
i, (12)
where
u
i= Z
ti+1ti
X(t
i+1, s)b dw ds ds , u
i∼ N (0, σ
2λR
i)) .
This system—dynamics + observation equations—leads via the Kalman filter subject to a diffuse prior to the generalised least squares problem [1]
min
xr
T1R
−1r
1+ r
T2V
−1r
2, (13)
4 g-splines and the Kalman filter C242
where
r
1r
2=
−X
1I
−X
2I . ..
−X
n−1I h
Th
T. ..
h
T
x −
0 y
, (14)
R = σ
2λdiag{R
1, R
2, . . . , R
n−1} , V = σ
2I , (15) and
R
i= Z
ti+1ti
X(t
i+1, s)bb
TX(t
i+1, s)
Tds . (16) Paige and Saunders [8] suggested reducing this problem to an ordinary least squares problem by making a Cholesky factorization of the covariance blocks R
i. These are then used to rescale the dynamics equations to produce a problem with unit diagonal covariance that can be reduced by orthogonal transforma- tions. This leads to their famous information filter. However, S˝ oderkvist [9]
showed there are potential problems arising from illconditioned R
i. The al-
ternative we consider is to use a V -invariant factorization starting with a
rank revealing Cholesky applied to the R
icorresponding to successive col-
umn blocks, followed by within column block sorting of the diagonal ele-
ments d
iand division by potentially much better behaved lower triangular
matrices L
i. Straight forward application of this procedure under the or-
dering given in (14) results in accumulating fill. This is seen readily by
4 g-splines and the Kalman filter C243
considering the first few steps. These give:
−L
−11X
1L
−11−L
−12X
2−L
−12.. . .. . .. . h
Th
Th
T
→
U
1W
1−L
−12X
2L
−12.. . .. . .. .
z
T11h
Th
T
→
U
1W
1U
2W
2.. . .. . .. .
z
T21z
T22h
T
.
The ordering used in the Paige and Saunders information filter generates
less direct fill. However, fill can be controlled to a total of m + 1 rows in
the next to pivotal block column by orthogonal transformations which are in
this context V -invariant as the blocks affected are associated with covariances
proportional to the unit matrix being derived from the second term in (13).
4 g-splines and the Kalman filter C244
The orthogonal transformations are first applied at step m
. . . . . . . . .
U
mW
m−L
−1m+1X
m+1L
−1m+1. . . . . . . . .
z
Tm1z
Tm2.. . z
Tmmh
T
→
. . . . . . . . .
U
mW
m−L
−1m+1X
m+1L
−1m+1. . . . . . . . .
Z
m0
,
where Z
m: R
m→ R
m. It is convenient to carry out the transformation in two steps in order to compute auxiliary quantities such as the innovations x
i|i−1. These are:
z
Ti1.. . z
Timh
T
→ " Z
i1h
T#
→ Z
i0
.
Our initial implementation has proved satisfactory in early experiments.
In particular, there has been good control over the magnitudes occurring.
The information filter has been used previously in problems of this kind [5], but it is not able to cope with the severe illconditioning that can occur in the R
imatrices [9]. This illconditioning is directly related to the smoothness of the spline. This is illustrated in the following examples.
Example 2 Quintic splines. This corresponds to the case
M =
0 1 0 0 0 1 0 0 0
, h =
1 0 0
, b =
0 0 1
,
4 g-splines and the Kalman filter C245
with h and b chosen for maximum smoothness. The covariance matrix blocks are readily computed:
R
i= δ
δ4 20
δ3 8
δ3 6 δ3
8 δ2
3 δ 2 δ3
6 δ
2
1
.
The rank revealing Cholesky gives
P R
iP
T= δ
1
δ
2
1
δ2
6
−
δ21
1
δ2 12
δ4 720