where M(ψ, P ) has been defined in (5.12). Thus, problem 5.3.1can be formalized as:
ˆ ψ, ˆP = arg min wy,P kψ − P ˆw y Bk 2 s. t. M (ψ, P )≥ 0 (5.13) T r(P ) = n P = P>≥ 0
where the constraint T r(P ) = n is added to improve the numerical conditioning, see
Miller and de Callafon(2013) for further details.
The solution ˆwy of Problem5.3.1is finally computed as: ˆ
wy = ˆP−1ψˆ (5.14)
In the remaining of the chapter the model ˆG(z) obtained by plugging in (5.6) the estimators ˆwy and ˆwEBu obtained respectively from (5.14) and the EB procedure in (5.5), will be called “LMI” model.
5.4
Stabilization via Penalty Function
The second stabilization technique is developed to act directly inside the Gaussian regression procedure. As discussed in Section2.4, a crucial step is the estimation of the hyperparameter vector η, that can be done e.g. through marginal likelihood optimization (2.48). It turns out that some hyperparameters η may lead to estimators (5.5) which do not correspond to stable models ˆG(z) and ˆH(z). Thus, one possible remedy is to restrict the set of admissible hyperparameters to a subset ΩS which leads to stable models. This
is not entirely trivial as the estimators (and thus the set ΩS) depend on the measured
data Y, U. Accordingly, Problem5.2.1can be formulated as follows.
Problem 5.4.1 (Reformulation). Estimate the hyperparameters η restricting the search
ˆ η = arg max η∈ΩS pη(Y ) = arg min η∈ΩS − ln pη(Y ) (5.15)
to the set ΩS = {η| ˆA(z) Stable}, i.e., the set of hyperparameters which leads to stable
models ˆG(z), ˆH(z).
Since the set ΩS cannot be determined a priori because it is data dependent, a
102 Enforcing Model Stability in Nonparametric Gaussian Regression
interpreted as a barrier to push the estimate ˆη into ΩS, or equivalently, to keep ˆη away
from the set of hyperparameters η which leads to an unstable A(z).
Denote with Aη(z) the polynomial A(z) in (5.7) built with the estimator
ˆ
wyη := Eη[wy|Y ], (5.16)
which is to indicate that ˆwy
η is obtained with the specific hyperparameters ˆwyη and
define the dominant root of Aη(z) as ¯ρη := max |σ(Aη(z))|, where σ(A(z)) denotes the
set of roots of the polynomial A(z).
Next, the penalty function J(¯ρη) can be defined:
J(¯ρη) = 1
(α(δ − ¯ρη))α −
1
(αδ)α (5.17)
where δ ≥ 1 is a scalar which determines the limit point corresponding to an infinite value of the function and α is a positive scalar which adjusts the steepness of the function.
7 ;2 0.2 0.4 0.6 0.8 1 J ( 7;2 ) 0 0.5 1 1.5 2 2.5 3 3.5 7 ;
A
!
/,
Figure 5.1: Representation of the penalty function J(¯ρη). The red bullet represents the
value of the penalty function associated to a specific ¯ρ in an illustrative example of an unstable
polynomial Aη(z). The blue arrows show the effects of the penalty function on ¯ρ while
estimating the hyperparameters. The black arrows show the effects of changing the parameters
α and δ.
5.4 Stabilization via Penalty Function 103
diverges (J(¯ρη) → ∞) when ¯ρ → δ and J(¯ρη) → 0 when ¯ρ → 0. Thus, when (5.17) is
added to the minimization problem (5.15), the effect is of penalizing the solutions η which yields ¯ρη outside the stability region.
As it will be shown in Algorithm 11, the two parameters α and δ are iteratively adjusted until the estimated hyperparameters lead to a stable forward model which solves the constrained problem (5.15).
Note that when α → 0, J(¯ρη) gives no penalty for η < δ and an infinite penalty for
η ≥ δ. Elaborating upon the intuition above, it is easy to prove that the solution of Problem 5.4.1can be found by the algorithm described below:
Algorithm 11 Stabilization via Penalty Function
1: Init:
2: Compute η0 through marginal likelihood maximization (Section 2.4.3), 3: Compute the predictor impulse response ˆwyη0 using (5.16),
4: Compute Aη0(z) and ¯ρη0 associated to ˆw
y η0, 5: Set α = 1. 6: while ¯ρηk ≥ 1 do 7: Set δ = ¯ρηk(1 + ), 8: Compute ηk= arg min η − ln pη(Y ) + J(¯ρη) (5.18)
and the associated ¯ρηk,
9: if − ln pηk(Y ) + J(¯ρηk) = − ln pηk−1(Y ) + J(¯ρηk−1) then
10: α = α− ∆α, with ∆α sufficiently small,
11: δ = δ− ∆δ, with ∆δ sufficiently small, 12: Set α = and δ = 1.
13: The solution of Problem 5.4.1is given by:
ˆ
η = arg min
η − ln pη(Y ) + J(¯ρη) (5.19)
ˆ
wyηˆ = Eηˆ[wy|Y ], wˆuˆη = Eηˆ[wu|Y ] (5.20)
In the remaining of the paper the model obtained by (5.6) using (5.20) will be called “ML + PF” model.
Remark 5.4.2. Notice that the iterative procedure which updates δ and α is needed because, in general, there is no guarantee that one can find an initial value of η ∈ ΩS.
Note also that the set ΩS is always non-empty provided the hyperparameter η includes
a scaling factor for the Kernel, i.e., a scalar variable which multiplies the Kernel. In fact, if this is the case, there exist values of η which leads to an estimator ˆwy = On×1 which,
104 Enforcing Model Stability in Nonparametric Gaussian Regression