Chapter 3_handout.pdf

(1)

1 CHAPTER 3

R

ECURSIVE

E

STIMATION FOR

L

INEAR

M

ODELS

•Organization of chapter in ISSO

–Linear models

•Relationship between least-squares and mean-square

–LMS and RLS estimation

•Applications in adaptive control

–LMS, RLS, and Kalman filter for time-varying solution –Case study: Oboe reed data

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

3-2

Basic Linear Model and MSE

• Consider estimation of vector  in model that linear in

• Model has classical linear form

• where z_kis kth measurement, h_kis corresponding

“design vector,” and v_k is unknown noise value

• Model used extensively in control, statistics, signal processing, etc.

• Many estimation/optimization criteria based on “squared-error”-type loss functions

– Leads to criteria that are quadratic in

– Unique (global) estimate 

• “Ideal” metric is MSE

• Special case of above is when

data pairs {h_k_, z_k} are

i.i.d. across k; then L()=

½

E[(z_kh_kT)2]  T ,

k k k

z h v





 

 _



 _

 1 1 ₁  2

2

( ) n

(

T

)

k k k

(2)

3-3

Least-Squares Estimation

• MSE (previous slide) usually infeasible

• Most common practical method for estimating in linear model is by method of least squares

• Criterion (loss function) has form

• where Z_n= [z₁, z₂,…, z_n]T and H_n is n p concatenated

matrix of h_kT row vectors

• Classical batchleast-squares estimate is

• Popular recursiveestimates (LMS, RLS, Kalman filter)

may be derived from batch estimate ‒ To be discussed below







   

  2  

1

1 1

ˆ( )

2

(

)

2 ( ) ( )

n

T T

k k n n n n

k

L z

n h n Z H Z H



_ˆ

( )n



₍

T

₎

1 T n n n n

H H

H Z

3-4

Geometric Interpretation of Least-Squares

(3)

3 1-5

Recursive Estimation and Online Training

• Above provides framework for special linear case of “online training” (see Example 3 in slides for Chap. 1)

– online = recursive (same meaning)

• If have many i.i.d. input-output data pairs {h_k_, z_k} (i.e., large n), then minimum of close to minimum of L() = ½MSE

• Note that sometimes called “empirical risk function” (ERF) in machine learning literature (but usually for nonlinear models)

• If process data one at a time, then have on-line training

such as stochastic gradient algorithm

– Note that gradient of summand in ERF, ½(z_kh_kT)2/_,

represents “noisy” value of true gradient L/(nearly

unbiased estimate when nis large)

• Contrast is batch training where all data are processed at each iteration via sumof squared errors



ˆ( )

L



ˆ( )

L

3-6

Recursive Estimation: LMS

•Batch form not convenient in many applications

– E.g., data arrive over time and want “easy” way to update estimate at time kto estimate at time k+1

•Least-mean-squares (LMS) method is very popular recursive method

– Stochastic analogue of steepest descent algorithm •LMS recursion:

•Convergence theory based on stochastic approximation (e.g., Ljung, et al., 1992; Gerencsér, 1995)

– Less rigorous theory based on connections to steepest descent (ignores noise) (Widrow and Stearns, 1985; Haykin, 1996)



ˆ

_₁





ˆ



_₁

(

T_₁



ˆ



_₁

)

,



0

(4)

3-7

LMS in Closed-Loop Control

•Suppose process is modeled according to autoregressive (AR) form:

where x_k represents state, and _i are unknown

parameters, u_k is control, and w_k is noise

•Let target (“desired”) value for x_k be d_k

•Optimal control law known (minimizes mean-square tracking error):

•Certainty equivalence principlejustifies substitution of

parameter estimates for unknown true parameters

– LMS used to estimate and _i in closed-loop mode

1 0  1 1      ,

k k k m k m k k

x x x x u w

         







1 0 1 1

k k k m k m

k

d x x x

u

3-8

LMS in Closed-Loop Control for

(5)

5 3-9

Recursive Least Squares (RLS)

•Alternative to LMS is RLS

– Recall LMS is stochastic analogue of steepest descent (“first order” method)

– RLS is stochastic analogue of Newton-Raphson (“second-order” method) faster convergence than LMS in practice – Next slide discusses second-order interpretation

•RLS algorithm (2 recursions):

•Need P₀ and to initialize RLS recursions

1 1

1

1 1

1 1 1 1 1

1

(

)

T k k k k

k k _T

k k k T

k k k k k k k

ˆ ˆ ˆ _z

                  

P h h P

P P

h P h

P h h

0

ˆ 

3-10

Interpretation of RLS as

Stochastic Newton



Raphson Method

• RLS has close connection to NewtonRaphson algorithm in Section 1.4 (see Subsection 3.2.5)

• Connection based on two assumptions:

– minimizes cumulative sum of squares through time k

– P_kis small (in matrix sense) relative to P₀

• Above assumptions reasonable in many applications: First assumption means that is “good” estimate and second assumption consistent with cumulative sum (3.12) in ISSO

• Then,

where [Hessian_k₊₁] denotes Hessian matrix of cumulative least-squares sum through time k+1

                     

1 1 1 1 1

1 1 Hessian , ( )

[

]

k T

k k k k k k k

k k

ˆ

ˆ ˆ ˆ _z

ˆ L ˆ

P h h

ˆ_k

(6)

3-11

Recursive Methods for Estimation of

Time-Varying Parameters

•It is common to have the underlying true  evolve in time (e.g., target tracking, adaptive control, sequential

experimental design, etc.)

– Time-varying parameters impliesreplaced with _k

•Consider modified linear model

•Prototype recursive form for estimating _k is

where choice of A_k and _k depends on specific algorithm

1 1

(

T 1 1

),

k k k k k k k k

ˆ _ _ _A ˆ _ _ _h _ _A ˆ __z _

   

T

k k k k

z  h  v

3-12

Three Important Algorithms for Estimation

of Time-Varying Parameters

•LMS

– Goal is to minimize instantaneous squared-error criteria across iterations

– General form for evolution of true parameters _k

•RLS

– Goal is to minimize weighted sum of squared errors – Sum criterion creates “inertia” not present in LMS – General form for evolution of _k

•Kalman filter

– Minimizes instantaneous squared-error criteria

– Requires precise statistical description of evolution of _k

via state-space model

(7)

7 3-13

Case Study: LMS and RLS with Oboe Reed Data

…an ill wind that nobody blows good.

—Comedian Danny Kaye in speaking of the oboe in the “The Secret Life of Walter Mitty” (1947)

•Section 3.4 of ISSOreports on linear and curvilinear

models for predicting quality of oboe reeds

– Linear model has 7 parameters; curvilinear has 4 parameters

•This study compares LMS and RLS with batch least-squares estimates

– 160 data points for fitting models (reeddata-fit); 80 (independent) data points for testing models ( reeddata-test)

– reeddata-fitand reeddata-testdata sets available from ISSO Web site

(8)

3-15

Comparison of Fitting Results for

reeddata-fit

and

reeddata-test

• To test similarity offitandtestdata sets, performed

model fittingusing testdata set

• This comparison is for checking consistency of the two data sets; not for checking accuracy of LMS or RLS

estimates

• Compared model fits for parameters in

– Basic linear model (eqn. (3.25) in ISSO) (p= 7)

– Curvilinear model (eqn. (3.26) in ISSO) (p= 4)

• Results on next slide for basic linear model

3-16

Comparison of Batch Parameter Estimates for

Basic Linear Model. Approximate 95%

Confidence Intervals Shown in [ꞏ, ꞏ]

reeddata-fit reeddata-test Constant,

const

0.156 [0.52, 0.21]

0.240 [0.75, 0.28] Top close, T 0.102

[0.01, 0.19] [0.12, 0.25] 0.067 Appearance,

A

0.055 [0.08, 0.19]

0.178 [0.03, 0.39] Ease of

Gouge, E

0.175 [0.05, 0.30]

0.095 [0.15, 0.34] Vascular, V 0.044

[0.08, 0.17]

0.125 [0.06, 0.31] Shininess,

S

0.056 [0.06, 0.17]

0.066 [0.13, 0.26] First blow, F 0.579

(9)

9 3-17

Comparison of Batch and RLS with

Oboe Reed Data

• Compared batch and RLS using 160 data points in

reeddata-fit and 80 data points for testing models

in reeddata-test

• Two slides to follow present results

– First slide compares parameter estimates in pure linear model

– Second slide compares prediction errors for linear and curvilinear models

3-18

Batch and RLS Parameter Estimates for Basic

Linear Model (Data from

reeddata-fit

)

Batch

Estimates EstimatesRLS Constant,

const 0.156 0.079

Top close, T 0.102 0.101

Appearance,

A

0.055 0.046 Ease of

Gouge, E

0.175 0.171

Vascular, V 0.044 0.043

Shininess,

S 0.056 0.056

(10)

3-19

Mean and Median Absolute Prediction

Errors for the Linear and Curvilinear Models

(Model fits from

reeddata-fit;

Prediction

Errors from

reeddata-test

)

Batch linear model

RLS linear model

Batch curvilinear

model

RLS curvilinear

model Mean 0.242 0.242 0.235 0.235 Median 0.243 0.250 0.227 0.224

 Ran matched-pairs t-test on linear versus curvilinear

models. Used one-sided test.

P-value for Batch/linear versus Batch/curvilinear is

0.077

P-value for RLS/linear vs. RLS/curvilinear is 0.10