15. Multi-objective least squares

(1)

(2)

Outline

Multi-objective least squares problem

Control

Estimation and inversion Regularized data fitting

(3)

Multi-objective least squares

I goal: choose

n-vector

x

so that

k

norm squared objectives

J

₁

=

kA1

x b

₁

k

2

, . . . ,

J

_k

=

kA

_k

x b

_k

k

2 are all small

I

A

_i is an

m

_i

⇥ n

matrix,

b

_i is an

m

_i-vector,

i = 1, . . . ,k

I

J

_i are the objectives in a multi-objective optimization problem

(also called a multi-criterion problem)

I could choose

x

to minimize any one

J

_i, but we want one

x

that makes them

(4)

Weighted sum objective

I choose positive weights ₁

_{, . . . ,}

_k and form weighted sum objective

J =

₁

J

₁

+

· · · +

_k

J

_k

=

₁

kA1

x b

₁

k

2

+

· · · +

_k

kA

_k

x b

_k

k

2

I we’ll choose

x

to minimize

J

I we can take ₁

₌

1, and call

J

₁ the primary objective

I interpretation of _i: how much we care about

J

_i being small, relative to

primary objective

I for a bi-criterion problem, we will minimize

(5)

Weighted sum minimization via stacking

I write weighted-sum objective as

J =

26

66

4 p

1

(A

1

x b

1

)

..

.

p

k

(Ak

x b

k

)

37

77

5

2

I so we have

J = k ˜Ax ˜bk

2, with

˜A =

26

66 ₆₆

66

4 p

1

A

1

..

.

p

k

A

k

37

77

5 ,

˜b =

26

66

4 p

1

b

1

..

.

p

k

b

k

37

77

5

(6)

Weighted sum solution

I assuming columns of

˜A

are independent,

ˆx = ( ˜A

T

˜A)

1

˜A

T

˜b

=

(

₁

A

T₁

A

₁

+

· · · +

_k

A

T_k

A

_k

)

1

(

₁

A

T₁

b

₁

+

· · · +

_k

A

T_k

b

_k

)

I can compute

ˆx

via QR factorization of

˜A

(7)

Optimal trade-off curve

I bi-criterion problem with objectives

J

₁,

J

₂ I let

ˆx( )

be minimizer of

J

₁

₊

J

₂

I called Pareto optimal: there is no point

z

that satisfies

J

₁

(z) < J

1

(ˆx( )), J

2

(z) < J

2

(ˆx( ))

i.e., no other point

x

beats

ˆx

on both objectives

(8)

Example

A

₁ and

A

₂ both

_{10 ⇥ 5}

10

4

10

2

10

0

10

2

10

4

0.2

0

0.2

0.4

0.6 ˆx

1

( )

ˆx

2

( )

ˆx

3

( )

ˆx

4

( )

ˆx

5

( )

(9)

Objectives versus and optimal trade-off curve

10

4

10

2

10

0

10

2

10

4

5

10

15 J

₁

( )

J

₂

( )

6

8

10

12

14

16

5

10 =

0.1 =

1 =

10 J

( )

J

2

(

)

(10)

Using multi-objective least squares

I identify the primary objective

– the basic quantity we want to minimize

I choose one or more secondary objectives

– quantities we’d also like to be small, if possible

– e.g., size of x, roughness of x, distance from some given point

I tweak/tune the weights until we like (or can tolerate)

ˆx( )

I for bi-criterion problem with

J = J

₁

₊

J

₂:

– if J₂ is too big, increase

(11)

Outline

Multi-objective least squares problem Control

(12)

Control

I

n-vector

x

corresponds to actions or inputs I

m-vector

y

corresponds to results or outputs

I inputs and outputs are related by affine input-output model

y = Ax + b

I

A

and

b

are known (from analytical models, data fitting . . . )

I the goal is to choose

x

(which determines

y), to optimize multiple

(13)

Multi-objective control

I typical primary objective:

J

₁

₌

ky y

des

k

2, where

y

des is a given desired or

target output

I typical secondary objectives:

– x is small: J₂ = kxk2

(14)

Product demand shaping

I we will change prices of

n

products by

n-vector

price I this induces change in demand dem

₌

E

d price

I

E

d is the

n ⇥ n

price elasticity of demand matrix I we want

J

₁

₌

k

dem tar

k

2 small

I and also, we want

J

₂

₌

k

price

k

2 small

I so we minimize

J

₁

₊

J

₂, and adjust

_>

0

(15)

Robust control

I we have

K

different input-output models (a.k.a. scenarios)

y

(k)

=

A

(k)

x + b

(k)

,

k = 1, . . . ,K

I these represent uncertainty in the system

I

y

(k) is the output with input

x, if system model

k

is correct I average cost across the models:

1 K

K

X

k=1

ky

(k)

y

des

_k

2 2

(16)

Outline

Estimation and inversion

(17)

Estimation

I measurement model:

y = Ax + v

I

n-vector

x

contains parameters we want to estimate I

m-vector

y

contains the measurements

I

m-vector

v

are (unknown) noises or measurement errors I

m ⇥ n

matrix

A

connects parameters to measurements

I basic least squares estimation: assuming

v

is small (and

A

has

(18)

Regularized inversion

I can get far better results by incorporating prior information about

x

into

estimation, e.g.,

– x should be not too large

– x should be smooth

I express these as secondary objectives:

– J₂ = kxk2 (‘Tikhonov regularization’)

– J₂ = kDxk2

I we minimize

J

₁

₊

J

₂

I adjust until you like the results

I curve of

ˆx( )

versus is called regularization path

(19)

Image de-blurring

I

x

is an image

I

A

is a blurring operator

I

y = Ax + v

is a blurred, noisy image

I least squares de-blurring: choose

x

to minimize

kAx yk

2

+

(kD

_v

xk

2

+

kD

_h

xk

2

)

D

_v,

D

_h are vertical and horizontal differencing operations

(20)

Example

(21)

Regularization path

(22)

Regularization path

(23)

Tomography

I

x

represents values in region of interest of

n

voxels (pixels)

I

y = Ax + v

are measurements of integrals along lines through region

y

i

=

n

X

i=1

A

ij

x

j

+

v

i

I

A

_ij is the length of the intersection of the line in measurement

i

with voxel

j

line in measurement i x₁ x₂

(24)

Least squares tomographic reconstruction

I primary objective is

kAx yk

2

I regularization terms capture prior information about

x

I for example, if

x

varies smoothly over region, use Dirichlet energy for graph

(25)

Example

I left: 4000 lines (100 points, 40 lines per point)

(26)

Regularized least squares reconstruction

=

10

2

=

10

1

=

1

(27)

Outline

(28)

Motivation for regularization

I consider data fitting model (of relationship

y ⇡ f (x)

)

ˆf(x) = ✓

1

f

1

(x) + · · · + ✓

p

f

p

(x)

with

f

₁

(x) = 1

I

_✓

_i is the sensitivity of

ˆf(x)

to

f

_i

(x)

I so large

_✓

_i means the model is very sensitive to

f

_i

(x)

I

_✓

₁ is an exception, since

f

₁

(x) = 1

never varies

(29)

Regularized data fitting

I suppose we have training data

x

(1)

_{, . . . ,}

x

(N),

y

(1)

_{, . . . ,}

y

(N) I express fitting error on data set as

A✓ y

I regularized data fitting: choose

_✓

to minimize

kA✓ yk

2

+

k✓2:p

k

2

I

_>

0

is the regularization parameter

I for regression model

ˆy = X

T

₊

v

1

, we minimize

(30)

Example

0 0.5 1 1 0 1 2 3 x Train Test

I solid line is signal used to generate synthetic (simulated) data

I 10 blue points are used as training set; 20 red points are used as test set I we fit a model with five parameters

_✓

₁, . . . ,

_✓

₅:

4

X

(31)

Result of regularized least squares fit

10 5 10 3 10 1 101 103 105 0 0.5 1 RMS error versus Train Test 10 5 10 3 10 1 101 103 105 2 1 0 1 2 Coefficients versus ✓₁ ✓₂ ✓₃ ✓₄ ✓₅