Nonlinear Regression:

(1)

Nonlinear Regression:

A Powerful Tool With Considerable Complexity

Half-Day 1: Estimation and Standard Inference

Andreas Ruckstuhl

Institut f¨

ur Datenanalyse und Prozessdesign

Z¨

urcher Hochschule f¨

ur Angewandte Wissenschaften

WBL Statistik 2016 — Nonlinear Regression

Engineering

IDP Institute of Data Analysis and Process Design

(2)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

Outline:

Half-Day 1

Estimation and Standard Inference

The Nonlinear Regression Model

Iterative Estimation - Model Fitting

Inference Based on Linear Approximations

Half-Day 2

Improved Inference and Visualisation

Likelihood Based Inference

Profile t Plot and Profile Traces

Parameter Transformations

Half-Day 3

Bootstrap, Prediction and Calibration

Bootstrap

Prediction

Calibration

Outlook

(3)

1 The Nonlinear Regression Model

The regression model

Y

i

=

h

D

x

_i

(1)

, . . . ,

x

_i

(

m

)

;

θ

1 , θ

2 , . . . , θ

p

E

+

E

i

with

E

i

indep.

N

0 , σ

2 In case of the linear regression model

h

D

x

i(1)

, . . . ,

x

(m) i

;

θ

1

, θ

2

, . . . , θ

p

E

=

θ

1

· 1 +

θ

2

x

i(2)

+

. . .

+

θ

p

x

i(p)

(i.e.,

m

=

p

)

Examples of nonlinear regression function:

h

x

i

;

θ

i

=

θ

1

x

iθ3

θ

2

+

x

iθ3

h

x

;

θ

i

=

θ

1

exp

D

_θ

2

x

i

E

h

x

;

θ

i

= exp

θ

1

x

_i(1)

θ3

exp

−

θ

2

x

_i(2)

(4)

Example: Puromycin

The Michaelis-Menten model for enzyme kinetics relates the initial “velocity” of an

enzymatic reaction to the substrate concentration

• teated with

Puromycin

4 not treated

0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200 Concentration Velocity ● ● ● ● ● ● ● ● ● ● ●● Concentration Velocity

Y

i

=

θ

1

· x

i

θ

2

+

x

i

+

E

i

with

E

i

.

i

.

d

.

∼ N

0 , σ

2

(Michaelis-Menten model)

x

substrate concentration [ppm]

(5)

Example: Biochemical Oxygen Demand (BOD)

Biochemical oxygen demand of stream water

● ● ● ● ● ● 1 2 3 4 5 6 7 8 10 12 14 16 18 20 Time (days) Oxygen demand (mg/l) Time Oxygen demand

Y

i

=

θ

1

·

1 −

e

θ

2

· xi

+

E

i

mit

E

i

.

i

.

d

.

∼ N

0 , σ

2 ,

where

Y

is the biochemical oxygen demand (BOD) [

mg

/`

] and

x

the

incubation time [days]

(6)

Example: Cellulose Membrane

Ratio of protonated to deprotonated carboxyl groups within the pore of celluose

membrane versus pH value

x

of the bulk solution

● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 12 160 161 162 163 x (=pH) y (= chem. shift) (a) x y (b)

Theoretically, this relation is described by the Henderson-Hasselbach equation,

Y

i

=

θ

1

+

θ

2

·

10

θ3+θ4xi

1 + 10

θ3+θ4xi

+

E

i

= 1

, . . . ,

n

,

with

E

i

.

i

.

d

.

∼ N

(7)

Transformably Linear Models

Example: h

x, θ

=θ1·exp

D

_θ 2 x

E

Applying the log-transformation, we obtain

log

h

x

, θ

ii

= log

D

θ

1

· exp

D

_θ

2

x

EE

= log

h

θ

1

i

+ log

D

exp

D

_θ

2

x

EE

= log

h

θ

1

i

+

θ

2

·

1 x

Hence log

h

x, θ

=ϑ_{1 +}ϑ₂

e

x

The “complete” transformably linear

model is

log

h

Y

i

=

ϑ

1

+

ϑ

2

_e

x

i

+

E

i

,

E

i

.

i

.

d

.

∼ N

0 , σ

2

The error term is additive

In the original representation, the model

transforms to

Y

i

= exp

ϑ

1

+

ϑ

2

_e

x

i

+

E

i

=

θ

1

· exp

D

_θ

₂

x

E

· E

_e

i

i.e.,

E

_e

i

is log-normally distributed and

the error is multiplicative.

Conclusion:

Transform to a linear model only if required by the error structure.

+

Check assumptions on error term by residual analysis.

(8)

If there is a deterministic model

y

=

θ

1

· x

θ2

, the random component may be either

additiv or multiplicativ. – The Tukey-Anscombe plot of the fitted model will show

clearly which model is more adequate for the data.

0 200 400 600 800 −1.0 −0.5 0.0 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000 1200 −500 0 500 1000 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● −2 0 2 4 6 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● −2 0 2 4 6 −1.0 −0.5 0.0 0.5 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● lm(log(y) ~ log(x)) nls(y ~ a * x^b) y = a * x^b + E ln(y) = ln(a) + b*ln(x) + E

(9)

A selection of transformably linear models

h

x

, θ

i

= 1

/

(

θ

1

+

θ

2

exph−

x

i)

←→

1 /

h

x

, θ

i

=

θ

1

+

θ

2

exph−

x

i

h

x

, θ

i

=

θ

1

x

/

(

θ

2

+

x

)

←→

1 /

h

x

, θ

i

= 1

/θ

1

+

θ

2

/θ

1_x1

h

x

, θ

i

=

θ

1

x

θ2

←→

lnh

h

x

, θ

ii

= lnh

θ

1i

+

θ

2

lnh

x

i

h

x

, θ

i

=

θ

1

exph

θ

2

g

h

x

ii

←→

lnh

h

x

, θ

ii

= lnh

θ

1

i

+

θ

2

g

h

x

i

h

x

, θ

i

= exph−

θ

1

x

(1)

exph−

θ

2

/

x

(2)

ii

←→

lnhlnh

h

x

, θ

iii

= lnh−

θ

1i

+ lnh

x

(1)

i −

θ

2

/

x

(2)

h

x

, θ

i

=

θ

1

x

(1)

θ2

x

(2)

θ3

(10)

2 Model Fitting Using an Iterative Algorithm

The method of least squares:

Find the minimum of

S

hθi

=

n

X

i

=1

(

y

i

−

η

i

hθi

)

2 mit

η

i

hθi

=

h

hθ,

x

i

.

Key steps for minimising:

approximate

the surface

η

h

θ

i

at a temporarily best value

θ

(`)

_{by a tangent}

plane

where

η

θ

(`)

is the point of contact.

search the point on the plane

, which is closest to

Y

(that is a linear regression

fitting problem).

The new point lies on the plain but not on the surface. However, it defines a

parameter vector

θ

(`+1)

which will be used in the next iteration step.

(11)

Algebraically formulated

1

_{Linear approximation of}

_η

_i

hθi

_at

_θ

(

m

)

_:

η

i

h

θ

i ≈

η

i

θ

(m)

+

A

(m)

θ

−

θ

(m)

,

where

A

(m)

₌

_A

θ

(m)

is the derivative matrix of

η

h

θ

i

at

θ

(m)

_{in the}

_m

_-th

iteration step.

2

(Local) linear Model

e

Y

(m)

≈

A

(m)

β

(m)

+

E

where

Y

e

(m)

=

Y

−

η

θ

(m)

and

β

(m)

=

θ

−

θ

(m)

3

Least-squares estimation for

β

(

m

)

→

β

_b

(

m

)

.

Set

θ

(m+1)

=

θ

(m)

+

β

b

(m)

.

4

Repeat steps 1 to 3 until the procedure converges.

(12)

Starting Values

interpret the behaviour of the regression function in terms of the

parameter analytically or graphically

transform the regression function to obtain simpler, preferably linear,

behaviour

use your knowledge from previous or similar experiments

Example Puromycin (2) - using transformation

y

≈

h

x

, θ

i

=

θ

1

· x

i

θ

2

+

x

i

transform to linearity

y

_e

=

1 y

≈

1 h

h

x

, θ

i

=

θ

2

θ

1

·

1 x

+

1 θ

1

that is

y

_e

≈

β

1

x

_e

+

β

0

linear regression

+

β

b

= (0

.

005 ,

0 .

00025)

T

starting values:

b

θ

0 1

=

1 b

β

0

≈

196 b

θ

0 2

=

b

β

1

b

β

0

≈

0 .

048

(13)

Example Puromycin (3)

● ● ● ● ● ● ● ● ● ● ● ●

0

10

20

30

40

50

0.005

0.010

0.015

0.020 1/Concentration

1/V

elocity

● ● ● ● ● ● ● ● ● ● ● ●

0.0

0.2

0.4

0.6

0.8

1.0

50

100

150

200 Concentration

V

elocity

Left: Regression line used for determining the starting values

θ

1 and

θ

2 .

Right: Regression function

h

x

;

θi

based on the starting values

θ

=

θ

(0)

(

) and based on the least-squares estimation

θ

=

θ

b

(——–),

respectively.

(14)

Example: Cellulose membrane (2) - starting values

h

x

;

θi

=

θ

1 +

θ

2

·

10 θ

3

+

θ

4

x

1 + 10

θ

3

+

θ

4

x

mit

θ

4 <

0 We know:

h

x

;

θi −→

θ

1 for

x

→ ∞

h

x

;

θi −→

θ

2 for

x

→ −∞

From data, we obtain

θ

(0)

₁

= 163

.

7 und

θ

(0)

₂

= 159

.

5 Let

y

_e

i

= log

10

θ

1(0)

−

y

i

y

i

−

θ

(0) 2

,

hence

y

_e

i

=

θ

3

+

θ

4

x

i

.

Simple linear regression results in starting values for both

θ

3 and

θ

4 θ

(0)

₃

= 1

.

83 and

θ

₄

(0)

=

−

0 .

36 .

(15)

Example: Cellulose membrane (3)

● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ●

2

4

6

8

10

12 −2

−1

0

1

2 x (=pH)

y

(a) ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

2

4

6

8

10

12

160

161

162

163 x (=pH)

y (= chem. shift)

(b)

(a) Regression line used for determining the starting values

θ

3 and

θ

4 .

(b) Regression function

h

x

;

θi

based on the starting values

θ

=

θ

(0)

₍

₎

(16)

Self-Starter Function

For repeated use of the same nonlinear regression model

+

use an automated way of providing starting values.

Basically, collect all the manual steps which are necessary to obtain the

initial values for a nonlinear regression model into a function.

Self-starter functions are specific for a given mean function

and calculate starting values for a given dataset.

If

SSmicmen()

(c.f. next slide) is a self-starter function,

then you can run the fitting process as

nls(rate

∼

SSmicmen(conc, Vm, K), data=D.minor)

How to write your own self-starter functions

see help or, e.g., Ritz & Streibig (2008), Sec 3.2

With the standard installation of R,

(17)

Self-Starter Functions in the Standard

Installation

Model

Mean Function

Name of Self-Starter Function

Biexponential

A

1

· e

−x·elrc1

₊

_A

₂

_·

_e

−x·elrc2

_{SSbiexp(x, A1, lrc1, A2, lrc2)}

Asymptotic regression

Asym

+ (

R

0 −

Asym

)

· e

−x·elrc

SSasymp(x, Asym, R0, lrc)

Asymptotic

regression

with offset

Asym

· (1

−

e

−(x−c0)·elrc

₎

_{SSasympOff(x, Asym, lrc, c0)}

Asymptotic

regression

(c0 = 0)

Asym

· (1

−

e

−x·elrc

₎

_{SSasympOrig(x, Asym, lrc)}

First-order

x

1

·

elKe+lKa−lCl

elKa−elKe

SSfol(x1, x2, lKe, lKa, lCl)

compartment

· (

e

−x2·elKe

−

e

−x2·elKa

)

Gompertz

Asym

· e

−b2·b3x

SSgompertz(x, Asym, b2, b3)

Logistic

A

+

B−A

1+e(xmid−x)/scal

SSfpl(x, A, B, xmid, scal)

Logistic (A = 0)

Asym

1+e(xmid−x)/scal

SSlogis(x, Asym, xmid, scal)

Michaelis-Menten

Vm

·

x

K+x

SSmicmen(x, Vm, K)

(18)

3 Inference Based on Linear Approximations

As a look on the summary output of the Example “Cellulose Membrane” shows

it look very similar to the summary output of a fitted linear regression model:

Formula: delta

∼

(T1 + T2 * 10ˆ(T3 + T4 * pH))/(10ˆ(T3 + T4 * pH) + 1)

Parameters:

Value

Std. Error

t value

Pr(

>

|

t

|

)

θ

1

163.706 0.1262

1297.26

<

2e-16

***

θ

2

159.785 0.1594

1002.19

<

2e-16

***

θ

3

2.675 0.3813

7.02 3.65e-08

***

θ

4

-0.512

0.0703

-7.28

1.66e-08

***

Residual standard error: 0.293137 on 35 degrees of freedom

Number of iterations to convergence: 7

(19)

The Asymptotic Properties

This approach is based on the local linearization of the model

(cf. iterative estimation procedure)

Y

=

η

h

θ

i

+

A

β

b

+

E

where

A

h

θ

i

is the

n

×

p

matrix of partial derivatives.

If the estimation procedure has converged, then

β

b

= 0.

Asymptotic Distribution of the Least Squares Estimator

b

θ

as

∼ N h

.

θ,

V

h

θ

ii

with asymptotic covariance matrix

V

h

θ

i

=

σ

2

(

A

h

θ

i

T

(20)

Application in Practise

To explicitly determine the covariance matrix

V

hθi

, we plug-in estimates

instead of true parameters:

A

hθi

is calculated using

θ

b

+

A

b

.

For the error variance

σ

2 _{we plug-in the usual estimator.}

Hence,

b

V

=

σ

_b

2 A

b

T

A

b

−

1 where

b

σ

2 =

S

h

θi

b

n

−

p

=

1 n

−

p

n

X

i

=1

y

i

−

η

i

D

b

θ

E

2 and

A

b

=

A

D

b

θ

E

.

(21)

Approximate 95%-confidence interval

Hence, an approximate 95%-confidence interval for

β

k

is

b

θ

k

±

se

b

β

k

· q

tn−p 0.975

,

where

se

_b

β

b

k

is the square root of the

k

th diagonal element of

V

b

.

Example “Cellulose Membrane”

From the summary output

Parameters:

Value

Std. Error

t value

Pr(>

|

t

|

)

θ

1

163.706 0.1262

1297.26

<

2e-16

***

θ

2

159.785 0.1594

1002.19

<

2e-16

***

θ

3

2.675 0.3813

7.02 3.65e-08

***

θ

4

-0.512

0.0703

-7.28

1.66e-08

***

Residual standard error: 0.293137 on 35 degrees of freedom

we can calculate the 95% confidence interval for

θ

1

:

163.71 ±

0.13

· q

t35

(22)

Example: Puromycin - back to the initial data set

The Michaelis-Menten model for enzyme kinetics relates the initial “velocity” of an

enzymatic reaction to the substrate concentration

• teated with

Puromycin

4 not treated

0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200 Concentration Velocity ● ● ● ● ● ● ● ● ● ● ●● Concentration Velocity

Y

i

=

θ

1

· x

i

θ

2

+

x

i

+

E

i

with

E

i

.

i

.

d

.

∼ N

0 , σ

2

(Michaelis-Menten model)

x

substrate concentration [ppm]

(23)

Example: Puromycin (4)

Modell:

Yi

=

θ

1 xi

θ

2 +

xi

+

Ei

.

Model with and without treatment

(all data):

Y

i

=

(

θ

1 +

θ

3 z

i

)

x

i

θ

2 +

θ

4 z

i

+

x

i

+

E

i

.

where

z

i

=

(

1 for

”

with“

0 for

”

without“

Working

hypothesis:

Only

the

asymptotic velocity

θ

1 is influenced

by adding Puromycin. Hence

Null hypothesis:

θ

4 = 0

R output for the example Puromycin

Parameters:

Value

Std. Error

t value

Pr(

>

|t|

)

θ

1

160.286 6.8964

23.24 2.04e-15

θ

2

0.048 0.0083

5.76 1.50e-05

θ

3

52.398 9.5513

5.49 2.71e-05

θ

4

0.016 0.0114

1.44

0.167 Residual standard error: 10.4 on 19 df

Since the P-value of 0.167 is larger

than the level of 5%

the null hypothesis is not rejected on

the 5% level.

95% confidence interval for

θ

4

:

0 .

016 ±

0 .

0114

· q

t19

0.975

=

(24)

Inference for the expected value

E

h

Y

|

x

o

i

=

h

x

o

;

θi

at

x

o

:

Linear Regression

h

x

o, β

=

x

T

o

β

is estimated by

b

ηo

=

x

T

_o

β .

b

(1

−

α

)

· 100% confidence interval

for

h

x

o, β

is

b

ηo

±

q

t

n−p

1−α/2

· se

h

ηo

b

i

with

se

h

ηo

_b

i

=

σ

_b

q

x

T

o

(

X

T

X

)

−1

_x

o