• No results found

Nonlinear Regression:

N/A
N/A
Protected

Academic year: 2021

Share "Nonlinear Regression:"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

Nonlinear Regression:

A Powerful Tool With Considerable Complexity

Half-Day 1: Estimation and Standard Inference

Andreas Ruckstuhl

Institut f¨

ur Datenanalyse und Prozessdesign

urcher Hochschule f¨

ur Angewandte Wissenschaften

WBL Statistik 2016 — Nonlinear Regression

Engineering

IDP Institute of Data Analysis and Process Design

(2)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

Outline:

Half-Day 1

Estimation and Standard Inference

The Nonlinear Regression Model

Iterative Estimation - Model Fitting

Inference Based on Linear Approximations

Half-Day 2

Improved Inference and Visualisation

Likelihood Based Inference

Profile t Plot and Profile Traces

Parameter Transformations

Half-Day 3

Bootstrap, Prediction and Calibration

Bootstrap

Prediction

Calibration

Outlook

(3)

1 The Nonlinear Regression Model

The regression model

Y

i

=

h

D

x

i

(1)

, . . . ,

x

i

(

m

)

;

θ

1

, θ

2

, . . . , θ

p

E

+

E

i

with

E

i

indep.

N

0

, σ

2

In case of the linear regression model

h

D

x

i(1)

, . . . ,

x

(m) i

;

θ

1

, θ

2

, . . . , θ

p

E

=

θ

1

·

1 +

θ

2

x

i(2)

+

. . .

+

θ

p

x

i(p)

(i.e.,

m

=

p

)

Examples of nonlinear regression function:

h

h

x

i

;

θ

i

=

θ

1

x

3

θ

2

+

x

3

h

h

x

;

θ

i

=

θ

1

exp

D

θ

2

x

i

E

h

h

x

;

θ

i

= exp

θ

1

x

i(1)

θ3

exp

θ

2

x

i(2)

(4)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

Example: Puromycin

The Michaelis-Menten model for enzyme kinetics relates the initial “velocity” of an

enzymatic reaction to the substrate concentration

teated with

Puromycin

4

not treated

0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200 Concentration Velocity ● ● ● ● ● ● ● ● ● ● ●● Concentration Velocity

Y

i

=

θ

1

·

x

i

θ

2

+

x

i

+

E

i

with

E

i

i

.

i

.

d

.

∼ N

0

, σ

2

(Michaelis-Menten model)

x

substrate concentration [ppm]

(5)

Example: Biochemical Oxygen Demand (BOD)

Biochemical oxygen demand of stream water

● ● ● ● ● ● 1 2 3 4 5 6 7 8 10 12 14 16 18 20 Time (days) Oxygen demand (mg/l) Time Oxygen demand

Y

i

=

θ

1

·

1

e

θ

2

·

xi

+

E

i

mit

E

i

i

.

i

.

d

.

∼ N

0

, σ

2

,

where

Y

is the biochemical oxygen demand (BOD) [

mg

/`

] and

x

the

incubation time [days]

(6)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

Example: Cellulose Membrane

Ratio of protonated to deprotonated carboxyl groups within the pore of celluose

membrane versus pH value

x

of the bulk solution

● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 12 160 161 162 163 x (=pH) y (= chem. shift) (a) x y (b)

Theoretically, this relation is described by the Henderson-Hasselbach equation,

Y

i

=

θ

1

+

θ

2

·

10

θ3+θ4xi

1 + 10

θ3+θ4xi

+

E

i

i

= 1

, . . . ,

n

,

with

E

i

i

.

i

.

d

.

∼ N

(7)

Transformably Linear Models

Example: h

x, θ

=θ1·exp

D

θ 2 x

E

Applying the log-transformation, we obtain

log

h

h

h

x

, θ

ii

= log

D

θ

1

·

exp

D

θ

2

x

EE

= log

h

θ

1

i

+ log

D

exp

D

θ

2

x

EE

= log

h

θ

1

i

+

θ

2

·

1

x

Hence log

h

x, θ

=ϑ1 +ϑ2

e

x

The “complete” transformably linear

model is

log

h

Y

i

i

=

ϑ

1

+

ϑ

2

e

x

i

+

E

i

,

E

i

i

.

i

.

d

.

∼ N

0

, σ

2

The error term is additive

In the original representation, the model

transforms to

Y

i

= exp

ϑ

1

+

ϑ

2

e

x

i

+

E

i

=

θ

1

·

exp

D

θ

2

x

E

·

E

e

i

i.e.,

E

e

i

is log-normally distributed and

the error is multiplicative.

Conclusion:

Transform to a linear model only if required by the error structure.

+

Check assumptions on error term by residual analysis.

(8)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

If there is a deterministic model

y

=

θ

1

·

x

θ2

, the random component may be either

additiv or multiplicativ. – The Tukey-Anscombe plot of the fitted model will show

clearly which model is more adequate for the data.

0 200 400 600 800 −1.0 −0.5 0.0 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000 1200 −500 0 500 1000 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● −2 0 2 4 6 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● −2 0 2 4 6 −1.0 −0.5 0.0 0.5 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● lm(log(y) ~ log(x)) nls(y ~ a * x^b) y = a * x^b + E ln(y) = ln(a) + b*ln(x) + E

(9)

A selection of transformably linear models

h

h

x

, θ

i

= 1

/

(

θ

1

+

θ

2

exph−

x

i)

←→

1

/

h

h

x

, θ

i

=

θ

1

+

θ

2

exph−

x

i

h

h

x

, θ

i

=

θ

1

x

/

(

θ

2

+

x

)

←→

1

/

h

h

x

, θ

i

= 1

1

+

θ

2

1x1

h

h

x

, θ

i

=

θ

1

x

θ2

←→

lnh

h

h

x

, θ

ii

= lnh

θ

1i

+

θ

2

lnh

x

i

h

h

x

, θ

i

=

θ

1

exph

θ

2

g

h

x

ii

←→

lnh

h

h

x

, θ

ii

= lnh

θ

1

i

+

θ

2

g

h

x

i

h

h

x

, θ

i

= exph−

θ

1

x

(1)

exph−

θ

2

/

x

(2)

ii

←→

lnhlnh

h

h

x

, θ

iii

= lnh−

θ

1i

+ lnh

x

(1)

i −

θ

2

/

x

(2)

h

h

x

, θ

i

=

θ

1

x

(1)

θ2

x

(2)

θ3

(10)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

2 Model Fitting Using an Iterative Algorithm

The method of least squares:

Find the minimum of

S

hθi

=

n

X

i

=1

(

y

i

η

i

hθi

)

2

mit

η

i

hθi

=

h

hθ,

x

i

i

.

Key steps for minimising:

approximate

the surface

η

h

θ

i

at a temporarily best value

θ

(`)

by a tangent

plane

where

η

θ

(`)

is the point of contact.

search the point on the plane

, which is closest to

Y

(that is a linear regression

fitting problem).

The new point lies on the plain but not on the surface. However, it defines a

parameter vector

θ

(`+1)

which will be used in the next iteration step.

(11)

Algebraically formulated

1

Linear approximation of

η

i

hθi

at

θ

(

m

)

:

η

i

h

θ

i ≈

η

i

θ

(m)

+

A

(m)

θ

θ

(m)

,

where

A

(m)

=

A

θ

(m)

is the derivative matrix of

η

h

θ

i

at

θ

(m)

in the

m

-th

iteration step.

2

(Local) linear Model

e

Y

(m)

A

(m)

β

(m)

+

E

where

Y

e

(m)

=

Y

η

θ

(m)

and

β

(m)

=

θ

θ

(m)

3

Least-squares estimation for

β

(

m

)

β

b

(

m

)

.

Set

θ

(m+1)

=

θ

(m)

+

β

b

(m)

.

4

Repeat steps 1 to 3 until the procedure converges.

(12)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

Starting Values

interpret the behaviour of the regression function in terms of the

parameter analytically or graphically

transform the regression function to obtain simpler, preferably linear,

behaviour

use your knowledge from previous or similar experiments

Example Puromycin (2) - using transformation

y

h

h

x

, θ

i

=

θ

1

·

x

i

θ

2

+

x

i

transform to linearity

y

e

=

1

y

1

h

h

x

, θ

i

=

θ

2

θ

1

·

1

x

+

1

θ

1

that is

y

e

β

1

x

e

+

β

0

linear regression

+

β

b

= (0

.

005

,

0

.

00025)

T

starting values:

b

θ

0 1

=

1

b

β

0

196

b

θ

0 2

=

b

β

1

b

β

0

0

.

048

(13)

Example Puromycin (3)

● ● ● ● ● ● ● ● ● ● ● ●

0

10

20

30

40

50

0.005

0.010

0.015

0.020

1/Concentration

1/V

elocity

● ● ● ● ● ● ● ● ● ● ● ●

0.0

0.2

0.4

0.6

0.8

1.0

50

100

150

200

Concentration

V

elocity

Left: Regression line used for determining the starting values

θ

1

and

θ

2

.

Right: Regression function

h

h

x

;

θi

based on the starting values

θ

=

θ

(0)

(

) and based on the least-squares estimation

θ

=

θ

b

(——–),

respectively.

(14)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

Example: Cellulose membrane (2) - starting values

h

h

x

;

θi

=

θ

1

+

θ

2

·

10

θ

3

+

θ

4

x

1 + 10

θ

3

+

θ

4

x

mit

θ

4

<

0

We know:

h

h

x

;

θi −→

θ

1

for

x

→ ∞

h

h

x

;

θi −→

θ

2

for

x

→ −∞

From data, we obtain

θ

(0)

1

= 163

.

7 und

θ

(0)

2

= 159

.

5

Let

y

e

i

= log

10

θ

1(0)

y

i

y

i

θ

(0) 2

,

hence

y

e

i

=

θ

3

+

θ

4

x

i

.

Simple linear regression results in starting values for both

θ

3

and

θ

4

θ

(0)

3

= 1

.

83

and

θ

4

(0)

=

0

.

36

.

(15)

Example: Cellulose membrane (3)

● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ●

2

4

6

8

10

12

−2

−1

0

1

2

x (=pH)

y

(a) ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

2

4

6

8

10

12

160

161

162

163

x (=pH)

y (= chem. shift)

(b)

(a) Regression line used for determining the starting values

θ

3

and

θ

4

.

(b) Regression function

h

h

x

;

θi

based on the starting values

θ

=

θ

(0)

(

)

(16)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

Self-Starter Function

For repeated use of the same nonlinear regression model

+

use an automated way of providing starting values.

Basically, collect all the manual steps which are necessary to obtain the

initial values for a nonlinear regression model into a function.

Self-starter functions are specific for a given mean function

and calculate starting values for a given dataset.

If

SSmicmen()

(c.f. next slide) is a self-starter function,

then you can run the fitting process as

nls(rate

SSmicmen(conc, Vm, K), data=D.minor)

How to write your own self-starter functions

see help or, e.g., Ritz & Streibig (2008), Sec 3.2

With the standard installation of R,

(17)

Self-Starter Functions in the Standard

Installation

Model

Mean Function

Name of Self-Starter Function

Biexponential

A

1

·

e

x·elrc1

+

A

2

·

e

x·elrc2

SSbiexp(x, A1, lrc1, A2, lrc2)

Asymptotic regression

Asym

+ (

R

0

Asym

)

·

e

x·elrc

SSasymp(x, Asym, R0, lrc)

Asymptotic

regression

with offset

Asym

·

(1

e

−(xc0)·elrc

)

SSasympOff(x, Asym, lrc, c0)

Asymptotic

regression

(c0 = 0)

Asym

·

(1

e

x·elrc

)

SSasympOrig(x, Asym, lrc)

First-order

x

1

·

elKe+lKalCl

elKaelKe

SSfol(x1, x2, lKe, lKa, lCl)

compartment

·

(

e

xelKe

e

xelKa

)

Gompertz

Asym

·

e

bb3x

SSgompertz(x, Asym, b2, b3)

Logistic

A

+

BA

1+e(xmidx)/scal

SSfpl(x, A, B, xmid, scal)

Logistic (A = 0)

Asym

1+e(xmidx)/scal

SSlogis(x, Asym, xmid, scal)

Michaelis-Menten

Vm

·

x

K+x

SSmicmen(x, Vm, K)

(18)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

3 Inference Based on Linear Approximations

As a look on the summary output of the Example “Cellulose Membrane” shows

it look very similar to the summary output of a fitted linear regression model:

Formula: delta

(T1 + T2 * 10ˆ(T3 + T4 * pH))/(10ˆ(T3 + T4 * pH) + 1)

Parameters:

Value

Std. Error

t value

Pr(

>

|

t

|

)

θ

1

163.706

0.1262

1297.26

<

2e-16

***

θ

2

159.785

0.1594

1002.19

<

2e-16

***

θ

3

2.675

0.3813

7.02

3.65e-08

***

θ

4

-0.512

0.0703

-7.28

1.66e-08

***

Residual standard error: 0.293137 on 35 degrees of freedom

Number of iterations to convergence: 7

(19)

The Asymptotic Properties

This approach is based on the local linearization of the model

(cf. iterative estimation procedure)

Y

=

η

h

θ

i

+

A

β

b

+

E

where

A

h

θ

i

is the

n

×

p

matrix of partial derivatives.

If the estimation procedure has converged, then

β

b

= 0.

Asymptotic Distribution of the Least Squares Estimator

b

θ

as

∼ N h

.

θ,

V

h

θ

ii

with asymptotic covariance matrix

V

h

θ

i

=

σ

2

(

A

h

θ

i

T

(20)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

Application in Practise

To explicitly determine the covariance matrix

V

hθi

, we plug-in estimates

instead of true parameters:

A

hθi

is calculated using

θ

b

+

A

b

.

For the error variance

σ

2

we plug-in the usual estimator.

Hence,

b

V

=

σ

b

2

A

b

T

A

b

1

where

b

σ

2

=

S

h

θi

b

n

p

=

1

n

p

n

X

i

=1

y

i

η

i

D

b

θ

E

2

and

A

b

=

A

D

b

θ

E

.

(21)

Approximate 95%-confidence interval

Hence, an approximate 95%-confidence interval for

β

k

is

b

θ

k

±

se

b

b

β

k

·

q

tnp 0.975

,

where

se

b

β

b

k

is the square root of the

k

th diagonal element of

V

b

.

Example “Cellulose Membrane”

From the summary output

Parameters:

Value

Std. Error

t value

Pr(>

|

t

|

)

θ

1

163.706

0.1262

1297.26

<

2e-16

***

θ

2

159.785

0.1594

1002.19

<

2e-16

***

θ

3

2.675

0.3813

7.02

3.65e-08

***

θ

4

-0.512

0.0703

-7.28

1.66e-08

***

Residual standard error: 0.293137 on 35 degrees of freedom

we can calculate the 95% confidence interval for

θ

1

:

163.71

±

0.13

·

q

t35

(22)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

Example: Puromycin - back to the initial data set

The Michaelis-Menten model for enzyme kinetics relates the initial “velocity” of an

enzymatic reaction to the substrate concentration

teated with

Puromycin

4

not treated

0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200 Concentration Velocity ● ● ● ● ● ● ● ● ● ● ●● Concentration Velocity

Y

i

=

θ

1

·

x

i

θ

2

+

x

i

+

E

i

with

E

i

i

.

i

.

d

.

∼ N

0

, σ

2

(Michaelis-Menten model)

x

substrate concentration [ppm]

(23)

Example: Puromycin (4)

Modell:

Yi

=

θ

1

xi

θ

2

+

xi

+

Ei

.

Model with and without treatment

(all data):

Y

i

=

(

θ

1

+

θ

3

z

i

)

x

i

θ

2

+

θ

4

z

i

+

x

i

+

E

i

.

where

z

i

=

(

1

for

with“

0

for

without“

Working

hypothesis:

Only

the

asymptotic velocity

θ

1

is influenced

by adding Puromycin. Hence

Null hypothesis:

θ

4

= 0

R output for the example Puromycin

Parameters:

Value

Std. Error

t value

Pr(

>

|t|

)

θ

1

160.286

6.8964

23.24

2.04e-15

θ

2

0.048

0.0083

5.76

1.50e-05

θ

3

52.398

9.5513

5.49

2.71e-05

θ

4

0.016

0.0114

1.44

0.167

Residual standard error: 10.4 on 19 df

Since the P-value of 0.167 is larger

than the level of 5%

the null hypothesis is not rejected on

the 5% level.

95% confidence interval for

θ

4

:

0

.

016

±

0

.

0114

·

q

t19

0.975

=

(24)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

Inference for the expected value

E

h

Y

|

x

o

i

=

h

h

x

o

;

θi

at

x

o

:

Linear Regression

h

x

o, β

=

x

T

o

β

is estimated by

b

ηo

=

x

T

o

β .

b

(1

α

)

·

100% confidence interval

for

h

x

o, β

is

b

ηo

±

q

t

np

1−α/2

·

se

h

ηo

b

i

with

se

h

ηo

b

i

=

σ

b

q

x

T

o

(

X

T

X

)

−1

x

o

Nonlinear Regression

h

h

x

o

, θi

is estimated by

b

η

o

=

h

D

x

o

,

θ

b

E

.

(1

α

)

·

100% confidence interval for

h

h

x

o

, θi

is

h

D

x

o

,

θ

b

E

±

q

t

np

1

α/

2

·

se

h

η

b

o

i

with

se

h

η

b

o

i

=

σ

b

q

b

a

T

o

A

b

T

A

b

1

b

a

o

and

a

b

o

=

h

h

x

o

, θi

∂θ

θ

=

b

θ

.

(25)

Confidence Band

Left: Confidence band (i.g., pointwise confidence intervals) for a fitted

straight line (linear regression model).

Right: Confidence band for the fitted curve

h

h

x

, θi

of the example

’Biochemical Oxygen Demand’.

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●

1.0

1.2

1.4

1.6

1.8

2.0

2.2

0

1

2

3

Years^(1/3)

log(PCB Concentr

ation)

● ● ● ● ● ●

0

2

4

6

8

0

5

10

15

20

25

30

Days

Oxygen Demand

(26)

The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations

Variable Selection

How about variable selection in nonlinear regression?

There is no one-to-one correspondence between predictor variables and

parameter as in linear regression!

Hence, the number of variables may differ from the number of parameters.

There are hardly ever problems, where some of the variables are in

question (Model is derived from subject matter theory!)

However, there are problems where a submodel

(a submodel is nested within

the full model)

may be adequat to describe the data;

cf. Example Puromycin, Slide 17, Half-Day 1.

If we have a collection of candidate

I

which need not to be submodels of each other and

I

the subject matter is somehow indifferent to this models,

I

but we want to find the the most appropriate model for the data

one can use Akaike’s information criterion (AIC) to select the best model

(and/or run a residual analysis)

(27)

Take Home Message Half-Day 1

In

nonlinear regression

,

Y

i

=

h

h

x

i

, θi

+

E

i

,

functions

h

are analysed which are not linear functions of the unknown

parameters

θ

.

Such models are often derived from the

subject matter theory

.

The flexibility of this model class is bought by a

more complex

estimation and inference theory

.

I

Parameter estimation is done by an iterative procedure

which needs

appropriate starting values

.

I

Inference is based on an asymptotic theory.

For finite sample size the results just hold

approximately

References

Related documents

protective factor in health care access in the United States. Disclosure of trans identity served as a protective or risk factor, depending on the outcome being examined. Moreover,

This session will delve into the various lien types that may encumber real property, possible issues of priority amongst liens, and how to protect the priority of a mortgage

[r]

The existence of such point-wise bounds can provide price bounds on measures of aggregate assets without making any assumptions on the dependence structure.. The Fréchet-Hoeding

The objec- tives of this study were to determine how different seeding rates and application rates of mepiquat- type plant growth regulator compounds (PGR) affected cotton growth

The negotiation of randomness and design, labyrinth and compost, decoded in abstract or figurative terms is also relevant to apprehend the cultural background of the

Flicker’s goals are (1) to provide multiple hardware options in terms of peripherals and harvesting technologies, (2) realize runtime and design time flexibility, (3) enable

While much of the CBA’s current work is clearly focussed on autism, other areas of education and special educational needs are also addressed both in the area of teaching and