Nonlinear Regression:
A Powerful Tool With Considerable Complexity
Half-Day 1: Estimation and Standard Inference
Andreas Ruckstuhl
Institut f¨
ur Datenanalyse und Prozessdesign
Z¨
urcher Hochschule f¨
ur Angewandte Wissenschaften
WBL Statistik 2016 — Nonlinear Regression
Engineering
IDP Institute of Data Analysis and Process Design
The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
Outline:
Half-Day 1
Estimation and Standard Inference
The Nonlinear Regression Model
Iterative Estimation - Model Fitting
Inference Based on Linear Approximations
Half-Day 2
Improved Inference and Visualisation
Likelihood Based Inference
Profile t Plot and Profile Traces
Parameter Transformations
Half-Day 3
Bootstrap, Prediction and Calibration
Bootstrap
Prediction
Calibration
Outlook
1 The Nonlinear Regression Model
The regression model
Y
i
=
h
D
x
i
(1)
, . . . ,
x
i
(
m
)
;
θ
1
, θ
2
, . . . , θ
p
E
+
E
i
with
E
i
indep.
N
0
, σ
2
In case of the linear regression model
h
D
x
i(1), . . . ,
x
(m) i;
θ
1, θ
2, . . . , θ
pE
=
θ
1·
1 +
θ
2x
i(2)+
. . .
+
θ
px
i(p)(i.e.,
m
=
p
)
Examples of nonlinear regression function:
h
h
x
i;
θ
i
=
θ
1x
iθ3θ
2+
x
iθ3h
h
x
;
θ
i
=
θ
1exp
D
θ
2x
iE
h
h
x
;
θ
i
= exp
θ
1x
i(1) θ3exp
−
θ
2x
i(2)The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
Example: Puromycin
The Michaelis-Menten model for enzyme kinetics relates the initial “velocity” of an
enzymatic reaction to the substrate concentration
•
teated with
Puromycin
4
not treated
0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200 Concentration Velocity ● ● ● ● ● ● ● ● ● ● ●● Concentration VelocityY
i=
θ
1·
x
iθ
2+
x
i+
E
iwith
E
ii
.
i
.
d
.
∼ N
0
, σ
2(Michaelis-Menten model)
x
substrate concentration [ppm]
Example: Biochemical Oxygen Demand (BOD)
Biochemical oxygen demand of stream water
● ● ● ● ● ● 1 2 3 4 5 6 7 8 10 12 14 16 18 20 Time (days) Oxygen demand (mg/l) Time Oxygen demand
Y
i
=
θ
1
·
1
−
e
θ
2·
xi
+
E
i
mit
E
i
i
.
i
.
d
.
∼ N
0
, σ
2
,
where
Y
is the biochemical oxygen demand (BOD) [
mg
/`
] and
x
the
incubation time [days]
The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
Example: Cellulose Membrane
Ratio of protonated to deprotonated carboxyl groups within the pore of celluose
membrane versus pH value
x
of the bulk solution
● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 12 160 161 162 163 x (=pH) y (= chem. shift) (a) x y (b)
Theoretically, this relation is described by the Henderson-Hasselbach equation,
Y
i=
θ
1+
θ
2·
10
θ3+θ4xi1 + 10
θ3+θ4xi+
E
ii
= 1
, . . . ,
n
,
with
E
ii
.
i
.
d
.
∼ N
Transformably Linear Models
Example: hx, θ=θ1·expD
θ 2 xE
Applying the log-transformation, we obtain
log
h
h
h
x
, θ
ii
= log
D
θ
1·
exp
D
θ
2x
EE
= log
h
θ
1i
+ log
D
exp
D
θ
2x
EE
= log
h
θ
1i
+
θ
2·
1
x
Hence loghx, θ=ϑ1 +ϑ2e
xThe “complete” transformably linear
model is
log
h
Y
ii
=
ϑ
1+
ϑ
2e
x
i+
E
i,
E
ii
.
i
.
d
.
∼ N
0
, σ
2The error term is additive
In the original representation, the model
transforms to
Y
i= exp
ϑ
1+
ϑ
2e
x
i+
E
i=
θ
1·
exp
D
θ
2x
E
·
E
e
ii.e.,
E
e
iis log-normally distributed and
the error is multiplicative.
Conclusion:
Transform to a linear model only if required by the error structure.
+
Check assumptions on error term by residual analysis.
The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
If there is a deterministic model
y
=
θ
1·
x
θ2, the random component may be either
additiv or multiplicativ. – The Tukey-Anscombe plot of the fitted model will show
clearly which model is more adequate for the data.
0 200 400 600 800 −1.0 −0.5 0.0 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000 1200 −500 0 500 1000 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● −2 0 2 4 6 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● −2 0 2 4 6 −1.0 −0.5 0.0 0.5 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● lm(log(y) ~ log(x)) nls(y ~ a * x^b) y = a * x^b + E ln(y) = ln(a) + b*ln(x) + E
A selection of transformably linear models
h
h
x
, θ
i
= 1
/
(
θ
1+
θ
2exph−
x
i)
←→
1
/
h
h
x
, θ
i
=
θ
1+
θ
2exph−
x
i
h
h
x
, θ
i
=
θ
1x
/
(
θ
2+
x
)
←→
1
/
h
h
x
, θ
i
= 1
/θ
1+
θ
2/θ
1x1h
h
x
, θ
i
=
θ
1x
θ2←→
lnh
h
h
x
, θ
ii
= lnh
θ
1i+
θ
2lnh
x
i
h
h
x
, θ
i
=
θ
1exph
θ
2g
h
x
ii
←→
lnh
h
h
x
, θ
ii
= lnh
θ
1i
+
θ
2g
h
x
i
h
h
x
, θ
i
= exph−
θ
1x
(1)exph−
θ
2/
x
(2)ii
←→
lnhlnh
h
h
x
, θ
iii
= lnh−
θ
1i+ lnh
x
(1)i −
θ
2/
x
(2)h
h
x
, θ
i
=
θ
1x
(1) θ2x
(2)θ3The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
2 Model Fitting Using an Iterative Algorithm
The method of least squares:
Find the minimum of
S
hθi
=
n
X
i
=1
(
y
i
−
η
i
hθi
)
2
mit
η
i
hθi
=
h
hθ,
x
i
i
.
Key steps for minimising:
approximate
the surface
η
h
θ
i
at a temporarily best value
θ
(`)by a tangent
plane
where
η
θ
(`)is the point of contact.
search the point on the plane
, which is closest to
Y
(that is a linear regression
fitting problem).
The new point lies on the plain but not on the surface. However, it defines a
parameter vector
θ
(`+1)which will be used in the next iteration step.
Algebraically formulated
1
Linear approximation of
η
i
hθi
at
θ
(
m
)
:
η
ih
θ
i ≈
η
iθ
(m)+
A
(m)θ
−
θ
(m),
where
A
(m)=
A
θ
(m)is the derivative matrix of
η
h
θ
i
at
θ
(m)in the
m
-th
iteration step.
2
(Local) linear Model
e
Y
(m)≈
A
(m)β
(m)+
E
where
Y
e
(m)
=
Y
−
η
θ
(m)and
β
(m)=
θ
−
θ
(m)3
Least-squares estimation for
β
(
m
)
→
β
b
(
m
)
.
Set
θ
(m+1)=
θ
(m)+
β
b
(m)
.
4
Repeat steps 1 to 3 until the procedure converges.
The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
Starting Values
interpret the behaviour of the regression function in terms of the
parameter analytically or graphically
transform the regression function to obtain simpler, preferably linear,
behaviour
use your knowledge from previous or similar experiments
Example Puromycin (2) - using transformation
y
≈
h
h
x
, θ
i
=
θ
1·
x
iθ
2+
x
itransform to linearity
y
e
=
1
y
≈
1
h
h
x
, θ
i
=
θ
2θ
1·
1
x
+
1
θ
1that is
y
e
≈
β
1x
e
+
β
0linear regression
+
β
b
= (0
.
005
,
0
.
00025)
Tstarting values:
b
θ
0 1=
1
b
β
0≈
196
b
θ
0 2=
b
β
1b
β
0≈
0
.
048
Example Puromycin (3)
● ● ● ● ● ● ● ● ● ● ● ●0
10
20
30
40
50
0.005
0.010
0.015
0.020
1/Concentration
1/V
elocity
● ● ● ● ● ● ● ● ● ● ● ●0.0
0.2
0.4
0.6
0.8
1.0
50
100
150
200
Concentration
V
elocity
Left: Regression line used for determining the starting values
θ
1
and
θ
2
.
Right: Regression function
h
h
x
;
θi
based on the starting values
θ
=
θ
(0)
(
) and based on the least-squares estimation
θ
=
θ
b
(——–),
respectively.
The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
Example: Cellulose membrane (2) - starting values
h
h
x
;
θi
=
θ
1
+
θ
2
·
10
θ
3+
θ
4x
1 + 10
θ
3+
θ
4x
mit
θ
4
<
0
We know:
h
h
x
;
θi −→
θ
1
for
x
→ ∞
h
h
x
;
θi −→
θ
2
for
x
→ −∞
From data, we obtain
θ
(0)
1
= 163
.
7 und
θ
(0)
2
= 159
.
5
Let
y
e
i= log
10θ
1(0)−
y
iy
i−
θ
(0) 2,
hence
y
e
i=
θ
3+
θ
4x
i.
Simple linear regression results in starting values for both
θ
3
and
θ
4
θ
(0)
3
= 1
.
83
and
θ
4
(0)
=
−
0
.
36
.
Example: Cellulose membrane (3)
● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ●2
4
6
8
10
12
−2
−1
0
1
2
x (=pH)
y
(a) ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●2
4
6
8
10
12
160
161
162
163
x (=pH)
y (= chem. shift)
(b)(a) Regression line used for determining the starting values
θ
3
and
θ
4
.
(b) Regression function
h
h
x
;
θi
based on the starting values
θ
=
θ
(0)
(
)
The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
Self-Starter Function
For repeated use of the same nonlinear regression model
+
use an automated way of providing starting values.
Basically, collect all the manual steps which are necessary to obtain the
initial values for a nonlinear regression model into a function.
Self-starter functions are specific for a given mean function
and calculate starting values for a given dataset.
If
SSmicmen()
(c.f. next slide) is a self-starter function,
then you can run the fitting process as
nls(rate
∼
SSmicmen(conc, Vm, K), data=D.minor)
How to write your own self-starter functions
see help or, e.g., Ritz & Streibig (2008), Sec 3.2
With the standard installation of R,
Self-Starter Functions in the Standard
Installation
Model
Mean Function
Name of Self-Starter Function
Biexponential
A
1
·
e
−x·elrc1+
A
2
·
e
−x·elrc2SSbiexp(x, A1, lrc1, A2, lrc2)
Asymptotic regression
Asym
+ (
R
0
−
Asym
)
·
e
−x·elrcSSasymp(x, Asym, R0, lrc)
Asymptotic
regression
with offset
Asym
·
(1
−
e
−(x−c0)·elrc)
SSasympOff(x, Asym, lrc, c0)
Asymptotic
regression
(c0 = 0)
Asym
·
(1
−
e
−x·elrc)
SSasympOrig(x, Asym, lrc)
First-order
x
1
·
elKe+lKa−lClelKa−elKe
SSfol(x1, x2, lKe, lKa, lCl)
compartment
·
(
e
−x2·elKe−
e
−x2·elKa)
Gompertz
Asym
·
e
−b2·b3xSSgompertz(x, Asym, b2, b3)
Logistic
A
+
B−A1+e(xmid−x)/scal
SSfpl(x, A, B, xmid, scal)
Logistic (A = 0)
Asym1+e(xmid−x)/scal
SSlogis(x, Asym, xmid, scal)
Michaelis-Menten
Vm
·
xK+x
SSmicmen(x, Vm, K)
The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
3 Inference Based on Linear Approximations
As a look on the summary output of the Example “Cellulose Membrane” shows
it look very similar to the summary output of a fitted linear regression model:
Formula: delta
∼
(T1 + T2 * 10ˆ(T3 + T4 * pH))/(10ˆ(T3 + T4 * pH) + 1)
Parameters:
Value
Std. Error
t value
Pr(
>
|
t
|
)
θ
1163.706
0.1262
1297.26
<
2e-16
***
θ
2159.785
0.1594
1002.19
<
2e-16
***
θ
32.675
0.3813
7.02
3.65e-08
***
θ
4-0.512
0.0703
-7.28
1.66e-08
***
Residual standard error: 0.293137 on 35 degrees of freedom
Number of iterations to convergence: 7
The Asymptotic Properties
This approach is based on the local linearization of the model
(cf. iterative estimation procedure)
Y
=
η
h
θ
i
+
A
β
b
+
E
where
A
h
θ
i
is the
n
×
p
matrix of partial derivatives.
If the estimation procedure has converged, then
β
b
= 0.
Asymptotic Distribution of the Least Squares Estimator
b
θ
as∼ N h
.θ,
V
h
θ
ii
with asymptotic covariance matrix
V
h
θ
i
=
σ
2(
A
h
θ
i
TThe Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
Application in Practise
To explicitly determine the covariance matrix
V
hθi
, we plug-in estimates
instead of true parameters:
A
hθi
is calculated using
θ
b
+
A
b
.
For the error variance
σ
2
we plug-in the usual estimator.
Hence,
b
V
=
σ
b
2
A
b
T
A
b
−
1
where
b
σ
2
=
S
h
θi
b
n
−
p
=
1
n
−
p
n
X
i
=1
y
i
−
η
i
D
b
θ
E
2
and
A
b
=
A
D
b
θ
E
.
Approximate 95%-confidence interval
Hence, an approximate 95%-confidence interval for
β
kis
b
θ
k±
se
b
b
β
k·
q
tn−p 0.975,
where
se
b
β
b
kis the square root of the
k
th diagonal element of
V
b
.
Example “Cellulose Membrane”
From the summary output
Parameters:
Value
Std. Error
t value
Pr(>
|
t
|
)
θ
1163.706
0.1262
1297.26
<
2e-16
***
θ
2159.785
0.1594
1002.19
<
2e-16
***
θ
32.675
0.3813
7.02
3.65e-08
***
θ
4-0.512
0.0703
-7.28
1.66e-08
***
Residual standard error: 0.293137 on 35 degrees of freedom
we can calculate the 95% confidence interval for
θ
1:
163.71
±
0.13
·
q
t35The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
Example: Puromycin - back to the initial data set
The Michaelis-Menten model for enzyme kinetics relates the initial “velocity” of an
enzymatic reaction to the substrate concentration
•
teated with
Puromycin
4
not treated
0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200 Concentration Velocity ● ● ● ● ● ● ● ● ● ● ●● Concentration VelocityY
i=
θ
1·
x
iθ
2+
x
i+
E
iwith
E
ii
.
i
.
d
.
∼ N
0
, σ
2(Michaelis-Menten model)
x
substrate concentration [ppm]
Example: Puromycin (4)
Modell:
Yi
=
θ
1
xi
θ
2
+
xi
+
Ei
.
Model with and without treatment
(all data):
Y
i
=
(
θ
1
+
θ
3
z
i
)
x
i
θ
2
+
θ
4
z
i
+
x
i
+
E
i
.
where
z
i
=
(
1
for
”
with“
0
for
”
without“
Working
hypothesis:
Only
the
asymptotic velocity
θ
1
is influenced
by adding Puromycin. Hence
Null hypothesis:
θ
4
= 0
R output for the example Puromycin
Parameters:
Value
Std. Error
t value
Pr(
>
|t|
)
θ
1160.286
6.8964
23.24
2.04e-15
θ
20.048
0.0083
5.76
1.50e-05
θ
352.398
9.5513
5.49
2.71e-05
θ
40.016
0.0114
1.44
0.167
Residual standard error: 10.4 on 19 df
Since the P-value of 0.167 is larger
than the level of 5%
the null hypothesis is not rejected on
the 5% level.
95% confidence interval for
θ
4:
0
.
016
±
0
.
0114
·
q
t190.975
=
The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
Inference for the expected value
E
h
Y
|
x
o
i
=
h
h
x
o
;
θi
at
x
o
:
Linear Regression
h
x
o, β
=
x
T
o
β
is estimated by
b
ηo
=
x
T
o
β .
b
(1
−
α
)
·
100% confidence interval
for
h
x
o, β
is
b
ηo
±
q
t
n−p1−α/2
·
se
h
ηo
b
i
with
se
h
ηo
b
i
=
σ
b
q
x
T
o
(
X
T
X
)
−1
x
o
Nonlinear Regression
h
h
x
o
, θi
is estimated by
b
η
o
=
h
D
x
o
,
θ
b
E
.
(1
−
α
)
·
100% confidence interval for
h
h
x
o
, θi
is
h
D
x
o
,
θ
b
E
±
q
t
n−p1
−
α/
2
·
se
h
η
b
o
i
with
se
h
η
b
o
i
=
σ
b
q
b
a
T
o
A
b
T
A
b
−
1
b
a
o
and
a
b
o
=
∂
h
h
x
o
, θi
∂θ
θ
=
b
θ
.
Confidence Band
Left: Confidence band (i.g., pointwise confidence intervals) for a fitted
straight line (linear regression model).
Right: Confidence band for the fitted curve
h
h
x
, θi
of the example
’Biochemical Oxygen Demand’.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
1.0
1.2
1.4
1.6
1.8
2.0
2.2
0
1
2
3
Years^(1/3)
log(PCB Concentr
ation)
● ● ● ● ● ●0
2
4
6
8
0
5
10
15
20
25
30
Days
Oxygen Demand
The Nonlinear Regression Model Model Fitting Inference Based on Linear Approximations
Variable Selection
How about variable selection in nonlinear regression?
There is no one-to-one correspondence between predictor variables and
parameter as in linear regression!
Hence, the number of variables may differ from the number of parameters.
There are hardly ever problems, where some of the variables are in
question (Model is derived from subject matter theory!)
However, there are problems where a submodel
(a submodel is nested within
the full model)
may be adequat to describe the data;
cf. Example Puromycin, Slide 17, Half-Day 1.
If we have a collection of candidate
I
which need not to be submodels of each other and
I
the subject matter is somehow indifferent to this models,
I
but we want to find the the most appropriate model for the data
one can use Akaike’s information criterion (AIC) to select the best model
(and/or run a residual analysis)
Take Home Message Half-Day 1
In
nonlinear regression
,
Y
i
=
h
h
x
i
, θi
+
E
i
,
functions
h
are analysed which are not linear functions of the unknown
parameters
θ
.
Such models are often derived from the
subject matter theory
.
The flexibility of this model class is bought by a
more complex
estimation and inference theory
.
I
Parameter estimation is done by an iterative procedure
which needs
appropriate starting values
.
I