Relationship between Two Variables

Linear Regression Models

2.1 Relationship between Two Variables

In this section, we describe the basic method of explicating the relation-ship between a variable that represents the outcome of a phenomenon and a variable suspected of aﬀecting this outcome, based on observed data.

The relationship used in our example is actually Hooke’s well-known

Table 2.1 The length of a spring under diﬀerent weights.

x_g 5 10 15 20 25 30 35 40 45 50

ycm 5.4 5.7 6.9 6.4 8.2 7.7 8.4 10.1 9.9 10.5

law of elasticity, which states, essentially, that a spring changes shape under an applied force and that within the spring’s limit of elasticity the change is proportional to the force.

2.1.1 Data and Modeling

Table 2.1 shows ten observations obtained by measuring the length of a spring (y cm) under diﬀerent weights (x g). The data are plotted in Fig-ure 2.1. The plot suggests a straight-line relationship between the two variables of spring length and suspended weight. If the measurements were completely free from error, all of the data points might actually lie in a straight line. As shown in Figure 2.1, measurement data generally include errors commonly referred to as noise, and modeling is therefore required to explicate the relationship between variables. To ﬁnd the re-lationship between the two variables of spring length (y) and weight (x) from the data including the measurement errors, let us therefore attempt the modeling based on an initially unknown function y= u(x).

We ﬁrst consider a more speciﬁc expression for the unknown func-tion u(x) that represents the true structure of the spring phenomenon.

The data plot, as well as our a priori knowledge that the function should be linear, suggests that the function should describe a straight line. We therefore adopt a linear model as our speciﬁed model, so that

y = u(x) = β0+ β1x. (2.1)

We then attempt to apply this linear model in order to explicate the re-lationship between the spring length (y) and the weight (x) as a physical phenomenon.

If there were no errors in the data shown in Table 2.1, then all 10 data points would lie on a straight line with an appropriately selected inter-cept (β0) and slope (β1). Because of measurement errors, however, many of the actual data points will depart from any straight line. To include consideration for this departure (ε) from a straight line by data points obtained with diﬀerent weights, we therefore assume that they satisfy

:HLJKW

[

6SULQJOHQJWK

\

Figure 2.1 Data obtained by measuring the length of a spring (y cm) under diﬀerent weights (x g).

Table 2.2 The n observed data.

No. 1 2 · · · i · · · n

Experiment points (x) x₁ x₂ · · · xi · · · xn

Observed data (y) y₁ y₂ · · · yi · · · yn

the relation

Spring length = β0+ β1× Weight + Error. (2.2) For the individual data points, we then have 5.4= β0+β15+ε1,· · ·, 8.2 = β₀+β125+ε5, · · ·. Figure 2.2 illustrates the relationship considering the ﬁfth data point (25, 8.2).

In general, let us assume that measurements are performed for n periment points, as in Table 2.2, and that a measurement at a given ex-periment point xiis yi. The general model corresponding to (2.2) is then yi= β0+ β1xi+ εi, i= 1, 2, · · · , n, (2.3)

\ E E [

E E

H

E E H

\

:HLJKW

[

6SULQJOHQJWK

Figure 2.2 The relationship between the spring length (y) and the weight (x).

where β0 and β1are regression coeﬃcients, εiis the error term, and the equation in (2.3) is called the linear regression model. The variable y, which represents the length of the spring in the above experiment, is the response variable and the variable x, which represents the weight in that experiment, is the predictor variable. Variables y and x are also often referred to as the dependent variable and the independent variable or the explanatory variable, respectively.

This brings us to the question of how to ﬁt a straight line to observed data in order to obtain a model that appropriately expresses the data. It is essentially a question of how to determine the regression coeﬃcients β0

and β1. Various model estimation procedures can be used to determine the appropriate parameter values. One of these is the method of least squares.

2.1.2 Model Estimation by Least Squares

The underlying concept of the linear regression model (2.3) is that the true value of the response variable at the i-th point xiis β0+ β1xiand that the observed value yiincludes the error εi. The method of least squares

consists essentially of ﬁnding the values of regression coeﬃcients β0and

Diﬀerentiating (2.4) with respect to the regression coeﬃcients β0and β₁, and setting the resulting derivatives equal to zero, we have

n The regression coeﬃcients that minimize the sum of squared errors can be obtained by solving the above simultaneous equations. This solution is called the least squares estimates and is denoted by ˆβ0 and ˆβ1. The equation

y = ˆβ0+ ˆβ1x, (2.6)

having its coefficients determined by the least squares estimates, is the estimated linear regression model. We can thus find the model that best fits the data by minimizing the sum of squared errors.

The value of ˆyi = ˆβ0 + ˆβ1xi at each xi (i = 1, 2, · · · , n) is called the predicted value. The diﬀerence between this value and the observed value yiat xi, ei= yi− ˆyi, is called the residual, and the sum of the squares of the residuals is given by ⁿ_i₌₁e²_i (Figure 2.3)ɽ

Example 2.1 (Hooke’s law of elasticity) For the data shown in Ta-ble 2.1, the sum of squared errors in the linear regression model is S (β₀, β₁) = {5.4 − (β0 + β15)}² + {5.7 − (β0 + β110)}²+ · · · + {10.5 − (β0+ β150)}², in which S (β0, β₁) is the function of the regression coef-ﬁcients β0, β₁. The least squares estimates that minimize this function are ˆβ0 = 4.65 and ˆβ1 = 0.12, and the estimated linear regression model is therefore y = 4.65 + 0.12x. In this way, by modeling from a set of observed data, we have derived in approximation a physical law repre-senting the relationship between the weight and the spring length.

2.1.3 Model Estimation by Maximum Likelihood

In the least squares method, the regression coeﬃcients are estimated by minimizing the sum of squared errors. Maximum likelihood estimation

Ö Ö

\ E

E

[

Ö Ö

Ö

_L _L

\ E

E

[ H

\

[

L ^[

\

/LQHDUUHJUHVVLRQPRGHO

HVWLPDWHGE\WKHOHDVW

VTXDUHVPHWKRG

5HVLGXDO

3UHGLFWHGYDOXH

Figure 2.3 Linear regression and the predicted values and residuals.

is an alternative method for the same purpose in which the regression coeﬃcients are determined so as to maximize the probability of getting the observed data, for which it is assumed that yiobserved at xiemerges in accordance with some type of probability distribution.

Figure 2.4 (a) shows a histogram of 80 measured values obtained while repeatedly suspending a load of 25 g from one end of a spring.

Figure 2.4 (b) represents the errors (i.e., noise) contained in these mea-surements in the form of a histogram having its origin at the mean value of the measurements. This histogram clearly shows a region containing a high proportion of the obtained measured values. A mathematical model that approximates a histogram showing the probabilistic distribution of a phenomenon is called a probability distribution model.

Of the various distributions that may be adopted in probability distri-bution models, the most representative is the normal distridistri-bution (Gaus-sian distribution), which is expressed in terms of mean μ and variance σ² and denoted by N(μ, σ²). In the normal distribution model, the observed value yiat xiis regarded as the realization of the random variable Yi= yi, and Yiis normally distributed with mean μiand variance σ²

f (y_i|xi; μi, σ²)= 1

√2πσ²exp

−(yi− μi)² 2σ²

, (2.7)

Figure 2.4 (a) Histogram of 80 measured values obtained while repeatedly sus-pending a load of 25 g and its approximated probability model. (b) The errors (i.e., noise) contained in these measurements in the form of a histogram hav-ing its origin at the mean value of the measurements and its approximated error distribution.

where μi for a given xi is the conditional mean value (true value) E[Yi|xi]= u(xi)= μiof random variable Yi. In the normal distribution, as may be clearly seen in Figure 2.4 (a), the proportion of measured val-ues may be expected to decline sharply with increasing distance from the true value.

In the linear regression model, it is assumed that the true values μ1, μ₂,· · ·, μnat the various data points lie on a straight line, and it follows This function decreases with increasing deviation of the observed value y_i from the true value β0 + β1x_i. Assuming that the observed data yi

around the true value β0+ β1xi at xithus follow the probability distri-bution f (yi|xi; β0, β₁, σ²), it is then an expression of the plausibility or certainty of the occurrence of a given value of yi, called the likelihood of yi.

Assuming that the observed data y1, y2, · · ·, yn are mutually inde-pendent and identically distributed (i.i.d.), the likelihood with n data and thus the plausibility with n speciﬁc data is given by the product of the likelihoods of all observed data

≡ L(β0, β₁, σ²). (2.9) Given the data{(xi, yi); i= 1, 2, · · · , n} in (2.9), the function L(β0, β₁, σ²) of the parameters β₀, β₁, σ² is then the likelihood function. Maximum likelihood is a method of ﬁnding the parameter values that maximize this likelihood function, and the resulting estimates are called the maximum likelihood estimates. For ease of calculation, the maximum likelihood estimates are usually obtained by maximizing the log-likelihood function

(β₀, β₁, σ²)≡ log L(β0, β₁, σ²)

= −n

2log(2πσ²)− 1 2σ²

n i=1

{yi− (β0+ β1x_i)}². (2.10)

The parameter values ˆβ0, ˆβ₁, ˆσ²that maximize the log-likelihood func-tion are thus obtained by solving the equafunc-tions

∂ (β₀, β₁, σ²)

∂β₀ = 0, ∂ (β₀, β₁, σ²)

∂β₁ = 0, ∂ (β₀, β₁, σ²)

∂σ² = 0. (2.11) Speciﬁc solutions will be given in Section 2.2.2.

The first term of the log-likelihood function defined in (2.10) does not depend on β₀, β₁, and the sign of the second term is always negative since σ² > 0. Accordingly, the values of the regression coefficients β0

and β1that maximize the log-likelihood function are those that minimize

n i=1

{yi− (β0+ β1xi)}². (2.12)

With the assumption of a normal distribution model for the data, the max-imum likelihood estimates of the regression coeﬃcients are thus equiv-alent to the least squares estimates of the regression coeﬃcients, that is, the minimizer of (2.4).

In document [Sadanori Konishi]Introduction to Multivariate Analysis Linear and Nonlinear Modeling(pdf){Zzzzz}.pdf (Page 42-49)

Linear Regression Models

2.1 Relationship between Two Variables

[

\

\ E  E [



E  E

H

 E  E   H

\

[

Ö Ö

\ E

 E

[

Ö Ö

Ö

\ E

 E

[ H

\

[

\

\ E E [

E E

E E H

E

E