• No results found

ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression Prof. Mohamad Hassoun

N/A
N/A
Protected

Academic year: 2021

Share "ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression Prof. Mohamad Hassoun"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

ECE 3040

Lecture 18: Curve Fitting by Least-Squares-Error Regression

ยฉ Prof. Mohamad Hassoun

This lecture covers the following topics:

๏‚ท

Introduction

๏‚ท

Linear least-squares-Error (LSE) regression: The straight-line model

๏‚ท

Linearization of nonlinear models

๏‚ท

General linear LSE regression and the polynomial model

๏‚ท

Polynomial regression with Matlab:

polyfit

๏‚ท

Non-linear LSE regression

๏‚ท

Numerical solution of the non-linear LSE optimization problem:

Gradient search and Matlabโ€™s

fminsearch

function

๏‚ท

Solution of differential equations based on LSE minimization

๏‚ท

Appendix: Explicit matrix formulation for the quadratic regression

problem

(2)

Introduction

In the previous lecture, polynomial and cubic spline interpolation methods were introduced for estimating a value between a given set of precise data points. The idea was to (interpolate) โ€œfitโ€ a function to the data points so as to perfectly pass through all data points. Many engineering and scientific observations are made by conducting experiments in which physical quantities are measured and recorded as inexact (noisy) data points. In this case, the objective would be to find the best-fit analytic curve (model) that approximates the underlying functional relationship present in the data set. Here, the best-fit curve is not required to pass through the data points, but it is required to capture the shape (general trend) of the data. This curve fitting problem is referred to as regression. The following sections present formulations for the regression problem and provide solutions.

The following figure compares two polynomials that attempt to fit the shown data points. The blue curve is the solution to the interpolation problem. The green curve is the solution (we seek) to the linear regression problem.

(3)

Linear Least-Squares-Error (LSE) Regression:

The Straight-Line Model

The regression problem will first be illustrated for fitting the linear model (straight-line), ๐‘ฆ(๐‘ฅ) = ๐‘Ž1๐‘ฅ + ๐‘Ž0, to a set of ๐‘› paired experimental observations:

(๐‘ฅ1, ๐‘ฆ1), (๐‘ฅ2, ๐‘ฆ2), โ€ฆ , (๐‘ฅ๐‘›, ๐‘ฆ๐‘›). So, the idea here is to position the straight-line (i.e., to determine the regression coefficients ๐‘Ž0 and ๐‘Ž1) so that some error measure of fit is minimized. A common error measure is the sum-of-the-squares (SSE) of the

residual errors ๐‘’๐‘– = ๐‘ฆ๐‘– โˆ’ ๐‘ฆ(๐‘ฅ๐‘–), ๐ธ(๐‘Ž0, ๐‘Ž1) = โˆ‘ ๐‘’๐‘–2 ๐‘› ๐‘–=1 = โˆ‘[๐‘ฆ๐‘– โˆ’ ๐‘ฆ(๐‘ฅ๐‘–)]2 ๐‘› ๐‘–=1 = โˆ‘[๐‘ฆ๐‘– โˆ’ (๐‘Ž1๐‘ฅ๐‘– + ๐‘Ž0)]2 ๐‘› ๐‘–=1

The residual error ๐‘’๐‘– is the discrepancy between the measured value, ๐‘ฆ๐‘–, and the approximate value ๐‘ฆ(๐‘ฅ๐‘–) = ๐‘Ž0 + ๐‘Ž1๐‘ฅ๐‘–, predicted by the straight-line regression model. The residual error for the ๐‘–th data point is depicted in the following figure.

A solution can be obtained for the regression coefficients, {๐‘Ž0, ๐‘Ž1}, that minimizes ๐ธ(๐‘Ž0, ๐‘Ž1). This criterion, ๐ธ, which is called least-squares-error (LSE) criterion, has a number of advantages, including that it yields a unique line for a given data set. Differentiating ๐ธ(๐‘Ž0, ๐‘Ž1) with respect to each of the unknown regression model coefficients, and setting the result to zero lead to a system of two linear equations,

(4)

๐œ• ๐œ•๐‘Ž0๐ธ(๐‘Ž0, ๐‘Ž1) = 2 โˆ‘(๐‘ฆ๐‘– โˆ’ ๐‘Ž1๐‘ฅ๐‘– โˆ’ ๐‘Ž0)(โˆ’1) ๐‘› ๐‘–=1 = 0 ๐œ• ๐œ•๐‘Ž1๐ธ(๐‘Ž0, ๐‘Ž1) = 2 โˆ‘(๐‘ฆ๐‘– โˆ’ ๐‘Ž1๐‘ฅ๐‘– โˆ’ ๐‘Ž0)(โˆ’๐‘ฅ๐‘–) ๐‘› ๐‘–=1 = 0

After expanding the sums, we obtain

โˆ’ โˆ‘ ๐‘ฆ๐‘– + ๐‘› ๐‘–=1 โˆ‘ ๐‘Ž0 ๐‘› ๐‘–=1 + โˆ‘ ๐‘Ž1๐‘ฅ๐‘– ๐‘› ๐‘–=1 = 0 โˆ’ โˆ‘ ๐‘ฅ๐‘–๐‘ฆ๐‘– ๐‘› ๐‘–=1 + โˆ‘ ๐‘Ž0๐‘ฅ๐‘– ๐‘› ๐‘–=1 + โˆ‘ ๐‘Ž1๐‘ฅ๐‘–2 ๐‘› ๐‘–=1 = 0

Now, realizing that โˆ‘๐‘›๐‘–=1๐‘Ž0 = ๐‘›๐‘Ž0, and that multiplicative quantities that do not depend on the summation index ๐‘– can be brought outside the summation (i.e., โˆ‘๐‘›๐‘–=1๐‘Ž๐‘ฅ๐‘– = ๐‘Ž โˆ‘๐‘›๐‘–=1๐‘ฅ๐‘–), we may rewrite the above equations as

These are called the normal equations. We can solve for ๐‘Ž1 using Cramerโ€™s rule and for ๐‘Ž0 by substitution (Your turn: Perform the algebra) to arrive at the following LSE solution: ๐‘Ž1โˆ— = ๐‘› โˆ‘ ๐‘ฅ๐‘–๐‘ฆ๐‘– โˆ’ โˆ‘ ๐‘ฅ๐‘–โˆ‘ ๐‘ฆ๐‘– ๐‘› ๐‘–=1 ๐‘› ๐‘–=1 ๐‘› ๐‘–=1 ๐‘› โˆ‘๐‘›๐‘–=1๐‘ฅ๐‘–2 โˆ’ (โˆ‘๐‘›๐‘–=1๐‘ฅ๐‘–)2 ๐‘Ž0โˆ— = โˆ‘ ๐‘ฆ๐‘– ๐‘› ๐‘–=1 ๐‘› โˆ’ ๐‘Ž1 โˆ— โˆ‘ ๐‘ฅ๐‘– ๐‘› ๐‘–=1 ๐‘›

(5)

The value ๐ธ(๐‘Ž0โˆ—, ๐‘Ž1โˆ—) represents the LSE value and will be referred to as ๐ธ๐ฟ๐‘†๐ธ and expressed as

๐ธ๐ฟ๐‘†๐ธ = โˆ‘(๐‘ฆ๐‘– โˆ’ ๐‘Ž1โˆ—๐‘ฅ๐‘– โˆ’ ๐‘Ž0โˆ—)2 ๐‘›

๐‘–=1

Any other straight-line will lead to an error ๐ธ(๐‘Ž0, ๐‘Ž1) > ๐ธ๐ฟ๐‘†๐ธ.

Let the value of the sum-of-the-square of the difference between the ๐‘ฆ๐‘– values and their average value, ๐‘ฆฬ… = โˆ‘ ๐‘ฆ๐‘–

๐‘› ๐‘–=1 ๐‘› , be ๐ธ๐‘€ = โˆ‘(๐‘ฆ๐‘– โˆ’ ๐‘ฆฬ…)2 ๐‘› ๐‘–=1

Then, the (positive) difference ๐ธ๐‘€ โˆ’ ๐ธ๐ฟ๐‘†๐ธ represents the improvement (where the smaller ๐ธ๐ฟ๐‘†๐ธ is, the better) due to describing the data in terms of a straight-line, rather than as an average value (a straight-line with zero slope and ๐‘ฆ-intercept equals to ๐‘ฆฬ…). The coefficient of determination, ๐‘Ÿ2, is defined as the relative error between ๐ธ๐‘€ and ๐ธ๐ฟ๐‘†๐ธ,

๐‘Ÿ2 = ๐ธ๐‘€ โˆ’ ๐ธ๐ฟ๐‘†๐ธ

๐ธ๐‘€ = 1 โˆ’ ๐ธ๐ฟ๐‘†๐ธ

๐ธ๐‘€

For perfect fit, where the regression line goes through all data points, ๐ธ๐ฟ๐‘†๐ธ = 0 and ๐‘Ÿ2 = 1, signifying that the line explains 100% of the variability in the data. On the other hand for ๐ธ๐‘€ = ๐ธ๐ฟ๐‘†๐ธ we obtain ๐‘Ÿ2 = 0, and the fit represents no improvement over a simple average. A value of ๐‘Ÿ2 between 0 and 1 represents the extent of

improvement. So, ๐‘Ÿ2 = 0.8 indicates that 80% of the original uncertainty has been explained by the linear model. Using the above expressions for ๐ธ๐ฟ๐‘†๐ธ, ๐ธ๐‘€, ๐‘Ž0โˆ— and ๐‘Ž1โˆ— one may derive the following formula for the correlation coefficient, ๐‘Ÿ, (your turn: Perform the algebra)

๐‘Ÿ = โˆš๐ธ๐‘€ โˆ’ ๐ธ๐ฟ๐‘†๐ธ

๐ธ๐‘€ =

๐‘› โˆ‘(๐‘ฅ๐‘–๐‘ฆ๐‘–) โˆ’ (โˆ‘ ๐‘ฅ๐‘–)(โˆ‘ ๐‘ฆ๐‘–)

โˆš๐‘› โˆ‘ ๐‘ฅ๐‘–2 โˆ’ (โˆ‘ ๐‘ฅ๐‘–)2 โˆš๐‘› โˆ‘ ๐‘ฆ

๐‘–2 โˆ’ (โˆ‘ ๐‘ฆ๐‘–)2

(6)

Example. Fit a straight-line to the data provided in the following table. Find ๐‘Ÿ2.

x 1 2 3 4 5 6 7

y 2.5 7 38 55 61 122 110

Solution. The following Matlab script computes the linear regression coefficients, ๐‘Ž0โˆ— and ๐‘Ž1โˆ—, for a straight-line employing the LSE solution.

x=[1 2 3 4 5 6 7];

y=[2.5 7 38 55 61 122 110]; n=length(x);

a1=(n*sum(x.*y)-sum(x)*sum(y))/(n*sum(x.^2)-(sum(x)).^2) a0=sum(y)/n-a1*sum(x)/n

The solution is ๐‘Ž1โˆ— = 20.5536 and ๐‘Ž0โˆ— = โˆ’25.7143. The following plot displays the data and the regression model, ๐‘ฆ(๐‘ฅ) = 20.5536๐‘ฅ โˆ’ 25.7143.

The following script computes the correlation coefficient, ๐‘Ÿ. x=[1 2 3 4 5 6 7];

y=[2.5 7 38 55 61 122 110]; n=length(x);

r=(n*sum(x.*y)-sum(x)*sum(y))/((sqrt(n*sum(x.^2)- ... (sum(x))^2))*(sqrt(n*sum(y.^2)-(sum(y))^2)))

(7)

The script returns ๐‘Ÿ = 0.9582 (so, ๐‘Ÿ2 = 0.9181). These results indicate that about 92% of the variability in the data has been explained by the linear model.

A word of caution: Although the coefficient of determination provides a convenient measure of the quality of fit, you should be careful not to rely on it completely. For, it is possible to construct data sets that will similar ๐‘Ÿ2 values, while the regression line is not well positioned for some sets. A good practice would be to visually inspect the plot of the data along with the regression curve. The following example illustrates these ideas.

Example. Anscombe's quartet comprises four datasets that have ๐‘Ÿ2 โ‰… 0.666, yet appear very different when graphed. Each dataset consists of eleven (๐‘ฅ๐‘–, ๐‘ฆ๐‘–) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. Notice that if we are to ignore the outlier point in the third data set, then the regression line would be perfect, with ๐‘Ÿ2 = 1.

Your turn: Employ linear regression to generate the above plots and determine ๐‘Ÿ2 for each of the Anscombeโ€™s data sets.

(8)

Linearization of Nonlinear Models

The straight-line regression model is not always suitable for curve fitting. The choice of regression model is often guided by the plot of the available data, or can be guided by the knowledge of the physical behavior of the system that generated the data. In general, polynomial or other nonlinear models are more suitable. A nonlinear regression technique (introduced later) is available to fit complicated

nonlinear equations to data. However, some basic nonlinear functions can be readily transformed into linear functions in their regression coefficients (we will refer to such functions as transformable or linearizable). Here, we can take advantage of the LSE regression formulas, which we have just derived, to fit the transformed equations to the data.

One example of a linearizable linear model is the exponential model, ๐‘ฆ(๐‘ฅ) = ๐›ผ๐‘’๐›ฝ๐‘ฅ, where ๐›ผ and ๐›ฝ are constants. This equation is very common in engineering (e.g., capacitor transient voltage) and science (e.g., population growth or radioactive decay). We can linearize this equation by simply taking its natural logarithm to yield: ln(๐‘ฆ) = ln(๐›ผ) + ๐›ฝ๐‘ฅ. Thus, if we transform the ๐‘ฆ๐‘– values in our data, by taking their natural logarithms, and define ๐‘Ž0 = ln(๐›ผ) and ๐‘Ž1 = ๐›ฝ we arrive at the equation of a straight-line (of the form ๐‘Œ = ๐‘Ž0 + ๐‘Ž1๐‘ฅ). Then, we can readily use the formulas for the LSE solution, (๐‘Ž0โˆ—, ๐‘Ž1โˆ—), derived earlier. The final step would be to set ๐›ผ = ๐‘’๐‘Ž0โˆ— and ๐›ฝ = ๐‘Ž

1โˆ— and arrive at the regression solution,

๐‘ฆ(๐‘ฅ) = ๐›ผ๐‘’๐›ฝ๐‘ฅ = (๐‘’๐‘Ž0โˆ—)๐‘’๐‘Ž1โˆ—๐‘ฅ

A second common linearizable nonlinear regression model is the power model, ๐‘ฆ(๐‘ฅ) = ๐›ผ๐‘ฅ๐›ฝ. We can linearize this equation by simply taking its natural logarithm to yield: ln(๐‘ฆ) = ln(๐›ผ) + ๐›ฝln (๐‘ฅ) (which is a linear model of the form ๐‘Œ = ๐‘Ž0 + ๐‘Ž1๐‘‹). In this case, we need to first transform the ๐‘ฆ๐‘– and ๐‘ฅ๐‘– values into ln(๐‘ฆ๐‘–) and ln(๐‘ฅ๐‘–), respectively, and then apply the LSE solution to the transformed data.

Other useful linearizable models include: ๐‘ฆ(๐‘ฅ) = ๐›ผ ln(๐‘ฅ) + ๐›ฝ (logarithmic function), ๐‘ฆ(๐‘ฅ) = 1

๐›ผ๐‘ฅ+๐›ฝ (reciprocal function), and ๐‘ฆ(๐‘ฅ) = ๐›ผ๐‘ฅ

๐›ฝ+๐‘ฅ

(9)

linearizable and provide their corresponding linearized form and their change of variable formulas, respectively.

(10)

Example. Fit the exponential model and the power model to the data in the following table. Compare the fit quality to that of the straight-line model.

๐‘ฅ 1 2 3 4 5 6 7 8

๐‘ฆ 2.5 7 38 55 61 122 83 143

Solution. Matlab script (linear.m) for the linear model, ๐‘ฆ = ๐‘Ž0 + ๐‘Ž1๐‘ฅ: x=[1 2 3 4 5 6 7 8]; y=[2.5 7 38 55 61 122 83 143]; n=length(x); a1=(n*sum(x.*y)-sum(x)*sum(y))/(n*sum(x.^2)-(sum(x)).^2); a0=sum(y)/n-a1*sum(x)/n; r=(n*sum(x.*y)-sum(x)*sum(y))/((sqrt(n*sum(x.^2)-(sum(x))^2))*(... sqrt(n*sum(y.^2)-(sum(y))^2))); a1, a0, r^2

(11)

Result 1. Linear model solution: ๐‘ฆ = 19.3036๐‘ฅ โˆ’ 22.9286, ๐‘Ÿ2 = 0.8811. Matlab script (exponential.m) for the exponential model, ๐‘ฆ = ๐›ผ๐‘’๐›ฝ๐‘ฅ:

x=[1 2 3 4 5 6 7 8]; y=[2.5 7 38 55 61 122 83 143]; ye=log(y); n=length(x); a1=(n*sum(x.*ye)-sum(x)*sum(ye))/(n*sum(x.^2)-(sum(x)).^2); a0=sum(ye)/n-a1*sum(x)/n; r=(n*sum(x.*ye)-sum(x)*sum(ye))/((sqrt(n*sum(x.^2)-(sum(x))^2))... *(sqrt(n*sum(ye.^2)-(sum(ye))^2))); alpha=exp(a0), beta=a1, r^2

Result 2. Exponential model solution: ๐‘ฆ = 3.4130๐‘’0.5273๐‘ฅ, ๐‘Ÿ2 = 0.8141. Matlab script (power_eq.m) for the power model, ๐‘ฆ = ๐›ผ๐‘ฅ๐›ฝ:

x=[1 2 3 4 5 6 7 8]; y=[2.5 7 38 55 61 122 83 143]; xe=log(x); ye=log(y); n=length(x); a1=(n*sum(xe.*ye)-sum(xe)*sum(ye))/(n*sum(xe.^2)-(sum(xe)).^2); a0=sum(ye)/n-a1*sum(xe)/n; r=(n*sum(xe.*ye)-sum(xe)*sum(ye))/((sqrt(n*sum(xe.^2)-

โ€ฆ

(sum(xe))^2))*(sqrt(n*sum(ye.^2)-(sum(ye))^2))); alpha=exp(a0), beta=a1, r^2

(12)

Result 3. Power model solution: ๐‘ฆ = 2.6493๐‘ฅ1.9812, ๐‘Ÿ2 = 0.9477

From the results for ๐‘Ÿ2, the power model has the best fit. The following graph compares the three models. By visually inspecting the plot we see that, indeed, the power model (red; ๐‘Ÿ2 = 0.9477) is a better fit compared to the linear model (blue; ๐‘Ÿ2 = 0.8811) and to the exponential model (green; ๐‘Ÿ2 = 0.8141). Also, note that the straight-line fits the data better than the exponential model.

Your turn: Repeat the above regression problem employing: (a) The logarithmic function, ๐‘ฆ = ๐›ผ ln(๐‘ฅ) + ๐›ฝ; (b) The reciprocal function, ๐‘ฆ = 1

๐›ผ๐‘ฅ+๐›ฝ ; (c) The

saturation-growth-rate function, ๐‘ฆ(๐‘ฅ) = ๐›ผ๐‘ฅ

(13)

General Linear LSE Regression and the Polynomial Model

For some data sets, the underlying model cannot be captured accurately with a straight-line, exponential, logarithmic or power models. A model with a higher degree of nonlinearity (i.e., with added flexibility) is required. There are a number of higher order functions that can be used as regression models. One important regression model would be a polynomial. A general LSE formulation is presented next. It extends the earlier linear regression analysis to a wider class of nonlinear functions, including polynomials. (Note: when we say linear regression, we are referring to a model that is linear in its regression parameters, ๐‘Ž๐‘–, not ๐‘ฅ.)

Consider the general function in z,

๐‘ฆ = ๐‘Ž๐‘š๐‘ง๐‘š + ๐‘Ž๐‘šโˆ’1๐‘ง๐‘šโˆ’1 + โ‹ฏ + ๐‘Ž1๐‘ง1 + ๐‘Ž0 (1) where the ๐‘ง๐‘– represents a basis function in ๐‘ฅ. It can be easily shown that if the basis functions are chosen as ๐‘ง๐‘– = ๐‘ฅ๐‘–, then the above model is that of an ๐‘š-degree

polynomial,

๐‘ฆ = ๐‘Ž๐‘š๐‘ฅ๐‘š + ๐‘Ž๐‘šโˆ’1๐‘ฅ๐‘šโˆ’1 + โ‹ฏ + ๐‘Ž1๐‘ฅ + ๐‘Ž0

There are many classes of functions that can be described by the above general function in Eqn. (1). Examples include:

๐‘ฆ = ๐‘Ž0 + ๐‘Ž1๐‘ฅ, ๐‘ฆ = ๐‘Ž0 + ๐‘Ž1cos(๐‘ฅ) + ๐‘Ž2sin (2๐‘ฅ) and ๐‘ฆ = ๐‘Ž0 + ๐‘Ž1๐‘ฅ + ๐‘Ž2๐‘’โˆ’๐‘ฅ2

One example of a function that canโ€™t be represented by the above general function is the radial-basis-function (RBF)

๐‘ฆ = ๐‘Ž0 + ๐‘Ž1๐‘’๐‘Ž2(๐‘ฅโˆ’๐‘Ž3)2

In other words, this later function is not transformable into a linear regression model, as was the case (say) for the exponential function, ๐‘ฆ = ๐›ผ๐‘’๐›ฝ๐‘ฅ. The

regression with such non-transformable functions is known as nonlinear regression and is considered later in this lecture.

(14)

In the following formulation of the LSE regression problem we restrict the model of the regression function to the polynomial

๐‘ฆ = ๐‘Ž๐‘š๐‘ฅ๐‘š + ๐‘Ž๐‘šโˆ’1๐‘ฅ๐‘šโˆ’1 + โ‹ฏ + ๐‘Ž1๐‘ฅ + ๐‘Ž0

Given ๐‘› data points {(๐‘ฅ1, ๐‘ฆ1), (๐‘ฅ2, ๐‘ฆ2), โ€ฆ , (๐‘ฅ๐‘›, ๐‘ฆ๐‘›)} we want to determine the

regression coefficients by solving the system of ๐‘› equations with ๐‘š + 1 unknowns: ๐‘Ž๐‘š๐‘ฅ1๐‘š + ๐‘Ž๐‘šโˆ’1๐‘ฅ1๐‘šโˆ’1 + โ‹ฏ + ๐‘Ž1๐‘ฅ1 + ๐‘Ž0 = ๐‘ฆ1

๐‘Ž๐‘š๐‘ฅ2๐‘š + ๐‘Ž๐‘šโˆ’1๐‘ฅ2๐‘šโˆ’1 + โ‹ฏ + ๐‘Ž1๐‘ฅ2 + ๐‘Ž0 = ๐‘ฆ2 .

๐‘Ž๐‘š๐‘ฅ๐‘›๐‘š + ๐‘Ž๐‘šโˆ’1๐‘ฅ๐‘›๐‘šโˆ’1 + โ‹ฏ + ๐‘Ž1๐‘ฅ๐‘› + ๐‘Ž0 = ๐‘ฆ๐‘›

The above system can be written in matrix form as Za = y,

[ ๐‘ฅ1๐‘š ๐‘ฅ1๐‘šโˆ’1 โ‹ฏ ๐‘ฅ1 1 ๐‘ฅ2๐‘š ๐‘ฅ2๐‘šโˆ’1 โ‹ฏ ๐‘ฅ2 1 โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ โ‹ฎ ๐‘ฅ๐‘›โˆ’1๐‘š ๐‘ฅ๐‘›โˆ’1๐‘šโˆ’1 โ‹ฏ ๐‘ฅ๐‘›โˆ’1 1 ๐‘ฅ๐‘›๐‘š ๐‘ฅ๐‘›๐‘šโˆ’1 โ‹ฏ ๐‘ฅ๐‘› 1][ ๐‘Ž๐‘š ๐‘Ž๐‘šโˆ’1 โ‹ฎ ๐‘Ž1 ๐‘Ž0 ] = [ ๐‘ฆ1 ๐‘ฆ2 โ‹ฎ ๐‘ฆ๐‘›โˆ’1 ๐‘ฆ๐‘› ]

where Z is an ๐‘›x(๐‘š + 1) rectangular matrix formulated from the ๐‘ฅ๐‘– data values with ๐‘ง๐‘–๐‘— = ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 (๐‘– = 1, 2, โ€ฆ , ๐‘›; ๐‘— = 1, 2, โ€ฆ , ๐‘š + 1), a is an (๐‘š + 1) column vector of unknown regression coefficients and y is an ๐‘› column vector of ๐‘ฆ๐‘– data values. For regression problems, the above system is over-determined (๐‘› > ๐‘š + 1) and, therefore, there is generally no solution โ€˜aโ€™ that satisfies Za = y. So, we seek the LSE solution, a*; the solution which minimizes the sum-of-squared-error (SSE) criterion

E(a) = ||y - Za||2 = โˆ‘๐‘›๐‘–=1(๐‘ฆ๐‘– โˆ’ โˆ‘๐‘š+1๐‘—=1 ๐‘ง๐‘–๐‘— ๐‘Ž๐‘šโˆ’๐‘—+1) 2

(15)

where ||.|| denotes the vector norm. As we did earlier in deriving the straight-line regression coefficients ๐‘Ž0 and ๐‘Ž1, we set all partial derivatives ๐œ•

๐œ•๐‘Ž๐‘– ๐ธ(a) to zero and

solve the resulting system of (๐‘š + 1) equations:

๐œ• ๐œ•๐‘Ž0๐ธ(๐‘Ž0, ๐‘Ž1, โ€ฆ , ๐‘Ž๐‘š) = โˆ’2 โˆ‘ (๐‘ฆ๐‘– โˆ’ โˆ‘ ๐‘ฅ๐‘– ๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) ๐‘ฅ๐‘–0 ๐‘› ๐‘–=1 = 0 ๐œ• ๐œ•๐‘Ž1๐ธ(๐‘Ž0, ๐‘Ž1, โ€ฆ , ๐‘Ž๐‘š) = โˆ’2 โˆ‘ (๐‘ฆ๐‘– โˆ’ โˆ‘ ๐‘ฅ๐‘– ๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) ๐‘› ๐‘–=1 ๐‘ฅ๐‘–1 = 0 ๐œ• ๐œ•๐‘Ž2๐ธ(๐‘Ž0, ๐‘Ž1, โ€ฆ , ๐‘Ž๐‘š) = โˆ’2 โˆ‘ (๐‘ฆ๐‘– โˆ’ โˆ‘ ๐‘ฅ๐‘– ๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) ๐‘› ๐‘–=1 ๐‘ฅ๐‘–2 = 0 . . ๐œ• ๐œ•๐‘Ž๐‘š๐ธ(๐‘Ž0, ๐‘Ž1, โ€ฆ , ๐‘Ž๐‘š) = โˆ’2 โˆ‘ (๐‘ฆ๐‘– โˆ’ โˆ‘ ๐‘ฅ๐‘– ๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) ๐‘› ๐‘–=1 ๐‘ฅ๐‘–๐‘š = 0

This system can be rearranged as (note: ๐‘ฅ๐‘–0 = 1)

โˆ‘ (๐‘ฆ๐‘– โˆ’ โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) = 0 ๐‘› ๐‘–=1 โˆ‘ (๐‘ฅ๐‘–๐‘ฆ๐‘– โˆ’ ๐‘ฅ๐‘– โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) = 0 ๐‘› ๐‘–=1 โˆ‘ (๐‘ฅ๐‘–2๐‘ฆ๐‘– โˆ’ ๐‘ฅ๐‘–2 โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) = 0 ๐‘› ๐‘–=1 . . โˆ‘ (๐‘ฅ๐‘–๐‘š๐‘ฆ๐‘– โˆ’ ๐‘ฅ๐‘–๐‘š โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘˜+1) = 0 ๐‘› ๐‘–=1

(16)

or, โˆ‘ ( โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) = ๐‘› ๐‘–=1 โˆ‘ ๐‘ฆ๐‘– ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–( โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) = โˆ‘ ๐‘ฅ๐‘–๐‘ฆ๐‘– ๐‘› ๐‘–=1 ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–2( โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) = ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–2๐‘ฆ๐‘– ๐‘› ๐‘–=1 . . โˆ‘ ๐‘ฅ๐‘–๐‘š( โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) = ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–๐‘š๐‘ฆ๐‘– ๐‘› ๐‘–=1

which by setting ๐‘ง๐‘–๐‘— = ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 (๐‘– = 1, 2, โ€ฆ , ๐‘›; ๐‘— = 1, 2, โ€ฆ , ๐‘š + 1) can then be expressed in matrix form as (Your turn: derive it)

ZT(Za) = ZTy or (ZTZ)a= ZTy

(Refer to the Appendix for an explicit representation of the above equation for the case of a quadratic regression polynomial.)

The matrix ZTZ is a (๐‘š + 1)x(๐‘š + 1) square matrix (recall that ๐‘š is the degree of

the polynomial model being used). Generally speaking, the inverse of ZTZ does exist for the above regression formulation. Multiplying both sides of the equation by (ZTZ)-1 leads to the LSE solution for the regression coefficient vector a,

Ia* = a* = [(ZTZ)-1 ZT]y

Where I is the identity matrix. Matlab offers two ways for solving the above system of linear equations: (1) using the left-division operator a = (Zโ€ฒ*Z)\(Zโ€ฒ*y), where โ€˜โ€ฒโ€™ is the Matlab transpose operator, or (2) using a = pinv(Z)*y, where pinv is the built-in

(17)

The coefficient of determination, ๐‘Ÿ2, for the above polynomial regression formulation is given by (for ๐‘› โ‰ซ ๐‘š)

๐‘Ÿ2 = 1 โˆ’โˆ‘ (๐‘ฆ๐‘– โˆ’ ๐‘ฆฬ‚๐‘–) 2 ๐‘› ๐‘–=1 โˆ‘๐‘› (๐‘ฆ๐‘– โˆ’ ๐‘ฆฬ…)2 ๐‘–=1

where ๐‘ฆฬ‚๐‘– is the ๐‘–th component of the prediction vector Za*, and ๐‘ฆฬ… is the mean of the ๐‘ฆ๐‘– values. Matlab can conveniently compute ๐‘Ÿ2 as,

1-sum((y-Z*a).^2)/sum((y-mean(y)).^2)

Example. Employ the polynomial LSE regression formulation to solve for a cubic curve fit for the following data set. Also, compute ๐‘Ÿ2.

๐‘ฅ 1 2 3 4 5 6 7 8

๐‘ฆ 2.5 7 38 55 61 122 83 145

Solution. A cubic function is a third-order polynomial (๐‘š = 3) with the four coefficients ๐‘Ž0, ๐‘Ž1, ๐‘Ž2 and ๐‘Ž3. The number of data points is 8, therefore,

๐‘› = 8 and the Z matrix is ๐‘›x(๐‘š + 1) = 8x4. The matrix formulation (Za = y) for this linear regression problem is

[ 1 1 1 1 8 4 2 1 27 9 3 1 64 16 4 1 125 25 5 1 216 36 6 1 343 49 7 1 512 64 8 1] [ ๐‘Ž3 ๐‘Ž2 ๐‘Ž1 ๐‘Ž0 ] = [ 2.5 7 38 55 61 122 83 145]

(18)

Solving using the pinv function, a = pinv(Z)*y, (y must be a column vector)

Alternatively, we may use the left-division operator and obtain the same result:

Therefore, the cubic fit solution is ๐‘ฆ = 0.029๐‘ฅ3 โˆ’ 0.02๐‘ฅ2 + 17.6176๐‘ฅ โˆ’ 19.2857 whose plot is shown below. Note that the contribution of the cubic and quadratic terms is very small compared to the linear part of the solution, for 0 โ‰ค ๐‘ฅ โ‰ค 8. That is why the plot of the cubic fit model is close to linear.

(19)

which indicates that the cubic regression model explains 88% of the variability in the data. This result has a similar quality to that of the straight-line regression model (computed in an earlier example).

The following is a snapshot of a session with the โ€œBasic Fitting toolโ€ (introduced in the previous lecture) applied to data in the above example. It computes and

compares a cubic fit to a 5th-degree and a 6th-degree polynomial fit.

Your turn. Employ the linear LSE regression formulation to fit the following data set employing the model: ๐‘ฆ(๐‘ฅ) = ๐‘Ž0 + ๐‘Ž1cos(๐‘ฅ) + ๐‘Ž2sin (2๐‘ฅ). Also, determine ๐‘Ÿ2 and plot the data along with ๐‘ฆ(๐‘ฅ). Hint: First determine the 10x3 matrix Z required for the Za = y formulation.

๐‘ฅ 1 2 3 4 5 6 7 8 9 10

(20)

Polynomial Regression with Matlab:

Polyfit

The Matlab polyfit function was introduced in the previous lecture for solving polynomial interpolation problems (๐‘š + 1 = ๐‘›, same number of equations as unknowns). This function can also be used for solving ๐‘š-degree polynomial regression given ๐‘› data points (๐‘š + 1 < ๐‘›, more equations than unknowns). The syntax of polyfit call is p=polyfit(x,y,m), where x and y are the vectors of the

independent and dependent variables, respectively, and m is the order of the regression polynomial. The function returns a row vector, p, that contains the polynomial coefficients.

Example. Here is a solution to the straight-line regression problem (first example encountered in this lecture):

Example. Use polyfit to solve for the cubic regression model encountered in the example from the previous section.

Solution:

Note that this solution is identical to the one obtained using the pseudo-inverse-based solution.

(21)

Non-Linear LSE Regression

In some engineering applications, nonlinear models are required to be used to fit a given data set. The above general linear regression formulation can handle such regression problems as long as the nonlinear model is transformable into an

equivalent linear function in the unknown coefficients. However, in some cases, the models are not transformable. In this case, we have to come up with an appropriate set of equations whose solution leads to the LSE solution.

As an example, consider the nonlinear model ๐‘ฆ(๐‘ฅ) = ๐‘Ž0(1 โˆ’ ๐‘’๐‘Ž1๐‘ฅ). This equation

canโ€™t be manipulated into a linear regression formulation in the ๐‘Ž0 and ๐‘Ž1

coefficients. The LSE formulation (for this model with ๐‘› data points) takes the form

๐ธ(๐‘Ž0, ๐‘Ž1) = โˆ‘(๐‘ฆ๐‘– โˆ’ ๐‘ฆ(๐‘ฅ๐‘–))2 ๐‘› ๐‘–=1 = โˆ‘(๐‘ฆ๐‘– โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’๐‘Ž1๐‘ฅ๐‘–))2 ๐‘› ๐‘–=1 ๐œ• ๐œ•๐‘Ž0๐ธ(๐‘Ž0, ๐‘Ž1) = 2 โˆ‘(๐‘ฆ๐‘– โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’ ๐‘Ž1๐‘ฅ๐‘–)) ๐‘› ๐‘–=1 (โˆ’1 + ๐‘’๐‘Ž1๐‘ฅ๐‘–) = 0 ๐œ• ๐œ•๐‘Ž1๐ธ(๐‘Ž0, ๐‘Ž1) = 2 โˆ‘(๐‘ฆ๐‘– โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’ ๐‘Ž1๐‘ฅ๐‘–)) ๐‘› ๐‘–=1 (๐‘Ž0๐‘ฅ๐‘–๐‘’๐‘Ž1๐‘ฅ๐‘–) = 0

This set of two nonlinear equations need to be solved for the two coefficients, ๐‘Ž0 and ๐‘Ž1. Numerical algorithms such as Newtonโ€™s iterative method for solving a set of two nonlinear equations, or Matlabโ€™s built-in fsolve and solve functions can be used to solve this system of equations, as shown in the next example.

Example. Employ the regression function ๐‘ฆ(๐‘ฅ) = ๐‘Ž0(1 โˆ’ ๐‘’๐‘Ž1๐‘ฅ) to fit the following

data.

x โˆ’2 0 2 4

(22)

Here, we have ๐‘› = 4, and the system of nonlinear equations to be solved is given by โˆ‘(๐‘ฆ๐‘– โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’๐‘Ž1๐‘ฅ๐‘–)) 4 ๐‘–=1 (โˆ’1 + ๐‘’๐‘Ž1๐‘ฅ๐‘–) = 0 โˆ‘(๐‘ฆ๐‘– โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’๐‘Ž1๐‘ฅ๐‘–)) 4 ๐‘–=1 (๐‘Ž0๐‘ฅ๐‘–๐‘’๐‘Ž1๐‘ฅ๐‘–) = 0

Substituting the data point values in the above equations leads to

(1 โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’โˆ’2๐‘Ž1))(โˆ’1 + ๐‘’โˆ’2๐‘Ž1) + (โˆ’4 โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’2๐‘Ž1))(โˆ’1 + ๐‘’2๐‘Ž1) + (โˆ’12 โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’4๐‘Ž1))(โˆ’1 + ๐‘’4๐‘Ž1) = 0

(1 โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’โˆ’2๐‘Ž1))(โˆ’2๐‘Ž0๐‘’โˆ’2๐‘Ž1) + (โˆ’4 โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’2๐‘Ž1))(2๐‘Ž0๐‘’2๐‘Ž1) + (โˆ’12 โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’4๐‘Ž1))(4๐‘Ž0๐‘’4๐‘Ž1) = 0

After expansion and combining terms, we get 15 + (๐‘’โˆ’2๐‘Ž1 โˆ’ 4๐‘’2๐‘Ž1 โˆ’ 12๐‘’4๐‘Ž1) + ๐‘Ž

0(3 + ๐‘’โˆ’4๐‘Ž1 โˆ’ 2๐‘’โˆ’2๐‘Ž1 โˆ’ 2๐‘’2๐‘Ž1 โˆ’ ๐‘’4๐‘Ž1 + ๐‘’8๐‘Ž1) = 0

2๐‘Ž0[โˆ’๐‘Ž0๐‘’โˆ’4๐‘Ž1 + (๐‘Ž

0โˆ’ 1)๐‘’โˆ’2๐‘Ž1 โˆ’ (๐‘Ž0+ 4)๐‘’2๐‘Ž1 โˆ’ (๐‘Ž0 + 24)๐‘’4๐‘Ž1 + 2๐‘Ž0๐‘’8๐‘Ž1] = 0

Matlab solution using function solve: syms a0 a1

f1=15+(exp(-2*a1)-4*exp(2*a1)-12*exp(4*a1))+a0*(3+exp(-4*a1)... -2*exp(-2*a1)-2*exp(2*a1)-exp(4*a1)+exp(8*a1));

f2=2*a0*(-a0*exp(-4*a1)+(a0-1)*exp(-2*a1)-(a0+4)*exp(2*a1)... -(a0+24)*exp(4*a1)+2*a0*exp(8*a1));

Matlab returns a set of four solutions to the above minimization problem. The first thing we notice is that for nonlinear regression, minimizing LSE may lead to multiple solutions (multiple minima). The solutions for this particular problem are:

(23)

1. ๐‘Ž0 = ๐‘Ž1 = 0, which leads to: ๐‘ฆ = 0(1 โˆ’ ๐‘’0) = 0, or ๐‘ฆ = 0 (the ๐‘ฅ-axis). 2. ๐‘Ž0 = 0 and ๐‘Ž1 โ‰… โˆ’1.3610 + 1.5708๐‘–, which leads to: ๐‘ฆ = 0

3. ๐‘Ž0 = 0 and ๐‘Ž1 โ‰… 0.1186 + 1.5708๐‘–, which leads to: ๐‘ฆ = 0.

4. ๐‘Ž0 โ‰… 2.4979 and ๐‘Ž1 โ‰… 0.4410, which leads to ๐‘ฆ(๐‘ฅ) = 2.4979(1 โˆ’ ๐‘’0.441๐‘ฅ).

The solutions ๐‘ฆ(๐‘ฅ) = 0 and ๐‘ฆ(๐‘ฅ) = 2.4979(1 โˆ’ ๐‘’0.441๐‘ฅ) are plotted below. It is obvious that the optimal solution (in the LSE sense) is ๐‘ฆ(๐‘ฅ) = 2.4979(1 โˆ’ ๐‘’0.441๐‘ฅ).

Your turn: Solve the above system of two nonlinear equations employing Matlabโ€™s

fsolve.

Your turn: Fit the exponential model, ๐‘ฆ = ๐›ผ๐‘’๐›ฝ๐‘ฅ, to the data in the following table employing nonlinear least squares regression. Then, linearize the model and

determine the model coefficients by employing linear least squares regression (i.e., use the formulas derived in the first section or polyfit). Plot the solutions.

๐‘ฅ 0 1 2 3 4

๐‘ฆ 1.5 2.5 3.5 5.0 7.5

Ans. Nonlinear least squares fit:

๐‘ฆ = 1.61087๐‘’0.38358๐‘ฅ

(24)

Numerical Solution of the Non-Linear LSE Optimization

Problem: Gradient Search and Matlabโ€™s

fminsearch

Function

In the above example, we were lucky in the sense that the (symbolic-based) solve function returned the optimal solution for the optimization problem at hand. In more general non-linear LSE regression problems the models employed are complex and normally have more than two unknown coefficients. Here, solving (symbolically) for the partial derivatives of the error function becomes tedious and impractical.

Therefore, one would use numerically-based multi-variable optimization algorithms to minimize ๐ธ(๐’‚) = ๐ธ(๐‘Ž0, ๐‘Ž1, ๐‘Ž2, โ€ฆ ), which are extensions of the ones considered in Lectures 13 and 14.

One method would be to extend the gradient-search minimization function

grad_optm2 to handle a function with two variables. Recall that this version of the function approximates the gradients numerically, therefore there is no need to

determine the analytical expressions for the derivatives. For the case of two variables ๐‘Ž0 and ๐‘Ž1, the gradient-descent equations are

๐‘Ž0(๐‘˜ + 1) = ๐‘Ž0(๐‘˜) + ๐‘Ÿ ๐œ•

๐œ•๐‘Ž0๐ธ(๐‘Ž0, ๐‘Ž1)

๐‘Ž1(๐‘˜ + 1) = ๐‘Ž1(๐‘˜) + ๐‘Ÿ ๐œ•

๐œ•๐‘Ž1๐ธ(๐‘Ž0, ๐‘Ž1)

where โˆ’1 < ๐‘Ÿ < 0. Upon using the simple backward finite-difference approximation for the derivatives, we obtain

๐‘Ž0(๐‘˜ + 1) = ๐‘Ž0(๐‘˜) + ๐‘Ÿ๐ธ[๐‘Ž0(๐‘˜), ๐‘Ž1(๐‘˜)] โˆ’ ๐ธ[๐‘Ž0(๐‘˜ โˆ’ 1), ๐‘Ž1(๐‘˜)] ๐‘Ž0(๐‘˜) โˆ’ ๐‘Ž0(๐‘˜ โˆ’ 1)

๐‘Ž1(๐‘˜ + 1) = ๐‘Ž1(๐‘˜) + ๐‘Ÿ๐ธ[๐‘Ž0(๐‘˜), ๐‘Ž1(๐‘˜)] โˆ’ ๐ธ[๐‘Ž0(๐‘˜), ๐‘Ž1(๐‘˜ โˆ’ 1)] ๐‘Ž1(๐‘˜) โˆ’ ๐‘Ž1(๐‘˜ โˆ’ 1)

The following is a Matlab implementation (function grad_optm2d) of these iterative formulas.

(25)

The above function [with ๐‘Ž0(0) = 2, ๐‘Ž1(0) = 0.5 and ๐‘Ÿ = โˆ’10โˆ’4] returns the

same solution that was obtained above with solve [with, a minimum error value of ๐ธ(2.4976, 0.4410) = 0.4364]:

(26)

Matlab has an important built-in function for numerical minimization of nonlinear multivariable functions. The function name is fminsearch. The (basic) function call syntax is [a,fa] = fminsearch(f,a0), where f is an anonymous function, a0 is a vector of initial values. The function returns a solution vector โ€˜aโ€™ and the value of the

function at that solution, fa. Here is an application of fminsearch to solve the above non-linear regression problem [note how the unknown coefficients are represented as the elements of the vector, ๐‘Ž0 = a(1), ๐‘Ž1 = a(2), and are initialized at [0 0] for this problem].

A more proper way to select the initial search vector a = [๐‘Ž0 ๐‘Ž1] for the above

optimization problem is to solve a set of ๐‘˜ nonlinear equations that is obtained from forcing the model to go through ๐‘˜ points (selected randomly from the data set). Here, ๐‘˜ is the number of unknown model parameters. For example, for the above problem, we solve the set of two nonlinear equations

๐‘ฆ๐‘– โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’๐‘Ž1๐‘ฅ๐‘–) = 0

๐‘ฆ๐‘— โˆ’ ๐‘Ž0(1 โˆ’ ๐‘’๐‘Ž1๐‘ฅ๐‘—) = 0

where (๐‘ฅ๐‘–, ๐‘ฆ๐‘–) and (๐‘ฅ๐‘—, ๐‘ฆ๐‘—) are two distinct points selected randomly from the set of points being fitted. A numerical nonlinear equation solver can be used, say Matlabโ€™s fsolve, as shown below [here, the end points (โˆ’2,1) and (4, โˆ’12) were selected].

(27)

Your turn: The height of a person at different ages is reported in the following table.

x (age) 0 5 8 12 16 18

y (in) 20 36.2 52 60 69.2 70

Determine the parameters ๐‘Ž, ๐‘ and ๐‘ so that the following regression model is optimal in the LSE sense.

๐‘ฆ(๐‘ฅ) = ๐‘Ž

1 + ๐‘๐‘’โˆ’๐‘๐‘ฅ

Ans. ๐‘ฆ(๐‘ฅ) = 74.321

1+2.823๐‘’โˆ’0.217๐‘ฅ

Your turn: Employ nonlinear LSE regression to fit the function

๐‘ฆ(๐‘ฅ) = ๐พ

โˆš๐‘ฅ4 + (๐‘Ž2 โˆ’ 2๐‘)๐‘ฅ2 + ๐‘2

to the data

๐‘ฅ 0 0.5 1 2 3

๐‘ฆ 0.95 1.139 0.94 0.298 0.087

Plot the data points and your solution for ๐‘ฅ โˆˆ [0 6].

Ans. ๐‘ฆ(๐‘ฅ) = 0.888

(28)

As mentioned earlier, different initial conditions may lead to different local minima of the nonlinear function being minimized. For example, consider the function of two variables that exhibits multiple minima (refer to the plot):

๐‘“(๐‘ฅ, ๐‘ฆ) = โˆ’0.02 sin(๐‘ฅ + 4๐‘ฆ) โˆ’ 0.2 cos(2๐‘ฅ + 3๐‘ฆ) โˆ’ 0.3 sin(2๐‘ฅ โˆ’ ๐‘ฆ) + 0.4cos (๐‘ฅ โˆ’ 2๐‘ฆ)

(29)

A contour plot can be generated as follows (the local minima are located at the center of the blue contour lines):

The following are the local minima discovered by function grad_optm2d for the indicated initial conditions:

The same local minima are discovered by fminsearch when starting from the same initial conditions:

(30)

Note that for this limited set of searches, the solution with the smallest SSE value is (๐‘ฅโˆ—, ๐‘ฆโˆ—) = (0.0441, โˆ’1.7618), which represents a more optimal solution.

Your turn (Email your solution to your instructor one day before Test 3). Fit the following data

๐‘ฅ 1 2 3 4 5 6 7 8 9 10

๐‘ฆ 1.17 0.93 โˆ’0.71 โˆ’1.31 2.01 3.42 1.53 1.02 โˆ’0.08 โˆ’1.51 employing the model

๐‘ฆ(๐‘ฅ) = ๐‘Ž0 + ๐‘Ž1cos(๐‘ฅ + ๐‘1) + ๐‘Ž2cos (2๐‘ฅ + ๐‘2)

This problem can be solved employing nonlinear regression (think solution via fminsearch), or it can be linearized which allows you to use linear regression (think solution via pseudo-inverse). Hint: cos(๐‘ฅ + ๐‘) = cos(๐‘ฅ) cos(๐‘) โˆ’ sin(๐‘ฅ) sin(๐‘). Plot the data points and ๐‘ฆ(๐‘ฅ) on the same graph.

In practice, a nonlinear regression model can have hundreds or thousands of coefficients. Examples of such models are neural networks and radial-basis-function models that often involve fitting multidimensional data sets, where each ๐‘ฆ value depends on many variables,

๐‘ฆ(๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3โ€ฆ). Numerical methods such as gradient-based optimization methods are often used to solve for the regression coefficients associated with those high-dimensional models. For a reference, check Chapters 5 and 6 in the following textbook:

(31)

Solution of Differential Equations Based on LSE Minimization

Consider the second-order time-varying coefficient differential equation

๐‘ฆฬˆ(๐‘ฅ) + 1

5๐‘ฆฬ‡(๐‘ฅ) + 9๐‘ฅ

2๐‘ฆ(๐‘ฅ) = 0 with ๐‘ฆ(0) = 1 and ๐‘ฆฬ‡(0) = 2

defined over the interval ๐‘ฅ โˆˆ [0 1].

We seek a polynomial ๐‘ฆฬƒ(๐‘ฅ) = ๐‘Ž0 + ๐‘Ž1๐‘ฅ + ๐‘Ž2๐‘ฅ2 + ๐‘Ž3๐‘ฅ3 + ๐‘Ž4๐‘ฅ4 that approximates the solution, ๐‘ฆ(๐‘ฅ). In general, ๐‘ฆฬƒ(๐‘ฅ) does not have to be a polynomial. By applying the initial conditions to ๐‘ฆฬƒ(๐‘ฅ) we can solve for ๐‘Ž0 and ๐‘Ž1

๐‘ฆฬƒ(0) = ๐‘Ž0 = ๐‘ฆ(0) = 1 and ๐‘‘๐‘ฆฬƒ(0)

๐‘‘๐‘ฅ = ๐‘Ž1 = ๐‘ฆฬ‡(0) = 2

Now, we are left with the problem of estimating the remaining polynomial coefficients ๐‘Ž2, ๐‘Ž3, ๐‘Ž4 such that the residual

๐‘“(๐‘ฅ, ๐‘Ž2, ๐‘Ž3, ๐‘Ž4) = ๐‘‘ 2๐‘ฆฬƒ(๐‘ฅ) ๐‘‘๐‘ฅ2 + 1 5 ๐‘‘๐‘ฆฬƒ(๐‘ฅ) ๐‘‘๐‘ฅ + 9๐‘ฅ 2๐‘ฆฬƒ(๐‘ฅ)

is as close to zero as possible for all ๐‘ฅ โˆˆ [0 1]. We will choose to minimize the integral of the squared residual,

๐ผ(๐‘Ž2, ๐‘Ž3, ๐‘Ž4) = โˆซ [๐‘“(๐‘ฅ, ๐‘Ž2, ๐‘Ž3, ๐‘Ž4)]2๐‘‘๐‘ฅ

1 0

First, we compute the derivatives, ๐‘‘๐‘ฆฬƒ(๐‘ฅ)

๐‘‘๐‘ฅ and ๐‘‘2๐‘ฆฬƒ(๐‘ฅ) ๐‘‘๐‘ฅ2 , ๐‘‘๐‘ฆฬƒ(๐‘ฅ) ๐‘‘๐‘ฅ = 2 + 2๐‘Ž2๐‘ฅ + 3๐‘Ž3๐‘ฅ 2 + 4๐‘Ž 4๐‘ฅ3 ๐‘‘2๐‘ฆฬƒ(๐‘ฅ) ๐‘‘๐‘ฅ2 = 2๐‘Ž2 + 6๐‘Ž3๐‘ฅ + 12๐‘Ž4๐‘ฅ 2

(32)

๐‘“(๐‘ฅ, ๐‘Ž2, ๐‘Ž3, ๐‘Ž4) = ๐‘‘ 2๐‘ฆฬƒ(๐‘ฅ) ๐‘‘๐‘ฅ2 + 1 5 ๐‘‘๐‘ฆฬƒ(๐‘ฅ) ๐‘‘๐‘ฅ + 9๐‘ฅ 2๐‘ฆฬƒ(๐‘ฅ) = 2๐‘Ž2 + 6๐‘Ž3๐‘ฅ + 12๐‘Ž4๐‘ฅ2 +1 5(2 + 2๐‘Ž2๐‘ฅ + 3๐‘Ž3๐‘ฅ 2 + 4๐‘Ž 4๐‘ฅ3) + 9๐‘ฅ2(1 + 2๐‘ฅ + ๐‘Ž2๐‘ฅ2 + ๐‘Ž3๐‘ฅ3 + ๐‘Ž4๐‘ฅ4) = 2 5 + 9๐‘ฅ 2 + 18๐‘ฅ3 + ๐‘Ž 2(2 + 2 5๐‘ฅ + 9๐‘ฅ 4) + ๐‘Ž 3(6๐‘ฅ + 3 5๐‘ฅ 2 + 9๐‘ฅ5) + ๐‘Ž4(12๐‘ฅ2 +4 5๐‘ฅ 3 + 9๐‘ฅ6) or, ๐‘“(๐‘ฅ, ๐‘Ž2, ๐‘Ž3, ๐‘Ž4) = 2 5 + 9๐‘ฅ 2 + 18๐‘ฅ3 + ๐‘Ž 2(2 + 2 5๐‘ฅ + 9๐‘ฅ 4) + ๐‘Ž3(6๐‘ฅ +3 5๐‘ฅ 2 + 9๐‘ฅ5) + ๐‘Ž 4(12๐‘ฅ2 + 4 5๐‘ฅ 3 + 9๐‘ฅ6)

The following Matlab session shows the results of using fminsearch to solve for the coefficients ๐‘Ž2, ๐‘Ž3, ๐‘Ž4 that minimize the error function

๐ผ(๐‘Ž2, ๐‘Ž3, ๐‘Ž4) = โˆซ [๐‘“(๐‘ฅ, ๐‘Ž2, ๐‘Ž3, ๐‘Ž4)]2๐‘‘๐‘ฅ

1 0

Note: the first component a(1) in the solution vector โ€™aโ€™ is redundant; it is not used in function ๐ผ.

Therefore, the optimal solution is

(33)

The following plot compares the โ€œdirectโ€ numerical solution (red trace) to the minimum residual solution (blue trace). We will study the very important topic of numerical solution of differential equations in Lecture 22 (e.g., employing ode45).

Your turn: Consider the first-order, nonlinear, homogeneous differential equation with varying coefficient ๐‘ฆฬ‡(๐‘ฅ) + (2๐‘ฅ โˆ’ 1)๐‘ฆ2(๐‘ฅ) = 0, with ๐‘ฆ(0) = 1 and ๐‘ฅ โˆˆ [0 1].

Employ the method of minimizing the squared residual to solve for the approximate solution

๐‘ฆฬƒ(๐‘ฅ) = ๐‘Ž0 + ๐‘Ž1๐‘ฅ + ๐‘Ž2๐‘ฅ2+ ๐‘Ž3๐‘ฅ3 + ๐‘Ž4๐‘ฅ4 over the interval ๐‘ฅ โˆˆ [0 1]. Plot ๐‘ฆฬƒ(๐‘ฅ) and the exact solution ๐‘ฆ(๐‘ฅ) given by

๐‘ฆ(๐‘ฅ) = 1

๐‘ฅ2โˆ’ ๐‘ฅ + 1 Ans. ๐‘ฆฬƒ(๐‘ฅ) = 1 + 0.964๐‘ฅ + 0.487๐‘ฅ2โˆ’ 2.903๐‘ฅ3+ 1.452๐‘ฅ4

Your turn: Determine the parabola ๐‘ฆ(๐‘ฅ) = ๐‘Ž๐‘ฅ2 + ๐‘๐‘ฅ + ๐‘ that approximates the cubic ๐‘”(๐‘ฅ) = 2๐‘ฅ3โˆ’ ๐‘ฅ2+ ๐‘ฅ + 1 (over the interval ๐‘ฅ โˆˆ [0 2]) in the LSE sense. In other words, determine the coefficients ๐‘Ž, ๐‘ and ๐‘ such that the following error function is minimized,

๐ธ(๐‘Ž, ๐‘, ๐‘) = โˆซ [๐‘”(๐‘ฅ) โˆ’ ๐‘ฆ(๐‘ฅ)]2๐‘‘๐‘ฅ

2 0

Solve the problem in two ways: (1) analytically; and (2) Employing fminsearch after evaluating the integral. Plot ๐‘”(๐‘ฅ) and ๐‘ฆ(๐‘ฅ) on the same set of axis.

(34)

Appendix: Explicit Matrix Formulation for the Quadratic Regression Problem

Earlier in this lecture we have derived the ๐‘š-degree polynomial LSE regression formulation as follows, โˆ‘ ( โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) = ๐‘› ๐‘–=1 โˆ‘ ๐‘ฆ๐‘– ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘– ( โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) = โˆ‘ ๐‘ฅ๐‘–๐‘ฆ๐‘– ๐‘› ๐‘–=1 ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–2( โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) = ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–2๐‘ฆ๐‘– ๐‘› ๐‘–=1 . . โˆ‘ ๐‘ฅ๐‘–๐‘š( โˆ‘ ๐‘ฅ๐‘–๐‘šโˆ’๐‘—+1 ๐‘š+1 ๐‘—=1 ๐‘Ž๐‘šโˆ’๐‘—+1) = ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–๐‘š๐‘ฆ๐‘– ๐‘› ๐‘–=1

Now, setting ๐‘š = 2 (designating a quadratic regression model) leads to three equations: โˆ‘ (โˆ‘ ๐‘ฅ๐‘–3โˆ’๐‘— 3 ๐‘—=1 ๐‘Ž3โˆ’๐‘—) = ๐‘› ๐‘–=1 โˆ‘ ๐‘ฆ๐‘– ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–(โˆ‘ ๐‘ฅ๐‘–3โˆ’๐‘— 3 ๐‘—=1 ๐‘Ž3โˆ’๐‘—) = โˆ‘ ๐‘ฅ๐‘–๐‘ฆ๐‘– ๐‘› ๐‘–=1 ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–2(โˆ‘ ๐‘ฅ๐‘–3โˆ’๐‘— 3 ๐‘—=1 ๐‘Ž3โˆ’๐‘—) = ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–2๐‘ฆ๐‘– ๐‘› ๐‘–=1

(35)

It can be shown that the above equations (Your Turn) can be cast in matrix form as, [ ๐‘› โˆ‘ ๐‘ฅ๐‘– ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–2 ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘– ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–2 ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–3 ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–2 ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–3 ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–4 ๐‘› ๐‘–=1 ] [ ๐‘Ž0 ๐‘Ž1 ๐‘Ž3] = [ โˆ‘ ๐‘ฆ๐‘– ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–๐‘ฆ๐‘– ๐‘› ๐‘–=1 โˆ‘ ๐‘ฅ๐‘–2๐‘ฆ๐‘– ๐‘› ๐‘–=1 ]

This is a 3x3 linear system that can be solved using the methods of Lectures 15 & 16. With this type of formulation, care must be taken as to employ numerical solution methods that can handle ill-condition coefficient matrices; note the dominance of the (all positive) coefficients in the last row of the matrix.

The following two-part video (part1, part2) derives the above result (directly) from basic principles. Here is an example of quadratic regression: Part 1 Part 2.

Example. Employ the above formulation to fit a parabola to the following data.

๐‘ฅ 0 5 8 12 16 18

๐‘ฆ 20 36.2 52 60 69.2 70

(36)

The code that generated the above result is shown below.

Your turn: Verify the above solution employing Polyfit. Repeat employing the pseudo-inverse solution a = pinv(Z)*y applied to the formulation,

๐™๐š = [ ๐‘ฅ1๐‘š ๐‘ฅ1๐‘šโˆ’1 โ‹ฏ ๐‘ฅ1 1 ๐‘ฅ2๐‘š ๐‘ฅ2๐‘šโˆ’1 โ‹ฏ ๐‘ฅ2 1 โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ โ‹ฎ ๐‘ฅ๐‘›โˆ’1๐‘š ๐‘ฅ๐‘›โˆ’1๐‘šโˆ’1 โ‹ฏ ๐‘ฅ๐‘›โˆ’1 1 ๐‘ฅ๐‘›๐‘š ๐‘ฅ๐‘›๐‘šโˆ’1 โ‹ฏ ๐‘ฅ๐‘› 1][ ๐‘Ž๐‘š ๐‘Ž๐‘šโˆ’1 โ‹ฎ ๐‘Ž1 ๐‘Ž0 ] = [ ๐‘ฆ1 ๐‘ฆ2 โ‹ฎ ๐‘ฆ๐‘›โˆ’1 ๐‘ฆ๐‘› ]

(37)

Your turn: Extend the explicit matrix formulation of this appendix to a cubic function ๐‘“(๐‘ฅ) = ๐‘Ž0 + ๐‘Ž1๐‘ฅ + ๐‘Ž2๐‘ฅ2 + ๐‘Ž3๐‘ฅ3 and use it to determine the polynomial coefficients for the data set from the last example. Compare (by plotting) your solution to the following solution that was obtained using nonlinear regression,

๐‘ฆ(๐‘ฅ) = 74.321

1+2.823๐‘’โˆ’0.217๐‘ฅ.

Verify your solution employing Polyfit.

(38)

References

Related documents

Crude extracts were prepared from the leaves of five medicinal plants viz., Aegle marmelos, Chloris virgata, Collinsonia anisata, Feronia limonia and Cassia auriculata using

hybrid bioreactor landfill, the combination of semi-aerobic and anaerobic phases resulted effective in removing the organic matter; then, aerobic phase further

Rossman is Professor of Mineralogy in the Division of Geological and Planetary Sciences, California Institute of Technology, Pasadena, California, and President of the

With the scenario given above in mind, we contrast two APIs that are based upon the same programming language and virtual machine - Java core reflection and the Java Debugger

Contractor shall use its best efforts to ensure that any Contractor or Subcontractor owned cargo, equipment and supplies is adequately insured by the appropriate

In this lab activity you will configure the routing protocol OSPF using the network shown in the Topology Diagram .The segments of the network have been subnetted using VLSM.. OSPF

Planet's General Characteristics in this House is : : Moon in your seventh house projects its influence by presenting the gift ofย  Moon in your seventh house projects its influence

The World Health Organisationโ€™s International Classification of Diseases system (ICD-10) definition 1 of Post-Traumatic Stress Disorder (PTSD) states that this disorder arises as