Predicting numeric values: regression
8.2 Locally weighted linear regression
One problem with linear regression is that it tends to underfit the data. It gives us the lowest mean-squared error for unbiased estimators. With the model underfit, we aren’t getting the best predictions. There are a number of ways to reduce this mean- squared error by adding some bias into our estimator.
One way to reduce the mean-squared error is a technique known as locally weighted linear regression (LWLR). In LWLR we give a weight to data points near our data point of interest; then we compute a least-squares regression similar to section 8.1. This type of regression uses the dataset each time a calculation is needed, similar to kNN. The solution is now given by
where W is a matrix that’s used to weight the data points.
LWLR uses a kernel something like the kernels demonstrated in support vector machines to weight nearby points more heavily than other points. You can use any ker- nel you like. The most common kernel to use is a Gaussian. The kernel assigns a weight given by
This builds the weight matrix W, which has only diagonal elements. The closer the data point x is to the other points, the larger w(i,i) will be. There also is a user-defined constant k that will determine how much to weight nearby points. This is the only parameter that we have to worry about with LWLR. You can see how different values of k change the weights matrix in figure 8.4.
wˆ = XTWX–1XTWy w i,i exp x i x – 2k – 2 --- =
Figure 8.4 Plot showing the original data in the top frame and the weights ap- plied to each piece of data (if we were forecasting the value of x=0.5.) The sec- ond frame shows that with k=0.5, most of the data is included, whereas the bot- tom frame shows that if k=0.01, only a few local points will be included in the regression.
161
Locally weighted linear regression
To see this in action, open your text editor and add the code from the following listing to regression.py.
def lwlr(testPoint,xArr,yArr,k=1.0): xMat = mat(xArr); yMat = mat(yArr).T m = shape(xMat)[0]
weights = mat(eye((m))) for j in range(m):
diffMat = testPoint - xMat[j,:] weights[j,j] = exp(diffMat*diffMat.T/(-2.0*k**2)) xTx = xMat.T * (weights * xMat)
if linalg.det(xTx) == 0.0:
print "This matrix is singular, cannot do inverse" return
ws = xTx.I * (xMat.T * (weights * yMat)) return testPoint * ws def lwlrTest(testArr,xArr,yArr,k=1.0): m = shape(testArr)[0] yHat = zeros(m) for i in range(m): yHat[i] = lwlr(testArr[i],xArr,yArr,k) return yHat
The code in listing 8.2 is used to generate a yHat estimate for any point in the x space. The function lwlr() creates matrices from the input data similar to the code in list- ing 8.1; then it creates a diagonal weights matrix called weights.
B
The weights matrix is a square matrix with as many elements as data points. This assigns one weight to each data point. The function next iterates over all of the data points and computes a value, which decays exponentially as you move away from the testPoint.C
The input k con- trols how quickly the decay happens. After you’ve populated the weights matrix, you can find an estimate for testPoint similar to standRegres().The other function in listing 8.2 is lwlrTest(), which will call lwlr() for every point in the dataset. This is helpful for evaluating the size of k.
Let’s see this in action. After you’ve entered the code from listing 8.2 into regres- sion.py, save it and type the following in the Python shell:
>>> reload(regression)
<module 'regression' from 'regression.py'>
If you need to reload the dataset, you can type in
>>> xArr,yArr=regression.loadDataSet('ex0.txt')
You can estimate a single point with the following:
>>> yArr[0] 3.1765129999999999 >>> regression.lwlr(xArr[0],xArr,yArr,1.0) matrix([[ 3.12204471]]) >>> regression.lwlr(xArr[0],xArr,yArr,0.001) matrix([[ 3.20175729]])
Listing 8.2 Locally weighted linear regression function
Create diagonal matrix
B
Populate weights with exponentially decaying valuesC
To get an estimate for all the points in our dataset, you can use lwlrTest():
>>> yHat = regression.lwlrTest(xArr, xArr, yArr,0.003)
You can inspect yHat, so now let’s plot these estimates with the original values. Plot needs the data to be sorted, so let’s sort xArr:
xMat=mat(xArr)
>>> srtInd = xMat[:,1].argsort(0) >>> xSort=xMat[srtInd][:,0,:]
Now you can plot this with Matplotlib:
>>> import matplotlib.pyplot as plt >>> fig = plt.figure() >>> ax = fig.add_subplot(111) >>> ax.plot(xSort[:,1],yHat[srtInd]) [<matplotlib.lines.Line2D object at 0x03639550>] >>> ax.scatter(xMat[:,1].flatten().A[0], mat(yArr).T.flatten().A[0] , s=2, c='red') <matplotlib.collections.PathCollection object at 0x03859110> >>> plt.show()
You should see something similar to the plot in the bottom frame of figure 8.5. Fig- ure 8.5 has plots for three different values of k. With k=1.0, the weights are so large that they appear to weight all the data equally, and you have the same best-fit line as using standard regression. Using k=0.01 does a much better job of capturing the underlying pattern in the data. The bottom frame in figure 8.5 has k=0.003. This is too noisy and fits the line closely to the data. The bottom panel is an example of overfitting, whereas the top panel is an example of underfitting. You’ll see how to quantitatively measure overfitting and underfitting in the next section.
Figure 8.5 Plot showing locally weighted lin- ear regression with three smoothing values. The top frame has a smoothing value of k=1.0, the middle frame has k=0.01, and the bottom frame has k=0.003. The top value of k is no better than least squares. The middle value cap- tures some of the underlying data pattern. The bottom frame fits the best-fit line to noise in the data and results in overfitting.
163
Example: predicting the age of an abalone
One problem with locally weighted linear regression is that it involves numerous com- putations. You have to use the entire dataset to make one estimate. Figure 8.5 demon- strated that using k=0.01 gave you a good estimate of the data. If you look at the weights for k=0.01 in figure 8.4, you’ll see that they’re near 0 in for most of the data points. You could save a lot of computing time by avoiding these calculations.
Now that you’ve seen two methods of finding best-fit lines, let’s put it to use pre- dicting the age of an abalone.