The bivariate LISA is a straightforward extension of the LISA function- ality to two different variables, one for the location and another for the average of its neighbors. Invoke this function from the menu as Space > Multivariate LISA, as in Figure 21.11 on p. 163, or click the matching toolbar button. This brings up the same variable settings dialog as before. Select A987 as the y-variable and A988 as the x-variable, in the same way as depicted in Figure 21.3 (p. 157). Note that the y-variable is the one with the spatial lag (i.e., the average for the neighbors). As before, select ozrook.GALas the spatial weights file (see Figure 21.4 on p. 157).
In the results window dialog, check Cluster Map, as in Figure 21.12 on p. 163. Click OK to generate the bivariate LISA cluster map, shown
Figure 21.10: Moran scatter plot matrix for ozone in 987 and 988.
in Figure 21.13 on p. 164. Note that this shows local patterns of spatial correlation at a location between ozone in August 1998 and the average for its neighbors in July 1998. Switching the selection of y and x variables in the variables settings dialog would create a LISA map of ozone at a location in July 1998 and the average for its neighbors in August 1998. The interpretation is similar to that of a space-time scatter plot. Compare the two bivariate LISA maps to their cross-sectional counterparts.
Figure 21.11: Bivariate LISA function.
Figure 21.12: Bivariate LISA results window options.
21.5
Practice
Several of the sample data sets contain variables observed at multiple points in time. Apart from the many measures included in the Los Angeles ozone data set, this includes the St Louis homicide data (stl hom.shpwithFIPSNO as the Key), the SIDS data (sids2.shp with FIPSNO as the Key), and the Ohio lung cancer data (ohlung.shpwithFIPSNOas theKey). Alternatively, consider a bivariate analysis between two variables that do not contain a time dimension.
Figure 21.13: Bivariate LISA cluster map for ozone in 988 on neighbors in 987.
Exercise 22
Regression Basics
22.1
Objectives
This exercise begins the review of spatial regression functionality inGeoDa, starting with basic concepts. Methodological background for multivariate regression analysis can be found in many econometrics texts and will not be covered here. The discussion of regression diagnostics and specific spatial models is left for Exercises 23 to 25.
At the end of the exercise, you should know how to: • set up the specification for a linear regression model • run ordinary least squares estimation (OLS)
• save OLS output to a file
• add OLS predicted values and residuals to the data table • create maps with predicted values and residuals
More detailed information on these operations can be found in the Release Notes, pp. 45–56.
22.2
Preliminaries
In this exercise, we will use theclassic Columbus neighborhood crime data (Anselin 1988, pp. 188–190) contained in the columbus.shp sample data set (use POLYIDas the Key). The base map should be as in Figure 22.1 on p. 166.
Figure 22.1: Columbus neighborhood crime base map.
Figure 22.2: Regression with- out project.
Figure 22.3: Regression inside a project.
InGeoDa, the regression functionality can be invoked without opening a project. This is particularly useful in the analysis of large data sets (10,000 and more) when it is better to avoid the overhead of linking and brushing the data table. To start a regression from theGeoDa opening screen (Figure 1.1 on p. 2), select Methods > Regress, as in Figure 22.2. Alternatively, when a project is open (i.e., after a shape file has been loaded), invoke theRegress command directly from the main menu bar, as in Figure 22.3. This brings up the default regression title and output dialog, shown in Figure 22.4 on p. 167.
There are two important aspects to this dialog, the output file name and the options for output. The Report Title can be safely ignored as it is not currently used. The file specified inOutput file namewill contain the regression results in a rich text format (RTF) file in the current working directory.1 The default isRegression.OLS, which is usually not very mean-
1
A rich text format file is a text file with additional formatting commands. It is often used as a file interchange format for Microsoft Word documents. It can be opened by many simple text editors as well, such as Wordpad, butnot Notepad.
Figure 22.4: Default regression title and output dialog.
Figure 22.5: Standard (short) output option.
Figure 22.6: Long output op- tions.
ingful as a file name. Instead, enter something that gives a hint about the type of analysis, such ascolumbus.rtf, shown in Figures 22.5 and 22.6.2
The dialog also contains a number of check boxes to specify long output options. The default is to leave them unchecked, as in Figure 22.5. Long output is created by checking the respective boxes, such asPredicted Value and Residualand Coefficient Variance Matrixin Figure 22.6.3
The option for predicted values and residuals should be used with cau- tion, especially for large data sets. It adds two vectors to the regression output window and file whose length equals the number of observations. This can quickly get out of hand, even for medium sized data sets.
2
When you run several regressions, make sure to specify adifferentoutput file for each analysis, otherwise all the results will be written to the same file (usually the default).
3
Figure 22.7: Regression model specification dialog.
The coefficient variance matrix provides not only the variance of the estimates (on the diagonal) but also all covariances. This matrix can be used to carry out customized tests of contraints on the model coefficients outside ofGeoDa. If this is not of interest, this option can safely be left off. Also, it is important to note that the long output options donot need to be checked in order to add predicted values or residuals to the data table (see Section 22.4.1). They only affect what is listed in the output window and file.
Click on the OK button in the title and output dialog to bring up the regression model specification dialog, shown in Figure 22.7.
Figure 22.8: Selecting the dependent variable.