Hedonic Regression Results: Micro-Scale 144

CHAPTER 6: ANALYSIS AND RESULTS 102

6.5 Survival Analysis Results 125

6.5.3 Hedonic Regression Results: Micro-Scale 144

A sample of 158 micro-scale parcels (greater than two acres and less than 10 acres) sold in 2000 was used to estimate a hedonic regression model of the same type and using the same specification as at the macro-scale (see Section 6.5.1). The results of the initial model are presented in Table 6.14 below. In the macro-scale model stream frontage, slope, shopping potential, and parcel size were the only significant predictors of sales price. However, at the micro-scale, distance to freeway access points, residential zoning designation, and per capita income are all highly significant in addition to parcel size. Standardized regression

coefficients indicate that the most important predictors (in descending order) are: residential zoning designation, parcel size, per capita income, and distance to nearest freeway access point. Variance inflation factors were calculated for each of the independent variables and none of these values approached the adopted threshold value (maximum value of 3.59). The residuals are not normally distributed based on the normal quantile-quantile plot shown in the first panel of Figure 6.15, and the residuals plot (response minus fitted values) in the second panel suggests the presence of potential outliers.

145

A total of six observations (right panel of Figure 6.15) were selected for further investigation based on their DFFITS values (exceeded threshold of 2 p/n). These seven anomalous parcels ranged in sales price from $385,000 to $2.5 million and the property records yielded no unusual information or insights that would justify their removal from the sample. A Breusch-Pagan test revealed evidence of heteroskedasticity (χ2 = 46.93, p = 0.0000)

although, this conclusion is complicated by a few key considerations. First, the residuals plot in the second panel of Figure 6.15 does not fit any of the classic archetypes associated with non-constant error variance. Aside from the cluster of observations in the bottom center of the graph, the residuals plot looks reasonably healthy. As a precaution, the seven

observations with residuals less than negative two (left panel of Figure 6.15) were examined further. Not surprisingly, the sales prices of these parcels ranged from $1,000 to $16,000 and the negative residual implies that the model overestimated their value. Based on information in the property records, two of these parcels were transferred from a developer to a home owners association, two represent intra-family transactions, one was sold to the county, one is now an office building, and the remaining parcel was sold to a privately owned water utility. Unfortunately, there were no discernible patterns in the independent variable values of these observations that would suggest the incorporation of a specific (omitted) covariate as a means of alleviating heteroskedasticity, if it in fact existed in the data.

As shown in Figure 6.16, these two sets of parcels (left and right panels of Figure 6.15) that have been identified as potentially problematic are located primarily in the

southern portion of the county and within incorporated areas. This is interesting because the effects of spatial autocorrelation within a regression context are similar to those of

146

Table 6.14: Micro-Scale OLS Parameter Estimates (N = 158).

Measure Estimate Std. Error Pr ( > | t | ) Signif. (Intercept) 10.2569 0.9690 0.0000 ***

Physical Characteristics

Water body frontage 0.3606 0.5269 0.4949

Stream frontage -0.3929 0.2099 0.0633 . Slope -0.1235 0.0759 0.1060 Poor soils -0.4013 0.2107 0.0589 . Wetlands -0.6932 0.6868 0.3146 Forest cover -0.0331 0.2492 0.8947 Parcel size 0.8027 0.2036 0.0001 *** Accessibility Employment potential 0.8724 2.2811 0.7027 Shopping potential 1.1042 0.8913 0.2174

Distance to freeway ramp -0.2604 0.1158 0.0261 * Policy Context School district 32 -0.1567 0.2143 0.4658 Tax rate 1.1523 0.7960 0.1499 Unincorporated 0.0989 0.3344 0.7680 Zoning -1.1375 0.2122 0.0000 *** Demographics Population density -0.0570 0.0531 0.2848 Income 0.0221 0.0079 0.0056 ** Adjusted R-squared 0.3758 F-statistic 6.908 on 16 and 141 DF Significance codes: ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1

32_{Districts where high schools did not meet state performance guidelines during the 2001-2002 year (i.e.,}

147

Figure 6.15: Diagnostic Plots For Micro-Scale Regression.

(Kelejian and Robinson, 2004: 80). Within a linear regression context, the existence of spatial autocorrelation is problematic for statistical inference because “the estimators are inefficient, and the variance estimator is downwards biased, thereby inflating the observed value of R2” (Cliff and Ord, 1981: 199). In fact, spatial processes often induce

heteroskedasticity if appropriate measures capturing the non-stationarity are not included in the model or through non-stationarity of functional forms or parameters in space (Anselin, 1988: 119). Based on simulation studies, Anselin (1988: 121) concludes that the “when the error terms fail to be independent, the distributional properties of several parametric tests for heteroskedasticity are no longer valid” and “the Breusch-Pagan test in particular is sensitive

148

to this.” In light of this warning, a heteroskedasticity-consistent covariance matrix33is estimated for the model and the results are presented in Table 6.15. Under this scenario, soil drainage capacity becomes significant at 0.05 alpha level, but the overall inference is

essentially unchanged. These results coupled with the lack of a clear pattern in the residuals plot (Figure 6.15), leads to the conclusion that the departure from normality and potential presence of spatial autocorrelation are perhaps more legitimate concerns.

The standard test for the presence of spatial autocorrelation in OLS residuals is the Moran’s I statistic, which is asymptotically normally distributed (Cliff and Ord, 1981: 205). However, in the present case we have strong evidence from the normal quantile-quantile plot and a Shapiro-Wilk test (W = 0.9189, p = 0.0000) that the residuals are not normal.

Fortunately, Tiefelsdorf and Boots (1995) derived a modified version of Moran’s I that is robust to non-normality, but first the spatial weights matrix must be selected using the procedure described above for the macro-scale model. This approach begins with plotting a correlogram of the dependent variable (micro-scale), as shown in Figure 6.17.

149

150

Table 6.15: Micro-Scale Heteroskedasticity-Consistent Parameter Estimates (N = 158). Measure Estimate Std. Error Pr ( > | t | ) Signif.

(Intercept) 10.2569 0.9103 0.0000 ***

Physical Characteristics

Water body frontage 0.3606 0.7203 0.6170

Stream frontage -0.3929 0.2132 0.0670 . Slope -0.1235 0.0738 0.0960 . Poor soils -0.4013 0.2023 0.0490 * Wetlands -0.6932 1.1627 0.5520 Forest cover -0.0331 0.3007 0.9130 Parcel size 0.8027 0.1811 0.0000 *** Accessibility Employment potential 0.8724 2.8537 0.7600 Shopping potential 1.1042 1.1085 0.3210

Distance to freeway ramp -0.2604 0.1152 0.0250 * Policy Context School district 34 -0.1567 0.2464 0.5260 Tax rate 1.1523 0.8680 0.1860 Unincorporated 0.0989 0.3615 0.7850 Zoning -1.1375 0.2143 0.0000 *** Demographics Population density -0.0570 0.0827 0.4920 Income 0.0221 0.0075 0.0040 ** Adjusted R-squared 0.4394 F-statistic 11.46 on 16 and 141 DF Significance codes: ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1

34_{Districts where high schools did not meet state performance guidelines during the 2001-2002 year (i.e.,}

151

Figure 6.17: Correlogram Of Logged Sales Price At Micro-Scale.

The range of spatial autocorrelation in the dependent variable falls below zero at

approximately 4.9 miles, rises, then declines again at roughly 6.25 miles before finally falling below zero at the 9 mile mark. Next, a series of binary connectivity matrices ranging from one to twelve in one-mile increments were created as part of the first phase of the spatial weights matrix selection process. The same eigenvector extraction and stepwise regression procedure was conducted and the results35 are presented in Table 6.16.

35_{Threshold distances less than three miles resulted in unconnected observations (zero neighbors).}

0 5 10 15 -0.10 -0.05 0.00 0.05 0.10 0.15

Distance Class (miles) 4.9 Miles 6.25 Miles 9 Miles

152

Table 6.16: Selection Procedure For Micro-Scale Connectivity Matrices.

Distance Raw AIC AICC

1 Mile ─ ─ 2 Miles ─ ─ 3 Miles -102.19 352.59 4 Miles -49.20 161.42 5 miles -90.83 366.31 6 Miles -195.46 476.50 7 Miles -125.59 281.50 8 Miles -131.96 394.30 9 Miles -8.05 174.68 10 Miles -91.75 127.53 11 Miles -141.66 394.12 12 Miles -66.76 443.48

Based on the corrected AIC values, the ten mile threshold is the most plausible of the candidates examined (minimizes criterion) and this conclusion is also consistent with the correlogram of the dependent variable presented in Figure 6.16 above. Next, the same set of weighting and coding schemes were evaluated during phase two of the spatial weights matrix

Table 6.17: Micro-Scale Selection Procedure For Weighting And Coding Schemes. Weighting Scheme Coding Scheme AICC

IDW B 615.52

IDW2 B 40.92

Griffith and Lagona (1998) B 8.73

IDW C 2.54

IDW2 _{C 1.70}

Griffith and Lagona (1998) C 1.52

IDW W 3.92

IDW2 _{W 21.41}

Griffith and Lagona (1998) W 1.51

IDW S 2.91

IDW2 _{S 38.08}

153

selection process and the results are presented in Table 6.17. Based on the corrected AIC values, the Griffith and Lagona (1998) weighting function applied to the ten mile threshold for establishing connectivity, with the W-coding scheme is the spatial weights matrix choice for the micro-scale model. A Moran’s I test using the selected matrix proved evidence of a very small degree of positive spatial autocorrelation in the residuals (I = 0.062, p = 0.001). A thorough analysis of the macro-scale and micro-scale regression models led to the conclusion that the OLS estimates in both cases, are acceptable. However, in order to

evaluate the predictive capacity of the OLS parameters, a more practical approach is to make an out-of-sample prediction. A second sample of vacant land parcels sold in 2001 were identified and the same set of independent variables were prepared for these parcels. The parameter estimates for each of the independent variables from Tables 6.6 and 6.14 were used to predict the logged sales price for parcels in the new 2001 sample. The predictive capability of each model was assessed using the root mean squared error (RMSE) of the actual logged sales price and the model predicted logged sales price36. The RMSE for the out-of-sample prediction was 0.972 at the micro-scale and 1.213 at the macro-scale. As a point of reference, the RMSE for the original hedonic regression sample was 1.047 at the micro-scale and 0.746 at the macro-scale. The implication is the micro-scale estimates performed better than the macro-scale estimates for the out-of-sample prediction application. This limited test of the predictive capacity of the parameter estimates should be interpreted with caution, but indicates a slight decrease in the error at the micro-scale and an increase in error at the macro-scale (relative to the initial hedonic regression model). The parameter estimates were then used to predict sales price (market value) of each parcel in the macro-

36_{RMSE was calculated as the square root of the variance of the residuals and can be interpreted in the same}

154

scale and micro-scale samples at the beginning of the study period. These predicted values were then used as an independent variable in the discrete-time hazard models. At the macro- scale, predicted values ranged from $13,435.17 to $35,627,592.49 and from $14,183.22 to $4,622,692.98 at the micro-scale for an individual land parcel.

In document Scale effects and the determinants of parcel subdivision : a discrete-time hazard analysis (Page 158-168)