2.6 Example 2: The Effect of Median Income on State-Level Voting in U.S.
2.6.2 Results
Figure 2.5 contains the estimated effect of income on state-level voting in each election. The open dots represent OLS coefficient point estimates in a particular year, and the vertical line through each point represents the 95% confidence interval for the estimate. The moving solid line represents the point estimates produced by BEER, and the shaded region indicates the 95% confidence intervals for these estimates.
Figure 2.5: OLS and BEER Estimates of the Between Effect of Median State Income.
Year Ef fec t of Med ian In come on Dem ocr at St at ewi d e V ot esh ar e
The individual regressions confirm the finding of Gelman et al that no significant effect of income is visible until later in the time series. Considered individually, the regressions do not demonstrate a significant result until the 1992 election. However, taken together, the regression results describe a situation in which income moves from a very small effect in the 1960s and 1970s to a much larger positive effect by the 2004 election. Since 1980, the between effect of income has grown steadily larger. The individual regressions are too conservative, since they do not individually confirm the larger story that their point estimates listed consecutively describe. BEER leverages this information, and
shows a significant and positive effect of income dating back to the 1968 election. This effect is estimated by BEER to have grown between the 1980 and 1984 elections, and monotonically since 1988.
If we apply a cluster analysis to the similarity matrix derived by BEER for the time points, and find a five group solution, then the clusters are defined as listed in table 2.5. As was the case with the regional authority example in section 2.5, even though the
Table 2.5: Groups from Average Linkage Cluster Analysis on Year Dissimilarity, Five Group Solution Group Years 1 1964, 1968 2 1976, 1980 3 1984 4 1972, 1988, 1992, 1996 5 2000, 2004
similarity scores do not consider how proximate two time points are when calculating their similarity, the groups derived in the data largely contain consecutive time points. The only exception is 1972, which aligns more closely with 1988, 1992, and 1996. The fact that consecutive time points are also the most similar is evidence for the theory that the causal process which translates states’ differences in income into differences in voting is changing over this time frame, slowly and deterministically. In other words, income is becoming more important over time as a factor which explains how the states vote differently.
The effect of income on state-level voting within regions
A subsequent research question may ask whether the relationship between income and voting really exists between states, or if this effect is manifest more accurately between
Figure 2.6: OLS and BEER Estimates of the Between Effect of Median State Income and Updated Sample Sizes.
Results for the Northeast Only Results for the South Only
Updated Nfor the Northeast Updated Nfor the South
Note: the non-updated sample size for the Northeast is 9, including CT, ME, MA, NH, NJ, NY, PA, RI, and VT. The non-updated sample size for the South is 10, including AL, AR, GA, KY, LA, MS, NC, SC, TN, and VA.
to both income and voting, but there may not be a real relationship between income and voting within these regions. We can begin to account for this possibility by including fixed effects for regions in the model. In that case, the model is now asking whether
richer states vote for the Democrat in greater proportions than the average for the re- gion. Some regions, however, have very few states. California makes up most of what we usually think of as the west coast region, with Oregon, Washington, and maybe Arizona included. With so few observations in the regions, the available degrees of freedom are quickly used up. Taken to a greater extreme, we might want to interact the income effect with the regions if we posit that income has a greater effect in some regions than others. We may even want to run the regression selectively only on the observations within a particular region. In figure 2.6, the results for two subsequent analyses are illustrated. On the left, the model is run just for the 9 states in the Northeast, and on the right the model is run just for the 10 states in the South.
Within each time point, there is a sample size of 9 for the Northeast and 10 for the South. These sample sizes are not big enough to be able to distinguish between a real effect and a null effect in most cases, so individual regressions are not very informative. BEER, however, uses information from proximate and similar time points to update the
N within each time point. The individual OLS point estimates and 95% confidence intervals are listed in the top two graphs of figure 2.6, along with the BEER point estimates and 95% confidence intervals. Notice how BEER derives a smaller confidence interval, but fails in either region to return the same positive and significant result that was derived for all 50 states and DC. In fact, BEER estimates the effect of income on voting in the South to be slightly but significantly negative since 1988. Since the theory is not supported within these regions while using more of the available information, we can be more confident that the relationship actually exists between regions rather than states. It is less likely that the null results are due simply to a lack of data.
The bottom two graphs in figure 2.6 show how BEER generates more power for the estimates by leveraging information from other years in the data. For the Northeast, although each cross-section has only 9 observations, updated sample sizes range from
approximately 40 to 80. In the South, the 10 observations are updated to sample sizes ranging from about 30 to about 70. These graphs demonstrate how BEER allows re- searchers with a small number of cases over time to derive between effects despite the small sample size in each cross-section.