CZ-Level Data - Data and Measure of Intergenerational Mobility

1.2 Data and Measure of Intergenerational Mobility

1.2.2 CZ-Level Data

From the 1910-1940 linked sample, estimates of intergenerational mobility can then be constructed at the CZ level.14,15 _{I focus on a measure of absolute upward mobility}

rather than relative mobility. The former may be of greater normative interest as the latter could be driven by worse outcomes for the rich (Chetty et al. 2014). To be precise, I define upward mobility for each CZ as the average occupation income rank of sons who grew up in that particular CZ and who have fathers from the bottom

11_{Bailey et al. (2018) show that standardizing names with the NYSIIS algorithm increases false}

matches by 22 percent on average as it removes variation that could have been useful in differentiating between close matches.

12_{I discuss the results based on a more conservative version of Abramitzky et al.’s (2012, 2014,}

2017) linking method in a subsequent footnote.

13_{My match rate is comparable to other work using similar iterative procedures to generate his-}

torical linked samples. Abramitzky et al. (2012), for example, match 29 percent of Norwegian-born men from the 1865 Norwegian census to the 1900 Norwegian and US censuses. Abramitzky et al. (2017), also focusing on Norwegian men, obtain success rates of 23.4 and 10.7 percent going from the 1910 US census to the 1900 and 1865 Norwegian censuses respectively. Velasco (2018) links 29 percent of males residing in rural parts of the US in 1920 to the 1940 complete counts.

14_{CZs are clusters of counties characterized by strong within-CZ commuting ties and weak}

between-CZ commuting ties, based on commuting data from the 1990 census. They were con- ceptualized by Tolbert and Sizer (1996) and popularized in the labor economics literature by Autor and Dorn (2013) and Dorn (2009). Using 1990-based CZs does not imply that these were the areas within which people commuted to work during the early 20th century. They simply allow for a direct comparison to be made with Chetty et al.’s (2018) data. Reassuringly, similar spatial patterns of upward mobility will be observed if the microdata are aggregated to the county level instead.

15_{Appendix A.2 details the procedures for aggregating the microdata to the CZ level. To improve}

the accuracy of my CZ estimates, I drop CZs that contain fewer than 250 effective linked sons with fathers from the bottom half of the national occupation income distribution. Chetty et al. (2014) set a threshold of 250 linked children when studying the 1980-1982 cohorts across all households. Chetty et al. (2018) use a different threshold that only affects the sample of CZs for blacks.

half of the national occupation income distribution: M obilityc = 1 Nc Nc X i=1

Rank_i,j,c|RankSon F ather_{≤M edian} (1)

where i, j, and c refer to the individual, his birth year cohort, and the CZ where he was raised respectively.16 _N

c is the number of sons who grew up in CZ c with

below-median income fathers, RankSon

i,j,c is the national occupation income rank of

sons when they are adults in 1940 relative to other native-born white sons from the same cohort, and RankF ather _{is the national occupation income rank of fathers in 1910}

relative to other fathers with sons of the same age.17 _{Sons and fathers are ranked}

by their occupation income scores because total income is not recorded prior to the 1950 census.18 _{The baseline analysis uses the occupation scores that are provided}

by IPUMS. These are computed as the median income of all persons in a given occupation based on the 1950 census.19 Individuals with missing occupations are dropped, but those with non-occupational responses (such as being at school, retired,

16_{The CZ of residence in 1910 is taken to be where the child was raised. This differs from Chetty}

et al. (2018), who assign children to CZs in proportion to the number of years they spent in each CZ before the age of 23. Their approach is not feasible here due to the limitations of the historical data. In a later footnote, I discuss some indirect evidence that most children are likely to have spent the majority of their childhoods in one particular CZ.

17_{I use the terms “below-median income”, “poor”, and “with fathers from the bottom half of}

the occupation income distribution” interchangeably in this chapter. Similarly, “above-median income”, “rich”, and “with fathers from the top half of the occupation income distribution” are used interchangeably to refer to those from the top half of households.

18_{The 1940 census records wages but not income from businesses or other sources. Non-wage}

income is particularly important for farmers and self-employed persons.

19_{More precisely, IPUMS computes occupation scores by taking a weighted average of the median}

or unemployed) are kept.20,21

My historical measure of upward mobility is comparable with that in Chetty et al. (2018). Chetty et al. (2018) define upward mobility as the expected income rank of sons with parents at the 25th percentile of the national income distribution:22

M obilityc= E[Ranki,j,cSon|Rank P arents

= p25] (2)

This is computed for each CZ by regressing the national income ranks of children on their parents’ ranks, and then using the estimated constant and slope to predict the rank of sons whose parents are at the 25th percentile. Chetty et al.’s (2018)

20_{Individuals who are not in the universe of the occupation question are also recorded as having}

missing occupations. The universe of the occupation question varies by census. Occupations are asked of all persons in 1910, but in 1940 they are only reported for those aged 14 and older, who are in the labor force, who are not institutional inmates, and who are not new workers (IPUMS- USA 2017). One might thus wonder if such inconsistencies could have distorted my estimates of upward mobility. To check this, instead of excluding all individuals with missing occupations, I drop only those that indicate being employed but who do not have a valid occupation. Such persons may be regarded as workers with true missing occupations, while all other individuals with missing occupations will be assigned an occupation score of zero. The latter may include those who are not in the labor force. Re-computing upward mobility for CZs with this adjustment and comparing them to the original measures yields a high correlation of 0.971, based on a sample of 630 common CZs.

21_{IPUMS assigns persons with non-occupational responses an occupation score of zero. Including}

these individuals in the sample is partially consistent with Chetty et al. (2018), who keep children with zero income during adulthood, although they do require the 5-year mean income of children’s parents to be strictly positive. A valid concern is whether including those with non-occupational responses distorts the accuracy of my mobility estimates, since such responses may reflect transitory shocks rather than the true permanent income of individuals. To show that this is unlikely to pose a problem, I drop fathers and sons with non-occupational responses in addition to those with missing occupations, re-rank the remaining populations, and reconstruct the historical estimates of upward mobility. The resulting measures are almost perfectly correlated with the original estimates.

22_{Using the income of fathers when constructing the historical measures of upward mobility instead}

of the combined income of both parents (when both are present), as in Chetty et al. (2018), is unlikely to introduce meaningful differences between the two sets of data. This is because among white mothers who are the spouse of a household head and who have at least one child in the household, only 4.38 percent are in the formal labor force and 4.19 percent are employed in 1910 (author’s calculation based on the 1910 1 percent IPUMS sample with sample weights). Furthermore, for the 1980-1982 cohorts, Chetty et al. (2014) document that using just the earnings of the parent with a higher mean income produces estimates of upward mobility that are perfectly correlated with measures based on the income of both parents (Online Appendix Table VII in their paper). Chetty et al. (2014) note that “most of the variation in parent income across households is not due to differences in marital status.”

measure can be interpreted as definition (1) because the relationship between the income ranks of sons and parents is likely to be linear in each CZ.23 _{The historical}

and contemporary estimates of upward mobility can thus be directly compared.24 How similar or different are the historical and contemporary data along other dimensions? Table 1.1 lists several features of the 1910-1940 and Chetty et al. (2018) samples. In terms of demographic characteristics, the underlying population of the two datasets are reasonably comparable.25 _{Where they differ more significantly are in}

23_{Chetty et al. (2014) document a linear relationship between children and parent ranks at the}

CZ level, and Chetty et al. (2018) further verify this by race. Neither paper explicitly mentions linearity at the CZ level by gender. Nonetheless, the reasonably high correlation of 0.680 between the mobility estimates for white males and females in Chetty et al. (2018) suggests that linearity at the CZ level may also extend to both genders separately. In more recent work, CFHJP suggest that there may in fact be some non-linearities in the relationship between the income ranks of children and parents. However, the close-to-perfect correlation between the estimates in CFHJP and Chetty et al. (2018) mentioned in an earlier footnote indicates that a linear approximation is still reasonably accurate.

24_{One might wonder why I do not directly use Chetty et al.’s (2018) method when estimating}

upward mobility in the historical data. Practically, using Chetty et al.’s (2018) technique produces mobility estimates that are almost perfectly correlated with my baseline measures (a correlation of 0.994 based on 625 common CZs). Conceptually, however, Chetty et al.’s (2018) approach is less appealing with the historical data because the limited variation in occupation scores and the significant number of farmers imply that percentile ranks are substantially coarser for below-median income households. The notion of a linear relationship is thus less meaningful. Figure A.1 in Appendix A.3 illustrates this for the 20 largest CZs based on the 1910-1940 linked data.

25_{Three differences in demographic characteristics are worth discussing briefly. First, the age at}

which income is measured in the 1910-1940 linked sample includes an older range relative to Chetty et al. (2018). Chetty et al. (2014) show that the relationship between child and parent ranks is relatively stable when the child is in his or her 30s and early 40s, although their data do not allow them to extend the range further. Using a long series of administrative income data from Sweden, Nybom and Stuhler (2017) plot the rank-rank correlations between sons and fathers, based on the annual income of sons at different ages and fathers’ income at age 45. They find that the correlations are reasonably stable when the son’s income is measured during his 30s and 40s, though there is a gradual decline when the son reaches his late 40s. My chapter also follows Chetty et al. (2018) in ranking each person against others from the same cohort, potentially reducing the importance of lifecycle bias at older ages when constructing estimates of upward mobility for each CZ. Taken together, the evidence suggests that the difference in age range between the two data is unlikely to compromise their comparability. Second, Chetty et al. (2018) require the parent to be 15-50 years old at the time of the child’s birth. They do this to eliminate instances of siblings or grandparents filing tax claims for the child, which would have resulted in them being incorrectly identified as the child’s parents. This is not necessary with the historical censuses as each person’s relationship to the household head is recorded. Third, I do not follow Chetty et al. (2018) in requiring parents to have 5-year average incomes that are positive. The economic status of fathers in my sample is only observed for a single census year and it is unclear if those with occupation scores of zero also had zero earnings in the adjacent years. In any case, as mentioned in an earlier footnote, excluding sons

the quality of matching and the measure of income used to compute upward mobility. Since unique identifiers for individuals are not available in the historical censuses, it is not possible to achieve a match rate that is as high as Chetty et al.’s (2018). With regard to the measure of income, however, it is less clear if the use of occupation scores is inferior to actual earnings. While occupation scores are imperfect measures of earnings in a given year, they may be better proxies for permanent income, which is the income concept of interest when studying intergenerational mobility. I return to the issues of match quality and income measurement in the robustness sections below – neither is likely to compromise the baseline results in the next section.

In document Three lessons for labor economics from history (Page 33-37)