Chapter 2: Challenges of Studying Socioeconomic and Neighborhood Effects on
2.2 Overcoming the Obstacles of Validity, Ecological Fallacy, and Other
can determine whether differences across areas are due to differences in the areas themselves or differences between the types of people living in different areas. Multilevel studies can also evaluate the role of confounders and modifiers of effect. Multilevel studies can differentiate the effects of context (neighborhood characteristics) and composition (individual characteristics). Multilevel analysis simultaneously includes group and individual level variables in regression analysis, thus allowing for controlling of potential confounders and also to allow for analysis of within-and between-
neighborhood variability in outcomes and to what extent the individual or group factors play in the variability. This is also why multilevel studies can be more challenging to conduct with regard to fallacies and confounding.
The most commonly discussed fallacy is the ecological fallacy. Fallacies result from drawing inferences at one level of aggregation based upon data at another level. Ecological fallacy is drawing inferences at the individual level because of group level data; whereas, symmetrical fallacy (also called individualistic or atomistic fallacy) results from drawing inferences at the group level based upon individual-level data. The two primary sources of ecological fallacy are the absences of information on individual-level confounders or effect modifiers, which could vary from group to group and the presence of contextual effects of derived variables which essentially is placing a larger effect from the aggregate measure than the individual measure (Diez-Roux, A.V. 1998). Both of these fallacies can be avoided by using multilevel analysis because both individual and
group level data are included in the evaluation and also by ensuring that appropriate inferences are made.
Two other fallacies that should be considered while doing multilevel analysis are the psychologistic fallacy and the sociologistic fallacy. Psychologistic fallacy is ignoring or leaving out relevant group-level variables in a study of individual level associations and sociologistic fallacy is leaving out relevant individual-level factors when studying groups. Both of these types of fallacies can be thought of as sources of confounding by leaving out relevant variables in the statistical testing model. This is a concern in multilevel modeling because the group level effect may actually be the result of a related/predictive individual-level variable being omitted; however, omission of
relevant/causal variables in a predictive model is a problem in all epidemiological studies regardless of whether the study is individual-level, group-level, or a combination of both. In multilevel analysis, if certain individual-level variables are omitted from the model, then the result is a confounded estimate of the group-level effect, but if there is an overcontrolling of potential confounding variables in the model, then the group-level effect can disappear and wrongfully lead to an assumption in the other direction.
Controlling for potential confounders is also a problem in individual-level studies as too many, too few, or the wrong ones may be placed into the regression model. The new dimension for confounding in multilevel analysis is confounding at the group-level by omitting variables, and also by overcontrolling. Multilevel analysis requires that potential confounders/variables on both the individual- and group-level be appropriately placed in the model (Diez-Roux, A.V. 2001; Diez-Roux, A.V. 1998). Variables can be stratified on levels, such as social position, to respond to the issue of confounding (Rauh,
V., Andrews, H., and Garfinkel, R. 2001; Pearl, M., Braveman, P., and Abrams, B. 2001).
Multicollinearity is also a problem that can arise. Certain predictors are so interrelated and correlated that it is very difficult to separate the effects of these variables statistically. Few studies using multilevel analysis discuss the issue of multicollinearity because, depending upon the specific research question, it is not meaningful enough to deal with the challenges of separating their individual effects (Diex-Roux, A.V. 1998).
It is important to keep in mind that the definition of the term “neighborhood” can and will vary between studies. Some studies try to incorporate the accepted notion of a neighborhood, while others use school districts, fire districts, census block groups, or census tracts as their definition of a neighborhood. There is not an accepted standard as to what a neighborhood should be as it will vary depending on the purpose of the study. If the study is on crime, then accepted police districts could be an acceptable
“neighborhood” area; whereas, if the study is on the best location to place a free-care hospital or clinic, then police districts would not be an appropriate choice for
neighborhood boundaries. What is important to consider in determining the area used to represent neighborhood is that the people in the area are alike in terms of what is being studied. Many studies are on the social impact of residential segregation and poverty, so census block group or census tract is a common use of the term neighborhood for these studies on social/socioeconomic factors. An added benefit to using census areas is that census block groups and census tracts include homogeneous populations and sometimes are changed in a new census to ensure that they contain groups of like people. The census tract is also a unit used by federal, state, and local agencies to determine eligibility
into programs for real life purposes such as medically underserved populations and qualified census tracts for low-income housing tax credits (Krieger, N. et al. 2005). Census tracts typically contain around 4000 people while census block groups typically contain around 1000 people. Ideally, the neighborhood areas should contain enough people and enough actual neighborhoods to allow determination of within- and between- area variability in the outcomes associated with them.
Another challenge is that neighborhoods are always in a state of flux. People in the neighborhood can go from a state of poverty to one of not and the census collects what they were at that moment. Mobility is also a concern, but most people who move to another neighborhood most likely move to another neighborhood of similar
socioeconomic status. So while the census data collected may only capture a moment in time, it is a generally accurate representation of the neighborhood at that moment
(Krieger, N. 1992; Diex-Roux, A.V., 2001).
The most challenging aspect of multilevel analysis is that, due to the integration of macro- and micro-level variables, a theory of causation must contain and explain the interactions between the levels; ie, how do individuals interact with the neighborhood environment. Most likely, the neighborhood and individual characteristics mutually affect each other; for example, nutrition is poor in individuals because there is a poor availability of quality foods in the grocery stores around the neighborhood (Diex-Roux, A.V., 2001). An excellent example of how individual choice interacts with the social environment/norms is with fertility. It is an individual’s choice of when they have a child, but if it is socially acceptable in the social environment/neighborhood to wait until the late 30’s, the “social norm” may persuade them to wait; and, if enough women follow
the social norm, social change will not occur. These methodological issues pertaining to multilevel analysis are still not completely worked out, but the methodology available as it is now has been used and tested in many studies and proven to be effective and useful.