METHODOLOGICAL APPROACH
2. Any chronic condition or cancer: dichotomous variable indicating the presence or absence of any health attention received in the past year from a chronic condition or
cancer.
147
included in this research are presented in Table 5.1. The indicators used to establish the differences in the SDH and health status between the study groups (the immigrants versus the Chilean-born and the immigrants versus the missing values) are presented in Appendix-5.2.
Table 5.1 Brief description of each independent variable: the Social Determinants of Health (SDH) as measured in the CASEN survey 2006
Set of SDH Variable Type of variable
Migration status
International immigrant Dichotomous variable [yes/no]
Preferred not to report migration status
(missing values of this question) Dichotomous variable [yes/no]
International immigrant offspring Dichotomous variable [yes/no]
Demographics
Age Continuous variable
Age group Categorical: <16, 16-65, >65 years old
Sex Dichotomous variable [male/female]
Marital status Categorical: single, married, divorced, widow
Belong to any minority ethnic group Dichotomous variable [yes/no]
Type of minority ethnic group Categorical: Aymara, Atacamenho, Mapuche and other
Zone* Dichotomous variable [urban/rural]
Number of household members Count variable
Socioeconomic determinants
Individual income per month Continuous (in Chilean pesos and USD)
Household income per month per capita Continuous (in Chilean pesos and USD)
Household income quintile per month
per capita* Categorical: five quintiles of household income distribution
Current employment status Dichotomous variable [employed/unemployed]
Contractual status Dichotomous variable [has contract/does not have contract]
Type of contract Dichotomous variable [permanent/
temporary]
Job status Dichotomous variable [full time/ part
time]
Type of occupation Categorical: executive, private sector, public sector, self-employed, domestic service
Economically inactive among those
16-65 years old* Dichotomous variable [yes/no]
Reasons for being inactive Categorical: student, housewife, retired, ill
Unemployed among those 16-65 years
old* Dichotomous variable [yes/no]
Reasons for being unemployed Categorical: found a job and starts soon, can’t find a job, don’t want to work, has intermittent informal job, other not stated
Educational level Categorical: none, primary school, high school, technical level, university level
Household assets Each of them dichotomous:
148
Home access to Internet [yes/no]
Mobile [yes/no]
Quality of the Housing Index (index combining quality of ceiling, floors and walls)*
Townsend definitionΨ Dichotomous [no overcrowding/
overcrowding]
Sanitary index (index combining access to clean water and public sewage system)*
Dichotomous [acceptable/deficient]
Access to health care
Health care provision Categorical (multinomial): none, public 100% free of charge, public with co-payment, private, other not stated Use of cervical screening programme Dichotomous variable [yes/no]
Number of preventive health care
attentions received in the past 3 months Count variable Type of preventive health care attention
received in the past 3 months
Categorical (multinomial): well baby care, antenatal care, chronic disease, gynaecological, preventive adult/
elderly, other, don’t remember Any mental health care attention
received in the past 3 months Dichotomous variable [yes/no]
Number of mental health care attentions received in the past 3 months
Count variable Any dental health care attention
received in the past 3 months Dichotomous variable [yes/no]
Number of dental health care attentions received in the past 3 months
Count variable Any specialty health care attention
received in the past 3 months Dichotomous variable [yes/no]
Number of specialty health care attentions received in the past 3 months
Count variable
*These variables were coded and analysed as recommended by the principal investigators of the CASEN survey 2006, Ministry of Planning in Chile (MIDEPLAN 2006). These are fully described in Appendix 5.1 in the Appendix book
ΨAs defined by the Townsed score, see Appendix 5.2 in the Appendix book
149 5.3.7 Data analysis
This section provides an overview of the methods used in this thesis. It describes the rationale used to analyse the dataset and post estimation tests included to assess the quality of the estimates obtained. However, the subsequent chapters provide a description of specific methods used in each of them. Chapter 6 includes spatial analysis, Chapter 7 includes cluster analysis and principal components analysis (PCA), Chapter 8 includes multinomial
regression, Chapters 9 includes Poisson regression and Chapter 10 includes exploratory factor analysis (EFA). Chapter 11 uses most of these approaches to analyse the living conditions and health of those preferring not to report their migration status. Because of the wide range of methods used in this thesis, it was decided to provide a description outlining the research methodology in this chapter and describe other specific statistical techniques when appropriate in the second part of the thesis.
5.3.7.a) Exploratory review of the database and use of weighted sample
An exploratory review of the database was carried out through summary statistics of each variable included in the study. Data imputation was not considered for this study, as almost all variables had a very low rate of missing values (below 0.005%). Notably, the single variable that reported a higher rate of missing values was the international migration status, which had a 0.67% of missing values.
The CASEN dataset was obtained from a secure web page from the responsible institution directly into version 10.0 STATA program in December 2008, to ensure data quality. In addition, almost all the analysis was conducted with a weighted sample in order to attain population-based estimates, as the survey had a complex multistage sampling strategy. The dataset was, therefore, analysed by using the survey command for the statistical programme STATA 10.0, which allowed a population size of 16 130 743 individuals, a close
representation of the national population of the country according to the latest Census (INE, 2008). Demographic characteristics of the total population in Chile according to the CASEN survey 2006 were also a close representation of the population in the country according to the latest census (INE, 2008).
150
Descriptive analysis was conducted for each variable under study, by the estimation of means and standard deviation if the variables were continuous. Categorical variables were described by estimating proportions with corresponding 95% confidence intervals. In addition, stratified analysis for each dependent variable was conducted, by each set of Social Determinants of Health.
5.3.7.c) Analysis of association between dependent and independent variables
The crude association between the independent variables and the dependent variables was studied, using the chi-square test (categorical variables), Pearson correlation (continuous variables), and t-test (one binary and one continuous variable) when required. If an association was found, its crude direction and magnitude was estimated through simple logistic regression (binary variables), simple linear regression (continuous variables), poisson regression (count variables), or multinomial regression (multinomial variables) with a 95% confidence level. This provided a crude coefficient, Odds Ratio (OR), Incidence Rate Ratio (IRR) or Relative Risk Ratio (RRR) and its corresponding 95% confidence interval.
Multivariable regressions (logistic, multiple, poisson, or multinomial according to the type of dependant variable) were also conducted to analyse the relationship between the dependant variables (mostly health outcomes, but also access to health care in Chapter 8) and different sets of covariates (different sets of SDH). This allowed observation of the covariate
(regarding migration status and the SDH) which was most strongly associated with the dependant variable (health outcomes) in presence of other covariates. This analysis was conducted to compare between the immigrant and the Chilean-born populations and also within two main subgroups: by age groups (under 16, 16-65, over 65 years old) and by sex (male versus female).
The modelling strategy most frequently used in this study was the following:
1) Multiple regression models were used to estimate the relationship between a health outcome and a single set of SDH (i.e. demographic, socioeconomic, material, access to health care, and migration-related).
2) Multiple regression models to estimate the relationship between a health outcome and a combination of sets of SDH. For this, a new set of SDH was added progressively to the model, in order to assess changes in magnitude and significance of the associations at every stage.
151
a health outcome and all sets of SDH. Because of multi-collinearity observed in the models, most full adjusted models firstly included all covariates with a statistically significant association to the dependent variable. P-values, adjusted R-squares and goodness of fits of these full models were assessed at this stage (details in the following paragraph).
4) Finally, all covariates from demographic, socioeconomic and migration sets of SDH were then added to the full model in order to assess if the addition of one or more of these key SDH would improve the fit of the models. The final regression models were then obtained and are presented in each results chapter.
5) Results presented in Chapters 6 to 11 describe findings from the final adjusted models, after conducting all the analyses described in points 1 to 4.
Additionally, when theoretically relevant, the correlation between different health outcomes and different covariates was tested and assessed as to whether linear combinations of these covariates would estimate more parsimonious models, through exploratory factor analysis (EFA, consider the same as factor analysis, FA, in this thesis) and principal component analysis (PCA). These methods are detailed in the following results chapters. With regard to post-estimation tests of the models, p-values, the amount of variance explained by the Adjusted R-squared value (R2, multiple regressions) or Adjusted McFadden pseudo R-squared value (R2, logistic, Poisson and multinomial regression) and the goodness of fit (GOF) of the models were analysed. Specific GOF tests, the Akaike Information Criterion (AIC, Bozdogan, 1987 and 2000) or simply the F value of the model were also estimated.
The most parsimonious model to explain the different health outcomes was then presented (i.e. the one with the highest R-squared or F value, lowest AIC). Joint (Wald) tests were carried out to test the significance of categorical variables with more than 2 categories (significance of the trends). Because of the availability of a large number of potential explanatory variables, testing of variables and interaction terms was primarily guided by theories on the SDH and their effects on the health of the immigrant population. Rather than testing all possible interactions, these were restricted to terms of scientific interest and according to previous literature to support their exploration, which is presented at the beginning of each chapter on results (Chapters 6 to 11).
152
Social and health inequalities were conceptualised as the difference in social characteristics and health outcomes between two compared groups, those of the international immigrant population and the Chilean-born population. Health inequalities between these two main comparison groups are compared particularly in terms of social position (household income, educational level, and type of occupation) and material living standards (overcrowding, quality of the housing and sanitary conditions). Health inequalities were quantified as the differences between the extreme (the highest compared to the lowest) household income quintiles, educational levels or types of occupation, within both the immigrant and the Chilean-born population; and between two equal categories across populations.
5.3.7.e) Confounding analysis
Each exploration of the association between health and the different SDH included the potential confounding effect of theoretically relevant variables by a regression model (logistic, multiple linear, Poisson, etc). Throughout this analysis, the confounding-adjusted coefficient or OR with its 95% confidence interval were obtained when pertinent. The epidemiological definition of a confounder is a distortion in the estimated exposure effect that results from differences in risk between the exposed and unexposed that are not due to exposure (Rothman & Greenland, 1998; Greenland, Robins and Pearl, 1999). Confounding then occurs when an observed association is in fact due to a mixing of the exposure, the disease and a third factor (Hennekens, Buring and Maurent, 1987). The criteria I used for identifying a confounding variable was the following: the factor should be associated with the exposure under study, should be a cause or a risk factor for the outcome, should be associated with the outcome even in the presence of the exposure, and should not be an intermediate variable in the path between the exposure and the event (Glymour et al., 2005;
Hernan et al., 2002, Moyses, 2000). Any change of direction and of magnitude over 10% in the crude estimated OR was considered a confounding effect (Hernan et al, 2008).
5.3.7.f) Interaction analysis
The possible interaction effects existing between the independent variables on the dependent variables were studied through regression models that included multiplicative interaction effects. With these analyses the estimated adjusted coefficient or OR for each dependent variable was obtained, with the corresponding 95% confidence interval. As stated by Stronegger, Berghold and Seeber (1998), there are different ways to define interaction between factors in epidemiological studies. In their standard form, methods of event data
153
empirical investigations as well as causal models of disease aetiology, e.g. the simple independent action model of Finney (Finney, 1971) or the sufficient-component-causes model of Rothman (Rothman, 1976; Hogan et al., 1978; Walker, 1981), suggest additive or other kinds of non-multiplicative concepts of interaction. Beyond this discussion, the epidemiological study design as well as the underlying causal model are determinants of the interaction structure of the data and should be considered in the model selection process.
Using generalized linear models with different parametrical link functions depending on the type of dependent variable under study, the existence of multiplicative interaction structures in the CASEN 2006 data were explored. The underlying theoretical model used throughout analysis was the latest model on the SDH presented in Chapter 4.
5.3.7.g) Spatial analysis
To provide further description of the findings, spatial analysis was used to describe the living conditions and health status of the international immigrant population and the Chilean-born.
Spatial descriptive analyses were performed using the MapWindow programme (see Chapter 6).
154 5.4.1 Access to the survey
The CASEN survey dataset was downloaded after a request approved from their local web pages in December 2008 (http://www.mideplan.cl/final/categoria.php?secid=25&catid=124).
The dataset provided to the chief investigator was anonymous. It only included an individual and household identification that had no correlation with the Chilean personal identification card. Therefore, the anonymous status of participants has been respected for this secondary data analysis. No data have personal identifiers, making it is highly unlikely that these data could be used to trace the identity of participants.
5.4.2 Potential risks
During the analysis, the CASEN survey dataset was only accessible by a password securely kept in the ARRC server (ARRC data analysis cluster) at the University of York, in my personal account. This avoided the potential risk of other persons accessing and misusing the data. Data analysis took place on a single computer provided by the University of York for personal use from August 2009 to June 2011. Additionally, individual data will never be published. Only aggregated results from these datasets are presented in this thesis.
5.4.3 Vulnerable groups
The CASEN survey covers children under 18, those with learning disability, people with mental illness, people with dementia, and adults who were unable to provide consent, who lived at home at the time of the data collection. These groups were not directly interviewed.
Their demographic and health information was collected by the head of the household or their proxy and these participants were anonymous. In addition, respondents could refuse to answer any question at any stage of the interview.
5.4.4 Potential benefits
From a general viewpoint, there is an equity concern about how to maximise benefits of migration to every country and community around the world (distributive justice). As shown by the understanding of migration through Globalisation, poor countries should receive even greater returns, because of the loss of their active labour, young, highly educated population.
The relationship between migration and social disparities needs further understanding, as social class might determine the probability of movement and its future consequences.
155
require greater multilateral attention. Moreover, several heath inequalities might be growing among these populations because of migration.
This research contributes to further understanding of migration in Chile and Latin America and its repercussions for health outcomes. Potential benefits are related to the future development of policy strategies to improve Social Determinants of Health among both the migrant population and, secondarily, the Chilean population in the country. It should stimulate future development of research in this field in Chile. It could motivate other Latin American countries to start research on health inequalities and immigrants’ health. It will also provide relevant evidence for political international relationships among the Andean Community and other Latin American committees. Findings from this study will be disseminated to local authorities in Chile through the following institutions: the Chilean Initiative for Health Equity, the International Organization for Migration in Chile and the Chilean Ministry of Health.
5.4.5 Ethics Committee Approval
This research obtained approval from the Research Governance Committee (HSRGC) at the University of York in July 2009. The aim of this committee is to ensure that research in the Department of Health Sciences has met stringent standards of governance.
5.5 SURVEY LIMITATIONS
Some limitations of the CASEN dataset were noted, prior to analysis:
1. The CASEN survey’s coverage was national, except for some remote and inaccessible locations, and institutionalised people (hospitalised, imprisoned, elderly living in
institutions), and these populations are therefore not included in this study’s analyses. The Chilean population interviewed in the survey might not entirely represent those who refused participation, or those not invited to participate. Consequently, some of those excluded, such as people in hospitals, prison or the elderly living in institutions, may have poor health and are not adequately represented in this dataset.
2. The use of weighted measures in all the analyses assumes independence between individuals and that those individuals included in the study are representative of the section where they live.
156
3. Overall, 15.20% of the Chilean population refused participation and no data were collected from them in order to determine significant differences between those who accepted participation, or reasons for not wanting to participate in the survey.
4. The cross-sectional design of the survey did not allow analysis of causal pathways between exposure (i.e. Social Determinants of Health) and health outcomes. However, a theoretical framework of the understanding of the relationship between structural and intermediate determinants and health (from the latest model by the Commission on the SDH by the WHO, 2008) was used to support the hypothesis and data analysis.
5. Of special interest was potential bias concerning international immigration status, as there were 0.67% of missing values for this variable in the dataset. It has also been reported in the past that undocumented immigrants might refuse participation more frequently than the local population. They would probably represent those living in poorer conditions and with a higher chance of presenting with a health problem. Therefore, the health status and Social Determinants of Health of undocumented immigrants might be underrepresented in this study. This aspect is further discussed in Chapter 11.
157