3.4 S TATISTICAL METHODS
3.4.2 Principal Components Analysis
In chapter four we calculated an index of household wealth that discriminates households based on their assets, materials and facilities. In the previous chapter we described that Principal Component Analysis is a widely used technique to develop this measure.
Principal component analysis (PCA) is a method that reduces the dimensionality of the data by creating a new set of uncorrelated variables (principal components), through a linear combination of the original variables (Everitt, 1991). It is expected that the first principal component accounts for the largest variation of information; thus,
summarising and representing the original data. The first principal component is expected to be a weighted average of the original variables. According to Everitt and Dunn (1991), the first principal component, as a linear combination of the original variables, can be represented by:
z1=a11x1+ a12x2+...+ a1pxp
where x1i , ... ,xpi represent the selected household assets, materials and facilities. The mathematical derivation of eigenvalues, eigenvectors and proportion of the variance accounted for each principal component, as well as numerical examples can be consulted in Everitt and Dunn (1991). The software STATA calculates the principal components using the command pca.
3.5 Summary
In this chapter we described the data used in this thesis: three nationally representative surveys and two indices published as official statistics. Additionally, we presented the main statistical methods used in the thesis and how they were applied.
The NHS-2000 is used in this study to explore the relationship between SES and
diabetes prevalence. Its main advantage is the inclusion of capillary blood tests to allow the detection of adults with undiagnosed diabetes. Another advantage is that the survey includes information on a large sample and all 32 states are represented. One of its disadvantages is that the survey only includes one adult per household.
The MxFLS-2002 and MxFLS-2005 is used to analyze the incidence of diabetes and to explore employment status and changes in waist circumference. This survey is the first nationally representative longitudinal survey. However, the follow-up is very short and the survey was planned for only three waves; and the third was not available when our analyses started. Although not all the states are represented, the five regions of Mexico are represented as well as primary sampling units (PSU‟s) representative of three socioeconomic strata. Even though only self-reported diabetes is recorded and the main purpose of the survey is not to gather information on health, the survey includes
information on anthropometric measurements and biomedical indicators for all the members in the household. One significant advantage is that adults were followed by the survey even if they moved to another household or moved to reside in the US.
The ENIGH-2000 was used to construct and validate an index of household wealth.
This survey includes household assets, materials and facilities as well as income and expenditure information that enable the validation of wealth indices
The Deprivation Index and the Human Development Index are used as measures of municipality SES. The Deprivation Index differentiates municipalities according to the lack of basic needs. The Human Development Index (HDI) at the municipality level is based on indicators of health, education and income.
4 CALCULATION AND VALIDATION OF AN INDEX OF HOUSEHOLD WEALTH IN THE ENIGH-2000
4.1 Introduction
The analysis of diabetes in the NHS-2000 requires a measure of SES at the household level. Income and consumption expenditures per household can be used as SES
measures. However, in the NHS-2000, information on expenditure was not included in the survey and income was absent for 8% of the households. Furthermore, income was not measured thoroughly since it was assessed only by two questions: one that inquired about the main income; and another that inquired about the additional incomes (such as transfers). In chapter 2 we mentioned that income presents other problems: underreport;
seasonal variability; measurement bias; and it is measured with difficulty in the self-employed and rural areas.
Three main ideas can be recovered from section 2.5. Firstly, that to deal with this type of problems in health surveys, indices of household wealth based on household assets, materials and facilities are commonly calculated. Secondly, that PCA is a popular technique to construct indices of household wealth, however there is not a general consensus in how to select which indicators to include in the index. And thirdly, that consumption expenditure is seen as one of the preferred measures of living standards and consequently, it is expected that these measures have a close association.
Therefore, to construct a measure of SES in the NHS-2000 we propose to build an index of household wealth based on household assets, materials and facilities using PCA. In addition, we use an auxiliary survey, the ENIGH-2000, to select the indicators
associated with expenditure, categorize the index and validate it.
The aim of this chapter is to construct and validate an index as a proxy to long-run household wealth in the ENIGH-2000, based on the household assets, materials and facilities included in both the ENIGH-2000 and the NHS-2000.
Section 4.2 presents the data and indicators, the calculation of income and expenditure, and the description of the statistical methods used. Section 4.3 reports the descriptive statistics, the linear regression model for the rank of expenditure, the calculation of the index, its categorization, and the percentiles of income and expenditure by category.
Finally, section 4.4 reports the conclusions.