Multivariate Analysis - : DATA AND METHODOLOGY

CHAPTER 3 : DATA AND METHODOLOGY

3.4 M ETHODOLOGY

3.4.4 Multivariate Analysis

According to the conceptual framework for child nutritional status used in this study, there are many factors that have a role to play in a child’s nutritional status. The factors are inter-linked and are at different levels. Based on the results of the preliminary analysis, variables that turn out to be statistically significant in the bivariate analysis are included in the multivariate analysis. Multivariate analysis is therefore employed to identify the factors that are associated with stunting and underweight after controlling for factors that were significantly associated with stunting and underweight in the bivariate analysis. All explanatory variables are entered simultaneously in the model and variables that are not significant are left out of the model. The significance level is 0.05 for all the analysis except the analysis that uses pooled data sets where the significance level is 0.01. Due to the hierarchical nature of the data sets used and the fact that a good percentage of households have more than one child, the next step in the multivariate analysis involves undertaking logistic multilevel modelling to analyse factors that are associated with stunting and underweight. Multilevel modelling ensures that estimates are robust i.e. in cases where there is significant variation at a particular level i.e. household or community, the fixed effects estimates are representative of the group level and not at the population level. Multilevel modelling is also useful in exploring how the inclusion or exclusion of specific variables affect the stunting and underweight variance at different levels and therefore provide an indication of which variables vary significantly across a particular level which is vital for making appropriate policy recommendations.

Both the household level variance and community level variance are explored in the study of the determinants of child under-nutrition in Malawi, the study on the levels and patterns of child under-nutrition in Malawi and the study of household and community socio-economic factors. The analysis of household behavioural factors however uses age group specific multivariate analysis and therefore it is only the community level variance that is explored. In cases where

both the household and the community level variance are not significant, a design weighted model in Stata is used. Logistic multilevel modelling is performed either in Stata or MLwiN. MLwiN is a software that was specifically designed to analyse hierarchical data and is quick to run which is necessary when both the household level and community level variance need to be estimated. There were no significant differences in the variance estimates in the multilevel models obtained in Stata and those obtained in MLwiN.

MLwiN software however, has a limitation in that its’ weighting techniques are not well developed to provide robust estimates for data of complex design like the ones used in this thesis. Ignoring survey design may lead to biased estimates and underestimation of standard errors in the analysis of child nutritional status as reported by Madise et al. (2003) in their study of five African countries. However, as observed by Madise et al. (2003) and Pfeffermann et al. (1998) inclusion of survey design variables as explanatory variables reduces the bias in the parameter estimates. The multivariate analysis undertaken in this thesis includes survey design variables such as region and residence as explanatory variables to reduce bias in estimates. Initial multivariate analysis is done in Stata accounting for survey design. This is done to make a comparison with estimates in MLwiN. The findings show that whilst there is a difference in the standard errors between Stata design weighted estimates and those in MLwiN at level one and community level, the differences in the standard errors is not so high as to change the significance levels of variables. Similar variables are significant or not significant in all the three models. An example of this kind of comparison based on the Stunting model using the IHS 2 data is shown in appendix 2.

The modelling framework

Since the response variable is binary, we are interested in the probability of success between 0 and 1. The child level model on stunting and underweight may be represented as follows: ( ) = + + …+

( ) (

)

= Odds of success

We use the logit link: ( ) which is a function that models the probability that a child is stunted = 1 in the stunting model, and the probability that the child is underweight = 1 in the underweight model.

is the constant, to are coefficients for explanatory variables to for child i. The explanatory variables to may be factors directly related to the child like age and sex, factors related to the mother, father and household level as well as factors at the community level.

The model may be extended to include a level 2 to form the following random intercept model for child i within household j or community j;

(

)

Where +

is the random intercept and is the random effect at level 2. The random effect represents the variation of nutrition status for children from different communities. Similar to a standard logistic regression we can obtain average predicted probabilities for a child i within level 2 being stunted or being underweight using the formula:

̂

( ̂ ̂ ̂ )

Assuming a three level model where level 2 is the household level and level 3 is the community level, the three level model may be represented as follows for a child i within household j in community k

(

)

Where +

is the random intercept, is the random effect for community k and is the random effect for household j within community k.

Average predicted probabilities for a child i within household j in community k are obtained using the following formula:

̂

( ̂ ̂ ̂ )

The multilevel model may be extended to include a random coefficient at a level higher than the child level i.e. household or community where the variation of a group level factor is

significantly different across households or communities. For child i within household j or community j, this is represented as follows:

(

)

Where +

is the random intercept and is the random error at level 2 is the random coefficient and is the random error at level 2

Models at level 2 or level 3 have a group-specific interpretation i.e. the interpretation of the coefficients is either at household level or community level.

Model results are presented as odds ratios. Predicted probabilities are also presented to show the variation of different factors by child’s age.

CHAPTER 4 : DETERMINANTS, LEVELS AND

In document Modelling under-nutrition in under-five children in Malawi (Page 74-78)