Statistical Methods

Fig. 2.6 The three delays model of maternal mortality

Part 5 - Data and Methods

5.3 Statistical Methods

The statistical models used in this study include a multilevel linear model and multilevel logistic models. Most of the data analyzed in this study is hierarchical in nature; hence the multilevel models are used to take into account the data structure. Also, in the case of the determinants of the late initiation of antenatal care for young women, the description of the phenomenon was also developed by a multiple correspondence analysis. Moreover, one of the objectives of this study is to establish the contextual determinants of usage of healthcare facilities during pregnancy, delivery and the postnatal period. Therefore, in this study, the context is also considered as the geographic background where a young girl (unit of analysis) has grown, as, for example, the availability of certain kinds of healthcare services. In order to take into account the geographic aspect, some procedures available in ArcGis software (ESRI) were employed in order to create contextual variables using, as data sources, the 2010 KSPA and the Country Lists of Health Facilities provided by the Ministry of Health. These models and the procedures are described in the following section.

15For more information concerning the sample design see par. 1.8 of Kenya National Bureau of Statistics (KNBS) and ICF Macro, 2010;

Kenya Demographic and Health Survey 2008-09. Calverton, Maryland: KNBS and ICF Macro.

5.3.1 Multilevel Models

To test the research project hypotheses, statistical modeling is used. Using multilevel models (given the hierarchic structure of the data), the influence of individual, household and community variables on young women’s reproductive behavior is estimated. Many kinds of data, such as the 2008/9 Demographic and Health Survey, have a hierarchical or clustered structure. For instance, children born to the same mother tend to be more alike in both physical and genetic characteristics. Similarly, women living in the same locality are likely to exhibit relatively similar behavior, since they share the same values and are likely to experience similar conditions relating to the availability and accessibility of healthcare services within the community. To ignore this relationship among the various units grouped at different levels risks overlooking the importance of group effects, and may render many of the traditional statistical analysis techniques used for studying data relationships invalid.

Therefore, the population structure, as in the case of this study, is composed by women grouped in communities, and is seen as an effective instrument for explanation of the phenomenon studied in itself (Goldstein, 1999).

5.3.2 Multilevel linear models

A simple single-level regression model is expressed as:

Where: is the response variable; subscript

refers to the unit; is the intercept; is the slope of the regression line; is the explanatory variable; and is the residual for the

unit.

If the level 1 units are nested within level 2 units, as in the case of the first model employed in this study analyzing the risk of a young women of a late initiation of antenatal care where level 1 units are the 15-25 years old women nested in communities which are the second level, we can describe simultaneously the relationships for several level e units, , as:

Whenever an item has an subscript it varies between level 1 units within a level 2 unit, and where an item has a subscript only, it varies across level 2 units but has the same value for all level 1 units within a level 2 unit.

This is called random intercept model, where only the intercepts and not the slopes are allowed to vary randomly at level 2 and it is specified as follows:

= + !

than

_"_($₎

Where:

var (!

) = &

_$^'

, var (

) = &

In the case of random intercept model, the total variance is divided in two units, corresponding to the levels employed in the model. Doing so, the correlation existing between the level 1 unit clustered in the same level 2 unit is considered and, therefore, is corrected. In this way it is possible to establish what part of variability between women is given by the community and what part is given by the women themselves. This can be measured by the Intra class correlation Coefficient (ICC) which is an indication of the correlation of the observation of women belonging to the same community or, in other words, it is an indication of the dependency of the women within the communities. The ICC is defined as the variance between communities divided by the total variance, where the total variance is the summation of the variance between communities and the variance within communities.

( = &$'

&_$^' + &^'

Where:

( measures the degree of homogeneity within level 2 units;

&_$^' is the total variance at level 2;

&' is the total variance at level 1.

Hence, the smaller the variance within communities, the greater the ICC because it means that women belonging to the same community have very similar characteristics so their correlation is high. Therefore, when the within group variance is minimized the between group variance is maximized and the ICC is greatest. (Goldstein, 1999).

“In a multilevel analysis a correction is made for communities which means that the information provided by a women belonging to the same community does not five 100 per cent new information but less. The magnitude of the new information provided by each individual women depends on the magnitude of the intra class correlation coefficient (ICC).

The higher the ICC the less new information provided by a women belonging to the same community.” (Twisk, 2006).

In the case of many covariates in the model the equation can be expressed in matrix notation as following:

= )^* + +^* ! +

Where:

is the response for the level 1 unit in the level 2 unit;

)* is the matrix of covariates corresponding to the level 1 unit in the level 2 unit;

is the associated vector of fixed parameter estimates;

+^* is a matrix of covariates (usually a subset of )^*) the effects of which vary randomly at level 2;

! is a vector of level 2 random effects;

is the random effect associated with the level 1 units.

The multilevel linear regression models are used in the analysis of late antenatal care visits presented in Chapter 6.

5.3.3 Multilevel logistic model

The models developed in Chapter 7 and Chapter 8 assume that the response variable is dichotomous, since the aim of the second and third models is to describe factors associated

with the risk for a young Kenyan women to delivery in a health facility and the risk of being visited by a professional health care after delivery. As in many other cases concerning social statistics, in these two cases we deal with categorical responses, and multilevel logistic regression models are therefore employed.

The 2 level random intercept logistic model is of the form:

,-./ 012 = )^* + ! !~ 4 (0, &$')

Where:

1 is the probability of an event occurring for the level 1 unit in the level 2 unit;

)^* is the matrix of fixed (observed) covariates corresponding to the level 1 unit in the level 2 unit;

is the associated vector of parameter estimates for the effects of fixed covariates;

is the vector of level 2 random effect, which represents unobserved level 2 characteristics.

The multilevel analysis are developed using MLwiN software (Multilevel Modeling for Windows).

Concerning the estimation procedures, the methods employed by MLwiN are used: the iterative generalized least squares (IGLS) estimation and the restricted iterative generalized least squares (RIGLS). The IGLS estimation may produce biased estimates especially when the sample size is small while the RIGLS estimation is unbiased (Goldstein, 1999).¹⁶

The estimation procedure for categorical response models in multilevel analysis is based on two approximation to be done as linearization of the equation (Twisk, 2006).

There is the possibility to choose between the first order approximation, which is simplest and more robust type of linearization, and a second order approximation, which is more accurate in estimates. Moreover, for the prediction of the probability (1) in a multilevel logistic model, as in the case of model 7 and model 8 of this study, it is more suitable to use the predictive quasi likelihood – PQL, which is more accurate than the marginal quasi likelihood – MQL estimation type. Following the indication of Twisk (2006) and Hox (2010) for the correct procedure of estimation for multilevel logistic model in this study a second order approximation with the predictive quasi likelihood – PQL approximation are employed.

16 For more details about estimation procedure and multilevel modeling methods see Goldstein, 1999; Snijders and Bosker, 1999; Twisk, 2006; Hox, 2010.

Significance tests used in the linear multilevel models are based on Wald test (Z statistics where +~4 (0,1) )for hypothesis testing separately for each parameter and on likelihood ratio test. Still following Twisk (2006) wald tests are performed to test the significance of coefficients of multilevel logistic models. In this study the results are presented by odds ratios and confidence interval at 95 per cent of probability or 90 per cent where necessary.¹⁷

In document Reproduction and maternal health care among young women in Kenya: geographic and socio-economic determinants (Page 81-86)

Fig. 2.6 The three delays model of maternal mortality

Part 5 - Data and Methods

5.3 Statistical Methods

5.3.1 Multilevel Models

5.3.2 Multilevel linear models



var (!

) = &

, var (

) = &

, var (