PRESENTATION OF STATISTICAL TOOLS - Methodological approach of the spatial distribution of mate

This section provides information about the statistics tools use in the analysis. The existent statistical methods used to estimate, analyse the determinants and to project adult mortality in general and maternal mortality in particular.

3.3.1 Chi-square

Let consider a dependant qualitative variable Y with η1 categories (Y1, Y2, ..., Yj, ..., Yη1 and an pre-dictor X with η₂ categories (X₁, X₂, ..., X_i, ..., X_η₂) captured from a sample of n observations. The Chi-square test of independent allowed us to confirm or infirm an hypothesis called null hypothesis (H₀).

We define the Hypotheses related to the Chi-square test as followed:

H₀ = The two variable are significantly independent, that means there is no association between them.

H1 = The two variable are statistically dependent or associated

The opposite of the null hypothesis is noted H₁ and called the alternative hypothesis. The observed values at a given point (I, J ) (noted O(I, J )) are the number of people belonging to both categories X_I and Y_J. The expected values at a given point (I, J ) (noted E(I, J )) are the number of people who were expected to belong to categories XI and YJ. In other words, the expected values at a given point (I, J ) is the number of observations in the sample multiply by the probability of any observation to belong to the categories XI and YJ.

By definition, the degree of freedom is:

df = (η1− 1)(η₁− 2) (3.7)

The value of the Chi-squared statistic test is given by:

The value of the Chi-squared test with yates’s continuity correction is provided by the following:

χ²_y =

Consult the chi-square distribution table to obtain the critical values according to the degree of freedom and level of significance. Hence, we could compare the calculated values and critical values of the Chi-square to be able to make the conclusions. The probability density function (pdf = }) of the Chi-Chi-square distribution is:

}(x) = x⁽^υ²⁻¹⁾e⁻^x²

2^υ²Γ(^υ₂) (3.10)

where, x > 0 and υ is the mean of the distribution and represents the degree of freedom of the Chi-square test of independence. The Gamma function is defined as followed:

Γ(x) = Z ∞

e^−yy^x−1dy (3.11)

Based on the pdf, we can build the chi-square distribution table of critical values ˘χ²_df with the probabilities (or levels of significance) associated. There is generally an association between two variables and only the level of significance related to the association differed In this study, we decided to choose α = 0.05 = 5%.

In other words, we choose to accept only relationships significant at 1 − α = 95% or above. We consider all the association with significance at less than 95% as very weak and therefore not acceptable since they could be due to chance. In such situation, we consider that there is no statistical significant association.

After that, we identified the probability of rejecting the null hypothesis (called p_value). The p_value of the χ² test is computed (but generally provided by statistical probability tables) as the probability of type I error.

pvalue = P(Type I error)

= P(rejecting H₀ when is H₀ true)

= p(Value of the statistic test is in CR when H0 is true)

= P (RR = χ²_df > ˘χ²_df in one tail CR)

Where CR is the critical region or

the rejection region in the Chi-square distribution plot. Thus, the probability corresponding to χ²_{(df )}>

χ²_{(df )}is determined as p_value. The following conclusion is drawn from the p_value: p_value< α =⇒ Reject H0

p_value> α =⇒ Accept H₀ H0 is rejected means the variables under the test are statistically significantly associated or dependent.

Said in other way, there is statistical evidence that the observed association between both variables is not due to chance. H0 is accepted means the variables are not statistically significantly associated or there is no statistical evidence about an association between the two variables tested. We can also say that the observed association is more probably due to the chance.

3.3.2 Logistic regression

This technique of analysis is applicable when the dependant variable is dichotomous: that means with 2 categories 0 and 1. The category 1 represents the study event (cases under study) while the category 0 represents the opposite event or control cases group. For each independent variable, one category should be identified as a reference category, which could serve as basis of risk comparison of people in other categories to face the event under study. In other words, the logistic regression helps us to compare the risk of exposure to the event under study between each category and the reference category of a given independent variable in the model.

The equation of a multiple logistic regression can be presented as follows:

Y = E(Y ) + ε_i (3.12)

Where Y ∼ B(p) is a Brenoulli random variable with probability of success p. In the logistic regression the dependent variable Y is a dummy variable which can only take two values Y = 1 or Y = 0. Where:

P (Y = 1) = p P (Y = 0) = 1 − p

p = probability that the dependent variable is 1 or proportion of people in the category 1 of the dependent variable.

1-p = Proportion of people in the category 0 of the dependent variable.

The second equation of the logistic model can also be written as follow:

logit(E(Y )) = logit(p) = β₀+X

The above equation can be written in the form:

In equation 3.16, If we replace a specific value of i = I. The categorical variable X_I has many categories (possible values). Let consider X_I as a (0,1) variable: X_I = 1 and X_I = 0 as the reference category.

The regression coefficient β_I can be determined as follows:

Where ˆp₁ and ˆp₂ are the probabilities of the variable X_I to have the values a and a + 1 respectively.

p₂ = β₀+ β_I× 1 +P β_iX_i (3.21)

p₁ = β₀+ β_I× 0 +P β_iX_i (3.22)

p2− ˆp1 = βI (3.23)

log(odds2) − log(odds1) = βI (3.24)

log(odds2

odds1) = βI (3.25)

log(OR) = βI (3.26)

OR = e^β^I (3.27)

where, OR=Odds ratio In the logistic regression, the regression coefficient of a given variable XI is expressed as the odds ratio of a category 1 and the reference category 0 (Kleinbaum et al., 2002). It gives us to ratio of chance between two categories of a given predictor X.

The analysis of the logistic regression is based on a number of statistics which is important to clarify.

Deviance analysis of model fit

The deviance statistics are an alternative of R² which enable one to measure the model fit. The difference between the null deviance (D²_{N ull}) and the residual deviance (D²_residual) follow a Chi-squared distribution with degree of freedom (ddf ) corresponding to the difference of degree of freedom of the null and the residual deviance. (dD² ∼ χ²_ddf) where dD² = D²_{N ull}− D_residual² and ddf = dfN ull− df_residual. The P_value = P rob(χ²_ddf > dD²) allows to conclude about the significance (or fitting) of the model (Sheather, 2009).

Akaike information criterion (AIC).

In general, AIC = 2δ − 2 ln(M L) where δ is the number of parameters in the model and ML is the maximum value of the likelihood function. AIC is used to select the best model in a set of data. In fact, the model with the smallest AIC should be given preference of choice.

Odds ratio (OR = exp(β))

exp(β) = OR or the odds ratio and the level of significance (p_value) associated. For each category of each predictor to except the reference category, there is an odds ratio and significance associated. When the p_valueis less than 0.05, we comment on the odds ratio which tell us about the ratio of the influence between each category and the reference. For a given category, when the odds ratio is greater than one (exp(β) > 1), we conclude that people under this category are exp(β) times more likely (or more at risk) to face the event of study (maternal death in our case) than those of the reference category.

When exp(β) < 1, we conclude that the individuals of the concerned category are 1 − exp(β) percent less likely or less at risk to experience the event under study than those of the reference category. The p_valueassociated to the odds ratio allows us to conclude about the degree of certainty of the odds ratio value.

3.3.3 Time series: ARIMA(p,d,q)

The time series model used in this study is the AutoRegressive Integrated Moving Average (ARIMA) model. The model ARIMA(p,d,q) depends on three positive integers parameters p,q and d. The model is a combination of three components with a parameter related to each of them. Indeed, the first component is the AutoRegressive (AR(p)), Integrated (I(d)) and Moving Average (MA(q)).

ARIMA (p,d,q)

The general formula of ARIMA (p,d,q)⁵ is : ϕ(L)

AR(p) or ARMA (p,0) or ARIMA (p,0,0)

MA(q) or ARMA (0,q) or ARIMA (0,0,q)

Y_t= (1 +

3.4 SYNTHESIS AND PARTIAL CONCLUSION

Chapter one presented the problem of the research in terms of social, economic and demographic importance of the subject. It also presented the national and international context in which the study is situated. Chapter two summarized previous researches done on the subject and highlighted their strengths as well as insufficiencies. This followed by delineation of the problem of the study, the gaps in previous researches and the scientific importance of the problem highlighted. The originality of this study on its specific context is well captured.

The chapter on methodology presented the data used, analytical methods and procedures of analysis.

Three data sets were used in this study, including the census 2006 data, the DHS data 2010 and the EMOC data 2010. The DHS data and census data are collected from the population in the households, while the EMOC data are collected from patients at health facilities. The target population of all analyses undertaken in this study is composed of two population groups: the maternal deaths and maternal survivals. The maternal deaths were the population of interest, whilst the survivals were used for control. At descriptive level of analysis, a Chi-squared test and Wilcoxon Mann Witney test were used. At multivariate level of analysis, a logistic regression model was performed both at national and regional scales. The analyses are done using the software R.

The study focussed on the assessment of maternal mortality level provided by the census 2006. The assessment of the method developed during the census 2006 to adjust the observe information, the comparison of findings with existent estimates at national and regional level are made. The forecasting of maternal mortality levels from 2006 to 2050 was part of the objectives of the study. A mathematical model based on ARIMA model and a component method based on the LiST model incorporated in the software SPECTRUM and a design regression models were also used in the study.

MORTALITY

This chapter addresses the issue of maternal mortality determinants. It seeks to identify socio-economic and demographic factors which influence significantly the phenomenon at national and regional scales.

To reach the target, both descriptive and multivariate approaches have been used. Due to the lack and deficiencies of data regarding maternal mortality, different data sets from different sources have been used to cover most of the factors found in the literature and also enable confrontation and comparisons of results as well as exploring the regional disparities of the phenomenon. Each section of this chapter presents the results of the analyses for each data set used. Thus, are presented below, the outcomes of the analyses from the census data¹ 2006, the Emergency Obstetric and Neonatal Care (EMOC) data², the demographic and health survey (DHS)³ 2010 respectively. For each data set, a descriptive analysis and inferential analysis were performed, as explained in the chapter tree assigned to the methodology.

In document Methodological approach of the spatial distribution of maternal mortality in Burkina Faso and explanatory factors associated (Page 95-103)