• No results found

CHAPTER TWO

2.12 Statistical Method

In this thesis two separate regression models namely Logistic Regression and general Linear Regression Model or Ordinary Least Square (OLS) regression have been used based on the nature of the outcome variable. The reason for the application of the two separate regression models in spite of applying a single regression is that the nature of the outcome variable considered in this study is

45

different. The performance of the individual student has been measured on the basis of the ASER score. The details of the ASER score are discussed in chapter five in this thesis. The ASER score is of a dichotomous type indicating the performance of the student as either pass coded as '1' or fail coded as '0'. This dichotomous outcome variable has been used for the measurement of the performance of the student at the individual level. To explore its association with a number of school related, teacher related, and students’ family related factors, a logistic regression model has been applied.

Another outcome variable is the pass percentage of the school based on the ASER score achieved by the students in each school. This outcome variable is of a continuous type and is an intervally scaled variable. Keeping in view the need to identify statistically significant factors associated with the performance of the school, a general linear regression model whose parameters are estimated by using ordinary least square (OLS) method has been carried out. This general linear regression model is also known as the OLS model. This model will be useful in identifying the number of most significant factors associated with the performance of the school which may be helpful in making future plans for the improvement of school performance.

The specification and the functional form of the Logistic regression model and the OLS regression are explained in the following sections.

2.12.1 Logistic Regression Model for Individual Level Analysis

Both theoretical and empirical considerations suggest that when the response variable is binary, the shape of the response function will frequently be curvilinear. The response functions are shaped either as a tilted S or a reverse tilted S, and are approximately linear except at the ends. These response functions are often referred to as sigmoidal. They have asymptotes at 0 and 1 and this automatically meets the constraints on E{Y} since the response function represents probabilities when the outcome variable is a 0,1 indicator variable, the mean response should be constrained as follows:

46

0  E{Y} = π  1 (1)

The tilted S shaped response functions are called logistic response functions and are expressed mathematically as follows in vector notation.

' ' exp( ) { } 1 exp( ) X E Y X     (2) Where ' 0 1 1 2 2 p p β X = β + β X + β X +...+ β X ' i 0 1 i1 2 i2 p ip β X = β + β X + β X +...+ β X

A logistic response function is either monotonic increasing or decreasing depending on the sign of

1. Another interesting property of a logistic response function is that it can be easily linearized. Let us denote E{Y} by π, since the mean response is the probability when the response variable is a 0,1 indicator variable. Then if the transformation is taken as:

e π π' = log 1 - π       (3)

we can obtain from (2):

'

π' = (β X) (4)

The transformation (2) is called the logit transformation of the probabilityπ.

The ratio π/(1-π)in the logit transformation is called the odds. Computing the odds is a commonly used technique of interpreting probabilities (Fleiss et al., 2003). The transformed response function (4) is referred to as the logit response function and π' is called the logit mean response and it ranges from - ∞ to ∞ as X ranges within - ∞ to ∞.

Hence, utilizing this concept of logistic response function taking on the values 1 and 0 with probabilities π and (1-π), respectively, Y is a Bernouli random variable with parameter E{Y} = π.

47

The logistic regression model in the usual form will be:

i i i

Y = E{Y } + ε

Since the distribution of the error term εi depends on the Bernoulli distribution of the response Yi, the multiple logistic regression equation would be formulated as:

' e π π' = log = β X 1 - π       (5)

This equation is in the same form as the multiple linear regression equation and the coefficients in the equation can be interpreted as regression coefficients. The right hand side of the equation (5) is called the linear predictor. The log of odds linked to the predictors is called the link function of the logistic regression.

Note that the fundamental assumption in logistic regression analysis is that log (odds) is linearly related to the independent variables. No assumptions are made regarding the distributions of the X variables (Afifi et al., 2004). One of the advantages of the logistic regression model is that the predictor variables X may be discrete or continuous.

2.12.2 Parameter Estimation in the Logistic Regression Model

Our response variable Yi is an ordinary Bernoulli random variable where:

i i

Prob(Y = 1) = π

i i

Prob(Y = 0) = 1 - π

Its probability distribution can be represented by:

i i

Y 1-Y

i i i i i

f (Y ) = π (1-π ) , for Y = 0,1 ; i = 1,....n

48 Yi = 1 or 0.

Since the Yi observations are independent, their logarithm of the joint probability function will be:

i i

n

Y 1-Y