• No results found

DISRICT UMERKOT

5.1. ESTIMATION METHOD

The data we use to estimate the stochastic frontier production function is a

panel data with 6 years of data for each of the 23 education districts. The

stochastic frontier production has generally been estimated by Maximum

Likelihood (ML) estimates. This will provide the correct results provided the

likelihood function is correctly specified, and more important, there are no

measurement errors in the variables. When there are errors in the variables

and these are not taken into account in estimation, the coefficient estimates will

156

important to use estimation methods that are robust to the existence of

measurement errors if one suspects the existence of such errors. There are

many capital inputs, such as libraries and laboratory equipment that we are not

able to account for. Finally, the output variables are measured as gross

enrolment while the inputs are aggregates for all school, teacher and student

specific variables. A major problem with the production functions is the

possibility that the quantity of input used by districts are themselves

endogenous, especially in countries where local districts determine school

inputs through local property taxes. Failure to account for the endogeneity of

these inputs will result in inconsistent or, at best, biased coefficient estimates.

The usual way of solving this problem is to estimate a simultaneous system of

equations that include the output equation and a set of input demand equations

in order to account for the endogeniety of inputs. In the case of Sindh, there are

reasons to suspect that the problem of endogenously determined inputs may not arise. First, the inputs we use—teachers and physical capital—are generally provided by the provincial government through the department of

education. While there exist student teachers association which may provide

additional resources, these are generally very small relative to overall

resources of the school and they never involve teachers or the construction of

physical facilities. Second, for political reasons, the provincial government

cannot provide school resources that vary systematically across districts or

regions, especially when measured on per student basis. This implies that we

could, potentially, treat the school inputs are exogenously determined. It is

possible that some school districts may get relatively large resources because the school‘s administration is more successful at lobbying the provincial

157

administration for resources or because government decision makers come

from this particular constituency.

To capture the part of heterogeneity that is unobserved, we exploit the panel

data structure of the data by including school type, year, and district-fixed

effects in the estimation. We observe 3 different school levels ;( primary, middle

and secondary). District fixed effects are included, as there is a large

dispersion in school enrolments across districts. School resources are

summarized in terms of four relatively homogenous categories: (1)

management personnel; (2) teaching personnel; (3) supporting personnel; and (4) material supplies. ‗Capital‘, in the sense of school building infrastructure is not accounted for owing to data constraints. Most of the costs are spent on

teaching personnel, followed by material, management personnel, and, lastly,

supporting personnel. Besides teaching, a teacher has some management and

administrative responsibilities. The different tasks within one function are not

officially reported. We assume that there is a homogenous distribution of these

different tasks within one function, between teachers both within one school

and across district.

5.1.1. Log-linear regression model

The choice between a linear regression model or a log–linear regression model is a perennial question in empirical analysis (Gujarati, 2003, p. 282). An

equation that specifies a linear relationship among the variables gives an

approximate description of some economic behavior. An alternative approach

is to consider a linear relationship between log-transformed variables; this is a

log-log model where the dependent variables as well as all explanatory

158

of variables measured in logarithms: sometimes the dependent variable,

sometimes both dependent and independent variables. Wooldridge (2009) says that using the ―double log‖ transformation (of both Y and X) we can turn a multiplicative relationship, such as a Cobb-Douglas production function, into a

linear relation in the (natural) logs of output and the factors of production.

Different functional forms give parameter estimates that have different

economic interpretation. The parameters of the linear model have an

interpretation as marginal effects. The elasticities will vary depending on the

data. In contrast the parameters of the log-log model have an interpretation as

elasticities. So the log-log model assumes a constant elasticity over all values

of the data set11.

Regression diagnostics provides to verify whether data meet the assumptions

of linear regression. Here, we will focus on the issue of normality. Some

researchers believe that linear regression requires that the outcome

(dependent) and predictor variables be normally distributed. We just looked at

the distribution of outcome variable, primary enrolment by making a histogram

11 http://shazam.econ.ubc.ca/intro/olslog.htm (accessed August 24, 2013).

Figure 5-1: Histogram and Kernel density of primary enrolment distribution

Source : Author‘s compilation

0 2 .0 e -0 6 4 .0 e -0 6 6 .0 e -0 6 8 .0 e -0 6 D e n sit y 0 100000 200000 300000 400000 penrol 0 2 .0 0 0 e -0 64 .0 0 0 e -0 66 .0 0 0 e -0 68 .0 0 0 e -0 6 D e n sit y 0 100000 200000 300000 400000 penrol

Kernel density estimate Normal density

kernel = epanechnikov, bandwidth = 1.6e+04

159

of the variable penrol, and also by the kernel density plot, which approximates

the probability density of the variable. Both histogram and the kernel

density plot indicate that the variable penrol does not look normal (fig5-1).

Hence, penrol transformed to the log form. A look at fig- 5-2 indicates that the

log transformation would help to make penrol more normally distributed. It can

be seen that penrol looks quite normal. Similar was the case for middle and

secondary enrolment. Also, in case of middle and secondary education the

variables were accordingly transformed into logarithmic form. According to

Wooldridge (2009) variables such as population, total number of employees

and number of school often appear in logarithmic form; these have a common

feature of being large integer values. In a heterogeneous analysis of this

nature, heteroscedasticity is one of the major problems and log-log model takes

care of this by transforming both dependent and independent variables in

logarithms to scale down the variation.

For this study, the econometric approach uses panel data on education

along with some macro data; with the model specification based on the district

situation and the literature. The approach shows how public spending and other

Figure 5-2: Histogram and Kernel density of primary enrolment (log) distribution

Source : Author‘s compilation

0 .5 1 1 .5 D e n sit y 10.5 11 11.5 12 12.5 13 lpenrol 0 .2 .4 .6 .8 1 D e n sit y 10 11 12 13 lpenrol Kernel density estimate Normal density

kernel = epanechnikov, bandwidth = 0.1493

160

interventions have influenced education enrolment through the years.

Education is a multifaceted phenomenon. As such, a wide variety of methods

are used to measure different aspects of education. These include, among

others, ratios, education attainment indicators, quality of education indicators

and measures of absolute and relative dispersion of education. I find two

important aspects in this scenario. Set in above background, I use student,

school and teacher characteristics data along with geographical area and

education expenditure to measure the efficiency of public sector education

programs in terms of the impact of education expenditure on the actual number

of students in a district school. Data are analyzed for all (twenty three) districts

for six years from 2005-06 to 2010-11. The sample period is constrained by the

availability of consistent data series for all the variables considered in this

model.