DISRICT UMERKOT
5.1. ESTIMATION METHOD
The data we use to estimate the stochastic frontier production function is a
panel data with 6 years of data for each of the 23 education districts. The
stochastic frontier production has generally been estimated by Maximum
Likelihood (ML) estimates. This will provide the correct results provided the
likelihood function is correctly specified, and more important, there are no
measurement errors in the variables. When there are errors in the variables
and these are not taken into account in estimation, the coefficient estimates will
156
important to use estimation methods that are robust to the existence of
measurement errors if one suspects the existence of such errors. There are
many capital inputs, such as libraries and laboratory equipment that we are not
able to account for. Finally, the output variables are measured as gross
enrolment while the inputs are aggregates for all school, teacher and student
specific variables. A major problem with the production functions is the
possibility that the quantity of input used by districts are themselves
endogenous, especially in countries where local districts determine school
inputs through local property taxes. Failure to account for the endogeneity of
these inputs will result in inconsistent or, at best, biased coefficient estimates.
The usual way of solving this problem is to estimate a simultaneous system of
equations that include the output equation and a set of input demand equations
in order to account for the endogeniety of inputs. In the case of Sindh, there are
reasons to suspect that the problem of endogenously determined inputs may not arise. First, the inputs we use—teachers and physical capital—are generally provided by the provincial government through the department of
education. While there exist student teachers association which may provide
additional resources, these are generally very small relative to overall
resources of the school and they never involve teachers or the construction of
physical facilities. Second, for political reasons, the provincial government
cannot provide school resources that vary systematically across districts or
regions, especially when measured on per student basis. This implies that we
could, potentially, treat the school inputs are exogenously determined. It is
possible that some school districts may get relatively large resources because the school‘s administration is more successful at lobbying the provincial
157
administration for resources or because government decision makers come
from this particular constituency.
To capture the part of heterogeneity that is unobserved, we exploit the panel
data structure of the data by including school type, year, and district-fixed
effects in the estimation. We observe 3 different school levels ;( primary, middle
and secondary). District fixed effects are included, as there is a large
dispersion in school enrolments across districts. School resources are
summarized in terms of four relatively homogenous categories: (1)
management personnel; (2) teaching personnel; (3) supporting personnel; and (4) material supplies. ‗Capital‘, in the sense of school building infrastructure is not accounted for owing to data constraints. Most of the costs are spent on
teaching personnel, followed by material, management personnel, and, lastly,
supporting personnel. Besides teaching, a teacher has some management and
administrative responsibilities. The different tasks within one function are not
officially reported. We assume that there is a homogenous distribution of these
different tasks within one function, between teachers both within one school
and across district.
5.1.1. Log-linear regression model
The choice between a linear regression model or a log–linear regression model is a perennial question in empirical analysis (Gujarati, 2003, p. 282). An
equation that specifies a linear relationship among the variables gives an
approximate description of some economic behavior. An alternative approach
is to consider a linear relationship between log-transformed variables; this is a
log-log model where the dependent variables as well as all explanatory
158
of variables measured in logarithms: sometimes the dependent variable,
sometimes both dependent and independent variables. Wooldridge (2009) says that using the ―double log‖ transformation (of both Y and X) we can turn a multiplicative relationship, such as a Cobb-Douglas production function, into a
linear relation in the (natural) logs of output and the factors of production.
Different functional forms give parameter estimates that have different
economic interpretation. The parameters of the linear model have an
interpretation as marginal effects. The elasticities will vary depending on the
data. In contrast the parameters of the log-log model have an interpretation as
elasticities. So the log-log model assumes a constant elasticity over all values
of the data set11.
Regression diagnostics provides to verify whether data meet the assumptions
of linear regression. Here, we will focus on the issue of normality. Some
researchers believe that linear regression requires that the outcome
(dependent) and predictor variables be normally distributed. We just looked at
the distribution of outcome variable, primary enrolment by making a histogram
11 http://shazam.econ.ubc.ca/intro/olslog.htm (accessed August 24, 2013).
Figure 5-1: Histogram and Kernel density of primary enrolment distribution
Source : Author‘s compilation
0 2 .0 e -0 6 4 .0 e -0 6 6 .0 e -0 6 8 .0 e -0 6 D e n sit y 0 100000 200000 300000 400000 penrol 0 2 .0 0 0 e -0 64 .0 0 0 e -0 66 .0 0 0 e -0 68 .0 0 0 e -0 6 D e n sit y 0 100000 200000 300000 400000 penrol
Kernel density estimate Normal density
kernel = epanechnikov, bandwidth = 1.6e+04
159
of the variable penrol, and also by the kernel density plot, which approximates
the probability density of the variable. Both histogram and the kernel
density plot indicate that the variable penrol does not look normal (fig5-1).
Hence, penrol transformed to the log form. A look at fig- 5-2 indicates that the
log transformation would help to make penrol more normally distributed. It can
be seen that penrol looks quite normal. Similar was the case for middle and
secondary enrolment. Also, in case of middle and secondary education the
variables were accordingly transformed into logarithmic form. According to
Wooldridge (2009) variables such as population, total number of employees
and number of school often appear in logarithmic form; these have a common
feature of being large integer values. In a heterogeneous analysis of this
nature, heteroscedasticity is one of the major problems and log-log model takes
care of this by transforming both dependent and independent variables in
logarithms to scale down the variation.
For this study, the econometric approach uses panel data on education
along with some macro data; with the model specification based on the district
situation and the literature. The approach shows how public spending and other
Figure 5-2: Histogram and Kernel density of primary enrolment (log) distribution
Source : Author‘s compilation
0 .5 1 1 .5 D e n sit y 10.5 11 11.5 12 12.5 13 lpenrol 0 .2 .4 .6 .8 1 D e n sit y 10 11 12 13 lpenrol Kernel density estimate Normal density
kernel = epanechnikov, bandwidth = 0.1493
160
interventions have influenced education enrolment through the years.
Education is a multifaceted phenomenon. As such, a wide variety of methods
are used to measure different aspects of education. These include, among
others, ratios, education attainment indicators, quality of education indicators
and measures of absolute and relative dispersion of education. I find two
important aspects in this scenario. Set in above background, I use student,
school and teacher characteristics data along with geographical area and
education expenditure to measure the efficiency of public sector education
programs in terms of the impact of education expenditure on the actual number
of students in a district school. Data are analyzed for all (twenty three) districts
for six years from 2005-06 to 2010-11. The sample period is constrained by the
availability of consistent data series for all the variables considered in this
model.