CHAPTER 4: GENDER DIFFERENCES IN EDUCATIONAL ATTAINMENT
4.2 EDUCATION AND THE EDUCATION SYSTEM IN TURKEY
4.3.2 Econometric Issues
It is important to consider some econometric issues while investigating the highest level of educational attainment to obtain unbiased estimation results. These include the censoring problem of the final attainment of enrolled children, intrafamily correlation among siblings and selection bias related to children currently residing in the household. 4.3.2.1 The Censoring Problem
As stated above, educational attainment has generally been investigated using current school enrolment and/or the highest level of education attained in the existing literature. The current school enrolment analysis is generally examined by using a probit model and is somewhat standard. However, estimation of the highest level of educational attainment is more complicated and may lead to some econometric problems. Firstly, children who are enrolled at school at the time of the survey constitute an important problem since, for those children, it is unknown whether they will complete this level of education or drop out of school. Consequently, their highest educational attainment is unknown. Such observations are right censored, which occurs when the survey (used in the analysis) ends before the event (completed education) has occurred, and may lead to biased estimation results58 (King and Lillard, 1983 and 1987; Behrman and Knowles, 1999; Maitra, 2003). Several approaches have been used to deal with this problem in the existing literature.
The first method is to exclude currently enrolled children (i.e., the censored observations) and to estimate the model only for individuals who have completed their education (Lazear, 1977; DeTray, 1978). Excluding such censored observations, however, may result in sample selection bias since this method may lead to a selection of older individuals and individuals with lower levels of education (Lillard and King, 1984). An alternative to this approach is to restrict the samples to include only older individuals who have completed education, as opposed to excluding currently enrolled children from the sample (Tansel, 1997; Holmes, 2003). Kalmijn (1994), for example, investigated the relationship between mothers’ socio-economic status and children’s schooling for the US using the National Survey of Families and Households (NSFH) 1987/1988. The author estimated logistic regression models only for individuals who were aged 24 or older at the time of the survey to overcome the censoring problem. This
58
171
method, however, may lead to a significantly reduced sample size. This method may also not be useful for developing countries experiencing constant change in their education systems since an analysis of more recent periods may be more relevant for policy. In addition, the impact of family background may become harder to ascertain since older individuals are less likely to live with their parents and household surveys generally do not provide information on the childhood environment of adults (Holmes, 2003).
Another way to deal with the censoring problem used in the existing literature is to incorporate age and its squared term as regressors in the model (Behrman and Wolfe, 1987). Including the age of the individual as a covariate may account for differences in educational attainment between young and enrolled children and older individuals who have completed education but it is not an effective way to overcome the censoring problem, as it cannot make a distinction between completers and enrolees59 (Lillard and King, 1984). This method is further complicated especially in underdeveloped countries because of late entry to school and repeating years, which are frequently encountered in these countries, and reduce the power of age as a predictor of educational attainment (Holmes, 2003).
Another method to overcome the censoring problem is to construct an ‘age and sex specific education index’ which is computed as the ratio of the current educational attainment of child j in family i, of age m, and sex k ( ) to the mean observed educational attainment of children of age m and sex k in the sample ( )60. This index has been used as a dependent variable by Rosenweig (1978), Birdsall (1982) and Wolfe and Behrman (1986). However, for enrolled children, this index indicates only whether the child lags behind his/her cohort by, for example, grade repetition, late start to school, temporary leave or whether the child is ahead rather than accounting for the censoring problem (Lillard and King 1984).
There are also other studies which used different methods to address the censoring problem in the existing literature. Chernichovsky (1985) used the OLS approach to
59 Consider two individuals of the same age, one of them is enrolled and the other one has already
dropped out of school. In this case, including age as a covariate is not an effective way to deal with the censoring problem since both children are considered as identical in the estimation (Lillard and King, 1984; Holmes, 2003).
60 In other words, an ‘age and sex specific education index’ indicates the ratio of child schooling to the
172
examine the socio-economic and demographic correlates of school enrolment and educational attainment in rural Botswana using the Rural Income Distribution Survey 1974/1975. The author addressed the censoring problem by examining separately the factors associated with the demand of education for enrolled children and non-enrolled children in the age group of 6-18 years old which includes both primary and secondary school children. Barros and Lam (1996), on the other hand, estimated OLS and 2SLS models to examine the determinants of educational attainment at the household level for 14-year-olds only in urban regions of Sao Paulo, Brazil using the 1982 Brazilian Annual Household Survey. They used this age group since the children in this age group should have completed the compulsory education in Brazil. However as Holmes (2003) argued, it is not clear whether using only one ‘schooling cohort’ is a good predictor of the completed level of education.
In order to circumvent the censoring problem, another alternative method is proposed by King and Lillard (1983 and 1987) and later used by Glewwe and Jacoby (1992), Behrman et al. (1997) and Holmes (2003), which is an ordered probit model that takes into account right censoring explicitly. In this censored ordered probit model, it is assumed that individuals will at least complete their last grade and this assumption is incorporated into the likelihood function (Lillard and Willis, 1994; Glick and Sahn, 2000).
4.3.2.2 Intrafamily Correlations in Educational Attainment
Another econometric issue in the educational attainment literature relates to common family characteristics for children from the same household since household surveys generally collect multiple observations per family. Children belonging to the same household are likely to share unobserved (by the researcher) characteristics, which affect their performance in school and their demand for education in a similar way. In this case, the residual terms are unlikely to be independent and they will be correlated through a common household-level component for children from the same household. Therefore, it can be argued that the highest level of educational attainment of siblings may be correlated as they have common or highly correlated values of regressors (Glick and Sahn, 2000; Lillard and King, 1984). Failure to account for this problem may lead to substantially underestimated standard errors. One way to deal with this problem is to select a single child from a household. For example, Parish and Willis (1993), in their analysis for Taiwan, used a single child from a household because they claimed that
173
including more than one child from a household may result in over-representation of large families. However, the authors did not state any selection criteria in the case of the existence of multiple eligible children in the household. Moreover, using this method may severely reduce the sample size.
An alternative method is to allow for intrafamily correlation explicitly in the estimation model. In this context, an ordered probit model, which allows for such correlations in the model through random effects, is proposed by King and Lillard (1983 and 1987) and Lillard and King (1984). In other words, the assumption is that the error terms in the index functions for educational attainment in the ordered probit model composed of a common household heterogeneity component and an idiosyncratic individual error (Glick and Sahn, 2000). The random effects ordered probit model, which is an extension of the ordered probit model by allowing for a household random specific component in the error term, has been used in only a few educational attainment studies particularly for underdeveloped or developing countries (see, for example, Lillard and King, 1984 for the Philippines; Glick and Sahn, 2000 for Guinea).
4.3.2.3 The Potential Selection Problem
In general, educational attainment studies use the child as the unit of analysis to explore mainly the association between the parental characteristics and child schooling. The importance of conducting child-specific research is emphasised by Holmes (2003) as follows: “Using children as the unit of observation permits the use of information about current parental, household and community characteristic, and, thus, the environment in which the schooling decisions are made (p.252)”. Furthermore, child-specific research is particularly important for developing countries and more relevant to policy since many developing countries have been attempting to restructure their education system and children are affected mostly by these changes.
The studies using children as the unit of observation often, however, include only children who live with their parents because most surveys do not provide information on children who do not reside in their parents’ house. Therefore, another econometric issue is the potential selection problem which arises from the fact that children leave the household of their parents after a certain age and those who are observed still live in the household (i.e., home-resident children) may be an unrepresentative and non-random sample (Tansel, 2002). It should be acknowledged that there may be a close association
174
between leaving home and educational attainment. For example, if the least capable or least supported children in the household leave home at an early age and are less likely to obtain higher level of education, then the correlation between the error terms in any leaving home and educational attainment equations may result in sample selection bias when educational attainment is estimated only for children living with their parents (Holmes, 2003). There is only limited discussion of this potential selection problem in the existing literature. While some studies include information on all living children and have estimated the models for all children and, thus, do not face this problem (see, for example, Glewwe and Jacoby, 1992 for Ghana; Tansel, 1997 for Cote d’Ivoire and Ghana), a majority of the educational attainment studies do not state the probability of this form of bias probably due to the unavailability of information on children not living with their parents (see, for example, Birdsall, 1980 for Columbia; Handa, 1996 for Jamaica). However, Holmes (2003) investigated the determinants of educational attainment in Pakistan by specifically examining two potential sources of bias, the censoring bias and sample selection bias. She estimated censored ordered probit models for all children and for home-resident children only using the 1991 Pakistan Integrated Household Survey. The results of this study indicated that the sample used for the estimation of educational attainment can alter the estimation results and, in particular, samples including only home-resident children lead to a bias in the estimation results of the demand for education. Although this study is one of the few studies, which considers sample selection bias in the analysis, it also has an important limitation since the author did not consider the bias resulting from the intra-family correlations for children from the same household.