Fixed and Random Effects - Estimating Efficiency

Productive Efficiency: Estimation Methods

2.6. Estimating Efficiency

2.6.4. Fixed and Random Effects

The central feature of the Battese and Coelli estimator is a fixed effects linear regression model. It is argued that this approach brings gains in statistical efficiency while obviating assumptions about the distribution of technical inefficiency.

However, heterogeneous production environments, which are not under the producer’s control, may influence the production process and incurred costs. These differences when observed or measured by observed proxies, can be incorporated in the estimation methods. One of the most important issues in stochastic frontier models is adjusting for the unobserved heterogeneity among producers functioning in different production environments. Individual producers face different external factors that could influence their production costs but are not under their control. Some of these factors are observed and can be controlled for in the analysis. However, in many cases the data are not available for all these variables. Moreover, the relevant factors are often too complex to be quantified by simple indicators. In panel data where an individual producer is observed several times, the producer-specific unobserved variations can also betaken into account through fixed or random effects⁵¹.

The first use of panel data models in stochastic frontier models goes back to Pitt and Lee (1981) who interpreted the panel data random effects as inefficiency rather than heterogeneity. This tradition continued with Schmidt and Sickles (1984) who used a

51 Panel data may have group effects, time effects, or both. These effects are either fixed effect or random effect. Consequently, panel data are analyzed to investigate group and time effects using fixed effect and random effect models.

similar interpretation applied to a panel data model with fixed effects. The basic panel data formulation, introduced by Schmidt and Sickles (1984), is a model in which the producer-specific stochastic term is interpreted as inefficiency. This term can be alternatively identified as a fixed intercept for each producer (FE model) or as an iid random term (RE model). In case where the unobserved heterogeneity is correlated with some of the explanatory variables, while the random effects estimators can be biased the fixed effects model may overestimate inefficiency scores.

A main shortcoming of these models is that any unobserved, time-invariant, producer-specific heterogeneity is considered as inefficiency. In more recent papers random effects model has been extended to include time-variant inefficiency (Cornwell, Schmidt and Sickles, 1990, Battese and Coelli, 1992). However, in both these models producer-specific effects are considered as inefficiency. Another problem arises when the producer-specific effects are correlated with the explanatory variables. A common feature of all these models is that they do not fully separate the sources of heterogeneity and inefficiency at the producer level. An alternative approach is to consider two separate stochastic terms for efficiency and producer-specific heterogeneity.

Basically, there are two methods of estimation in the literature. In the first, the estimation of the parameters of the production frontier is done conditionally on fixed values of the u_i’s which leads to the fixed effects model and the within estimator of the frontier coefficients. In the second, the estimation is carried out marginally on the producer specific effects u_it’s which leads to the random effects model and either the Generalised Least Squares (GLS) or the LM estimation of the parameters (Puig-Junoy, 2001)⁵².

52 As described in Coelli (1996), the imposition of one or more restrictions upon this model formulation can provide a number of the special cases of this particular model which have appeared in the literature. Setting η to be zero provides the time – invariant model set out in Battese, Coelli and Colby (1989). Furthermore, restricting the formulation to a full (balanced) panel of data gives the production function assumed in Batesse and Coelli (1988). The additional restriction of µ equal to zero reduces the model to model One in Pitt and Lee (1981). One may add a forth restriction of T = 1 to return to the original cross sectional, half - normal formulation of Aigner, Lovell and Schmidt (1977). Obviously, a large number of permutations exist. For example, if all these restrictions excepting µ = 0 are imposed,

A fixed effect model (Schmidt and Sickles, 1984) examines if intercepts vary across groups or time periods, whereas a random effect model (Pitt and Lee, 1981) explores differences in error variances⁵³. The fixed effect model asks how group and/or time affect the intercept, while the random effect model analyzes error variance structures affected by group and/or time. Slopes are assumed unchanged in both fixed effect and random effect models. The following table compares the fixed effect and random effect models:

Table 2.2. Fixed Effect and Random Effect Models

Fixed Effect Model Random Effect Model

Functional form yit =

(

α +µi

)

+Xit^'β+vit yit =α +Xit^'β+

(

µi +vit

)

Intercepts Varying across groups and/or times Constant

Error variances Constant Varying across groups and/or times

Slopes Constant Constant

Estimation LSDV, within effect method GLS, FGLS

Hypothesis test Incremental F test Breusch-Pagan LM test

Notes:

1. The parameter estimate of a dummy variable is a part of the intercept in a fixed effect model and a component of error in the random effect model. Slopes remain the same across groups or time periods.

the model suggested by Stevenson (1980) results. Furthermore, if the cost function option is selected, we can estimate the model specification in Schmidt and Lovell (1979) specification, which assumed allocative efficiency. These latter two specifications are the cost function analogues of the production functions in Battese and Coelli (1988) and Aigner, Lovell and Schmidt (1977), respectively.

53 In both cases, inefficiency effects are assumed to be time invariant.

2. vit ^~iid

(

⁰^,σv²

)

indicates that errors are independent identically distributed Source: Park (2009), p. 2

Group effect models create dummies using grouping variables (e.g., country, producer, and race). A one-way model includes only one set of dummy variables (e.g., producer), while a two – way model considers two sets of dummy variables (e.g., producer and year). If one grouping variable is considered, it is called a one-way fixed or random group effects model. Two-way group effect models have two sets of dummy variables, one for a grouping variable and the other for a time variable.

This variation has two important restrictions. First, any time invariant heterogeneity will be pushed into α_i and ultimately intouˆ . Second, the model assumes that _i inefficiency is time invariant. For short time intervals, this may be a reasonable assumption. But, this is to be questionable. Both of these restrictions can be relaxed by placing country specific constant terms in the stochastic frontier model – we call this a ‘true’ fixed effects model:

it it it

it x v u

y =α+ ^'β+ − (2.33)

where u_ithas the stochastic specifications noted earlier for the stochastic frontier model.

In the fixed effects, the production function is denoted:

i it it

it x v u

y =

α

β

+ − (2.34)

where y_itis the (log of the) output of the system, x_itis (logs of) the set of inputs, v_itis the random component representing stochastic elements as well as any country (and time) specific heterogeneity, u_iis the inefficiency in the system, and i and t denote country and year, respectively.

Assuming that u_i> 0, the equation is rewritten:

it uncorrelated with other components of the model, the parameters can be estimated by least squares, using the “within,” or dummy variable estimator. The country specific constants embody the technical inefficiency. The inefficiencies are estimated in turn by shifting the function upward so that each constant term is measured as a deviation from the benchmark level: model. This feature leads to the extension of the model to a truncated normal model by allowing the mean of U_ito be nonzero (Stevenson, 1980). The major shortcoming here is that the strict assumption suppresses individual heterogeneity in inefficiency that is allowed, for example, by the fixed effects formulation.

Superficially, this amounts simply to adding a full set of country dummy variables to the stochastic frontier model. The model is still fit by maximum likelihood, not least squares.

The true fixed effects model places the unmeasured heterogeneity in the production function: with a loglinear model, it produces a neutral shift of the function, specific to each country. One might, instead, have the heterogeneity reside in the inefficiency distribution. This could be accomplished with the formulation:

δ δ

µ_i = ₀_i +h_i^' (2.37)

that is, by placing the country specific dummy variables in the mean of the truncated normal distribution, rather than in the production function. Once again, in a moderate sized sample, this is a minor reformulation of the familiar model.

Although the fixed effects models have the advantage of following correlation between the inefficiency term and the independent variables, and of allowing no distributional assumption on efficiency, the results should be interpreted carefully.

The possibility that the producer – specific effects would include the influence of variables that vary across producers but are invariant over time may be not ruled out.

Simar (1992) has shown that the fixed effects model appears to provide a poor estimation of the intercepts and of the slope coefficients of frontier production functions and consequently unreasonable measures of technical efficiency.

On the other hand, as referred in Greene (2003 a,b), the random effects model is obtained by assuming that u_iis time invariant and also uncorrelated with the included variables in the model:

(

i it

)

it X v

y =α + ^'β+ µ + (2.38)

In the linear regression case, the parameters are estimated by two step generalized least squares (Greene, 2003 a,b). Random effects model has a significant drawback:

there is no implied estimator of inefficiency in this model, that is, no estimator of TE_i as in the fixed effects case.

Pitt and Lee (1981) showed how the time invariant composed error model could be extended to a panel data version of the stochastic frontier model. The direct extension would be of limited usefulness here, first because of the assumption of uncorrelatedness of u_iand x_iand, because of the assumption of time invariance of the inefficiency. The first of these can be remedied in the same fashion as suggested earlier. Estimation of the random effects model with heterogeneity in E[U_i] is straightforward.

The heterogeneity may also enter the distribution of u_itwhich can, as before, have mean µ_ior, in principle, even µ_itwith time variation in the covariates. Country specific estimates of inefficiency are computed using the Jondrow et al. (1982) formulation,

though simulation methods are needed to integrate out the unmeasured random effects.

In the random effects model, the stochastic nature of the efficiency effects is explicitly taken into account in the estimation process. The GLS estimation provides consistent and unbiased estimates of the parameters, if the regressors xit are not correlated with the technical efficiency effects uit. A relative major advantage of the GLS estimator relative to the within estimator is its flexibility to include the time – invariant regressors. In the fixed effects model, the coefficients of time – invariant regressors, even though they may vary across producers, cannot be estimated because these time – invariant regressors will be eliminated in the within transformation, as shown in the equation:

it i it i

it y x x v

y − )= ′( − )+ ′

( β _(2.39)

In this case, the producer – specific technical efficiency effects will include the influence of all variables that are time – invariant at the producer level within the sample. This would make technical efficiency comparisons difficult unless the excluded fixed regressors influence all producers in the sample equally (Kumbhakar, 1987).

Summarising, with fixed effects models, all statistical inference can only be made on the cross – section unit used for estimation. In other words, the findings from a fixed effects model cannot be generalised. An alternative is the random effects model, in which the error components are assumed to be random variables drawn from a normal distribution and independently and identically distributed, with the assumption that these error components are uncorrelated with the explanatory variables.

In the RE framework, it is assumed that the producer-specific effects are uncorrelated with the explanatory variables in the model. Therefore, all the extensions of the RE model are prone to heterogeneity bias due to such correlation. However, the refinement of the model to separate different sources of heterogeneity may improve the performance of the model, especially regarding the inefficiency estimates.

The core difference between fixed and random effect models lies in the role of dummy variables. If dummies are considered as a part of the intercept, this is a fixed effect model. In a random effect model, the dummies act as an error term. A fixed group effect model examines group differences in intercepts, assuming the same slopes and constant variance across entities or subjects. Since a group (individual specific) effect is time invariant and considered a part of the intercept, ui is allowed to be correlated to other regressors.

A random effect model, by contrast, estimates variance components for groups (or times) and errors, assuming the same intercept and slopes. u i is a part of the errors and thus should not be correlated to any regressor; otherwise, a core OLS assumption is violated. The difference among groups (or time periods) lies in their variance of the error term, not in their intercepts.

A random effect model is estimated by generalized least squares (GLS) when the Ω matrix, a variance structure among groups, is known. The feasible generalized least squares (FGLS) method is used to estimate the variance structure when Ω is not known. A typical example is the groupwise heteroscedastic regression model (Greene 2003 a,b). There are various estimation methods for FGLS including the maximum likelihood method and simulation (Baltagi and Cheng 1994).

Fixed effects models are not without their drawbacks. The fixed effects models may frequently have too many cross-sectional units of observations requiring too many dummy variables for their specification. Too many dummy variables may sap the model of sufficient number of degrees of freedom for adequately powerful statistical tests.

Moreover, a model with many such variables may be plagued with multicollinearity, which increases the standard errors and thereby drains the model of statistical power to test parameters. If these models contain variables that do not vary within the groups, parameter estimation may be precluded. Although the model residuals are assumed to be normally distributed and homogeneous, there could easily be country-specific (groupwise) heteroskedasticity or autocorrelation over time that would further plague estimation (Yaffee, 2003).

The one big advantage of the fixed effects model is that the error terms may be correlated with the individual effects. If group effects are uncorrelated with the group means of the regressors, it would probably be better to employ a more parsimonious parameterization of the panel model.

Conventional panel data models such as fixed-effects or random-effectsmodels can be employed to account for unobserved heterogeneity(Pitt and Lee, 1981; Schmidt and Sickles, 1984). A major limitationof these models is the treatment of the inefficiency term astime-invariant, which raises a fundamental identification problem.Not only must the model distinguish noise from the inefficiency effects, but also the unobserved, time-invariant, producer-specific heterogeneity becomes difficult to distinguish from the inefficiency component (Greene, 2005). Some authors have extended the random-effects model to include time-variant inefficiency (Cornwell et al., 1990;Battese and Coelli, 1992, 1995). However, a drawback in thesemodels is that the producer-specific effects are still consideredas inefficiency, which may result in biased estimates (Greene, 2005). Moreover, when producer-specific effects are correlated with theexplanatory variables, the random-effects estimators are affected by heterogeneity bias. As pointed out by Greene (2002b), while fixed-effects estimators are still consistent with regard to the production frontier slopes, inefficiency variations are overestimated. Thus, an obvious drawback of all these modelsis their inability to separate fully the sources of heterogeneityand inefficiency at the producer level.

In a recent development, Greene (2005) demonstrated how a stochasticfrontier model can be extended to panel data models by includinga random effect in the model. He refers to this extension asthe ‘true’ random-effects model.

The ‘true’random-effects model is basically a random-constant frontiermodel that is obtained by combining a conventional random-effectsmodel with a skewed stochastic term representing inefficiency.

However, since most of the unobserved factors, in particular those relating to efficiency explanatory conditions, are most likelyto be correlated with the output and

some of the explanatory variables, the ‘true’ random-effect estimators of the production function coefficients could still be biased.

2.7. Concluding Remarks

The discussion concerning the measurement of productivity and efficiency in the economic literature started with contemporaneous papers by Debreu (1951) and Koopmans (1951). Koopmans (1951) and Debreu (1951) made the first systematic efforts in the investigation of efficiency and its measurement. However, the standard efficiency measurement literature was started by Farrell (1957), built upon Debreu (1951) and Koopmans (1951). Farrell (1957) proposed to measure the efficiency of a productive unit in terms of the realized deviations from an idealized frontier isoquant.

The empirical identification of such a benchmark is the main issue of the literature on efficiency measurement. Farrell (1957) extended this work in an attempt to operationalize the measurement of productivity and efficiency. From Farrell's work, we define the productivity of an economic agent as the scalar ratio of outputs to inputs used by the agent in its production process. Finally in the 1970's, with the seminal papers of Aigner et al. (1977) and Meeusen and van den Brock (1977), econometricians developed a statistically and theoretically sound method for measuring efficiency, a method now known as stochastic frontiers. In this case, a stochastic frontier is defined as the locus of best performing agents within a data set.

The other data points of the other producers are located "below" this estimated frontier. The relative distance measured between this best performance and the other data points is interpreted as inefficiency.

The approach to frontier estimation, proposed by Farrell (1957), was also considered by Shephard (1970) and Afriat (1972) who suggested mathematical programming methods that could achieve frontier estimation, but the method did not receive wide attention until the paper by Charnes, Cooper and Rhodes (CCR) (1978), in which the term DEA was first presented. Charnes, Cooper and Rhodes (1978) proposed a model that had an input orientation and assumed constant returns to scale (CRS).

Subsequently, Färe and Logan (1983) and Banker, Charnes and Cooper (BCC) (1984) proposed variable returns to scale (VRS). The term DEA and the CCR model were

first introduced in 1978 (Charnes et al, 1978) and were followed by a phenomenal expansion of DEA in terms of its theory, methodology and application over the last few decades (Førsund and Sarafoglou, 2003, Seiford (1996), Charnes et al (1994).

Charnes et al. (1978) and Banker et al. (1984) extended Farrell’s ideas by imposing returns to scale properties. The nonparametric approach relies on a production frontier defined as the geometrical locus of optimal production plans (Simar and Wilson, 1998, 2007). The production frontier can be estimated non parametrically from a set of observed production units, based on different envelopment techniques. A Common nonparametric measure is the Data Envelopment Analysis (DEA). Nonparametric DEA shows how one can apply simulation methods, to conduct statistical inference to obtain more reliable and robust results. In DEA the inefficiency is defined as the distance from the frontier of a convex envelope of the data; therefore, due to the convexity assumption, a company might be compared to an unobservable and fictitious linear combination of efficient observations (Coelli et al., 2005). Thus, the efficiency score is the point on the frontier characterized by the level of inputs that should be reached to be efficient (Simar and Wilson, 1998, Simar and Wilson, 2007).

Then the analysis proceeds on deterministic, where deviations of a producer from the theoretical maximum are allocated exclusively to inefficiency, and stochastic production frontiers, where the deviation from the frontier is decomposed into stochastic noise and technical inefficiency in production. Chapter 2 analyses the

In document An industry and country analysis of technical efficiency in the European Union, 1980-2005 (Page 117-129)