• No results found

Methodology – Panel Data Analysis

2.6 Methodology

2.6.3 Sample Data and Methodological approach

2.6.3.2 Methodology – Panel Data Analysis

A panel data contains a set of cross sectional units, i.e. countries in our specific case, which are observed over some time period. In line with the mainstream of the existing literature, we denote the number of cross sectional units by N and number of time periods where we observe the individuals as T. The use of panel data helps to account for individual differences, or heterogeneity. In a panel data set which is “long and narrow”, implying that we have only a few individuals but long time duration, the seemingly unrelated regression model is more frequently employed. However, in a situation where we have a “short and wide” panel data set, i.e. there are many individuals and relatively few time-series observations, the fixed effects model is more useful and can be applied to panel data with different features (any number of individuals).

Consider a flexible linear regression model as follows

𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛽𝛽1𝑖𝑖+ 𝛽𝛽2𝑥𝑥2𝑖𝑖𝑖𝑖+ 𝛽𝛽3𝑥𝑥3𝑖𝑖𝑖𝑖+ 𝑒𝑒𝑖𝑖𝑖𝑖, t = 1, … T (2.1)

By averaging the data across time periods, we obtain the following

𝑦𝑦�𝑖𝑖 = 𝛽𝛽1𝑖𝑖+ 𝛽𝛽2𝑥𝑥̅2𝑖𝑖+ 𝛽𝛽3𝑥𝑥̅3𝑖𝑖+ 𝑒𝑒̅𝑖𝑖 (2.2)

The “bar” notation suggests that we have averaged the values of the variable across time and thus the subscript time t is discarded. Subtract the second equation from the first one, we have

𝑦𝑦𝑖𝑖𝑖𝑖− 𝑦𝑦�𝑖𝑖 = 𝛽𝛽2(𝑥𝑥2𝑖𝑖𝑖𝑖 − 𝑥𝑥̅2𝑖𝑖) + 𝛽𝛽3(𝑥𝑥3𝑖𝑖𝑖𝑖− 𝑥𝑥̅3𝑖𝑖) + (𝑒𝑒𝑖𝑖𝑖𝑖− 𝑒𝑒̅𝑖𝑖) (2.3)

Therefore, the least squares estimates of the parameters 𝛽𝛽2 and 𝛽𝛽3 are equivalent to those from the more complicated least squares dummy variable model.

The random effects model is constructed on the idea that the individuals contained in a panel data set may be chosen randomly from a large population. By contrast, in the fixed effect model we introduced earlier, all individual differences are assumed to be measured by the differences in the intercept parameter, 𝛽𝛽1𝑖𝑖. By contrast, in the random effect model, we make the same assumption that all cross sectional differences are accommodated by the intercept parameters, but we treat the individual differences random instead of fixed. To achieve this, we let the intercept parameter 𝛽𝛽1𝑖𝑖 to include a fixed component which

denotes the population average on the whole, 𝛽𝛽̅1, and random individual differences,

represented by 𝑢𝑢𝑖𝑖, given by

𝛽𝛽1𝑖𝑖 = 𝛽𝛽̅1+ 𝑢𝑢𝑖𝑖 (2.4)

Where 𝑢𝑢𝑖𝑖 represents the random effect. We assume that 𝑢𝑢𝑖𝑖 has zero mean, is uncorrelated

across different individuals and has a constant variance, 𝜎𝜎𝑢𝑢2. Having some

rearrangements, we obtain a familiar regression below

𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛽𝛽̅1+ 𝛽𝛽2𝑥𝑥2𝑖𝑖𝑖𝑖 + 𝛽𝛽3𝑥𝑥3𝑖𝑖𝑖𝑖+ (𝑒𝑒𝑖𝑖𝑖𝑖+ 𝑢𝑢𝑖𝑖) (2.5)

the term 𝑒𝑒𝑖𝑖𝑖𝑖+ 𝑢𝑢𝑖𝑖 can be further expressed as 𝑣𝑣𝑖𝑖𝑖𝑖. We assume that the error term 𝑒𝑒𝑖𝑖𝑖𝑖 has zero mean, constant variance 𝜎𝜎𝑒𝑒2, and is uncorrelated over time period. Furthermore, we assume the individual effects 𝑢𝑢𝑖𝑖 are not correlated with the error 𝑒𝑒𝑖𝑖𝑖𝑖. Overall, 𝑣𝑣𝑖𝑖𝑖𝑖 has zero mean and a constant variance 𝜎𝜎𝑣𝑣2 = 𝜎𝜎𝑢𝑢2+ 𝜎𝜎𝑒𝑒2. However, 𝑣𝑣𝑖𝑖𝑖𝑖 is serially correlated, i.e. the errors for each individuals are inter-correlated with the correlation coefficient ρ=𝜎𝜎𝑢𝑢2/

(𝜎𝜎𝑢𝑢2+ 𝜎𝜎𝑒𝑒2). In such a particular situation, the least squares estimator is still unbiased and

consistent, however, it is no longer efficient, i.e. the variance is not minimum. Hence, the standard errors obtained from the least squares technique are incorrect. To account for

this problem, the generalized least squares estimator (the minimum variance estimator) is proposed to solve the random effects model. We first apply the least squares to a transformed model as follows

𝑦𝑦𝑖𝑖𝑖𝑖∗ = 𝛽𝛽̅1𝑥𝑥1𝑖𝑖𝑖𝑖∗ + 𝛽𝛽2𝑥𝑥2𝑖𝑖𝑖𝑖∗ + 𝛽𝛽3𝑥𝑥3𝑖𝑖𝑖𝑖∗ + 𝑣𝑣𝑖𝑖𝑖𝑖∗ (2.6) Where we have 𝑦𝑦𝑖𝑖𝑖𝑖= 𝑦𝑦 𝑖𝑖𝑖𝑖 − 𝛼𝛼𝑦𝑦�𝑖𝑖 (2.7) 𝑥𝑥1𝑖𝑖𝑖𝑖∗ = 1 − 𝛼𝛼 (2.8) 𝑥𝑥2𝑖𝑖𝑖𝑖= 𝑥𝑥 2𝑖𝑖𝑖𝑖− 𝛼𝛼𝑥𝑥̅2𝑖𝑖 (2.9) 𝑥𝑥3𝑖𝑖𝑖𝑖∗ = 𝑥𝑥3𝑖𝑖𝑖𝑖 − 𝛼𝛼𝑥𝑥̅3𝑖𝑖 (2.10) 𝑣𝑣𝑖𝑖𝑖𝑖∗ = 𝑣𝑣𝑖𝑖𝑖𝑖− 𝛼𝛼𝑣𝑣̅𝑖𝑖 (2.11)

The transformation parameter α is given by 𝛼𝛼 = 1 − 𝜎𝜎𝑒𝑒

�𝑇𝑇𝜎𝜎𝑢𝑢2+𝜎𝜎𝑒𝑒2

. It can be further shown that

the transformed 𝑣𝑣𝑖𝑖𝑖𝑖∗ has constant variance 𝜎𝜎𝑒𝑒2 and is uncorrelated.

In the previous section we introduced the fixed effects and random effect models for panel data. In practice, the latter is more preferred due to several advantages over the former. For example, the random effects estimator accounts for the random sampling process where the data set is collected; it allows for the variables that are individually time-invariant; as it is essentially a generalized least squares estimation technique, in large sample sizes, generates a smaller variance compared with the least squares estimator. However, a potential difficulty one may encounter in applying the random effects estimator is that the random error 𝑣𝑣𝑖𝑖𝑖𝑖 = 𝑒𝑒𝑖𝑖𝑖𝑖 + 𝑢𝑢𝑖𝑖 is correlated with any of the

explanatory variables on the right hand side of the regression. In such a situation, both least squares and generalised least squares become biased and inconsistent and the fixed

effects estimator, which removes the random effect 𝑢𝑢𝑖𝑖 together with any other time-

invariance components, can provide a good alternative.

In order to examine the correlation between the error term 𝑢𝑢𝑖𝑖 and the explanatory

variables in a random effects regression and thus to decide on the most appropriate estimation procedure to apply, we consider a Hausman test. The test is based on the comparison between the regression coefficient estimates obtained from the random effects model with those obtained from the fixed effects model. If the error 𝑢𝑢𝑖𝑖 is not correlated with the explanatory variables 𝑥𝑥𝑘𝑘𝑖𝑖𝑖𝑖, both random effects and fixed effects estimators show consistency and thus converge to their underlying values in the cases of large sample sizes. However, if there is correlation between 𝑢𝑢𝑖𝑖 and 𝑥𝑥𝑘𝑘𝑖𝑖𝑖𝑖, then the random effects estimator is no long consistent while the fixed effects estimator remains consistent. In practice, the Hausman test can be carried out employing the student t test with the null hypothesis that there is no correlation between 𝑢𝑢𝑖𝑖 and any of the explanatory variables. Consequently, if the null cannot be rejected, one should implement the random effects estimator since it tends to have a smaller variance compared with the competitor. Correspondingly, if the null is rejected, the fixed effects estimator should be preferred as it shows consistency.