CHAPTER 4: GOVERNANCE VARIABLES AND FIRM PERFORMANCE
4.3 Sample Selection and Data Collection
4.3.1 Introduction to Panel Data Analysis
As stated above, panel data is frequently used in prior corporate finance literature. In fact, panel data estimation is often considered to be an efficient analytical method in handling econometric data (Asteriou & Hall, 2011). In consequence, the advantages of panel data analytical methods have made the use of panel data analysis to become popular in social sciences researches, and particularly in corporate finance researches. According to Asteriou & Hall, (2011) the main advantage of the panel data is that it allows the researcher to include data for N cross-sections (i.e., individuals, households, firms, countries, and so on) and T time periods (i.e., yearly, quarterly, monthly, and so on). The combined matrix of the time series for each cross-sectional member in the data set, including the development over time will increase the number of observations and offer a wide variety of estimation methods.
Another important reason for the increasing interests in panel data is that, the potential use of panel data can radically reduce the underlying serious impact of omitted variables on the statistical inference by allowing the researcher to control over any omitted variable, unobservable, and/or hard to measure firm-specific effects that may have direct or indirect effects on the selected variables in a research (Dougherty, 2011). For example, O’Connell, (2007: P: 372) stated that “In a study of the determinants of corporate failure, appropriate panel data modelling and estimation permits the researcher to control for unobservable firm-specific effects which can have a major impact on the probability that an individual firm will fail but are nevertheless difficult to measure”. This implies that utilizing panel data will allow to control for any unobservable individual specific effects, which in turns enhance the reliability of the estimator and increase the robustness and validity of the findings.
Another attraction of panel data sets is that it gives more informative data, more variability, less collinearity among the selected variables, more degrees of freedom and more efficiency (Baltagi & Chang, 1994). Contrary to time series analysis that are usually plugged with multicollinearity, a panel data estimation that will allow researchers to test hypotheses using a large number of observations on a range of cross-sections i.e., firms, individuals, or countries giving more informative data that can produce more reliable parameters estimates (O’Connell, 2007). It’s worth noting that, analysing small sample of observation through time-series analysis would be confounded with multiple issues like difficulties in obtaining t-ratios or f-statistics from regressions. Panel data sets can solve this problem by pooling the data into a ‘panel’ of time-series
98
from different cross-sectional units that will allow to include dummy variables to capture the systematic differences among panel observations which is known as fixed-effect model (Asteriou & Hall, 2011).
In terms of classifications, a panel can be described as a ‘balanced panel’ if the panel has the same number of time observations for each unit of variable and every individual. On the other hand, if some observations are missing or there are different numbers of time observations for some of the variables or individuals, this can be described as an ‘unbalanced panel’ (Dougherty, 2011; Asteriou & Hall, 2011). On the other hand, despite all of the mentioned advantages in the above analysis, panel data models may have confounded with different issues. One of the serious issues may face panel data analysis as determined in prior empirical investigations is called multicollinearity.
In practice, multicollinearity refers to the high degree of correlation between the model’s explanatory variables. According to Wooldridge, (2010) there are two main types of multicollinearity; perfect and near multicollinearity. While perfect-multicollinearity occurs when there is a certain correlation between explanatory variables, near-multicollinearity happens when there is a small statistical, albeit economically significant correlation between exploratory variables. In fact, the latent type of multicollinearity is likely to occur in real life practice. Multicollinearity issue can be serious sometimes and may reduce the research’s results validity and reliability (Hsiao, 2007). High degree of multicollinearity can lead to large standard error and thus imprecise estimates of coefficients.
In consequence, prior empirical literature suggests few methods to assess the degree of multicollinearity among the model’s exploratory variables. One of the most widely-used techniques is called Variance Inflation Factor (VIF). VIF illustrates the degree for every independent variable that has been explained by other independent variable to eliminate collinear variables. In other word, the change in one variable will change the coefficient. According to Hair et al., (2010)if VIF is more than 10 this indicates that the model is confounded with serious multicollinearity issue. Others were very simple in determining multicollinearity in empirical work. For example; Gujarati and Porter, (2009) suggest that if the correlation between the independent variables is less than 0.80, then there is no need to consider any serious multicollinearity issue. The variance inflation factor (VIF) is defined as:
𝑉𝐼𝐹 (𝛽j) = 1/ (1 – 𝑅2
99
Where,𝑅2j is the coefficient of determination from a regression of the explanatory variable, 𝑋j, on a constant and the rest of the explanatory variables. The VIF represents the ratio of the actual variance of the estimated coefficient, 𝛽j, to what it would have been in the absence of multicollinearity, where 𝑅2j is equal to zero. Hence the higher is the VIF value, the higher the degree of multicollinearity.
Despite the fact that multicollinearity can lead to unreliable and unstable estimates of regression coefficients, there are several situations in which multicollinearity issue can be safely ignored (Allison, 2012). Further, Brooks, (2014) stated that multicollinearity can be ignored if the model is otherwise adequate and robust, whereas the presence of multicollinearity does not affect the best linear unbiased estimator properties of the utilized regression. Additionally, Brooks, (2014) argue that in order to control multicollinearity issue in research, a one can simply remove highly correlated variables from the model.