4. Research design
4.2. Multiple regression techniques
In general, there are different techniques for conducting multiple regression analysis. Very
common in business and management studies is ordinary least squares (OLS) regression.
OLS is a linear modelling technique, where the relationship between the dependent variable Y and the independent variable(s) X is presented by means of a line of best fit, where X predicts Y at least to some extent (Moutinho & Hutcheson, 2011). The line of best fit provides “the smallest squared difference on average between the predicted values of Y and the actual values of Y” (Scherbaum & Schockley, 2015). Comparing the actual and predicted values of the dependent variable allows for determining the model fit. Moreover, the deviations, or residuals, of the values indicate how well the model predicts each observation (Moutinho & Hutcheson, 2011). However, some major limitations of this technique are that it is sensitive to outliers and does not resolve the issue of reverse causation. Some of the studies examining the determinants of corporate social disclosure with OLS regression are those of Patten (1991), Cormier et al. (2005), Elfeky (2017), Hackston & Milne (1996), Gamerschlag et al. (2011), Sukcharoensin (2012), Elijido-Ten (2004), Naser et al. (2006), Roberts (1992) etc. In order to explore the determinants of corporate environmental disclosure in Germany, Cormier et al. (2005) conduct country-specific OLS regressions for two separate periods, where dependent variable is a standardized environmental disclosure score and independent variables are proxies for economic, public pressures and control variables. In another research focused on CSR disclosure in German companies, Gamerschlag et al. (2011) apply OLS regressions to examine the determinants of three CSR disclosure levels: total CSR disclosure, environmental disclosure and social disclosure. Naser et al. (2006) and Sukcharoensin (2012) perform OLS regressions in models testing the relationship between a CSR disclosure index and the explanatory variables in developing countries context.
Logistic regression is another type of regression applied in studies examining the determinants of CSR disclosure. Logistic regression has a categorical (non-metric) dependent variable with values assigned to separate groups, as it aims to predict the probability of an observation belonging to one of these groups (Hair et al., 2010). This regression type is relatively simple but offers a relatively low prediction accuracy (Park et al., 2017). Roberts (1992) conducts a logistic regression by using measures of stakeholder power, economic performance and strategic posture towards social responsibility in order to estimate the
60
variations in CSR disclosure levels with values of 0 standing for “poor”, 1 for “good” and 3 for “excellent. Wuttichindanon (2017) performs logistic regression to test the relationship between firm-specific factors and report choice companies make when disclosing social information as a dichotomous dependent variable.
Probit regression, similarly to logistic regression, examines categorical dependent variables. Although both models seem to be quite alike, there are some major differences. For example, probit regression is based on standard normal distribution of error terms, while logistic regression assumes a logistic function (Hoffmann, 2016; Liu, 2015). In addition, odds ratios interpretations are only possible for logistic regression (Smithson & Merkle, 2013). According to Vogt (2011), some drawbacks of logit and probit regression models are the unobserved heterogeneity in the categories of predictor variables and the residual variation. In addition, there is yet no suitable equivalent of R-squared as in OLS. Chih et al. (2009) examine by means of probit regression whether companies’ engagement in CSR is influenced by financial and institutional factors, including measures of financial performance, competition, legal environment and economic environment, among others. The authors classify the studied companies into two groups depending on whether they are listed in the Dow Jones Sustainability World Index (CSR Group) or not (non-CSR Group). Accordingly, the dependent variable in their model takes the value of 1 if the firm belongs to the CSR Group and 0 if the firm belongs to the non-CSR Group. Gamerschlag et al. (2011) apply probit regression to test the probability that a company provides a separate CSR report, assigned the value of 1, or not (with value of 0) contingent upon measure of company visibility, profitability, size, shareholder structure and relationship to US stakeholders.
Other regression methods applied in research on social disclosure determinants include
quantile regression (e.g. Ortas et al., 2015), pooled regression analysis (e.g. Cormier et al.,
2005), panel data regression (Mahoney, 2012; Inchausti, 1997), random effect panel
regression (Welbeck et al., 2017) and stepwise regression (Inchausti, 1997). Quantile regression refers to a method, which models conditional quantiles as functions of predictor variables without making distributional assumptions (Hao & Naiman, 2007; Olsen et al., 2012). It is considered as a more flexible alternative to the OLS model (Le Cook & Manning, 2013), when the assumptions of the latter are not met. As stated by Koenker & Gilbert (1978), who first introduced the model, the regression median is viewed to be more efficient than the least squares estimator in distributions for which the median is more efficient than the mean. However, quantile regression requires sufficient data and is computationally more intensive (Rodriguez & Yao, 2017).
Pooled regression, as applied by Cormier et al. (2005), combines time-series and cross- sectional data for the modeling. While times-series data refers to observations made at multiple points in time, cross-sectional data is collected at one point in time and made up of observations from individuals, groups, companies or other units (Pal & Prakash, 2017). Panel data regression is very similar to the pooled time-series cross-sectional regression since it also
61
observes units over time. However, panel data consists of observations on the same units, while pooled regression may use different samples from the same group, population etc. As already noted, Welbeck et al. (2017) use a random effect panel regression because, in contrast to a fixed effects model, it allows the inclusion of time-invariant variables as explanatory variables. In addition, this approach assumes that unobserved units’ effects or variation across units are random and uncorrelated with the predictor variables included in the model (Torres- Reyna, 2007). The application of a random effect panel regression allows for making inferences beyond the particular sample used in the model.
Another method applied in the CSR disclosure literature is stepwise regression. Inchausti (1997) uses a stepwise regression with a forward selection procedure to explore which of the independent variables can better explain the dependent variable. For this purpose, the researcher takes the independent variable with the highest partial correlation coefficient and tests, whether it should be entered into the model based on a pre-specified criterion, such as F- tests or t-tests. The same procedure is conducted with the variable with the second highest partial correlation, and so forth. The test stops as soon as the t-tests or F-tests show that the last entered variable is insignificant (Wang & Lain, 2003). The stepwise regression is a suitable approach when there is a large number of potential independent variables. However, one disadvantage is that it presupposes a single “best” subset of independent variables (Krishnaswamy et al., 2006).