3. The research methodology
3.3. The link between management practices and financial performance
3.3.3. Assessing the size of effect of management practices on financial performance
In a second step, a regression model was set up in order to assess the magnitude of the impact of these practices on financial performance, following the classical equation structure:
π = π½0+ β π½ππ πΎ
π=1
πππ+ ππ
with π being the independent variable, π½0 the intercept with the Y-axis, πππ the independent variables, π½ππ the parameters and π the error terms.
The purpose of a regression analysis is to estimate relationships between variables. It is often applied to predict or forecast outcomes and/or effects of events. The econometric model used to link management practices and financial performance can be specified with the following equation:
πΉππΌπ = π½0+ β π½ππ πΎ
π=1
πππ + ππ
Where πΉππΌπ is the financial performance indicator for farm π and πΉππΌπ β {π πππ, π΄πππ , π ππ΄π,π ππΈπ}; πππ are the explanatory variables such as formal business planning, cash
60
flow planning, benchmarking, education, accessing advice, size, tenure, age and having a working spouse9; π½ππ are the coefficients to be estimated; and πΎ is the number of coefficients.
There are several econometric models that can be used for this study (Carter-Hill et al., 2011). When the dependent variable is limited (binary, categorical or ordered), discrete choice models need to be used such as:
ο· The probit and logit model, which is used when the dependent variable takes only two values (0 or 1) β this is the case when the farms would for example be classified in successful/unsuccessful categories, and the regression would be run on this categorisation (e.g. top 50% and bottom 50% of the sample);
ο· The multivariate probit model, which estimates the joint relationship between several dependent variables and independent variables; and
ο· Ordered probit/logit model, which is used when the independent variable has more than 2 options - which would be the case if the ratios are classified into categories (for example 20% percentiles or 5 categories).
When the dependent variable is continuous, there are, among others, the following options:
ο· Simple linear regression, when there is only one independent variable;
ο· Multiple linear regression, when there are at least 2 independent variables; and
ο· Tobit (Heckit) model, when the dependent variable is truncated.
The section below describes the model used in this research, namely a multiple linear regression model, as the dependent variable is continuous and not truncated and there are several independent variables. The discrete models were tested, and rejected as too much of the detail was lost with the conversion of financial performance to successful and unsuccessful farms, resulting in models with few significant variables. For RoS, RoA and RoE, a multiple linear regression model was estimated, as the ratios are continuous and there are several independent variables. For ATO, the lower limit of the data is 0, as sales are generally not negative. This data is, however, not truncated or censored, so a multiple linear regression model is also appropriate for use.
61 3.3.4. The multiple linear regression model
There are six assumptions underpinning the multiple linear regression model (Carter-Hill et al., 2011):
1. There is a linear relationship between the dependent variable and independent variables10:
π¦ = π½0+ β π½ππ₯ππ+ ππ πΎ
ππ
2. The expected value of the error term is 0 (exogeneity):
πΈ(π) = 0 β πΈ(π¦) = π½0+ β π½ππ₯ππ πΎ
ππ
3. The variance of the error term is constant (homogeneity of variance or homoscedasticity):
π£ππ(π) = π2πΌ
π = π£ππ (π¦)
with πΌπ a n x n identity matrix.
4. The covariance between errors is zero β the errors of one observation are not correlated with the errors of other observations (independence or not correlated):
πππ£( ππ, ππ) = πππ£ (π¦π, π¦π) = 0
5. The variables π₯π‘πΎ are not random, and are not exact linear functions of the other explanatory or independent variables (no linear dependence or collinearity)
6. The values of π are normally distributed about their mean: π ~ π( 0, π2)
10 If a non-linear relationship exists, transformation of the dependent and/or independent variables is sometimes
feasible in order to achieve a linear relationship. These include a log-log model, a log-linear model and a linear- log model, where either the dependent and/or independent variables are transformed.
62
If the assumptions stated above hold, the estimation of the parameters is based on the ordinary least squares (OLS) principle that minimises the sum of squared residuals, resulting in the best linear unbiased estimators (BLUE) of the parameters π½, calculated with the following formula:
if πππΈ = βππ=1πΜπ2 and πππΈβ = βππ=1πΜπβ2 then πππΈ < πππΈβ
π½Μ = (πβ²π)β1πβ²π
with πππΈ being the sum of squares due to the error, and π½Μ the best linear unbiased according to the Gauss-Markov theorem (Carter-Hill et al., 2011).
When the variance of the errors is not constant (there is heteroskedasticity) or when the errors are correlated, resulting in a violation of the assumptions mentioned above, parameters can be estimated through the Generalised Least Squares (GLS) method (Cameron & Trivedi, 2010):
π(π|π) = π2β¦
With β¦ being a symmetric, positive definite n x n matrix. In this case, the Gauss-Markov theorem does not hold anymore, and the estimated π½ are inefficiently estimated, even though they are unbiased.
The GLS method will overcome this issue, and will result in efficient estimates, with smaller standard errors and larger t-statistics:
π½Μ = (ππππ ββ²πβ)β1πββ²πβ = (πβ²β¦β1π)πβ²β¦β1π
When β¦ is equal to πΌπ, then π½Μπππ will be equal to π½Μ.
Special cases of GLS are weighted least squares (WLS) or feasible generalised least squares (FGLS). When there is heteroskedasticity present in the model, but no correlation between the errors, WLS is used; while FGLS when there is heteroskedasticity and serial correlation (Greene, 2003).
63
Goodness of fit of the model is tested with R-squared (π 2), which indicates how much of the variation in the sample is explained by the regression model:
π 2 = β(π¦Μ β π¦Μ )π 2
β(π¦π β π¦Μ )2
= 1 βπππΈ
πππ
with πππ the total sum of squares for π¦ and πππΈ the sum of squares due to the residuals. When none of the variance in the dataset is explained by the model, the R-squared is 0; while an R- squared of 1 means that the model is a perfect fit (all variance in the sample is explained by the model). The rule, therefore, is the higher the R-squared, the better.
Measuring goodness of fit can also be assessed through Aikakeβs Information Criterion (AIC) and the Bayesian Information Criterion (BIC), when the R-squared is not available, and as there are issues with the interpretation of Pseudo R-squared calculations (Williams, 2015). They allow for a comparison between different model estimates instead of generating the absolute deviation of observed data in a model.
They are calculated as follows:
π΄πΌπΆ = π·πΈππ + 2π
and
π΅πΌπΆ = π·πΈππ + ln(π) β π
with P being the number of estimated parameters, N the number of observations, and π·πΈππ is equal to β2 β the log-likelihood of the model (πΏπΏπ). For both criteria the rule is the smaller the result, the better the fit.