• No results found

CHAPTER 3: RESEARCH METHODOLOGY

3.4. Quantitative Techniques of Firm Failure

3.4.3 Cross-Section Logistic Regression Models

As discussed in sections 2.4.1 and 3.4.1, the logistic regression has been a popular analytical tool with its binary form in firm failure studies. Although one of the main analytical tools for this study is the panel data ordered regression at firm-year observations, the key properties of the simple logistic regression and of its ordered logistic regression extension are presented as this methodology can be used as a baseline benchmark on firm-level results.

104 | P a g e

The logistic regression model with the traditional binary dependent variable specification has been used extensively in applications where the outcome from the dependent variable lies between 0 and 1. As it was discussed in Section 3.3, this binary classification does not mean that the numbers itself have a particular interpretation; it is the attributes attached to them that matter. Logistic Regression (logit) models have been particularly popular in the literature. One can broadly argue that logit effectively replaced multiple discriminant analysis as the benchmarking technique in the failure prediction literature while it has also been particularly popular in non-prediction studies.

The logistic regression model (logit) is able to estimate probabilities that are between the 0 and 1 threshold, which is a traditional weakness of the ordinary least squares estimated linear regression model. In the firm failure literature 0 is usually used to denote the non-failed firms and 1 to denote the failed firms. Some of the key advantages of the logistic regression model is that it is relatively easy to implement, it does not make any assumptions on the multivariate normality and the equality of variance, and covariance between the groups that are analyzed while its statistical tests are easy to be implemented (Hair et al., 2006). The logistic function effectively transforms the traditional regression model, which limits its outcomes to be bounded within the (0,1) interval (Brooks, 2008).

The form of the logistic model would be 𝑃𝑖 =

1

1 + π‘’βˆ’(𝛽1+𝛽2πœ’2𝑖+β‹―+π›½π‘˜πœ’π‘˜π‘–+𝑒𝑖)

Where the Pi is the probability, and e is the exponential number. Using the firm

failure studies as an example, P will be the probability that a firm fails and the πœ’s

in the parenthesis of the denominator represent potential determinants of the firm failure and u is the error term.

The logit model is, therefore, not a linear model, and as a result the maximum likelihood estimator is usually used for its interpretation. The principle for the maximum likelihood estimator is that the parameters are chosen to jointly maximise the log-likelihood function (Brooks, 2008), which means that the logistic regression aims to maximise the likelihood that an event occurs (Hair et al., 2006). In order to assess the goodness of fit of a logit model one can examine the predictive accuracy of the model (Hair et al., 2006). The likelihood value is the

105 | P a g e

best measure of how well the model fits the data but pseudo 𝑅2 measures can also be used. For the likelihood value, a perfect model will, in theory, be the one with the lower -2 log likelihood where the minimum value of -2LL is 0 which corresponds to the perfect fit (Hair et al., 2006). In order to assess the accuracy of the model one can use the classification matrix which measures how well the allocation to a specific group has been done by the model (for example allocation between failed and non-failed group of companies).

The principles that apply to the binary logistic regression models can be generalised to apply to the ordered logistic regression models. In the corporate failure literature Johnsen and Melicher (1994) and Tsai (2013) are examples of the few studies that have used an ordered response logit model. These studies suggested that between the non-failed firms and the liquidated firms, there is an important interim stage of firms in financial distress that is not captured in a binary specification and therefore significant information around the determinants of the three stages of the firms is lost.

A key advantage of the ordered logistic regression is that it can accommodate more than two outcomes and therefore it can enable the researcher to explain cases that fall between the two outcomes of the binary classification. It is important to note that in the ordered logistic regression there is no assumption around the spacing between the responses (Harrell, 2015). This means that in cases of the potential outcome takes the value of 0, 1 and 2, there is no assumption that the distance between 0 and 1 is the same as is between 1 and 2. Despite this notion, whereas in the binary model we can observe that 𝑦𝑖 = 1 when π‘¦π‘–βˆ—> 0, the ordered response logit model needs to generalise this concept to introduce multiple thresholds for the alternative states of the dependent variable (Baum, 2006). For example, when there are three potential outcomes in the dependent variable we will have two thresholds over the variable. The generalisation of this notion is: π‘ƒπ‘Ÿ(𝑦𝑖 = 𝑖) = π‘ƒπ‘Ÿ(πΎπœ„βˆ’1< π‘₯𝑖𝛽 + 𝑒𝑗< 𝐾𝑖).

This means that the probability of an individual 𝑗 takes outcome 𝑖 depends on the π‘₯𝑖𝛽 falling between the cutpoints (𝑖 βˆ’ 1) and 𝑖. This represents the generalisation with regards to the dependent variable from the binary model when it has one threshold at zero (Baum, 2006).

106 | P a g e