Discriminant Analysis - Statistical Methods of Analysis in Chapter 5

Chapter 3 – Data, Measurement and Methods

3.5 Statistical Methods of Analysis in Chapter 5

3.5.2 Discriminant Analysis

Determining the differences between groups in data has been increasingly recognized as an important technique. Different selection parameters are employed to find patterns in the data on the basis of covariates. Researchers inspired by the seminal work of Fisher (1936) on discriminant analysis, initiated the work in this direction by starting from two group classification models. Rao (1948) extended the two-group classification approach of Fisher to multiple group classification. Others refined the idea of Fisher in the 1940s and introduced important extensions of this concept (Huberty and Olejnik, 2006; Kendall, 1957; Mclachlan, 2004; Tatsuoka, 1969; Tatsuoka and Tiedeman, 1954; Webb and Copsey, 2011; William and Lohnes, 1962).

The early application of discriminant analysis was limited to the field of medicine and biology, however, later methodological developments made it suitable for use in business, education and psychology (Rencher, 2002). Discriminant analysis is a multivariate inferential statistical technique that has been traditionally used for classification of observations from unknown groups to a set of groups decided in advance (Klecka, 1980). It has been used in studies to find whether the pre-decided groups in the data are statistically significantly different from

142 each other or not. This technique organizes data in the best way to minimize within group differences and maximize between group variations.

The three forms of discriminant analysis technique include linear discriminant analysis, canonical discriminant analysis (CDA) and quadratic discriminant analysis. CDA is the most general approach. CDA uses different combinations of the covariates to find the minimum variation within group and the maximum variation between the groups (Friedman, 1989). On the other hand, linear discriminant analysis uses distance between centroids of the groups, in place of using within group and between group variations. Quadratic discriminant analysis is the most complex technique. It uses quadratic methods to find groups with minimum misclassification (Han et al., 2012).

CDA is used in this chapter as it is the most relevant to the objective of finding whether the five groups of firms on the basis of their response on components of the entrepreneurial ecosystem in Pakistan are significantly different from each other. Minimum within group differentiation and the maximum between groups differentiation will ensure that the components of the entrepreneurial ecosystem with similar effects are grouped in one entrepreneurial ecosystem. It is important to see whether the identified entrepreneurial ecosystems are significantly different from each other or not. Therefore, CDA will also ensure the identification of only statistically significantly different entrepreneurial ecosystems in Pakistan.

The five groups identified by the cluster analysis were used to find the entrepreneurial ecosystems existing within Pakistan. These five groups are then used in the CDA to create a scatter matrix within and between the groups by reducing the mean difference within the groups. These between group matrices (Sb) and within group matrices (Sw) are then used for generating eigen values as follows:

143

𝑆_𝑊−1𝑆_𝐵𝒘 = 𝑗𝒘 3.17

In equation 3.17 the multiplication of the inverse of the within group scatter matrix, Sw-1 by the between group scatter matrix, SB ensures that firms within one group are similar to each and dissimilar to the firms in other groups. Here W is an eigenvector used for the weighted combination of a within and between groups scatter matrix, and j indicates that variation is maximized for between group differences and minimized for within group differences.

The outcome of the CDA produces four discriminant functions on the basis of combinations of the components of the entrepreneurial ecosystem. These discriminant functions are statistically significantly different from each other. The discriminant score for each discriminant function can be calculated using the following equation:

𝐷_𝑥𝑖 = ±𝑑_1𝑖𝑎𝑓_𝑖± 𝑑_2𝑖𝑟𝑒𝑔_𝑖 ± 𝑑_3𝑖𝑖𝑓𝑟𝑎𝑠_𝑖± 𝑑_4𝑖𝑐𝑜𝑟𝑟_𝑖± 𝑑_5𝑖𝑝𝑜𝑙_𝑖 ± 𝑑_6𝑖𝑖𝑛𝑓_𝑖 ± 𝑑_7𝑖𝑤𝑘_𝑖±

𝑑_8𝑖𝑡𝑎𝑥_𝑖 ± 𝑑_9𝑖𝑒𝑙𝑒𝑐_𝑖 3.18

Where 𝐷_𝑥𝑖 is the discriminant score of each firm and the d1i, d2i…d9i are the discriminant coefficients (also called factor loadings) of the covariates in each discriminant function.

The definitions and measurements of access to finance (af), government regulations (reg), infrastructure (infras), corruption (corr), political instability (pol), practice of informal sector (inf), the non-availaility of an educated workforce (wk), tax rate and administration (tax), and electricity (elec) are explained in Table 3.1.

Finally, the factor loadings will be used to determine which component contributes significantly to which discriminant function. According to Comrey and Lee (1992), McLachlan (2004) and Tabachnick and Fidell (2007) factor loading of more than 0.4 indicates the statistically significant contribution of a factor to its functions. Therefore, this criterion will be used to determine the entrepreneurial ecosystem components contributing significantly to a

144 discriminant function. Since these discriminant functions are composed of different combinations of institutional and physical conditions, these can be called entrepreneurial ecosystems.

The factor loadings of components of entrepreneurial ecosystems are then used as weights to calculate a discriminant score for each firm. These discriminant scores are the sum of the products of factor loadings with the observational values. This interactive and interdependent index, based on components of institutional framework conditions and physical conditions, shows the entrepreneurial ecosystem of Pakistan. We then apply regression technique, to estimate the effect of the entrepreneurial ecosystem (based on index value for each firm) on the performance of SMEs.

In document The effect of entrepreneurial ecosystems on performance of SMEs in low middle income countries with a particular focus on Pakistan (Page 141-144)