Chapter 3 ā Data, Measurement and Methods
3.5 Statistical Methods of Analysis in Chapter 5
3.5.2 Discriminant Analysis
Determining the differences between groups in data has been increasingly recognized as an important technique. Different selection parameters are employed to find patterns in the data on the basis of covariates. Researchers inspired by the seminal work of Fisher (1936) on discriminant analysis, initiated the work in this direction by starting from two group classification models. Rao (1948) extended the two-group classification approach of Fisher to multiple group classification. Others refined the idea of Fisher in the 1940s and introduced important extensions of this concept (Huberty and Olejnik, 2006; Kendall, 1957; Mclachlan, 2004; Tatsuoka, 1969; Tatsuoka and Tiedeman, 1954; Webb and Copsey, 2011; William and Lohnes, 1962).
The early application of discriminant analysis was limited to the field of medicine and biology, however, later methodological developments made it suitable for use in business, education and psychology (Rencher, 2002). Discriminant analysis is a multivariate inferential statistical technique that has been traditionally used for classification of observations from unknown groups to a set of groups decided in advance (Klecka, 1980). It has been used in studies to find whether the pre-decided groups in the data are statistically significantly different from
142 each other or not. This technique organizes data in the best way to minimize within group differences and maximize between group variations.
The three forms of discriminant analysis technique include linear discriminant analysis, canonical discriminant analysis (CDA) and quadratic discriminant analysis. CDA is the most general approach. CDA uses different combinations of the covariates to find the minimum variation within group and the maximum variation between the groups (Friedman, 1989). On the other hand, linear discriminant analysis uses distance between centroids of the groups, in place of using within group and between group variations. Quadratic discriminant analysis is the most complex technique. It uses quadratic methods to find groups with minimum misclassification (Han et al., 2012).
CDA is used in this chapter as it is the most relevant to the objective of finding whether the five groups of firms on the basis of their response on components of the entrepreneurial ecosystem in Pakistan are significantly different from each other. Minimum within group differentiation and the maximum between groups differentiation will ensure that the components of the entrepreneurial ecosystem with similar effects are grouped in one entrepreneurial ecosystem. It is important to see whether the identified entrepreneurial ecosystems are significantly different from each other or not. Therefore, CDA will also ensure the identification of only statistically significantly different entrepreneurial ecosystems in Pakistan.
The five groups identified by the cluster analysis were used to find the entrepreneurial ecosystems existing within Pakistan. These five groups are then used in the CDA to create a scatter matrix within and between the groups by reducing the mean difference within the groups. These between group matrices (Sb) and within group matrices (Sw) are then used for generating eigen values as follows:
143
ššā1ššµš = šš 3.17
In equation 3.17 the multiplication of the inverse of the within group scatter matrix, Sw-1 by the between group scatter matrix, SB ensures that firms within one group are similar to each and dissimilar to the firms in other groups. Here W is an eigenvector used for the weighted combination of a within and between groups scatter matrix, and j indicates that variation is maximized for between group differences and minimized for within group differences.
The outcome of the CDA produces four discriminant functions on the basis of combinations of the components of the entrepreneurial ecosystem. These discriminant functions are statistically significantly different from each other. The discriminant score for each discriminant function can be calculated using the following equation:
š·š„š = ±š1ššššĀ± š2ššššš ± š3šššššš šĀ± š4ššššššĀ± š5ššššš ± š6ššššš ± š7šš¤ššĀ±
š8šš”šš„š Ā± š9šššššš 3.18
Where š·š„š is the discriminant score of each firm and the d1i, d2iā¦d9i are the discriminant coefficients (also called factor loadings) of the covariates in each discriminant function.
The definitions and measurements of access to finance (af), government regulations (reg), infrastructure (infras), corruption (corr), political instability (pol), practice of informal sector (inf), the non-availaility of an educated workforce (wk), tax rate and administration (tax), and electricity (elec) are explained in Table 3.1.
Finally, the factor loadings will be used to determine which component contributes significantly to which discriminant function. According to Comrey and Lee (1992), McLachlan (2004) and Tabachnick and Fidell (2007) factor loading of more than 0.4 indicates the statistically significant contribution of a factor to its functions. Therefore, this criterion will be used to determine the entrepreneurial ecosystem components contributing significantly to a
144 discriminant function. Since these discriminant functions are composed of different combinations of institutional and physical conditions, these can be called entrepreneurial ecosystems.
The factor loadings of components of entrepreneurial ecosystems are then used as weights to calculate a discriminant score for each firm. These discriminant scores are the sum of the products of factor loadings with the observational values. This interactive and interdependent index, based on components of institutional framework conditions and physical conditions, shows the entrepreneurial ecosystem of Pakistan. We then apply regression technique, to estimate the effect of the entrepreneurial ecosystem (based on index value for each firm) on the performance of SMEs.