4.5 At the Method and Technique Levels
4.5.7 Data Analysis Procedures and Measurements
The data analysis procedures consist of organising, categorising, tabulating, and examining raw data and transforming them into “a body of facts that are in a format suitable for decision making” and hypotheses testing (Emory & Cooper 1991; Zikmund 2000, p.416; Davis 2005; Burns & Bush 2006). A survey questionnaire forms the empirical basis for this research study where; a simple approach with simple statistical measures by using descriptive information of different innovation practices and an advanced approach with latent variables by using derived importance based on a structural equation modelling technique linking innovation practices and their antecedents and their impact on business growth performance and assessing a model fit with multiple relationships are discussed (Burns & Bush 2006; Manning & Munro 2006; Martensen et al. 2007).
4.5.7.1 Data Processing Procedures and Statistical Programs
Data conducted through survey-based business research is suitable for computer analysis because of the large amount of raw data gathered during this research study that requires editing, sorting, coding, error check, and mathematical calculation (Ticehurst & Veal 2000; Stevens 2002; Cooper & Schindler 2003; Zikmund 2003; Davis 2005; Neuman 2006). Raw data is subject to editing and coding to check and verify errors before the statistical analysis is conducted (Cooper & Schindler 2003; Zikmund 2003). The data editing process checks and adjusts data for omissions, reliability, and consistency before coding and later transferring to data storage processes (Emory & Cooper 1991; Ticehurst & Veal 2000; Sekaran 2003; Zikmund 2003; Malhotra 2004). Upon receiving, the investigator checks the survey questionnaire completeness and eligibility of respondents. Then, the data coding process identifies and classifies each response with numerical scores and symbols (Ticehurst & Veal 2000; Zikmund 2003). After that, cleaning and screening data requires data to be coded, consistent, and checked for missing values (Tabachnick & Fidell 2001; Hair et al. 2006; Manning & Munro 2006). Lastly, data is entered into the computer using a number of statistical software programs (i.e. SPSS 20.0 and Smart-PLS 2.0 M3) to obtain descriptive and inferential statistical analyses, to summarise information and data, and to examine the research questions and
hypothesised conceptual model (Barclay, Higgins & Thompson 1995; Tabachnick & Fidell 2001; Ringle, Wende & Will 2005; Manning & Munro 2006).
4.5.7.2 Descriptive Statistics
The description and summary of information and raw data about basic patterns in the population and sample in allowing its understanding and interpretation can be done through simple statistics: the descriptive statistics (Gay & Diehl 1992; Davis 2005; Burns & Bush 2006; Manning & Munro 2006; Neuman 2006). These measures give indications of frequency distribution, central tendency, dispersion, and involve mean, medium, mode, standard deviation, and error (Render & Stair 1994; Petrie & Sabin 2000; Tabachnick & Fidell 2001). Further, cross-tabulation is mostly used to arrange data in a table format by counting the frequency of responses and classifying the data against other data sets (Cooper & Schindler 2003). Descriptive statistics are useful to gain a better understanding of data but are not appropriate to provide useful information on research situations and multiple relationships between many latent variables (Sekaran 2003).
4.5.7.3 Inferential Statistics
The interference and judgment of information about the level of confidence in the population on the basis of a sample can be conducted by the inferential statistics (Gay & Diehl 1992; Sekaran 2003; Neuman 2006). It is useful to test hypotheses and conceptual models about the relationships in the population on the basis of measurements made on samples (Ticehurst & Veal 2000; Sekaran 2003). They test the relationships between questions/items/indicators (i.e. manifest variables) and corresponding constructs (i.e. latent variables) and the relationships between constructs (i.e. latent variables). Further, the selected statistical test types are to be based on the format of data, level of measurements, and number of constructs and variables (Sekaran 2003), enabling the investigator to evaluate the reliability and validity of the measured questions and items (measurement model) and to assess the relationships between the constructs and latent variables (structural model). The simple and advanced approaches of inferential data analyses to evaluate the research questions and hypotheses and conceptual model are classified into two techniques that are: first-generation and second-generation (Haenlein & Kaplan 2004).
4.5.7.3.1 First-Generation Technique
The first-generation statistical technique is described as a regression-based approach (i.e. Multiple Regression Analysis, Multiple Discriminant Analysis, Logistic Regression Analysis, Conjoint
Analysis, Canonical Correlation Analysis, and Analysis of Variance) and a factor-based approach (i.e. Exploratory Factor Analysis, Confirmatory Factor Analysis, and Cluster Analysis), which is the central part of the statistical instruments that are used to either identify or confirm theoretical hypothesis based of the empirical data analysis (Haenlein & Kaplan 2004). The first-generation statistical tests for the data analysis process are based on the following techniques: testing outlier assumptions and normality distributions, evaluating reliability (Cronbach’s alpha) and validity (homogeneity and internal consistency), conducting factor analysis, looking for cause-and-effect relationships between latent variables, and testing hypothesised conceptual model relationships.
The first-generation techniques have some limitations according to Haenlein & Kaplan (2004) that are the assumption of a simple model structure consisting of one independent and one dependent latent variable (i.e. that is limited for the analysis of a more complex model with both mediating and intervening latent variable), the assumption of all latent variables to be observable, and the assumption of all latent variables to be measured without error. However, to overcome these limitations of the first-generation techniques, a Structural Equation Modelling technique as a second-generation technique is used as an alternative to analyse more than one layer of linkage between the independent and dependent latent variables. The second-generation potent technique is “an alternative to other multivariate techniques [Regression Analysis], which are limited to representing only a single relationship between the independent and dependent variables” (Cooper & Schindler 2003, p.623). For example, this research study requires the assessment of the effect of independent latent variables (i.e. external and internal factors) on an intervening latent variable (i.e. innovation practices) at the same time as assessing the effect of innovation practices on a dependent latent variable (i.e. business growth performance).
4.5.7.3.2 Second-Generation Technique
The second-generation statistical technique can be described as a component-based approach to the Structural Equation Modelling (SEM), known as Partial Least Squares (PLS) path modelling, which is also a variance-based approach to test a priori theoretical and measurement assumptions against empirical data analysis (Wold 1985; Haenlein & Kaplan 2004; Vinzi et al. 2010). The PLS modelling technique “focuses on maximizing the variance of the dependent variables explained by the independent ones instead of reproducing the empirical covariance matrix” (Haenlein & Kaplan 2004, p.209) that is a substitute estimation technique to the traditional SEM (Fornell & Cha 1994; Chin 1998a; Hair et al. 2006). PLS studies variables simultaneously and not partially, that is the
problem becomes more structured and easier to understand and the improved behaviours depend on the end-goal (Haenlein & Kaplan 2004; Martensen et al. 2007). It has been used by a growing number of researchers from various disciplines including strategic management (Hulland 1999), innovation management (Martensen et al. 2007; Vieites & Calvo 2011), organisational behaviour (Higgins, Duxbury & Irving 1992), marketing (Reinartz, Krafft & Hoyer, 2004; Tenenhaus et al. 2005), and consumer behaviour (Fornell & Robinson 1983).
Prior to the testing of the conceptual model, the latent variables and the items need to be specified. The relationship between a latent variable and its item can be modelled as either reflective (effect indicator) or formative (cause or induced indicator) indicators (Vinzi et al. 2010). For example, if the latent variable is considered as “giving rise to something observed” such as an individual trait and attitude, reflective indicators should be used, whereas formative indicators are employed if the latent variable is considered as “being explanatory combinations of items” such as individual health and life stress (Haenlein & Kaplan 2004, p.289). This research study has a straight forward conceptual model with no latent variable being a cause and an effect of another latent variable that is described as a recursive model (Schumacker & Lomax 1996; Byrne 2001; Hair et al. 2006). All items are modelled as reflective indicators because they are viewed as effects (not causes) of latent variables (Bollen & Lennox 1991) that depend on their latent variables (Haenlein & Kaplan 2004; Diamantopoulos & Siguaw 2006; Vinzi et al. 2010). The technique has two-step methods to be undertaken to simultaneously test a measurement (outer) model and a structural (inner) model (Haenlein & Kaplan 2004; Vinzi et al. 2010). Thus, the second-generation statistical (PLS) test for the data analysis process are based on the following techniques: reliability (individual, Cronbach’s alpha, and composite), average variance extracted, validity (convergent and discriminate), Squared Multiple Correlations, Goodness-of-Fit, Stone-Geisser Test, and path coefficients, testing for cause-and-effect relationships among latent variables, and testing for hypothesised relationships in a conceptual innovation-based model.
The Partial Least Squares path modelling technique was selected in this research study mainly due to its ability to deal with normality violations (i.e. multivariate normality); it does not require the hard assumption of the distributional properties of raw data, among other rationales that include: PLS ensures against improper solutions by the removal of factor indeterminacy; PLS is robust in dealing with data noise and missing data; PLS applies many parameters in a complex model with normal residual distributions; PLS handles collinearity in the independent latent variables; PLS
has more statistical power than a maximum-likelihood covariance-based SEM method and is a prediction-oriented technique in maximising the variance explained in the latent variables; PLS allows simultaneous modelling of the relations among latent variables; PLS combines regression and factor analysis within the measurement model in each run; PLS is more advantageous in case of new and refined measures; and PLS does not necessitate a large sample size (for example, 200 or fewer cases), (Fornell & Bookstein 1982; Falk & Miller 1992; Johansson & Yip 1994; Barclay, Higgins & Thompson 1995; Cassel, Hackl & Westlund 1999; Chin 2002; Gustafsson & Johnson 2004; Haenlein & Kaplan 2004; Henseler, Ringle & Sinkovics 2009; Ronkko & Evermann 2013). The PLS technique is suitable for research studies when the phenomenon under study is new or changing, the model is relatively complex (i.e. large number of items and variables), and data does not satisfy the normality assumptions and large sample size (Chin & Newsted 1999).