• No results found

Before analyzing their data, researchers should also examine response patterns. In doing so, they are looking for a pattern often described as straight lining.

Straight lining

is when a respondent marks the same response for a high proportion of the questions. For example, if a 7-point scale is used to obtain answers and the response pattern is all 4s (the middle response), then that respondent in most cases should be removed from the data set. Similarly, if a respondent selects only ls or only 7s, then that respondent should in most cases be removed.

Inconsistency in answers may also need to be addressed before analyzing your data. Many surveys start with one or more screening questions. The purpose of a screening question is to ensure that only individuals who meet the prescribed criteria complete the survey. For

example, a survey of mobile phone users may screen for individuals who own an Apple iPhone. But a question later in the survey is posed and the individual indicates he or she is an Android user. This respon­

dent would therefore need to be removed from the data set. Surveys often ask the same question with slight variations, especially when reflective measures are used. If a respondent gives a very different answer to the same question asked in a slightly different way, this too raises a red flag and suggests the respondent was not reading the ques­

tions closely or simply was marking answers to complete and exit the survey as quickly as possible.

Outliers

An outlier is an extreme response to a particular question, or extreme responses to all questions. The first step in dealing with out­

liers is to identify them. Many statistical software packages have options to help identify outliers. For example, IBM SPSS Statistics has an option called Explore that develops box plots and stem-and-leaf plots that facilitate the identification of outliers by respondent num­

ber (Mooi & Sarstedt, 2011).

Once the respondents are identified, the researcher must decide what to do. For example, if there are only a few identified outliers, the approach most often followed is to simply remove them from the data set. On the other hand, as the number of outliers increases, at some point the researcher must decide if the outlier group represents a dis­

tinct and unique subgroup of the sample.

There are two approaches to use in deciding if a unique subgroup exists. First, a subgroup can be identified based on prior knowledge.

For example, if the research involves examining binge drinking among college students, the researcher should have knowledge (based on previous studies) of the proportion of students who are binge drinkers as well as how binge drinkers should be defined. If several respondents indicate their alcohol consumption is much higher than known pat­

terns, these individuals likely should be removed from the data set.

However, if the corresponding information is not available, researchers can revert to data-driven approaches to identify distinct subgroups. In the context of PLS-SEM, the finite mixture PLS (FIMIX-PLS) approach has gained prominence, which can be used to identify (latent) sub­

groups of respondents. We will discuss the FIMIX-PLS approach (Ringle, Wende, & Will, 2010; Sarstedt & Ringle, 2010) in the context of unobserved heterogeneity in greater detail in Chapter 8.

54 A Primer on Partial Least Squares

Data Distribution

PLS-SEM is a nonparametric statistical method. Different from maximum likelihood (ML)-based CB-SEM, it does not require the data to be normally distributed. Nevertheless, it is important to verify that the data are not too far from normal as extremely non-normal data prove problematic in the assessment of the parameters' signifi­

cances. Specifically, extremely non-normal data inflate standard errors obtained from bootstrapping (see Chapter 5 for more details) and thus decrease the likelihood some relationships will be assessed as significant (Hair, Ringle, & Sarstedt, 2011; Henseler et al., 2009).

The Kolmogorov-Smirnov test and Shapiro-Wilks test are designed to test normality by comparing the data to a normal distri­

bution with the same mean and standard deviation as in the sample (Mooi & Sarstedt, 2011 ). However, both tests only indicate whether the null hypothesis of normally distributed data should be rejected or not. As the bootstrapping procedure performs fairly robustly when data are non-normal, these tests provide only limited guidance when deciding whether the data are too far from being normally distrib­

uted. Instead, researchers should examine two measures of distributions-skewness and kurtosis.

Skewness assesses the extent to which a variable's distribution is symmetrical. If the distribution of responses for a variable stretches toward the right or left tail of the distribution, then the distribution is characterized as skewed. Kurtosis is a measure of whether the distribu­

tion is too peaked (a very narrow distribution with most of the responses in the center). When both skewness and kurtosis are close to zero (a situation that researchers are very unlikely to ever encounter), the pat­

tern of responses is considered a normal distribution. A general guide­

line for skewness is that if the number is greater than + 1 or lower than -1, this is an indication of a substantially skewed distribution. For kurtosis, the general guideline is that if the number is greater than+ 1, the distribution is too peaked. Likewise, a kurtosis of less than -1 indi­

cates a distribution that is too flat. Distributions exhibiting skewness and/or kurtosis that exceed these guidelines are considered non-normal.

Serious effort, considerable amounts of time, and a high level of caution are required when collecting and analyzing the data that you need for carrying out multivariate techniques. Always remem­

ber the garbage in, garbage out rule. All your analyses are meaning­

less if your data are inappropriate. Exhibit 2.12 summarizes some key guidelines you should consider when examining your data and

preparing them for PLS-SEM. For more detail on examining your data, see Chapter 2 of Hair et al. (2010).

CASE STUDY ILLUSTRATION: SPECIFYING THE PLS-SEM MODEL

The most effective way to learn how to use a statistical method is to actually apply the method to a set of data. Throughout this book, we use a single example that enables you to do that. We start the example with a simple model, and in Chapter 5, we expand that same model to a much broader, more complex model. For our initial model, we hypothesize a path model to estimate the relationships between cor­

porate reputation, customer satisfaction, and customer loyalty. The example will provide insights on ( 1) how to develop the structural model representing the underlying concepts/theory, (2) the setup of measurement models for the latent variables, and ( 3) the structure of the empirical data used. Then, our focus shifts to setting up the SmartPLS software (Ringle et al., 2005) for PLS-SEM.

Exhibit 2.12 Guidelines for Examining Data Used With PLS-SEM

Missing data must be identified. When missing data for an observation exceed 15%, it should be removed from the data set.

Other missing data should be dealt with before running a PLS-SEM analysis. When less than 5% of values per indicator are missing, use mean replacement. Otherwise, use casewise replacement, but make sure that the deletion of observations did not occur system­

atically and that enough observations remain for the analysis. Also consider using more complex imputation procedures.

Straight lining and inconsistent response patterns typically justify removing a response from the data set.

Outliers should be identified before running a PLS-SEM, and in most instances, the offending responses should be removed from the data set. Subgroups that are substantial in size should be identified based on prior knowledge or by statistical means (e.g., FIMIX-PLS).

Lack of normality in variable distributions can distort the results of multivariate analysis. This problem is much less severe with PLS­

SEM, but researchers should still examine PLS-SEM results carefully when distributions deviate substantially from normal. Absolute skewness and/or kurtosis values of greater than 1 are indicative of highly non-normal data.

56 A Primer on Partial Least Squares

Application of Staae 1

:

Structural Model