• No results found

Assumptions Associated with SEM

Behavioural Responses

3.6 Assumptions Associated with SEM

As with all multivariate analyses, there are a number of assumptions that must be met prior to conducting any analysis. The first of these is that the data is multivariate normally distributed. A second basic assumption in SEM is that all relations are linear (Kline, 2005). It is recommended that scatterplots are inspected for obviously curvilinearity in the data. SEM also has a number of assumptions associated with sample size. Similar to exploratory factor analysis, structural equation modelling is thought to be a large sample technique. Due to this, one of the key questions researchers are faced with is how large a sample size is needed and has been regarded as one of the most deceptively difficult questions to answer (Jackson, 2003). One suggested approach is based upon the number of parameters, with higher sample size to parameter ratio required. Recommendations have ranged from as high as 10:1 to as low as 5:1 (Kline,

2005). Minimum sample size estimates have also been proffered and it is thought that samples under 100 are unsound. A final assumption that must be met relates to missing data. There are an extensive number of issues a researcher must consider when dealing with missing data and for this reason missing data is discussed in detail below.

3.6.1 Missing Data

A key assumption of SEM is that it requires a complete data set. Unfortunately social science data is often hampered with this problem. To cope with missing data, statisticians have provided a number of methods for handling non-response. These include, listwise deletion, pairwise deletion and maximum likelihood approaches. Up until recently, listwise deletion, pairwise deletion and mean substitution were the most popular, however maximum likelihood methods have gained popularity given that many SEM packages provide more advanced imputation methods in their programs. Before deciding on the most appropriate way of handling missing data, the researcher needs to determine the nature of the missing data. Thus, missing data is classified as either missing at random (MAR) or missing at completely at random (MCAR). If an individual does not respond to a particular question, and the lack of response is thought to be related to another question in the study then the data are considered MAR. An example of MAR would be where older people will not provide answers to a salary question. On the other hand, MCAR assumes that a subjects’ non-response is not dependent on any other question in the study or the question itself. It is important that a researcher explores their data before deciding on the most appropriate method of handling missing responses.

100

Next, the various missing data methods are discussed along with their assumptions.

Finally the missing data method used here will be presented.

If a case is missing data on some variables in the analysis, then listwise deletion involves discarding the entire response set for that case. The main virtue of this method is its simplicity (Schafer and Graham, 2002). However, the prevalence of listwise deletions has lessened in recent years due to a number of serious inefficiencies. Firstly, listwise deletion can lead to bias as the researcher has assumed that those cases that are deleted are the same as those that contain complete data (Malhotra, 1987; Schafer and Olsen, 1998). As well as this problem, it can also lead to large amounts of data being deleted which can thus result in inefficient parameter estimates (Enders and Bandalos, 2001).

When data is Missing Completely At Random (MCAR), it has been found that listwise deletion will lead to unbiased parameter estimates and can thus be used for SEM.

With pairwise deletion, cases are omitted from any calculations involving variables for which they have missing data (Enders and Bandalos, 2001). Pairwise deletion is no longer a popular method as it can produce a covariance matrix that cannot be inverted which is an essential step in structural equation modelling (Acock, 2005). The final less sophisticated missing data method is mean substitution. Mean substitution involves calculating the average response from the available data and replacing missing values with this mean value (Tabachnick and Fidell, 2007). Research has found that this method flattens distributions, increases variance (Schafer and Graham, 2002), underestimates correlations and β weights (Acock, 2005) and redefines the scale (Schafer and Graham,

2002). Due to these serious drawbacks, authors have strongly objected to mean substitution. In fact, Pallant (2004, p. 119) is so adamant about this that she states that mean substitution ‘should NEVER be used as it can severely distort the results of your analysis’ (original emphasis).

More recent missing data procedures have become much more advanced and are available in many software programs such as SPSS, SAS and PRELIS (Acock, 2005).

These methods are based on maximum-likelihood procedures and are thus more sophisticated. These methods are advantageous in that they only assume the less demanding assumption of MAR. The general premise of these methods is that they examine the pattern of responses and thus substitute values based on the pattern of responses obtained from complete responses.

One such maximum likelihood method available in LISREL is the Expectation-Maximisation (EM) Algorithm. This procedure was first introduced by Dempster et al (1977) and heralded a fundamental shift away from the way statisticians viewed data.

Through the EM algorithm, researchers can use observed values as a basis for making assumptions about unobserved ones. EM forms a covariance matrix by assuming the distribution of the missing data cases and is an iterative procedure with two steps:

expectation and maximisation (Tabachnick and Fidell, 2007). The first step is the expectation stage where the conditional expectation of the missing data is found. These expectations are then replaced with the missing data. The second stage, maximisation

102

performs the maximum likelihood estimation and then when convergence is achieved the EM variance-covariance matrix can be saved as a separate data set.

For this research, the dataset was first explored to determine whether a pattern existed in the missing data. This is done using Little’s MCAR test (Tabachnick and Fidell, 2007).

Little’s test examines the pattern of the missing data and provides a statistic, which if significant indicates that the data is missing completely at random. For the current dataset a Little’s statistic of 0.232 was found, signifying that the missing data pattern was indeed MCAR. Having a dataset where the missing data falls into the MCAR category is quite advantageous as it allows you to choose whichever missing data treatment you wish. The EM algorithm was chosen over more elementary techniques such as listwise and pairwise deletion, given its sophistication and ease of use in SPSS.