MISSING DATA AND COMPUTING SCORES TO FORM NEW MEASURES

Further steps with SPSS 8 for Windows

MISSING DATA AND COMPUTING SCORES TO FORM NEW MEASURES

As we have seen, the satis1 score for the first participant and the satis1 and satis2 scores for the second participant are missing. In research, it is quite common for some scores to be missing. Participants may omit to answer questions, they may circle two different answers to the same question, the experimenter may forget to record a response, and so on. It is important to Box 3.8 Compute Variable dialog box

Table 3.3 Case Summaries output showing values of satis1, rsatis2, satis3, rsatis4 and satis

52 Further steps with SPSS 8 for Windows

consider carefully how you are going to deal with missing data. If many of the data for one particular variable are missing, this suggests that there are problems with its measurement which need to be sorted out. Thus, for example, it may be a question which does not apply to most people, in which case it is best omitted. If many scores for an individual are missing, it is most probably best to omit this person from the sample since there may be problems with the way in which these data were collected. Thus, for example, it could be that the participant was not paying attention to the task at hand.

Where data for an index such as job routine are missing for some individuals, it is not appropriate to use the sum of the responses as an index since the total score will not reflect the same number of responses. For example, someone who answers ‘strongly agree’ (coded 5) to all four job routine items will have a total score of 20 whereas someone who strongly agrees with all items but who, for some reason, did not give an answer to one of them will have a total score of only 15. In other words, when we have missing data for items that constitute an index we need to take account of the missing data. In this situation a more appropriate index is the mean score of the non-missing values which would be 5 for the first (20/4=5) and for the second (15/3=5) individual. Another advantage of using the mean score for a scale such as job routine is that the mean score now corresponds to the answers to the individual items, so that an average score of 4.17 indicates that that person generally answers ‘agree’ to those items.

However, we would not generally want to derive an average score for someone who has a relatively large amount of data missing. A criterion sometimes applied to what constitutes too much missing data is if more than 10 per cent of the data are missing for an index, then the index itself is defined as missing for that participant. If we applied this principle to the two participants in the Job Survey, no score for job satisfaction would be computed for them, although they would have scores for job routine and autonomy.

To compute a mean score we use the MEAN(numexpr,numexpr,…) function in the Compute Variable dialog box. If we want to specify a minimum number of values that must be non-missing for the mean to be produced we type a full stop after MEAN, followed by the minimum number. We will use the four satis items to illustrate how this is done. With only four items, we cannot use a cut-off point of more than 10 per cent for exclusion as missing. Therefore, we will adopt a more lenient criterion of more than 50 per cent. If more than 50 per cent (i.e. 2 or more) of the scores for the job satisfaction items are missing, we will code that variable for participants as missing. In other words, the minimum number of values that must be non-missing for the mean to be computed is 3. As before, the new variable is called satis but the numeric expression in the Numeric Expression: is MEAN.3(satis1, rsatis2, satis3, rsatis4). If we examine the new values of satis in the Data Editor we see that it is 3.00 for the first case (9.00/3=3.00),. for the second case (since there are only two valid values) and 3.75 for the third case (15.00/4=3.75).

To display the mean value as a zero when it is missing, we use the Recode into Same Variables procedure in which we select System-missing in the box called Old Value and enter 0 in the box called Value: (in the New Value section) and select Add.

To convert this mean score back into a total score (which takes into account numbers of valid scores that might vary between three and four), we simply multiply this mean by the maximum number of items which is 4. To do this we use the Compute Variable procedure where the new variable is still called satis and where the numeric expression is satis * 4.

Since we wish to form three new variables (job satisfaction, job autonomy and job routine), we have to repeat this Compute Variable procedure for the job routine and job autonomy items. Although we know there are no missing values for these two sets of variables, it does no harm to be cautious and assume there may be some as we have done. To determine if there are any rogue or missing values in data sets with which we are unfamiliar, we use the Frequencies procedure (see Chapter 5).

Aggregate measures of job satisfaction, job autonomy and job routine used in subsequent chapters have been based on summing the four items within each scale and assigning the summed score as missing where more than 10 per cent of the items were missing. Since 2 of the 70 cases in the Job Survey had one or two of the answers to the individual job satisfaction items missing, the number of cases for whom a summed job satisfaction score could be computed is 68. The summed scores for job satisfaction, job autonomy and job routine have been called satis, autonom and routine respectively. To do this for satis, for example, we first compute the mean score with the numeric expression MEAN.4(satis1 to satis4) and then convert this to a total score with the numeric expression satis * 4.

When we have a data set which has a large number of variables which we may not need for a particular set of analyses, we may find it more convenient to create a new file which contains only those variables that we are going to examine. For example, if we intend to analyze the aggregate variables of satis, routine and autonom and not the individual items that constitute them, then we can create a new file which holds these new variables (together with any of the other variables we need) but which does not contain the individual items. We delete the individual items by selecting the variable names satis1 to routine4 in the Data Editor and selecting Edit and Cut. We then save the data in the Data Editor in a new file which we will call ‘jssd.sav’ (for job scored survey data) and which we will use in subsequent analyses.

EXERCISES

1. What is the SPSS procedure for selecting men and women of African origin in the Job-Survey data?

54 Further steps with SPSS 8 for Windows

2. Write the conditional expression for SPSS to select women of Asian or West Indian origin who are 25 years old or younger in the Job-Survey data. 3. What is the conditional expression for selecting participants who had no

missing job satisfaction scores in the Job-Survey data?

4. What is the SPSS procedure for receding the Job-Survey variable skill into the same variable but with only two categories (unskilled/semi-skilled vs fairly/highly skilled)?

5. What is the SPSS procedure for receding the variable income into the new variable of incomec comprising three groups of those earning less than £5,000, between £5,000 and under £10,000, and £10,000 and over, and where missing values are assigned as zero?

6. Using the arithmetic operator *, express the variable weeks as days. In other words, convert the number of weeks into the number of days.

Concepts and their

In document Quantitative Analysis With SPSS (Page 68-72)