Impact of non-response - Generic changes in skill requirement

Generic trends in skill requirement

7.1. Generic changes in skill requirement

7.1.4. Impact of non-response

In the case of HILDA, non-response can be broken into three categories: attrition, where respondents drop out of the survey altogether; item non-response, where respondents complete most of the questionnaire but fail to answer individual questions; and SCQ non- response, where respondents complete the interview but fail to return the SCQ.

The extent of attrition has been discussed in subsection 4.3.2, and the top line in Table 7.5 below, which is derived from counts included in the data file for each wave, illustrates its impact on the number of respondents who provided data for the interview questionnaire in each wave. This shows a loss of individual responding persons considerably less drastic

than the overall loss of households, especially between Waves 1 and 2 where it amounted to 6.6% of responding persons as against 13% of households.

Item non-response among those who actually returned the SCQ appears to be barely more than trivial, as can be seen in the final column of Table 7.4 above, where the number of respondents who provided data over all waves for WORKFLOW, the most frequently answered of the six key questions, exceeds by fewer than 50 the number who answered all three questions required for a score on the skill-intensity scale. A count in individual waves shows that of those SCQ respondents who reported themselves or were presumed by the coders to be in paid employment, a maximum of 62 in any year failed to complete all three of these questions.

This leaves the issue of SCQ non-response as the most important contributor to sample loss. Table 7.5 below shows how this affected both the overall sample and the sample of

employed respondents in each wave.

Wave 1 2 3 4 5 6 Interviewed 13969 13041 12728 12408 12759 12905 No SCQ 911 1403 981 1012 1294 1196 % no SCQ 6.52 10.76 7.71 8.16 10.14 9.27 Employed 8525 8088 7991 7822 8247 8357 Employed, no SCQ 525 885 589 625 857 797 % employed no SCQ 6.16 10.94 7.37 7.99 10.39 9.54 Table 7.5

Interviewed respondents who failed to return the SCQ

These figures show that SCQ non-response peaked in Waves 2 and 5 at over 10% of the interviewed sample. The proportional impact was slightly higher for respondents who reported in the interview that they were employed at the time of survey. Between Waves 1 and 2 the rate of SCQ non-response for employed persons who completed the interview jumped by almost 78%, amplifying the impact of overall sample attrition to reduce the achieved sample for the key questions by just under 800, or 10%. The rise in the non-return rate exceeded 30% again in Wave 5, but this time was offset by a rise in the interview rate to produce an achieved sample for this sequence around 200 greater than in Wave 4. The degree to which this affected the reliability of the Wave 5 findings, relative to the previous year, depends on how closely any non-response bias affecting the SCQ in the two waves mirrored that for interview non-response.

Given that these were the two years in which the sharpest change occurred in the aggregate means, this evidence provides some reason for caution in treating the whole of the change in each year as genuine. While nothing can be demonstrated conclusively, it seems prudent to assume that at least part of the drop in mean scores between Waves 1 and 2 was an artefact of non-response bias. Taken together with the evidence found earlier of a response effect influencing the movement in scores over the same pair of years, this possibility strengthens the argument against reading too much into that apparent shift.

As an additional gross check against systematic bias, tests were applied to see whether the same effects appeared in different parts of the questionnaire. Some kinds of response bias, if they occurred, might be expected to affect the results across most or all areas of the questionnaire, including questions unrelated to the subject matter of the variables studied in

this research: an example would be the apparent tendency of respondents to experiment with more extreme scores the first time they saw or heard the questions, but revert to more conservative scores on subsequent occasions. Unexplained consistency across the

responses on different topics, in different parts of the questionnaire, might be evidence of this kind of systematic bias. Other kinds of bias might be related to unobserved sources of bias specific to parts of the questionnaire, e.g. in the way particular sequences of questions were asked, the position of individual questions within a sequence or of that sequence within the questionnaire, respondent fatigue setting in towards the latter end of either questionnaire, or differences in the way respondents answered questions in the interview and the SCQ. These would show up in otherwise unexplained discrepancies between the response trends for the key variables of interest and those for questions in other sequences, or in the interview questionnaire, which covered similar topics.

To test these possibilities, five control variables were selected and subjected to the same tests as the main variables of interest. All represent subjective ratings of aspects of job quality, but with different emphases, and are asked in different contexts:

• Overall job satisfaction is a summary rating given by the respondents at the end of a

sequence of questions on individual aspects of their job (pay, work-life balance, etc). It occurs in the interview questionnaire and uses an 11-point response scale. It

correlates modestly with task discretion (around .26 in Wave 5), but only weakly (though still significantly) with skill-intensity;

• Chance of losing job in next 12 months is a percentage rating given by respondents

in the interview questionnaire. Besides using a different kind of response scale, it differs from the main variables studied here in that it rates the probability of an event rather than a simple opinion on a qualitative aspect of the respondent’s job. It might be expected intuitively to move in line with the state of the labour market, with the mean getting lower as the economy approaches full employment. Its correlations with the key variables of interest are statistically significant but very weak (<.05);

• Chance of voluntarily leaving job in next 12 months is the adjacent question in the

interview questionnaire and uses the same response scale. However, it differs in measuring the respondents’ own intentions, and thus in indicating the extent to which a respondent might be prepared to act on his views about the quality of his present job. In fact, it correlates very strongly (around .7) with the previous

question, and more strongly with skill-intensity than with task discretion, though the correlations in both the latter cases are well below .2;

• FAIRPAY (“I get paid fairly for the things I do in my job”) is part of the same

sequence in the SCQ as the main variables of interest, but loaded on a different factor in the PCA. It correlates modestly with task discretion (.2) but non- significantly with skill-intensity;

• FUTURE (“I worry about the future of my job”) is also part of the same sequence

but loaded on a non-related factor, job stress. Its correlations with skill-intensity and task discretion are statistically significant but well below .1. It might be

a little below .2, suggesting that respondents answered SCQ questions differently from questions on the same general topic in the interview.

On the repeated-measures ANOVA, all five control variables show very different patterns from those for the variables of primary interest. Job satisfaction shows no significant variation whatever, either between waves or over the full period, while the variation in FAIRPAY is significant at the .05 level over the full period, but not between any two waves. On none of the remaining three does any significant change appear either between Waves 1 and 2 or between 4 and 5. On “chance of voluntarily leaving job” the only significant differences are between Waves 4 and 6, while on “chance of losing job” they appear between Wave 1 and Waves 3-6. The same divide occurs for FUTURE, albeit significant differences also appear between Wave 2 and the later waves; this suggests that despite the relatively poor recorded correlation, the pattern of response has been essentially the same for the two questions that cover the same general underlying construct in the interview and the SCQ.

Turning to the aggregate scores, the marked drop between the first two waves reappears for FUTURE and “chance of losing job” but not for the other three control variables, while the three variables that relate to expected job change all show a rise between Waves 4 and 5. The fishtail pattern shown in Figure 7.4 recurs in the responses across waves for those three variables where the response scale permits such an analysis (job satisfaction, FAIRPAY and FUTURE), again converging on the aggregate mean trendline rather than the midpoint of the scale, though the convergence is more gradual and the break between Waves 1 and 2 less sharp in the case of FAIRPAY.

The evidence provided by these control variables is incomplete and impressionistic. Nevertheless, it represents some cause for confidence that HILDA respondents can and do discriminate in their response between questions in the same sequence covering different topics, and hence that the response patterns on the skill-related variables are indeed specific to the aspects of skill to which they refer and not simply manifestations of some more broadly applying artefact, such as might result from panel conditioning, sequence effects or non-response bias. This applies in particular to the problematic breaks in the trendline, at least so far as the aggregate means are concerned, at Waves 2 and 5, though this evidence is not sufficient by itself to demonstrate that they represent actual change. At the same time the broad similarity in response patterns between the two control variables which refer to the construct of perceived job insecurity in the interview and the SCQ respectively suggests that SCQ non-response need not be biasing the findings as badly as might be feared. On the other hand the fishtail effect, where the distribution of responses shifts sharply away from the extreme points on the scale after the initial wave, appears to apply across both questionnaires and a range of topics, and hence must remain under consideration as a convincing possible explanation for at least part of the first of those breaks.

7.1.5. Summary

This critical review of the key trends in the data has been more exhaustive than would have been necessary had those trends been more pronounced or consistent, or had there been a longer run of data within which to locate them and assess their ecological significance. It remains ultimately inconclusive. Each of the three methods applied to measure the change has its own strengths and its own weaknesses as a basis for valid inference to the population, and none emerges as the most methodologically compelling on all criteria. Their findings conflict on some matters which are absolutely critical to making sense of the data. The

choice of which method to prefer in each case, or how best to reconcile the inconsistencies between their findings, must ultimately be a matter of judgement rather than the

unequivocal outcome of formal analysis. To summarise their strengths and weaknesses: • The aggregate means have the advantages of staying closest to the recorded results

and maximising the sample available for analysis in any one wave. However, they suffer the drawback of being susceptible to bias as the result of random – or worse still, non-random – variations in the composition of the sample from wave to wave due to differential rates of non-response. In this respect they undermine many of the arguments for using a panel sample. The fluctuations in sample size across the waves are so substantial by comparison with the actual movements in the indicators that in some waves at least – notably those where the greatest average change appears in the data – they could quite feasibly account for all of that change. • The binned scores provide some insurance against misinterpreting purely random

variations in individuals’ scores across waves for the same level of agreement, if one assumes that such random movement is most likely to occur around the middle of the scale where the choice is a matter of greatest indifference. However, while this assumption is intuitively persuasive, there is nothing in the data to prove or disprove whether it actually applies in this instance. Similarly, they compensate for the possibility that some respondents will use central scores as a substitute for the missing “Don’t know/ not applicable” response category; but this (if it in fact occurs) is more likely to be a problem with the interview questions than with the SCQ

where respondents who genuinely cannot commit to an opinion have the option of leaving the question unanswered without risk of embarrassment. Perhaps the strongest argument for their use is that they compensate for the main inferential problem arising from the use of a response scale with no verbal anchors for the intermediate points, namely that no two respondents can be guaranteed to perceive the same distance between the same two points on what is, after all, an ordinal scale. A score towards one end or the other, though it cannot be confidently assumed to represent the same intensity of opinion for all the respondents who give it, can at least be taken as representing a clear preference one way or the other. From this point of view the binned scores are useful for extracting strong or unequivocal trends over this specific period. In a longer-term perspective, however, they may equally conceal more pervasive trends that emerge only gradually and have their main impact on respondents whose opinion lies around the centre of the distribution. • The use of repeated measures ANOVA on the set of respondents who answered in

all waves provides the most rigorous method of formally estimating the statistical significance of recorded changes from wave to wave, and eliminates any

contribution of unintended sample variation. Restricting the analysis to identifiable changes from individuals’ Wave 1 responses can be seen as enhancing the accuracy of the findings because the Wave 1 sample was the closest to the original designed sample and hence can be assumed to be the most representative of the population. From another perspective, though, it represents a weakness because in the presence of known high levels of non-response, the set of respondents who answered all the questions in all waves can reasonably be expected to differ from less conscientious respondents on some dimensions that critically influence their expected scores. Though the results may be highly accurate for the specific population they represent,

it is less certain that they can be accurately extrapolated to the broader population of interest.

More sophisticated modelling might go some way further towards resolving these uncertainties, but the only sure remedy, as has already been stressed several times, is a longer run of data. Pending this, a number of interim conclusions can be drawn, perhaps not altogether safely, but with sufficient confidence to form a basis for further analyses. The closest thing to a certain trend that appears in all three methods is the decline in the average skill-intensity of Australian jobs between 2001 and 2006. While small, this trend appears to be statistically significant and strong enough to offset the rise in mean scores that occurred in Wave 5, at least over the period for which data are so far available. The main uncertainty attaching to this trend is the degree to which it depends on the fall in means between the first two waves, which more than accounts for the full difference between the first and latest waves on both composite scales. Without this movement, the true size of which is open to doubt because of the apparent contributions of sample variability and panel conditioning, the picture would look very different.

The direction of overall movement on task discretion is more equivocal, but in any case the movement so far appears to have been quite small, though statistically significant. The two individual variables which appear to stand out against the trend by showing a net gain in aggregate mean scores over the five years are COMPLEX and WORKFLOW. However, the failure of this countervailing pattern to show up in the binned scores for either variable suggests that most of the gain is taking place around the middle of the response scale, where it could include a large element of random variation in individuals’ scores

It also appears reasonably clear that for whatever reason, representative scores (including those for some negative indicators of job quality) rose across the board in Wave 5, though on two out of the three measures used in this section, the rise does not appear to have been sustained. The lack of any obvious external explanation makes it necessary to treat this finding too with caution, especially as it too occurred in a year with an unusually high proportion of missing SCQs. It may also be due at least in part to a response effect caused by the addition of nine new relevant variables to the sequence in that year, and on the analogy of what appears to have happened in Wave 2, this could also account for some of the drop in scores in the following wave. Nonetheless it appears sufficiently robust to be treated as genuine until clearer evidence emerges to disprove it. Some of the evidence from the ANOVA opens the possibility that on some variables at least, it might signal the

beginning of a more sustained upward trend. If this proves to be the case as more waves of data become available, it will require a thorough revision of many of the tentative

interpretations that have been placed on the data in this thesis.

On present indications, however, it can at least be said with reasonable confidence that the null hypothesis has not been proven, since on the best available evidence the change in both the skill-intensity and the task discretion dimensions of skill was statistically significant over this period, albeit neither consistent nor steady. With equal confidence it can be said that the change was neither as marked nor as rapid as might be inferred from the public discussions about a skills crisis over these years. Indeed, most or all of the change occurred in the opposite direction to what might have been expected.

7.2. Contributions of generic and compositional change

In document Australia’s national skilling system and its trajectory: A model and analysis for the period 2001 2006 (Page 184-190)