• No results found

Research questions and methodology

6.3. Measuring the three processes

Using the same dataset to investigate three distinct processes can pose problems of

methodology. First and foremost, the assumed reliability of the data will vary according to the process which is the subject of an individual analysis. Change in the aggregate

indicators which results from variability in the occupational composition of the panel over successive waves must be treated as error when the purpose is to track generic skill change (process 1), but where the objective is to track compositional change (process 3), it

constitutes valid and relevant data – subject to the important assumption that these changes in panel composition accurately mirror changes in the occupational composition of the employed workforce as a whole. More generally, the process which is the object of inquiry determines whether it is more appropriate to treat the data as true longitudinal panel data (i.e. following individual members of the panel across waves) or a series of cross-sections. As a general consideration, panel data permit the identification of two types of change. One is cross-sectional change, where the statistic of interest for the sample, either as a whole or disaggregated according to selected classificatory parameters, changes from wave to wave. Thus for example, aggregate mean scores on a skill-intensity scale may vary from year to year, as may the distribution of each score on the scale across industries and

occupations; the probability of an individual reporting a certain score in a given year can be calculated depending on the category to which she belongs, but the data do not directly map individual trajectories. The other is gross change (Kalton, Kasprzyk and McMillen 1989: 264) which occurs at the level of individuals, or groups or cohorts of individuals, from wave to wave; by extrapolating from these individual trajectories it may be possible to arrive at rules or hypotheses predicting the future behaviour or experience of specified categories within the population, even if their actual membership is unknown. Ordinary time-series data sources, which draw a fresh sample for each time period, can only shed light on the former.

For many purposes the two can be treated as more or less interchangeable. For example, the experience or performance of an age cohort over time can be tracked either by selecting those individuals who reached the specified age in the first wave and tracking their

individual or group progress across waves, or by taking the set of sample members in each wave who fall within the appropriate age range for that year, and either approach is capable

of producing statistically reliable findings. However, the strength of gross change is that it provides accurate information about flows, e.g. between programs, between employment and unemployment, from casual to permanent employment, or between industries or occupations. It can also be more informative about causal mechanisms, especially those which take effect over several years. Each type is more appropriate for particular types of inferential analyses, and each is vulnerable to different kinds of error. Both types have a role in tracking each of the three processes, but the importance of each varies from one process to the next.

A generic shift in the skill content of work resulting from process 1 will generally be reflected in cross-sectional change at the aggregate level, as a residual once the influence of the changing balance of employment across industries and occupations has been controlled for. Thus it is useful to look first for changes at the aggregate level from wave to wave. Even if the aggregates do not change significantly because the impact of process 1 has been offset by contrary trends in the other two processes, its net contribution can often still be identified by controlling for their impact through regression models.

However, a more sensitive indication of the incidence of process 1 can be achieved, albeit at the cost of some sample loss, by concentrating the analysis on those members of the panel who continued to occupy the same job across waves. Strictly speaking, the purpose in carrying out such an analysis is to track the same jobs rather than the same individuals; the individual identity simply represents the only reliable marker that is available for the identity of the job. Generic changes can be identified by repeated-sample T-tests on those respondents who answered the same question in adjacent waves, or by one-way ANOVA for longer periods, excluding from the analysis those who changed jobs between surveys. (Those who were unemployed or not in the labour force in the previous wave are

automatically excluded by the requirement for an answer in both waves, since the questions of primary interest are asked only of respondents who are currently employed.) The results will reveal the extent to which changes in aspects of skill affected jobs already in existence. To the extent that the panel is representative of the employed population, the results of this analysis will be predictive of the experience that an average member of that population can expect in comparable employment circumstances.

For process 2, on the other hand, cross-sectional analyses for each wave will be more informative. The objective here is to determine whether and how the skill profile of each industry or occupation changes from year to year, regardless of who occupies the jobs or whether they are the same jobs as in the previous year. Even if there is substantial turnover between waves among the individuals making up the sample, the results will remain

reliable so long as the sample is equally representative in both waves – in effect, so long as the variability in panel composition is random. Strictly speaking, that part of the change in each industry or occupational profile which results from the upgrading of current jobs constitutes noise, or at any rate double-counting if the same analysis is intended to shed light simultaneously on processes 1 and 2. However, some of this confusion can be avoided by concentrating on the movements in the skilfulness of different industries and occupations relative to one another, on the assumption that any growth in the overall skill content of work will be manifested in a rise in the base level of skill across all categories. Identifying the impacts of process 3, i.e. true compositional change, strictly requires the reverse of the approach taken for process 1. Theoretically, the most valuable informants in a longitudinal sense are those respondents who have changed jobs at some time over the six

waves, moving to a different occupation and/or industry, and have experienced a change in the skill demands of their jobs as a result. However, the size of the HILDA sample offers little potential for tracking these cases at any useful level of disaggregation. Cross- sectional analysis can still provide part of the picture so long as it is possible to link any changes in the overall indicators of skill-intensity to changes in the industry/occupational profile of the sample – assuming, once again, that the latter accurately mirror changes in that of the workforce.

As a first step towards carrying out longitudinal analyses of gross change, and to simplify the specification of cross-sectional analyses which need to be carried out over multiple waves, a short longitudinal file was created using the SPSS MERGE FILES procedure. Source files for this procedure were the short working files constructed for each wave, which retained only between 85 and 135 relevant variables from the responding persons (Rperson) file for the corresponding wave. These single-wave working files were merged with a master file for all waves supplied as part of the Wave 6 data release, with individual respondent records sorted and matched on the cross-wave identifier which is included among the unit record variables for each wave. This operation was based on the syntax set out at page 25 of the 2008 HILDA Manual for creating an unbalanced wide longitudinal file. The resulting file contains the relevant variables for each year in which the respondent was interviewed, or returned the SCQ, depending on the variable. The order in which variables are entered consists of the full set of selected variables for each year, followed by the full set for the next. Responses on an individual variable in different years can be distinguished by the unique identifying character for each wave which begins the variable title.

The resulting file was then checked for accuracy of cross-wave respondent match by drawing a small random sample of respondents and visually comparing their recorded responses over all waves on two variables, Sex (_hgsex) and Age at last birthday before the date of interview (_hhiage). This check was backed up by taking two subsets of Wave 1 respondents, those who gave their sex in that year as male and those who gave their age as 25, and running frequency counts on their response to the same question in each subsequent year. All cases matched perfectly across all waves on Sex, while the mean age advanced by 1 year for each successive wave with a range not exceeding 0.2 years, a variance which is explainable by differences in the time of year at which the respondent was interviewed in adjacent waves.

6.4. Construction of composite scales for the two dimensions