Pilot studies to establish data validity - Creation and validation of the SLCS

Chapter 4 Construction and validation of the survey instrument

4.3 Creation and validation of the SLCS

4.3.4 Pilot studies to establish data validity

The two pilot studies conducted in the fourth step of scale development aimed to test the feasibility of the survey and identify modifications needed for the conduct of a larger study. This included testing the logistical components of the survey, such as participant recruitment, return rates, and completions, and the effectiveness of the online portal. Two of the key goals in the pilot studies were to test the reliability and validity of the data collected using the scale, and to check and rehearse statistical and analytical processes to determine their efficacy in proving the construct validity of the survey instrument.

Invitations to take part in the first pilot of the survey instrument were sent via email to the principals, deputy principals, head teachers, assistant principals and class teachers of five primary schools and five high schools in the NSW public system. For this pilot, forty-three responses were obtained within a month, and of these responses, forty provided complete datasets. Different opinions were found in the research literature about a suitable size for a pilot study. Hertzog (2008) advises that there is no simple or straightforward formula for the number of samples required for a pilot study, as different types of studies are influenced by different factors. However, Connelly (2008) maintains that a pilot study sample should be at least ten per cent of the larger parent study sample, and others suggest that ten to thirty participants would be sufficient (Hill, 1998; Julious, 2005; Van Belle, 2002).

It was estimated that the actual survey would capture between 300 and 400 responses; therefore, for this pilot, an N=40 response was considered a good sample.

Statistical analysis of data captured from the first pilot

Factor analysis is a statistical technique commonly used to reduce data to a smaller set of summary variables and to detect structure in the relationships between variables. In the pilot, internal consistency was checked using inter-item correlation with a software known as the

96 Statistical Package for the Social Sciences (SPSS). Reliability was obtained using Cronbach’s alpha, which is used for estimating reliability for item-specific variance in a unidimensional test (Cortina, 1993). Principal Axis Factoring (PAF) was used to extract factors followed by an oblique rotation (Promax) to understand which items load on what factors by generating a pattern matrix (Field, 2011).

Data from the ‘importance rating’ scale

In this first pilot study, an inter-item correlation matrix was generated for each capability set for both the importance rating and the strength rating, which indicated high internal consistency. The pilot statistics are given in detail in Appendix 2.3.

The raw scores collected from the importance rating scale indicated a high tendency to rate most items in all five capability sets (1) ‘Very Important” and (2) ‘Important’. Statistically, the data collected showed internal consistency, indicating that every item measured the same construct, with a Cronbach’s alpha score of between 0.8 and 0.9 for each of the sets (Set 1: 0.874; Set 2: 0.922; Set 3: 0.919; Set 4: 0.938; Set 5: 0.923), implying high and acceptable internal consistency. Cronbach’s alpha quantifies the degree of internal consistency (reliability), and a score higher than 0.7 is considered reliable (Field, 2011.)

An EFA technique, specifically Principal Axis Factoring (PAF) was used to explore the relationships among the items in the five capability sets and to identify underlying factors and dimensions. For each of the five sets, two items were removed, based on whether most participants failed to list them as very important or important, or whether they had item-total correlations lower than .50. This step in factor extraction was to retain factors with eigenvalues >1 and to reduce less related items (Costello & Osborne, 2005; Field, 2011). More detail of the analysis is provided in Appendix 2.4.

Data from the ‘strength rating’ scale

An analysis of the strength rating was conducted on the remaining eight items within each set. Raw scores collected from the ‘strength rating’ scale showed a more evenly distributed selection across the five Likert choices and indicated a progressively larger standard deviation across the sets, as shown in Table 4.5 below.

Similar to the ‘importance rating’ scale, data collected from the ‘strength rating’ scale showed internal consistency and Cronbach’s alpha scores of between 0.70 and 0.95 for all 5 sets (Set 1: 0.74; Set 2: 0.89; Set 3: 0.92; Set 4: 0.91; Set 5: 0.93), indicating acceptable internal consistency for all leadership sets.

A separate EFA was conducted for each of the five leadership sets. This was considered more suitable than conducting one single factor analysis across all five sets simultaneously, because the survey was not intended to be used to assess correlations between sets, but instead individuals’ strengths within each set. For each factor analysis, PAF was conducted, a method used to identify the true underlying factor structure of a scale. An oblique rotation via the

98 Promax method was chosen to examine the correlation matrix, as this produces factor structures that are easily interpretable (Field, 2011).

The pilot provided basic indications of the instrument’s construct validity and enabled the extraction of factors for future analysis. To further validate the instrument, a Confirmatory Factor Analysis (CFA) was conducted after data were collected from the main survey. This is further explained in Section 4.3.5.

Convergent and discriminant validity

Convergent and discriminant validity are interlocking essential aspects of construct validity. Convergent validity is established by evidence that different indicators of theoretically similar or corresponding constructs are strongly interrelated, while discriminant validity is demonstrated by results showing that indicators of theoretically distinct constructs are not highly interrelated (Brown, 2006).

To establish convergent validity for this study, the Big Five Inventory (BFI) created and validated by John and Srivastava (1999) was employed. This 44-item inventory is widely used and recognised. This scale was chosen because it is a 5-factor model, and shares some common descriptions of behaviour with the survey instrument constructed for this study, the SLCS.

To establish discriminant validity, the Schutte Self-Report Emotional Intelligence Test (SSEIT) was used. This 33-item self-reporting measure of emotional intelligence, which was developed and validated by Schutte, Malouff, Hall, Haggerty, Cooper, Golden, et al. (1998), assesses emotional intelligence in three aspects: 1. appraisal and expression of emotion; 2. regulation of emotion; and 3. utilisation of emotion. Although some aspects of emotional intelligence are covered in the SLCS, this scale is not a tool to measure leadership capability. Both the BFI and the SSEIT were also chosen for their easy access. The instruments were obtained from Statistics Solutions (http://www.statisticssolutions.com). The results of the establishment of convergent and discriminant validity are discussed in Section 4.3.5.

Construction of a new survey

Responses gained from the pilot showed a distinct pattern: that all forty participants had rated the forty items from the SLCS as either ‘important’ or ‘very important'. This led to the decision that the study would not pursue further investigation into the question of how school

99 leaders in NSW public schools view the importance of these capability sets, and the ‘importance rating’ part of the survey was therefore discarded. One of the key reasons for this decision was that if the section was retained, with the addition of the two new sets of measurement scales, the survey would become too long.

Using data generated from the pilot study, the items in each of the five capability sets were reduced from ten to eight, making a total of forty. This newly constructed scale was used only to measure participants’ strength ratings.

With the addition of 44 items from the BFI (John & Srivastava, 1999) and 33 items from the SSEIT (Schutte et al., 1998), the new survey comprised 117 items, which was only 17 more than the original 100 items, 50 of which were from the ’importance rating’ scale and 50 from the ‘strength rating’ scale.

Second pilot

Permission to make the necessary alterations was sought and gained from the Western Sydney University Ethics Committee, and the new survey was posted online for a second pilot using Qualtrics. Fifteen participants were recruited to trial the new survey via email invitations; the small sample size was due to the unavailability of volunteers. The group comprised one primary school principal, one primary school deputy principal, three primary school assistant principals, two high school head teachers and eight classroom teachers. They were asked to complete the new online survey and verbal feedback was obtained face-to- face or over the phone. Positive feedback was obtained and there was no report of ‘fatigue’ being caused by responding to the survey. The response time averaged 40–45 minutes. Insightful comments were made by members of the focus group on the new SLCS, and two changes were suggested. The first change was the wording of the Likert scale. At the lower end of the scale, ‘weak’ was changed to ‘minimal strength’ and ‘very weak’ was changed to ‘no strength’. This was because some of the participants felt a tendency to avoid admitting to being ‘weak’ or ‘very weak’. It was suggested that changing the wording to ‘minimal strength’ and ‘no strength’ would elicit more accurate responses; therefore, this wording was adopted. The second change was to the order of the five-point Likert scale. In the revised version, the order of the strength rating was reversed to: 1= no strength; 2= minimal strength;

100 3=moderately strong; 4= strong and 5= very strong. This matched the five-point Likert scales of the other two scales.

The survey was divided into three parts. Part A presented the five capability sets, Leading Self, Leading Others, Leading other leaders, Leading the organisation, and leading the community, all clearly labelled with explanations of what each set is about. Part B (the SSEIT) was presented as a survey of their emotional intelligence, and Part C (the BFI) was presented as a personality survey (appendix 2.1, pg. 318). Upon the approval of the supervising panel of this study, the survey was ready to be administered online.

In document A School Leadership Pipeline Model : a systemic and holistic model for school leadership development (Page 110-115)