Development of the CUPIT - The Cannabis Use Problems Identification Test (CUPIT) : development

Following hierarchical steps in authoritative guidelines (Clark & Watson, 1995; Comrey & Lee, 1992; Floyd & Widaman, 1995; Gorsuch, 1983; Kline, 1998; Nunnnally & Bernstein, 1994; Pett et al., 2003; Streiner & Norman, 1995), screen development procedures incorporated preliminary item analysis (IA) followed by Principal Component Analysis (PCA) for item selection and reduction. In tandem, this analytic strategy aimed to: (a) eliminate from initial analyses poorly distributed or otherwise unsuitable items (b) establish the underlying structure of surviving items (c) extract maximum variance from the items (d) identify the most parsimonious solution (fewest possible constructs/factors needed) to reproduce the original data, and (e) select homogeneous, discriminating items (empirical indicators) most related to the factor/s. The psychometric properties of the component scales were then determined.

Exploratory Factor Analysis (EFA): A rationale

As earlier noted, in addition to an (optimal) heterogeneous sample, this research also met both sample and minimum subject/variables ratio (STV 5:1) required (see MacCallum et al., 1999; Velicer & Fava, 1998). Without a priori model specification, PCA is typically employed for data condensation and prediction purposes, representing all the variance of the observed variables while addressing collinearity (Cortina, 1993; Fabrigar et al., 1999; Floyd & Widaman, 1995; Gorsuch, 1997; Hair et al., 1998; Kline, 2000; Nunnally, 1978). Despite considerable debate over which exploratory factor model (component vs. common factor) is the more appropriate (Cattell, 1965; Gorsuch, 1997; Fabrigar et al., 1999; Mulaik, 1990; Pedhazur & Schmelkin, 1991) both arrive at essentially identical substantive conclusions in most applications using 30+ variables (Nunnally, 1978; Stevens, 2002; Velicer & Jackson, 1990). Moreover, factor ‘indeterminancy’ and other complications associated with CFA rendered PCA the appropriate approach (Comrey & Lee, 1992; Hair et al., 1998; Nunnally, 1978). Some (e.g., Cattell, 1965; Gorsuch, 1997; Pedhazur & Schmelkin, 1991) claim PCA ‘inflates’ correlations among variables by utilizing all variance available. Albeit, in this research the different direction of skewness for different variables (some + some -) results in a

weakened analysis due to attenuated correlations (Gorsuch, 1997; Tabachnick & Fidell, 2001).

A schism also characterizes choice of orthogonal vs. oblique rotation (see Cattell, 1978; Fabrigar et al., 1999; Gorsuch, 1997; Pedhazur & Schmelkin, 1991). In orthogonal rotations the axes maintain their 90-degree separations and their independence after rotation, maximizing simplicity and conceptual clarity (Nunnally, 1978). Determinate component scores can be computed for each case and used as IVs (predictors) or DVs (e.g., diagnostic group membership) in regression and other prediction models, or factor structure in groups compared (Floyd & Widaman, 1995; Tabachnick & Fidell, 2001). In oblique rotations, axes are rotated separately without regard to maintaining independence, and can be rotated through their own angle. This makes possible a greater number of “cleaner” (zero) loadings. Simple structure (Thurstone, 1947) is easier to achieve. Interpretability is diminished, however, and factors may be sample- specific (Floyd & Widaman, 1995; Tabachnick & Fidell, 2001). Again, if factors are truly uncorrelated, orthogonal and oblique rotation produce nearly identical results (Floyd & Widaman, 1995; Gorsuch, 1983; Costello & Osborne, 2005; Tabachnick & Fidell, 2001). Accordingly, as recommended both techniques were used in tandem and solutions compared to ascertain any altered interpretation (Cortina, 1993; Floyd & Widaman, 1995; Tabachnick & Fidell, 2001).

Screening

As earlier noted, there was no data missing on the CUPIT. Z-scores generated for each response identified only a few (n=14) scores exceeding the ‘outlier’ 3.29 criterion (Tabachnick & Fidell, 2001). These ranged between 3.34 and 3.67 with only one exceeding 4.0, presenting no concerns (Hair et al., 1998). A few outliers are expected in larger (200+) community samples. Six cases were identified as multivariate outliers using the p<.001 criterion for Mahalanobis distances. Five of these (four 13 to 16 year- old males, and one 32 year-old female) had extreme scores on consumption variables and items indicative of compulsive use. The remaining outlier (female, also 32 years) had extreme negative scores on these variables, indicative of recent abstinence/problem minimization. All six were deleted from impending analyses, leaving 205 cases.

Several positively skewed and kurtosed continuous items evidenced some violation of the assumptions of multivariate normality, homoscedasticity, and linearity, commonly encountered in real world samples (Hair et al., 1998; Tabachnick & Fidell, 2001). However, for interpretive reasons, transformations are not universally recommended (Orr, Sackett, & DuBois, 1991; Tabachnick & Fidell, 2001) and logarithmic transformations are preferable (Bland & Altman, 1996; Osborne, 2002). Twenty-five items manifesting varying degrees of (predominantly moderate positive) skewness and kurtosis were transformed, using the appropriate square root or logarithmic (reflected where negatively skewed) transformation (Tabachnick & Fidell, 2001). Transformations reducing skewness to nearest zero were selected for use in factor analyses to compare with analyses using only original, unadulterated variables.

Item analysis

Item response distributions were examined (Table 6.2). Dichotomous (and trichotomous) variables are “dubious” (Kim & Mueller, 1978), “fraught with problems” (Floyd & Widaman, 1995), best avoided (Comrey & Lee, 1992; Gorsuch, 1983; Nunnally & Bernstein, 1994), or even proscribed (Streiner, 1994), while truncated Likert-scaled distributions are unsuitable, for factor analyses. Accordingly, items 9 (family drug problems), 24 (felt sick), 29 (been injured), 32 (others expressed concern), 36 (lost friends/partners), 38 (been arrested), were removed prior to PCA. The ‘Adjunct’ items (39-43), included to assess participants’ insight into cannabis’ harm potential and not intended for FA, were also omitted. The two multi-response dichotomously scored items (Reasons for Use, Hazardous Use) were transformed into composite continuous variables. HazardUse (q. 28) was created by combining the physically hazardous activities (driven, operated machinery, played sports, unprotected sex, used a weapon). Reasons for Use (q. 1) was computed into 5 variables (ReasFun=fun, pleasure; ReasSens=sensory enhancement; ReasHealth= health maintenance; Reashelp= social facilitation; Reaspsych=can’t stop, to avoid negative mood states) by grouping together progressively pathological reasons for using cannabis. Raw scores were inverted on item 12 (‘have you been able to stop using cannabis when you wanted to’), scored negatively to reduce variance related to response bias, ensuring all items were scored in

the same positive direction. Deletions and creation of composite variables resulted in 39 surviving items of appropriate metric scaling for factor analyses.

Principal Components Analysis (PCA)

The factorability of the data was first assessed by examination of the correlation matrix for sufficient correlations of .3 and above (Comrey & Lee, 1992; Nunnally, 1978), Bartlett’s test of sphericity, and the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. A significant Bartlett’s test (χ2, 741df=40199. 38, p=<.001) and satisfactory KMO statistic (.90) indicated data factorability. Analyses then proceeded in two stages: (1) PCA was conducted on the original (no transformed) 39 variables, first with Varimax, then Oblique, rotation; (2) this process was then repeated using 14 original variables together with the 25 transformed variables. At each stage, PCA began with the statistical criterion specified at .30, (then .35, .40, .45, .50, .55, .60, .65), suppressing lower loadings for simplification and specifying 2, 3, 4, then 5 factors, in successive runs. Results were then compared.

Heuristics from the ‘rules of thumb’ rubric (Gorsuch, 1983; Kim & Mueller, 1978) applied for factor retention and rotation included Cattell’s (1966) scree test, substantive importance (% variance accounted for), comprehensibility, and number of variables with significant loadings on a factor. A minimum of 3 highly-loading indicators per factor are necessary for over-determined, stable factors (Fabrigar et al., 1999; Kim & Mueller, 1978; Nunnally, 1978). More improve factor stability (Floyd & Widaman, 1995). Since either over- or under-factoring is to be avoided (Comrey & Lee, 1992; Fabrigar et al., 1999) the Kaiser-Guttman criterion (eigenvalue >1.00), which can greatly overestimate the optimal number to retain (Gorsuch, 1983), was bypassed as it often militates against the most parsimonious solution.

The correlation matrix of the 39 items is displayed in Appendix 25. Results of the scree test matched the eigenvalue criteria. Early indications were that solutions using a >.60 statistical criterion and submitting 2 factors to orthogonal rotation produced the ‘cleanest’, univocal (simple) structure with no cross-loadings, and more than 3 high- loading variables on both factors. Sixteen items loaded on the two components,

explaining 38.62% of the total variance. A lower statistical criterion (.55) and 2 factors produced several ambiguously loaded variables (orthogonal rotation) or increased complexity (cross-loadings in oblique rotation). Of note, as the literature suggests results were remarkably similar whether the original variables only, or the transformed- original combination, were used, and whether followed by orthogonal, or oblique, rotation. As the factor correlation matrices revealed, however, factors consistently correlated between .25 and .29, representing less than 10% overlap in variance. Hence, following guidelines for identifying the solution with the greatest scientific utility, consistency, replicability, and least complexity apropos of research objectives (e.g., Comrey & Lee, 1992; Fabrigar et al., 1999; Nunnally, 1978; Tabachnick & Fidell, 2001), the orthogonal solution using the original (untransformed) variables was selected for interpretation and further analyses.

This decision was based on several empirical and pragmatic considerations. First, despite the improved normality (hence correlations) of the transformed variables, interpretability of the transformed-original variable merger was at best compromised and at worst, unfeasible. After rotation, moreover, several transformed items had substantial negative loadings, the antithesis of the “positive manifold” (Comrey & Lee, 1992) characteristic of good structure. Second, Gorsuch (1983) observed that in actuality, normalizing data does not greatly increase many correlation coefficients: “it appears that normalizing is not needed as a standard procedure for estimates of factor loadings” (p. 302). Third, small factor correlations (<.32) warrant the assumption of orthogonality (Gorsuch, 1983; Nunnally, 1978; Tabachnick & Fidell, 2001). Fourth, the primary objective of item reduction and prediction analyses (and not exploration of underlying latent constructs) required determinate factor scores derived from orthogonal components. Finally, and perhaps most compellingly, given the virtually identical results generated by the alternative variable-technique combinations, only minor differences (if any) would ultimately emerge in subsequent analyses, whichever the solution used. Clear evidence of the stability of the solution manifested in its constant appearance, regardless of which extraction technique was employed (Tabachnick & Fidell, 2001).

Table 6.6 presents the 16 items and their loadings on the two primary components after rotation, component eigenvalues, and amount of variance explained. As evident, several variables loaded above that deemed “excellent” (>.71, denoting 50% variance shared with that component) while most of the remaining loadings met the criterion (>.63, reflecting 40% shared variance) considered “very good” (see Comrey & Lee, 1992). Given the .30 (Comrey & Lee, 1992) or .35 (Nunnally & Bernstein, 1994) cut-off typically recommended as the minimum loading for variables used to define a factor, all these representative variables qualified for interpretive purposes. Component 1 had significant loadings of ten items: five of these were consumption variables, and the remaining 5 suggested ‘compulsivity’ or ‘impaired control’ over use. Component 2 comprised six items reflecting consequences of, or problems caused by, cannabis use. From this point on, the component scales will be known by these factor labels.

Table 6.6: Impaired Control and Problems subscales after Orthogonal Rotation: Eigenvalues, Percentage of Variance Explained, and Item Loadings

Item Impaired Control Problems

Number of days used past 12 months Number of days used past 3 months Times used on a typical day of use Times used first thing in the morning Hours a day stoned Would find it difficult to stop using altogether Longest time without cannabis Felt the need for cannabis Able to stop using when wanted to Found it difficult to get through a day without cannabis Cannabis interfered with work at school, job, or home Lacked energy to get things done Given up important/enjoyable things for cannabis Things planned or expected not happened after using cannabis Concentration and memory problems Used after deciding not to Eigenvalue Percentage of Variance Explained .789 .758 .748 .726 .706 .679 .648 .625 .612 .608 .753 .731 .706 .669 .649 .606 8.199 6.861 21.024 17.592

Psychometric properties of the component scales Internal consistency reliability

The internal consistency of the components was first examined. Table 6.7 displays the descriptive statistics and standardized reliability estimates for the two component scales for the whole sample, and for age-groups. As shown, Cronbach’s alphas (Cronbach, 1951) ranged from .79 to .92. A benchmark of .80 (Nunnally, 1978), or .70 for exploratory research (Nunnally, 1978; Hair et al., 1998), is commonly recommended. Coefficient alpha is a function of the number of items, their average inter-tem correlations and dimensionality, however, and not a measure of unidimensionality (Cortina, 1993; Schmitt, 1996). To ensure univocal components, items should be only moderately (average .15 -.50) intercorrelated (Clark & Watson, 1995; Cortina, 1993; Streiner & Norman, 1995). Differentiated items yield far more information. As shown, mean inter-item correlations in both components ranged from .40 to .59. Thus, across the whole sample and age groups these results were (more than) adequate on both indices. The favourable internal consistency results suggest proper sampling of content domains. In addition, the moderate significant component scale intercorrelation (r=.47, p=<.001) suggests that the components, with good discriminant validity (share 22% of their variance) are separate (unidimensional), but related, axis of an overarching construct (Cronbach, 1990; Dawe et al., 2002; Schmitt, 1996).

Table 6.7: Reliability Estimates and Descriptives for the CUPIT subscales

Group/Variable Impaired Control Problems

Whole sample (n=212) Mean

Mean inter-item correlation

Age groups

Adolescents (n=138) Mean

Mean inter-item correlation

Adults (n=74) Mean SD

Mean inter-item correlation

.92 28.14 11.78 .53 .92 26.98 11.40 .55 .92 30.30 12.23 .52 .83 6.96 4.58 .46 .79 7.36 4.51 .40 .90 6.20 4.63 .59

Construct validity

Component scores were generated for all respondents using the simple summation of raw scores method (see Comrey & Lee, 1992; Floyd & Widaman, 1995; Pedhazur & Schmelkin, 1991). This technique generates readily interpretable scores (cf. estimates) that are comparable across studies (Pett et al., 2003). The universally high loadings, item-total correlations, Likert-type scale metric, and mean inter-correlations of items made differential proportional weighting unnecessary (Floyd & Widaman, 1995; Gorsuch, 1983; Guilford, 1954; Nunnally, 1978; Streiner & Norman, 1995). As Nunnally (1978) points out, the correlation of weighted and unweighted items is generally so high that “it is almost always a waste of time to seek differential weights for items” (p. 296). Scores were then used in correlation analyses with other key outcome measures to establish the construct validity of component items.

Pearson correlations first assessed the relationship between the components and age as a continuous variable. While age was significantly correlated with Impaired Control, age was not significantly associated with Problems. Descriptive statistics revealed age was significantly skewed and kurtosed. A Student’s t-test comparing age-groups (adolescents/adults) on the components revealed significant differences on Impaired Control, t (201) = -2.13, p. 03, and Problems, t (201) = 2.00, p. 047. Adults had significantly higher mean scores on Impaired Control, while adolescents scored more highly on Problems. Thus, where indicated, use of age-groups in subsequent validation analyses was warranted.

Convergent/discriminant validity

Constituent scales of a measure should possess good convergent and discriminatory validity (Campbell & Fiske, 1959). Following satisfactory testing of multivariate assumptions, Pearson correlations were calculated to assess the relationship between the components and the 12-month CIDI-Auto 2.1 (WHO, 1997), the criterion standard; the Timeline Follow Back (TLFB; Sobell & Sobell, 1992) measures of 90- and 30-day cannabis consumption; the 6-month Severity of Dependence Scale (SDS; Gossop et al., 1995), Cannabis Problems Questionnaire for adults (CPQ; Copeland et al., 2001a, 2005) and adolescents (CPQ-A; Martin et al., 2006); and the BSI-18 (Derogatis, 2000)

measure of psychological distress. Pearson correlations were first computed for the sample, followed by a t-test by age-groups. If significantly different, analyses were repeated by age-groups. Results appear in Table 6.8.

Table 6.8: Correlation between the CUPIT subscales and Key Validation Measures

Measure/Variable

Impaired Control Sample Adolescents Adults (n=211) (n=138) (n=73)

Problems Sample Adolescents Adults (n=211) (n=138) (n=73) CIDI-Auto Number of DSM/ICD symptoms Severity of Dependence Scale (SDS) TLFB interview: Days used past 90 days Cones used past 90 days Days used past 30 days Cones used past 30 days CPQ for Adults Core scale (n=76) Spouse scale (n=26) Children scale (n=21) Work scale (n=38) CPQ for Adolescents Core scale (n=135) Parent scale (n=129) Partner scale (n=64) School scale (n=117) Work scale (n=26) BSI-18 GSI score .63*** .68*** .51*** . 71*** .72*** .70*** .80*** .81*** .76*** .60*** .60*** .65*** .77*** .80*** .72*** .60*** .55*** .67*** .53*** .26 .43 .46** .59*** .35*** .46*** .54*** .46* .18* .21* .09 .57*** .61*** .50***. .59*** .61*** .63*** .25** .42** .08 .19** .23** .27* .24*** .40** .10 .19** .21* .28* .62*** .36 . -.04 .48** .63*** .26** .25* .48*** .47* .42*** .36*** .56***

*p<.05, **p<.01, *** p<.001, all tests two-tailed.

Mean scores and standard deviations for the key validation measures tabulated have been detailed (section one, this chapter). As reported there, adults scored significantly higher than their younger counterparts (<.001) on all four TLFB consumption variables. Adults also scored significantly more highly on the BSI (<.05). T-tests revealed no significant age-group differences on the SDS or total number of symptoms elicited in the CIDI-Auto interview. Nonetheless, given the mixed patterns that emerged,

component correlations with all key validation measures are tabulated by age groups for the interested reader.

As expected, both components exhibited highly significant, conceptually consistent moderate to strong positive correlations with total DSM-IV/ICD-10 symptoms, the SDS, and all subscales of the CPQ for adolescents. Both components also showed varying moderate to strong correlations with the adult CPQ core and work scale. The weak nonsignificant relationships with the spouse and children subscales were largely explained by their limited applicability to respondents (Copeland et al., 2005). Impaired Control was also strongly correlated with all four TLFB consumption measures. Problems showed somewhat weaker significant correlations with these measures, most of which was accounted for by the adolescents. An interesting pattern of association emerged for the GSI (BSI). This measure’s significant but weak correlation with Impaired Control over the whole sample was again largely accounted for by the adolescents. In contrast, the GSI’s highly significant, moderate positive correlation with Problems held over the whole sample.

Concurrent/criterion-related validity

Three independent DSM-IV/ICD-10 diagnostic groups were created from the CIDI- Auto algorithim output: no diagnosis (n=17), abuse and/or harmful use (n=36), and dependence, with or without abuse/harmful use diagnoses (n=158). A one-way ANOVA was conducted to test for any differences in component scores as a function of diagnostic group. Component scores for adolescents/adults in the diagnostic groups were then compared. Distributions revealed adult diagnostic groups were unequal (1; 12; 60), abrogating between-group ANOVAs. Hence, Student’s t-tests were conducted to compare adult ‘abuse/harmful use’ and ‘dependence’ groups (only). With their adequate group distributions (16; 24; 98), ANOVAS were appropriate to test for adolescent diagnostic group differences. Given the importance of consumption measures in drug screening, ANOVAs were also used to investigate any significant differences on the TLFB variables across sample diagnostic groups, and then by age groups (t-tests for adults). All ANOVAs were followed by post hoc pairwise comparisons using Tukey’s HSD. Results appear in Table 6.9.

Table 6.9: CUPIT subscale and Consumption scores by DSM-IV/ICD-10 Diagnostic Group

Variable

Diagnostic Group

No Diagnosis Abuse/Harmful Use Dependence

Mean SD Mean SD Mean SD F Significance

Impaired Control Whole sample (n=211) Adolescents (n=138) Adults (n=73) Problems Whole sample (n=211) Adolescents (n=138) Adults (n=73) Cannabis Consumption Whole sample (n=211) Days used past 90 days Cones used past 90 days Days used past 30 days Cones used past 30 days

Adolescents (n=138) Days used past 90 days Cones used past 90 days Days used past 30 days Cones used past 30 days

Adults (n=73) Days used past 90 days Cones used past 90 days Days used past 30 days Cones used past 30 days

14.12 7.32 17.75 7.03 32.08 10.37 51.74 <.001 13.13 6.27 19.04 6.85 31.18 10.07 37.07 <.001 15.17 6.95 33.55 10.76 - 5.666* <.001 3.18 2.77 4.56 2.80 7.91 4.68 16.10 <.001 3.19 2.86 4.75 2.89 8.68 4.38 19.04 <.001 4.17 2.69 6.63 4.90 -1.689* .096 15.71 19.57 29.89 23.71 61.94 27.40 40.16 <.001 69.35 73.63 139.94 145.99 617.13 752.70 11.56 <.001 5.06 6.92 10.53 9.15 20.86 9.71 34.54 <.001 21.71 22.80 50.67 53.78 215.16 271.34 10.77 <.001 11.31 7.66 30.00 22.75 55.92 26.72 28.81 <.001 57.25 55.91 140.33 116.05 461.16 472.90 11.10 <.001 3.50 2.66 10.17 8.26 18.66 9.54 25.47 <.001 18.81 20.06 50.50 49.32 160.04 192.87 7.98 <.01 29.67 26.58 71.77 25.79 -5.14* <.001 139.17 199.15 871.87 1016.64 -2.47* <.05 11.25 11.09 24.45 8.94 -4.48* <.001 51.00 64.16 305.18 348.54 -2.50* <.05

* Student’s t-test, 70df. All tests two-tailed.

As Table 6.9 shows, across the whole sample and for adolescents only, mean scores on both components were significantly different across diagnostic groups, and in the expected direction (no diagnosis mean scores < abuse/harmful use mean scores < dependence scores). That is, mean scores plotted against diagnostic groups displayed an

In document The Cannabis Use Problems Identification Test (CUPIT) : development and psychometrics : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Psychology at Massey University, Palmerston North, New Zealand (Page 190-200)