7.3 Reliability and Validity
7.3.1 Reliability
Reliability is defined as “the degree to which measures are free from random error and
therefore yield consistent results” (Zikmund, 2003, p. 330). In simple terms, reliability refers
to the degree to which a scale produces stable and consistent results upon repeated
166
the lower the reliability (Hair et al., 2010). Therefore, the main objective of reliability is to
minimize the errors and biases in research (Yin, 1994).
Reliability can be assessed through three approaches – test-retest, alternative-form and
internal consistency reliability (Netemeyer et al., 2003). Test-retest reliability is used when
the same instrument is given to test the same respondents on two different occasions, taking
into account the equivalent conditions. In this case, a correlation coefficient is calculated to
reveal the degree of similarity between the two tests. However, the initial test can influence respondents’ responses on the second test administration (Malhotra, 1996). For instance,
respondents may perform better after experiencing what they have learned from the first test. Furthermore, respondents’ attitude may have changed due to the time factor. Respondents
may change their attitude if the amount of time between the two tests is too long. Hence, the
longer the time allowed between the tests, the lower the reliability. These limitations stated
by Malhotra (1996) and Zikmund (2003) make test-retest reliability unsuitable for use in this
study.
The alternative-form method “is used when two alternative instruments are designed to be as
equivalent as possible” (Zikmund, 2003, p.331). Two different items measuring the same
construct are administered to the same group of respondents. The higher the correlation
between the two forms, the more reliable the scale is (Zikmund, 2003). However, it is
difficult in all cases to create two equivalent forms of the same instrument.
Test-retest reliability and alternative-form reliability are mostly used for longitudinal studies.
They are not considered appropriate for use in this thesis because of the abovementioned
167
reliability, is “used to assess the reliability of a summated scale where several items are
summed to form of total score” (Malhotra, 1996, p. 305). In this case, a scale has proven
reliability when all the items show consistency in their indication of the concept being
measured. There are three methods used to measure internal consistency (Hair et al., 2010).
The first is split-half reliability, which requires dividing a multi-items measurement into two
halves and then examining the results obtained from the first half of the scales items against
the results from the other half. The weakness of this method is that the results vary
depending on how the items are divided. The second method is Cronbach’s (1951) coefficient
alpha, one of the most widely used methods in estimating reliability (Nunnally, 1978;
Sekaran, 2000). This method estimates the extent to which the items in the scale are
representative of the domain of the construct being measured. Cronbach’s alpha should be
used as the first measure to assess the reliability of a measurement scale (Nunnally, 1978; Churchill, 1979). Moreover, Cronbach’s alpha is important in measuring multi-point scale
items, e.g., the 7-point Likert scales used in this thesis. (Sekaran, 2000). Therefore, Cronbach’s alpha is considered appropriate to assess the reliability of the measures used in
this thesis.
Furthermore, as suggested by Fornell and Larcker (1981), composite reliability and average
variance are extracted in order to assess reliability (Fornell and Larcker, 1981). This
approach is widely used in marketing research (e.g., De Wulf et al., 2001; Hsieh et al., 2005;
Bove and Johnson, 2006). Composite reliability (also called construct reliability) measures
the overall reliability of the construct in the aggregate (Holmes-Smith et al., 2006) and is
168 (
λ
ᵢ)² CR = ─────────── (λ
ᵢ)² + ∑ Ɛᵢ Where, CR: Composite reliability λi: The standardized loadingεi: The measurement error for each indicator
It is generally recommended that CR should be equal or greater than .70 (Nunnally, 1978).
The average variance extracted (AVE) is another reliability measure and is “a summary of
convergence among a set of items representing a latent construct. It is the average percentage of variation explained among the items” (Hair et al., 2006, p. 773). The AVE reflects the
overall amount of variance explained by the latent construct (Fornell and Larcker 1981) and
is calculated from the formula given below (Fornell and Larcker, 1981):
(
λ
ᵢ²)AVE = ─────────── (
λ
ᵢ²) + ∑ Ɛᵢ169
AVE: The average variance extracted λi: The standardized loading
εi: The measurement error for each indicator
The AVE should be equal to or greater than .50 to indicate that the observable variables truly
reflect the construct in question and ensure the validity of the scale under investigation (Chin,
1998).
In this thesis, CR and AVE have been calculated separately for each multiple item construct
because AMOS does not compute these two measures directly (Hair et al., 2010). Cronbach’s
alpha, CR, and AVE were employed to ensure that the specified items are sufficient in their
representation of the underlying constructs.
7.3.2 Validity
Reliability alone is not enough to determine that an instrument is adequate (Churchill, 1979;
Anderson and Gerbing, 1988; Dunn et al., 1994; Hair et al., 2010). Therefore, validity is conducted to validate the constructs of this thesis. Validity refers to “the ability of a scale to
measure what intended to be measured” (Zikmund, 2003, p.331). It is believed that the better
the fit between the conceptual operational definitions the greater the measurement validity
(Hair et al., 2010). Convergent validity, discriminant validity and nomological validity are
required to be investigated in the validation of a construct (Peter, 1981). As for the purpose of
the generalisability of the research findings, these three validations were conducted in this
170
7.3.2.1 Convergent Validity
Convergent validity indicates the degree to which the latent variable correlates to pre-
specified indicators to measure the same construct (Anderson and Gerbing, 1988; Gerbing
and Anderson, 1988; Steenkamp and Van Trijp, 1991). Convergent validity of the constructs
in this thesis was firstly investigated by assessing the reliabilities of all the constructs. Then
the factor loadings of each construct were estimated to ensure that they are statistically
significant. Finally, composite reliability (CR) and the average variance extracted (AVE)
were used for evaluating convergent validity (Fornell and Larcker, 1981; Anderson and
Gerbing, 1988). According to Fornell and Cha (1994), convergent validity can be guaranteed
if the value of the average variance extracted (AVE) is equal or greater than .50 and
composite reliability (CR) is greater than the AVE.
7.3.2.2 Discriminant Validity
Discriminant validity refers to the extent to which one construct is district from other similar
constructs (Hair et al., 2006). High discriminant validity indicates that a construct is unique
and captures some phenomena that other measures do not. The main aim of discriminant
validity is to confirm that internal consistency is greater than external consistency. This
research used the method suggested by Fornell and Larcker (1981) to evaluate discriminant
validity. In this case, the average variance extracted (AVE) was compared with the square of
the correlation estimate between the constructs. The AVE for each construct should be
171
7.3.2.3 Nomological Validity
Nomological validity refers to the investigation of the hypothesized relationships as well as
the empirical relationship between the constructs (Hair et al., 2010). In this thesis,
nomological validity was first achieved when correlations between the constructs were in
accordance with the theory specified (Hair et al., 2006). Then the structural model was used
to assess nomological validity of the correlated constructs as suggested by Schumacker and
Lomax (2004).
7.4 Experiment procedure
The experiment was a 2 (high versus low brand-cause fit) x 2 (ongoing cause versus natural
disaster cause) factorial design. As a result, there were four questionnaires. A sample
questionnaire was shown in Appendix 9. The questionnaires were distributed to the
undergraduate students in the lectures, seminars, undergraduate common rooms, and libraries
on university campus. A prize draw of £100 were offered to encourage the students to fill out
the questionnaires. Each participant was assigned randomly to a questionnaire. The random
assignment was facilitated by sorting four sets of questionnaires into a systematic order prior
to distributing. To be able to conduct the prize draw and to ensure that each student filled out
only one questionnaire, the respondents were asked to leave their contact numbers or emails
172