• No results found

7.3 Reliability and Validity

7.3.1 Reliability

Reliability is defined as “the degree to which measures are free from random error and

therefore yield consistent results” (Zikmund, 2003, p. 330). In simple terms, reliability refers

to the degree to which a scale produces stable and consistent results upon repeated

166

the lower the reliability (Hair et al., 2010). Therefore, the main objective of reliability is to

minimize the errors and biases in research (Yin, 1994).

Reliability can be assessed through three approaches – test-retest, alternative-form and

internal consistency reliability (Netemeyer et al., 2003). Test-retest reliability is used when

the same instrument is given to test the same respondents on two different occasions, taking

into account the equivalent conditions. In this case, a correlation coefficient is calculated to

reveal the degree of similarity between the two tests. However, the initial test can influence respondents’ responses on the second test administration (Malhotra, 1996). For instance,

respondents may perform better after experiencing what they have learned from the first test. Furthermore, respondents’ attitude may have changed due to the time factor. Respondents

may change their attitude if the amount of time between the two tests is too long. Hence, the

longer the time allowed between the tests, the lower the reliability. These limitations stated

by Malhotra (1996) and Zikmund (2003) make test-retest reliability unsuitable for use in this

study.

The alternative-form method “is used when two alternative instruments are designed to be as

equivalent as possible” (Zikmund, 2003, p.331). Two different items measuring the same

construct are administered to the same group of respondents. The higher the correlation

between the two forms, the more reliable the scale is (Zikmund, 2003). However, it is

difficult in all cases to create two equivalent forms of the same instrument.

Test-retest reliability and alternative-form reliability are mostly used for longitudinal studies.

They are not considered appropriate for use in this thesis because of the abovementioned

167

reliability, is “used to assess the reliability of a summated scale where several items are

summed to form of total score” (Malhotra, 1996, p. 305). In this case, a scale has proven

reliability when all the items show consistency in their indication of the concept being

measured. There are three methods used to measure internal consistency (Hair et al., 2010).

The first is split-half reliability, which requires dividing a multi-items measurement into two

halves and then examining the results obtained from the first half of the scales items against

the results from the other half. The weakness of this method is that the results vary

depending on how the items are divided. The second method is Cronbach’s (1951) coefficient

alpha, one of the most widely used methods in estimating reliability (Nunnally, 1978;

Sekaran, 2000). This method estimates the extent to which the items in the scale are

representative of the domain of the construct being measured. Cronbach’s alpha should be

used as the first measure to assess the reliability of a measurement scale (Nunnally, 1978; Churchill, 1979). Moreover, Cronbach’s alpha is important in measuring multi-point scale

items, e.g., the 7-point Likert scales used in this thesis. (Sekaran, 2000). Therefore, Cronbach’s alpha is considered appropriate to assess the reliability of the measures used in

this thesis.

Furthermore, as suggested by Fornell and Larcker (1981), composite reliability and average

variance are extracted in order to assess reliability (Fornell and Larcker, 1981). This

approach is widely used in marketing research (e.g., De Wulf et al., 2001; Hsieh et al., 2005;

Bove and Johnson, 2006). Composite reliability (also called construct reliability) measures

the overall reliability of the construct in the aggregate (Holmes-Smith et al., 2006) and is

168 (

λ

ᵢ)² CR = ─────────── (

λ

ᵢ)² + ∑ Ɛᵢ Where, CR: Composite reliability λi: The standardized loading

εi: The measurement error for each indicator

It is generally recommended that CR should be equal or greater than .70 (Nunnally, 1978).

The average variance extracted (AVE) is another reliability measure and is “a summary of

convergence among a set of items representing a latent construct. It is the average percentage of variation explained among the items” (Hair et al., 2006, p. 773). The AVE reflects the

overall amount of variance explained by the latent construct (Fornell and Larcker 1981) and

is calculated from the formula given below (Fornell and Larcker, 1981):

(

λ

ᵢ²)

AVE = ─────────── (

λ

ᵢ²) + ∑ Ɛᵢ

169

AVE: The average variance extracted λi: The standardized loading

εi: The measurement error for each indicator

The AVE should be equal to or greater than .50 to indicate that the observable variables truly

reflect the construct in question and ensure the validity of the scale under investigation (Chin,

1998).

In this thesis, CR and AVE have been calculated separately for each multiple item construct

because AMOS does not compute these two measures directly (Hair et al., 2010). Cronbach’s

alpha, CR, and AVE were employed to ensure that the specified items are sufficient in their

representation of the underlying constructs.

7.3.2 Validity

Reliability alone is not enough to determine that an instrument is adequate (Churchill, 1979;

Anderson and Gerbing, 1988; Dunn et al., 1994; Hair et al., 2010). Therefore, validity is conducted to validate the constructs of this thesis. Validity refers to “the ability of a scale to

measure what intended to be measured” (Zikmund, 2003, p.331). It is believed that the better

the fit between the conceptual operational definitions the greater the measurement validity

(Hair et al., 2010). Convergent validity, discriminant validity and nomological validity are

required to be investigated in the validation of a construct (Peter, 1981). As for the purpose of

the generalisability of the research findings, these three validations were conducted in this

170

7.3.2.1 Convergent Validity

Convergent validity indicates the degree to which the latent variable correlates to pre-

specified indicators to measure the same construct (Anderson and Gerbing, 1988; Gerbing

and Anderson, 1988; Steenkamp and Van Trijp, 1991). Convergent validity of the constructs

in this thesis was firstly investigated by assessing the reliabilities of all the constructs. Then

the factor loadings of each construct were estimated to ensure that they are statistically

significant. Finally, composite reliability (CR) and the average variance extracted (AVE)

were used for evaluating convergent validity (Fornell and Larcker, 1981; Anderson and

Gerbing, 1988). According to Fornell and Cha (1994), convergent validity can be guaranteed

if the value of the average variance extracted (AVE) is equal or greater than .50 and

composite reliability (CR) is greater than the AVE.

7.3.2.2 Discriminant Validity

Discriminant validity refers to the extent to which one construct is district from other similar

constructs (Hair et al., 2006). High discriminant validity indicates that a construct is unique

and captures some phenomena that other measures do not. The main aim of discriminant

validity is to confirm that internal consistency is greater than external consistency. This

research used the method suggested by Fornell and Larcker (1981) to evaluate discriminant

validity. In this case, the average variance extracted (AVE) was compared with the square of

the correlation estimate between the constructs. The AVE for each construct should be

171

7.3.2.3 Nomological Validity

Nomological validity refers to the investigation of the hypothesized relationships as well as

the empirical relationship between the constructs (Hair et al., 2010). In this thesis,

nomological validity was first achieved when correlations between the constructs were in

accordance with the theory specified (Hair et al., 2006). Then the structural model was used

to assess nomological validity of the correlated constructs as suggested by Schumacker and

Lomax (2004).

7.4 Experiment procedure

The experiment was a 2 (high versus low brand-cause fit) x 2 (ongoing cause versus natural

disaster cause) factorial design. As a result, there were four questionnaires. A sample

questionnaire was shown in Appendix 9. The questionnaires were distributed to the

undergraduate students in the lectures, seminars, undergraduate common rooms, and libraries

on university campus. A prize draw of £100 were offered to encourage the students to fill out

the questionnaires. Each participant was assigned randomly to a questionnaire. The random

assignment was facilitated by sorting four sets of questionnaires into a systematic order prior

to distributing. To be able to conduct the prize draw and to ensure that each student filled out

only one questionnaire, the respondents were asked to leave their contact numbers or emails

172