Quality Checks - The intention to cycle : a comparative study of the perceptions and attitudes

Descriptive statistics were used to summarise the data and to provide a basic

understanding from which decisions about data processing and further analysis can be made. There were four areas of focus at this stage:

• Data quality

• Representativeness • Normality

• Response patterns

There were also a number of validity tests which were applied to the PLS-SEM models (Chapter 6). These are described in Section 3.9.

3.6.1 Data Quality

There may be data quality issues, such as straight-lining with the Likert-scale

questions, which could require responses to be removed (Nancarrow and Tapp, 2014). To help reduce these issues at the data collection stage, any respondent that took less than 6 minutes to complete the survey was automatically rejected. The minimum time was set at 6 minutes after reviewing cases of poor quality data from the soft launch. The online survey tool (Survey Gizmo) which was used assigns every respondent a ‘dirty data’ score from 0 to 100, where 100 is the poorest quality data. This tool allows the user to specify which issues should be included when determining the score. As only ‘straight-lining/patterned responses’ was appropriate for this survey, this issue was selected and given the maximum weight (10). The tool indicated that the

majority of responses showed no suspicious answer patterns. Responses that showed any suspicious response, indicated by any score of 1 or more, were rejected. Through

3.6. Quality Checks

this process 10% of the responses were removed. The order in which the Likert scale questions were presented was randomised to reduce bias, this means that the order in which any individual respondent was shown the questions is not known. Because of this, it was determined that the data quality scores allocated at the time of completion by the survey software would be used as post-hoc analysis would not be able to detect patterned responses (Survey Gizmo, 2015).

A further check was made to insure that the data for analysis was provided by

genuine UK respondents. Using the Internet Protocol (IP) address of the respondents, only responses recorded as originating in the United Kingdom were retained. The postcode data provided by the respondent was then matched to the self-reported urban/rural description of the local area and the region in which they lived and cases where there appeared to be mismatches were also removed.

Following these data quality checks 88% of the data was retained for analysis.

3.6.2 Shapiro-Wilks Normality Test

The attitudinal statements were tested to see whether they could be assumed to fit within the normal distribution. Within frequentist statistics there are two approaches; parametric and nonparametric. Many researchers see parametric methods as

preferable since they have greater statistical power, allowing small differences between groups to be revealed with more confidence (Bryman, 2016). However, the application of these methods requires more assumptions regarding the nature of the of the data to be made. Parametric statistics require assumptions which

non-parametric methods do not; (i) that the data are based on interval level of measurement or above and (ii) that they are well described by the normal

distribution. Non-parametric methods provide a more appropriate option if these assumptions cannot be met. If parametric methods are used when the required assumptions are not met this can lead to Type 1 error which can result in differences being accepted which are not really present within the data.

Within the range of non-parametric methods there is still a range of assumptions which, depending on the specific test, may need to be met regarding skewness or kurtosis within the data.

Thus Shapiro-Wilks normality tests were applied to the data. Where p <0.05 the null hypothesis (that the data was normal) was rejected.

3.6.3 Representativeness

In addition to the precautions taken as described in section 3.6.1, the data was compared to nationally available data regarding age and gender distribution within the population, this analysis is reported within Section 4.4.

3.7 Hypothesis Testing

Non-parametric tests were applied when dealing with responses to the attitudinal statements. In addition, non-parametric tests were preferred when data were not ratio or interval level.

3.7.1 Mann-Whitney U Test

In cases where averages are to be compared for nominal data with two levels the Mann-Whitney U test was used. The Mann-Whitney U test is a nonparametric test similar to the t-test. While it does not make the assumption of a normal distribution, the assumption that the distribution of both samples is the same is required. The null hypothesis for this test is that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample (Sheskin, 2003).

Where p <0.05 the null hypothesis was rejected.

3.7.2 Kruskal-Wallis Test

In cases where averages are to be compared for nominal data with more than two levels the Kruskal-Wallis test was used. This is an extension of the Mann-Whitney U test and thus does not assume normal distribution but does assume that the samples originate from the same distribution. The parametric equivalent of this test is analysis of variance ANOVA. As with ANOVA, this test is used to test the null hypothesis that none of the samples stochastically dominates any of the others, but does not reveal which samples are significantly different from each other (Sheskin, 2003).

In document The intention to cycle : a comparative study of the perceptions and attitudes of cyclists and non-cyclists (Page 80-83)