CREATION AND MAINTENANCE OF
6.3 Sample Data, Instrument Reliability, Outliers and distribution testing
6.3.1 Sample Data
A total of 2,425 responses were received. However, after checking for outliers, one response was found to be inconsistent and was therefore removed entirely from the dataset as detailed in section 6.3.2. This left a DGS respondent dataset of 2,424, of which 2,392 provided their ages. A comparison of the broad characteristics of the DGS Dataset with the comparator datasets for statistical testing is shown below in Table 6.2.
SWEMWBS Age Understanding Society Wave 7 Dataset
2015-16
Mean 25.38 48.11
N 37,469 37,469
Standard Deviation34 4.91 18.45
Minimum 7 16
Maximum 35 101
Median 25 46
DGS Participants Dataset 2019 Mean 28.21 54.57
N 2,392 2,392
Standard Deviation 3.94 14.54
Minimum 7 14
Maximum 35 90
Median 28 57
Table 6.2 Comparison of characteristics of National and DGS datasets
Table 6.2 shows that, as would be expected in large datasets, the range of SWEMWBS scores is the same for both datasets, between the lowest (7) and highest (35) possible SWEMWBS scores. The mean age range is slightly higher for DGS participants, with both groups containing a broad range of ages of participants, which is in line with the normality testing for the age range of both qualitative interview participants (as shown in Chapter 5 Figure 5.4) and the wider DGS survey participants (Appendix N), that also found a skew towards the higher age groups. An independent t-test between the mean ages of the two groups found that the differences in ages of respondents was significant (p<.05). However, a scatter-graph showed no significant correlation between age and SWEMWBS for both datasets with an R2 of 0.005, as shown in Figure 6.1, enabling the data to be used as a comparative dataset.
34 Standard Deviation is a measure that tells you the dispersion of a dataset relative to its mean value. The higher the standard deviation, the further the data points are spread out from the mean and the greater the variation in the data values.
190
Figure 6.1 Comparison of scatterplots with line of best fit between comparative national dataset ‘Understanding Society Wave 7’ and DGS participants’ dataset, indicating no significant correlation between age and SWEMWBS
191
There was no individual age data for the loneliness measures recorded in the Community Life Survey. Instead, ages are banded into either 25-64 or 16-24 and 65+. The comparison between the DGS dataset and the national community life dataset is shown below in Figure 6.2, indicating similar percentages for both age bandings, with 4.2% difference between them. In the national dataset 34.5% of the respondents were 16-24 or 65+ as opposed to 30.3% of the DGS respondents, a difference of 4.2% less. Those aged 25-64 represented 69.7% of the national dataset and the DGS dataset was 4.2% lower at 65.5%.
Figure 6.2 Comparison of age bandings percentage respondents for DGS and national datasets
In terms of gender, 86.7% of participants were men and 13.1% women, with 6 participants or 0.2% not indicating a gender.
34.5
65.5
30.3
69.7
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
16-24 or 65+ 25-64
Percentage of respondents
Age band
National Data (n = 10167) DGS Participants (n = 2389)
192
Figure 6.3 shows around one third of participants in the DGS dataset across all forms of participation were retired or semi-retired.
Figure 6.3 DGS participant’s retirement status by form of participation
Figure 6.4 shows where respondents to the questionnaire resided. It can be seen that 74.1%
of participants lived in rural areas (rural and villages) and 25.9% of participants lived in urban areas (towns and cities).
Figure 6.4 Current residence of questionnaire respondents (percentages)
32.9
Retired or semi-retired Not retired or semi-retired Retired/semi-retired percentages by group of respondents
Percentage of participants
Full Dataset Regular Beater and Pickers up Regular Paying Guns Syndicate members (paying)
43.8
Type of area in which respondents resided
193
The background and current residence of each of the 2,424 respondents was also analysed using a cross-tabulation. The results are shown in Table 6.3 below.
Area grew up: rural or urban
Rural (rural or village) Urban (town or city) Area
Table 6.3 Comparison current residence and where participants grew up.
Further calculations using the data from Table 6.3, as detailed in Table 6.4, show that 85.6%
of participants in DGS have rural residence links meaning they either currently live in a rural area or were brought up in a rural area, whilst only 14.4% have no rural connections in terms of former or current residence, indicating the strong rural link apparent within DGS
Total % of total participants (n=2424)
Rural residence links
Rural Dweller, Rural Heritage35 1264 Rural Dweller, Urban Heritage36 533 Urban Dweller, Rural Heritage 278
2075 85.6%
No rural residence links
Urban Dweller, Urban Heritage349 349 14.4%
Table 6.4 Summary of participant’s rural residence links (currently or when growing up)
Rural identity, its importance to participants and links to social capital within DGS is explored in Chapter 7, section 7.1.
35 Rural Heritage = grew up in a rural area (village or rural)
36 Urban Heritage = grew up in an urban area (town or city)
194
The interview data analysed in Chapter 5 was from England only. The DGS questionnaire responses received were spread widely across the UK as shown in Figures 6.5 and 6.6 below, with numbers of responses increasing in density in rural areas and areas such as Exmoor and Yorkshire, where there is a greater incidence of DGS.
Figure 6.5 Density of responses UK-wide (2 Map.co.uk, 2019) Figure 6.6 Distribution of responses UK-wide (Doogal, 2019)
195
A summary of the key features of the DGS participant dataset is shown in Table 6.5 below.
DGS participants sample: key features
Overall sample size 2,424
AGE n = 2392 (Age not given by 32 participants representing 1.3% of total sample)
Age Range 14 to 90
Standard Deviation37 14.54
Mean 54.57
Median 57
SWEMWBS
Mean 28.21
Standard Deviation 3.94
Median 28
Other key data
Retirement status 32.9% Retired or Semi-retired 67.1% Not Retired
Gender n=2418
Gender not given by 6 participants (0.2%)
86.7% Male 13.1% Female
Current residence 74.1% Rural (rural or village) 25.9% Urban (town or city)
Rural connection 85.6% Rural connection (current residence or residence where grew up)
14.4% No rural connection Table 6.5 Key features of wider questionnaire DGS participants’ sample
37 Standard Deviation is a measure that tells you the dispersion of a dataset relative to its mean value. The higher the standard deviation, the further the data points are spread out from the mean and the greater the variation in the data values.
196 6.3.2 Outliers
An outlier is an observation that falls outside the pattern of a distribution overall (Moore and McCabe, 1999). A summary of outliers identified in the key data-sets for statistical analysis of age and SWEMWBS is shown in Table 6.6 below. These were reviewed within the dataset and a single respondent entry was found to be an outlier for both age and SWEMWBS (extreme outlier38). Further investigation found that this multivariate outlier had inconsistencies within the responses and was removed from the dataset39. Other respondent entries were reviewed and, although their SWEMWBS responses were low, the other responses appeared genuine and consistent across the questionnaire so were retained.
These outliers represented less than 1% of the dataset and to remove them could falsely represent the data. It has been noted by other researchers that, whilst the removal of obviously false outliers (caused by respondent error or impossible value entry for example) is recommended (Orr, Sackett and Dubois, 1991; Osborne and Overhay, 2004), on average, 1%
of subjects will be outliers (Osborne and Overhay, 2004) and in large datasets, outliers have not been found to be a substantial source of validity variance (Orr, Sackett and Dubois, 1991).
In this study it was particularly important not to apply bias to the dataset due to the controversial nature of the topic. If the lower scores within the range of SWEMWBS within the results had been removed, the study could have been accused of falsely overstating the well-being levels within the study population. For outlier summary/boxplots see Appendix K.
Original
Table 6.6 Summary of outliers identified, removed and retained for age and SWEMWBS
38 In SPSS boxplots, cases falling over 1.5 box lengths from the lower or upper hinge of the box are identified as
‘standard’ outliers; ‘extreme’ outliers are identified when they fall over 3 box lengths from either hinge. The box length is the central 50% of cases dispersed around the median value.
39 The single, multivariate outlier was removed because the responses given did not make sense. The
respondent noted that they walk 0 km as a beater...this is not possible. Even though they noted they did not also shoot in the ‘beaters and pickers-up’ section they say they are a member of a syndicate shooting 100 days a season at total cost £6500. This number of days and cost is unlikely if not impossible. The age, ethnicity, type of area they grew up in and current home location were also inconsistent, particularly in light of other responses within the questionnaire.
40 The outlier removed for both SWEMWBS & age is the single response referred to in section 7.3.2.
197 6.3.3 Instrument Reliability
The Short Warwick Edinburgh Mental Well-being Scale (SWEMWBS) was used to gather data on well-being, which consists of seven items shown in Figure 6.7, including the format in which the question is asked.
‘Below are some statements about feelings and thoughts. Please tick the box that best describes your experience of each over the last 2 weeks’
None
I’ve been able to make up my own mind about
things 1 2 3 4 5
Short Warwick Edinburgh Mental Well-being Scale (SWEMWBS)© NHS Health Scotland, University of Warwick and University of Edinburgh, 2007, all rights reserved.
Figure 6.7 Short Warwick Edinburgh Mental Well-being scale
The SWEMWBS data obtained using the questionnaire was subjected to Cronbach’s test, which tests internal reliability of the scale by measuring the average agreement between items, and for which a score of 0.7 or higher is needed for a scale to be considered suitable for use in research (McLeod, 1994) and a score over 0.8 is considered good (Gliem and Gliem, 2003). The results of the Cronbach’s tests for the full dataset and the split datasets indicated are shown in Table 6.7. The values all exceed 0.8 or higher, complying with best practice and the SWEMWBS is widely used in other surveys, including the national Understanding Society UK survey (UK Data Service, 2017). No individual question removal makes the value greater than the Cronbach’s alpha of the whole scale. Full details of the Cronbach’s alpha analyses can be found in Appendix M.
198
Beaters and pickers-up only (1530 responses) .844 .848 7
Syndicate Members only (1289 responses) .843 .848 7
Paying Guns Only (1459 responses) .846 .850 7
Table 6.7 Cronbach’s alpha scores for SWEMWBS data
The high Cronbach’s scores shown in Table 6.7 across the dataset indicate that the responses provide a reliable dataset with a high level of internal consistency (McLeod, 1994), strongly supporting the validity of the research results.