Chapter 4 Methodology and methods
4.5 Statistical methods
4.5.3 Correlation coefficients
Pearson product-moment correlation coefficient is a measure of the linear
association between two variables, ranging from -1 to +1, where -1 indicates a
perfectly negative linear relationship, 0 is no relationship and +1 is a perfectly
positive relationship. The stronger the correlation, the closer the correlation
coefficient comes to ±1. A positive value indicates a direct relationship and a
negative value an inverse relationship. This test is useful for summarising the
strength of the linear relationship between variables, however, it does not infer
4.5.4 Regression
Researchers rely on regression analysis when trying to explain a dependent variable
as an outcome of various independent variables. The regression method used
depends to a large extent on the type of data used in the research project. Common
to all regression methods is the need to describe as simply as possible the
relationships between the variables under study.
Logistic regression
Logistic regression is a statistical method used by researchers to analyse data with
one or more independent variables that are associated with an outcome. The
outcome is measured with a dichotomous or binary (categorical) dependent
(outcome) variable (Field, 2005) where, for example, 0 would be the absence and 1
the presence of disease, with one seeking to estimate the probability of an individual
being either 0 or 1.
Logistic regression analyses generate the coefficients, standard errors and
significance levels of a formula to predict a logit transformation of the probability of
the presence of the outcome of interest (e.g. poor health) (Szumilas, 2010). The
exponential function of the regression coefficient is the odds ratio associated with a
one unit increase in exposure. This is particularly useful in health research where
most variables are dichotomous, for example whether or not an individual has a long-
term illness. Therefore, logistic regression is used to describe data and explain
relationships between a dependent variable and one or more independent
(predictor) variables. This would enable questions on the odds of workers in each
long-term illness, mental health conditions, and poor health behaviours) compared
to workers in other occupations to be addressed. Reference categories were used in
all logistic regression models and were determined by the research questions and
sample size, with the largest group often being used as the reference category.
Reference categories were 40−49 year olds, females, of white ethnic origin, nursing
and midwifery professionals/nurses, and in full-time work.
Interpreting and reporting of logistic regression models assumes a degree of
knowledge. As mentioned above, estimated logistic regression coefficients are
expressed in exponential form as odds ratios. The overall fit of the model is
interpreted and expressed using the -2 log likelihood, the significance determined by
Cox and Snell R2 and Nagelkerke R2 statistics, and the percentage correctly predicted.
The Cox and Snell R2 and Nagelkerke R2 statistics provide an indication of the
proportion of variance explained by the predictors. The percentage correctly
predicted provides an overall percentage of cases that are correctly predicted by the
model and each outcome category.
In the study, logistic regression was used to address research questions three, five,
and seven. The questions are shown in Chapter 1.
Cox proportional hazards regression
Survival analysis is an important statistical procedure used by researchers to examine
the relationship of the survival distribution (or the time it takes for an event to occur)
to covariates. In this study, Cox proportional hazards regression was used to
study (see section 4.5.4). Workforce exit is measured with dichotomous variables
where 0 equates to remained and 1 is left the workforce.
Cox proportional hazards regression generates the coefficient, standard errors and
significance levels of a formula to predict the log-hazard of the probability of
workforce exit. The exponential function of the regression coefficient is the relative
hazard ratio associated with one unit increase in exposure. One main advantage of
Cox proportional hazards regression is that there is no requirement to select a
specific probability model to represent survival time or, in this case, time to
workforce exit, and it is thus more robust than parametric methods. This is
particularly useful when examining the relative odds of workforce exit among
different occupations. The hypotheses were tested using the Wald test and the
Likelihood ratio test.
In the study, Cox proportional hazards regression was used to address research
question nine (see Chapter 1).
Two-tailed tests
Two-tailed tests are a measure used to determine whether a sample is greater than
or less than a certain range of values. There were two main reasons for using two-
tailed testing. First, a larger magnitude of the critical value is used providing a more
conservative, rigorous test (Cho and Abe, 2013). Second, by drawing on a two-tailed
test, the analysis was safeguarded against the parameter being significant in the
4.6 Studies
This section explores the methods used in each of the four studies separately. First,
the methods used to address each research question will be outlined in accordance
with the Strengthening the Reporting of Observational Studies in Epidemiology
(STROBE) statement.
4.6.1 Study 1
The prevalence of tobacco smoking, physical activity, alcohol consumption, and
dietary habits, specifically sugar, fat and fruit and vegetable intake, among nurses
and student nurses internationally was the focus of question one. To address this
research question, a quantitative integrative review of literature published between
January 2000 and December 2016, and indexed in MEDLINE, CINAHL and PsycINFO
on nurses’ or student nurses’ health behaviours was conducted. This study was
presented in Chapter 3.
4.6.2 Study 2
Research questions, objective and hypotheses
The research questions and aims are shown in Chapter 1, sections 1.7 and 1.8. Methods
Study design, setting and participants
A cross-sectional study design was used to quantify the health status of workers in
eight occupational groups in the UK using routinely collected data from the APS (ONS,
2016). The APS comprises partly of the LFS, a survey of people resident at private
addresses in the UK. The main purpose of the survey is to provide key social and
workers in the UK labour market. The survey is managed by a subdivision in the
Office for National Statistics.
The LFS covers an estimated 60,000 households each quarter and uses a panel design
whereby samples remain in the survey for five consecutive rounds. The survey uses
an unclustered sample of addresses in the UK to improve precision of estimates. In
Scotland, there is a very small bias in that there is only partial coverage of the
population north of the Caledonian Canal – approximately five percent of the total
population in this area. The APS provides enhanced annual data for England –
particularly urban areas - targeting a minimum of 510 economically active people in
each unitary authority/local authority district and a minimum of 450 in each Greater
London Borough (UK Data Archive, 2016). This provides an estimated sample size of
320,000 people, representing 0.16 percent of the British population.
There are four different sampling frames used in the LFS with the UK split into two
areas – south of the Caledonian Canal (e.g. England, Wales and most of Scotland),
north of the Caledonian Canal, Northern Ireland, and NHS accommodation
establishments. In Wave 1 the sample was selected by ordering the sampling frames
geographically and then drawing the selection systematically with fixed intervals.
Samples were based on postcodes taken from the Royal Mail Postcode Address File
or the telephone directories depending on geographical location. This sample was
then retained for four more consecutive rounds before these respondents exited the
survey. Data is collected in all regions by means of face-to-face interviews with the
exception of those north of the Caledonian Canal where telephone interviews are
potential bias from non-coverage of people not listed in the directories for several
reasons (e.g. no telephone, mobile only, ex-directory, living in new-build housing).
While this approach may bias the sample towards those with a telephone, alternative
strategies (e.g. face-to-face interviews) would be costly and time consuming. The
APS yields a response rate of around 66 percent.
Respondents who participated in the Annual Population Survey between January and
March 2016, were economically active and aged between 17 and 69. The present
study excluded respondents aged below 17 and over 69 since we assumed that
people below 17 were generally in full-time education and those over 69 would
typically be retired. While applying this assumption certainly has limitations, given
the complexity to define working age at an individual level this was considered to be
the best available criteria to enable comparisons to be drawn and meaningful
findings to emerge.
Variables
Outcome variables
The choice of outcome measure was a crucial component of this study. A mixture of
self-assessed health outcomes and self-reported health problems were chosen to
provide a broad picture of workers’ self-reported health.
Current disability
Current disability was measured in accordance with the Equality Act 2010 (Part 2,
Chapter 1, Section 6), in which respondents self-reported to be either (Equality Act)
disabled or not (Equality Act) disabled. The Equality Act 2010’s legal definition of
impairment, and the impairment has a substantial and long-term adverse effect on
the person’s ability to carry out normal day-to-day activities” (Equality Act, 2010, p.
4). Interviewers also asked respondents during their first interview if they had ever
had any other health problem or disability that had lasted more than one year, yes
versus no.
Health problem affecting amount or kind of paid work
Respondents who self-reported a health problem that they expected to last for more
than one year, (and were aged below 64 and currently looking for or wanting work)
were asked whether their health problem affected the amount of paid work they
were able to do (yes/no). These respondents were also asked whether their health
problem affected the kind of paid work they were able to do (yes/no).
Satisfaction with life
The Satisfaction with Life Scale, was based on a simple question, “How do you rate
your satisfaction with life as a whole nowadays?” on a 10-point scale ranging
between extremely dissatisfied (0) to extremely satisfied (10) and is a frequently used
measure of wellbeing. The main advantage of using satisfaction with life is that the
democratic measures allow people to self-evaluate their own life situation rather
than have others – such as governments – decide what is important to them.
Moreover, the scale leads people to evaluate their life, not merely in relation to
health alone, but rather integrate life domains such as health and finances as they
see fit, providing their own unique weights to each domain (Pavot and Diener, 2009).
This is a subjective process whereby respondents rate their satisfaction with life
perceived life circumstances with a self-imposed standard. It is unclear precisely how
each person makes this judgement. Of course, there are immediate influences from
our current situation but also historic influences. An accumulation of influences over
the life course from childhood, schooling and family backgrounds impact on life
satisfaction. Figure 4.1 provides a visual representation of some determinants
influencing how people rate their life satisfaction.
Taken from Clark, Fleche, Layard, Powdthavee and Ward (2016).
Figure 4.1. Determinants of Adult Life Satisfaction.
Clark et al. (2016) used survey data from four major countries – Australia, Britain,
Germany and the United States – to investigate variations in life satisfaction. The
study indicated that social relationships and mental and physical health mattered
most to people, with emotional health as a child the best predictor of an adult’s life
feature in the least satisfied people (bottom 10% of the population in terms of life
satisfaction) was not unemployment or poverty, but mental ill health (e.g. depression
or anxiety). Nonetheless, the rating of life satisfaction remains fairly consistent over
much of adulthood, with a steep decline in life satisfaction often seen among those
aged over 70 (Baird, Lucas, and Donnellan, 2010).
Control variables
The Standard Occupational Classification (SOC) codes established in the UK in 1990
are an internationally recognised common classification of occupations based on skill
specification and skill level and were used to categorise respondents into eight
groups (see Table 4.3 for the occupational groups used in the analysis along with their
SOC2010 codes). There is likely to be a small degree of bias associated with
categorising occupations this way with skill requirements inevitably varying from job
to job and workplace to workplace; complete agreement in every establishment or
authority area is unlikely. Nevertheless, despite these minor points, SOC provides a
straightforward and structured approach to classifying occupation, compatible with
international standards (ONS, 2010).
In relation to ‘occupation’, respondents were identified as belonging to a health
occupation or one of two comparison groups. Accordingly, the first group comprised:
Health occupations included: health professionals, therapy professionals, nursing and
midwifery professionals, caring personal services, health and social services
Table 4.3 Occupational Classification.
Occupational categories Included occupations
SOC2010 Code (2012)
Health professionals Medical practitioners; psychologists; pharmacists; ophthalmic opticians; dental practitioners; veterinarians; medical radiographers; podiatrists; and health professionals.
221
Therapy professionals Physiotherapists; occupational therapists; speech and language therapists; and therapy professionals.
222
Nursing and midwifery professionals
District nurses; health visitors; mental health practitioners; nurses; practice nurses; psychiatric nurses; staff nurses; student nurses; midwifery sisters; midwives; and student midwives.
223
Caring personal services Nursing and auxiliaries and assistants; ambulance staff (excluding paramedics); dental nurses; house parents and residential wardens; care workers and home carers; senior care workers; care escorts; and undertakers, mortuary and crematorium assistants.
614
Health and social services managers and directors
Table 4.3 Occupational Classification Continued.
Occupational categories Included occupations
SOC2010 Code (2012)
Managers and proprietors in health and care services
Health care practice managers and residential, day and domiciliary care managers and proprietors. 124
Teaching and educational professionals
Higher education teaching professionals; further education teaching professionals; secondary education teaching professionals; primary and nursery education teaching professionals; special needs education teaching professionals; senior professionals of educational establishments; education advisers and school inspectors; and teaching and other educational professionals.
231
Other occupations All other
The two comparison occupational groupings were:
Teaching and educational professionals,
while the final group contained all other occupations not included in groups one and
two.
Teachers were selected as a comparison group to show that the difference in health
outcomes identified in the study are due to the work itself because of the similarity
between other determinants. There are six main similarities between teachers and
nurses. First, the qualification level required to practice as a qualified teacher and
nurse are similar with both professions (General Teaching Council for Scotland, 2012;
Nursing and Midwifery Council, 2015). Second, both teachers and nurses tend to
remain in the profession for life. Third, both teaching and nurses are classed as
vocational occupations. Fourth, both occupations generally draw people from a
similar social background. Fifth, teaching and nursing professionals have a similar
pay level (£22,500 to £59,000 [National Careers Service, 2016] and £21,909 to
£41,373 [RCN, 2015] respectively). Finally, teachers are also a highly stressed group.
Despite these similarities there is one main difference between nurses and teachers.
Teachers typically work normal business hours Monday to Friday whereas nurses are
required to work round the clock, seven days a week to meet demands.
Other occupations provided a comparison group to contextualise the findings.
Demographics
Age was measured in whole years and coded into ten-year intervals: 17−29, 30−39,
and coded into white and other. Other was formed of mixed/multiple ethnic groups,
Indian, Pakistani, Bangladeshi, Chinese, any other Asian background,
Black/African/Caribbean/Black British, and other ethnic groups.
Statistical methods
Descriptive statistics of health measures were generated to examine the distribution
of poor health of health workers relative to teachers and the general population.
Next, descriptive statistics of the effect of health on work and satisfaction with life
were presented by each occupational group. Finally, logistic regression analysis was
used to calculate the potential association between several determinants of health
and the occurrence of poor health by occupation. Logistic regression was used
because the dependent variables were dichotomous which violates the assumption
of linearity in normal regression. The assumptions of the absence of
multicollinearities and no outliers in the data were met. Odds ratios were used to
identify what occupation was more damaging for health in relation to specific groups,
such as 40−49 year old women.
In the first stage, the risk of having a current disability was investigated by occupation
with adjustment for gender, age, and working hours. In the second stage, the risk of
reporting a health problem lasting more than a year was investigated by occupation
with adjustment for gender, age and working hours. In the third stage, the risk of
having a health problem that affected the amount of work defined by respondents
was investigated by occupation with adjustment for gender, age and working hours.
In the fourth stage, the risk of reporting a health problem that affected the kind of
gender, age and working hours. Finally, in the fifth stage, dichotomous satisfaction
with life score was investigated by occupation with adjustment for gender, age and
working hours. The accepted level of significance was taken as the 5 percent level.
Data checking
Data in the LFS required an extensive amount of data checking in order to conduct
the analysis outlined above. All variables included in the analysis were checked to
ensure there were no problems evident with the coding or reporting of variables. A
child indicator was not used in this study because of incomplete coding within the
dataset. Information on the number of people in the sample who reported not to
have any dependents was missing. Manually coding people who did not report to
have a dependent child as not having any children could produce misleading results
as the figure would include those who did not answer this question. Therefore, this
variable was omitted from the analysis. This was an important and time-consuming
phase in study two.
4.6.3 Study 3
Research questions, objectives and hypotheses
The research questions and objectives are shown in Chapter 1, sections 1.7 and 1.8.
Methods
Study design, setting and participants
The study design, setting and participants have been described in detail on page 128
The LFS provides an estimated sample size of 40,000 people and a response rate of
Variables
Outcome variables
The outcome variables for health problems possibly incurred by work was informed
by previous literacy which has linked different health outcomes to type of