P
arts of this chapter are based on Burtet al.190under the terms of the Creative Commons Attributionlicense (CC BY 4.0), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).
Abstract
Background
Although minority ethnic groups have consistently reported poorer care in patient surveys, it is not known whether this is because they receive worse care or because they respond differently to such surveys. Methods
We conducted an experimental vignette study to investigate whether or not South Asian people rate simulated GP consultations differently from white British people. In total, 564 white British and 564 Pakistani adults were recruited using an in-home face-to-face approach. Trained fieldworkers completed computer-assisted personal interviews during which participants rated the communication within three
video recordings of simulated GP–patient consultations. Consultations were shown in a random order,
selected from a pool of 16. Mean differences in communication scores (on a scale of 0–100) between
white British and Pakistani patients were estimated from linear regression. Results
Pakistani participants, on average, scored consultations 9.8 points higher than white British participants
(95% CI 8.0 to 11.7 points;p<0.001) when viewing the same consultations. When adjusted for age,
gender, deprivation, self-rated health and video, the difference increased to 11.0 points (95% CI 8.5 to
13.6 points;p<0.001). The largest differences were seen in older participants (≥55 years) and when
communication was scripted to be poor. Conclusions
Substantial differences in ratings were found, with Pakistani respondents giving higher scores to videos showing the same care. If we take these findings at face value, they would suggest that the lower scores reported by Pakistani patients in national surveys such as the GP Patient Survey represent genuinely worse care.
Introduction and rationale for the study
As outlined inChapter 5, some minority ethnic groups have reported consistently lower patient experience
scores than the majority population in both the UK and the USA.75,150–153Of particular concern within
the UK, and confirmed by the analyses undertaken for this programme grant, South Asian groups
report significantly more negative experiences of GP–patient communication than their white British
counterparts.131,156Potential explanations for these lower ratings focus on whether South Asian patients
A number of potential drivers of more negative ratings of similar standards of care exist. For example, it
has been suggested that differences in the use of questionnaire response scales (e.g. Elliottet al.80) may
lead to South Asian groups being less likely to endorse the most positive options when asked to evaluate a
doctor’s communication skills. Our analysis of GP Patient Survey data, drawing on item response theory to
explore whether or not items receive systematically different responses from South Asian and white British
groups, suggested that this was unlikely to be the case.159Yet there are also other, alternative drivers of
poorer ratings of similar care, most notably that the evaluation of consultations by South Asian respondents is influenced by systematic variations in their expectations of, or preferences for, care. Fundamentally, these concerns centre on a well-recognised and long-standing problem with surveys: that
individuals may interpret and respond to the‘same’question in many different ways.191Potential solutions
to this problem arose first within the field of political science, where the use of standardised scenarios,
or vignettes, was proposed to evaluate the disparity in responses to survey items.82Such approaches are
particularly relevant to understanding minority ethnic experiences: as already described, alongside potential variations in scale use by individuals from various ethnic backgrounds, we also need to consider systematic cultural variations in expectations of or preferences for care, as well as the potential for systematic
variations in actual experience. A recent US study81adopted Kinget al.’s vignette methodology to examine
the extent of cross-cultural incomparability in survey responses, using predominantly written vignettes. This online survey concluded that score variations observed on national surveys among African American, Latino and white respondents were likely to reflect true differences in real-life experiences, at least for
items in the survey that used an‘always to never’response scale.81
The aim of this strand of work was to build on previous vignette approaches to examine whether or not people from a Pakistani background rate the communication within simulated GP consultations differently from white British people. If these groups rate simulated consultations similarly when viewing identical video vignettes, then we would be able to conclude that it is more likely that the lower scores previously reported by South Asian respondents in national patient experience surveys reflect real differences in quality of communication within consultations.
Changes to study methods from the original protocol
This strand of work, as stated in the original protocol, formed part of our wider aim of exploring in more detail the experiences of minority ethnic groups, together with the GP Patient Survey analyses reported in Chapter 5: to understand the reasons why minority ethnic groups, especially South Asians, give lower scores on patient surveys than the white British population (aim 5).
In our original protocol, to undertake this study we envisaged developing a DVD containing short clips
(3–4 minutes) of four simulated patient consultations and asking respondents to rate these using the
GP–patient communication items of the GP Patient Survey. These DVDs would be sent out, with
questionnaires and instructions, to patients registered with practices with a high proportion of South Asian
patients. We suggested using SANGRA (South Asian names and group recognition algorithm)192to identify
South Asian patients. In practice, we first devised a more robust and efficient approach to recruiting participants, using targeted face-to-face recruitment in partnership with the market research agency, Ipsos MORI. This enabled us to effectively reach a rigorously sampled set of participants of known Pakistani ethnicity. Second, participants rated simulated consultations during face-to-face computer-assisted
interviews conducted by trained fieldworkers. This enabled us to collect high-quality and consistent ratings
of consultations. Our recruitment and rating approach is detailed in full inMethods.
As we acknowledged in our original protocol, the requirement of the vignettes approach to show identical consultations to all participants meant that all videos had to be in English. However, we had stated that, although we would therefore have to exclude patients who could not understand English, we would make study questionnaires and documentation available in four Asian languages. As we employed face-to-face
EXPERIMENTAL VIGNETTE STUDY
NIHR Journals Library www.journalslibrary.nihr.ac.uk
computer-assisted interviews in the study, this requirement was no longer necessary once we had screened for those who were confident in their ability in spoken English. This therefore represents a further
improvement on our original study design.
Methods
In this experimental vignette study we showed videos of simulated GP–patient consultations to white
British and Pakistani respondents, who were asked to rate the quality of the communication within each consultation that they viewed. The study advisory group was particularly involved in consideration of the nature of the vignettes to be shown and the study materials.
Simulated consultations
To ensure generalisability and to avoid the chance inclusion of a characteristic or event that, unknown to us, might systematically be rated differently by the two participating ethnic groups, we produced a series of 16 vignettes. We set out to manipulate the vignettes on three key domains:
1. the presenting complaint depicted within each consultation
2. the quality of the communication within each consultation (poor or good)
3. the ethnic background of the actors playing the doctor and patient (South Asian or white British). Published recommendations for the production of vignettes emphasise the importance of developing a
valid script and considering how best to manipulate this on the domains of interest.193We therefore based
our vignettes on real-life consultations that were video recorded as part of another workstream (the
association between patients’, raters’and GPs’assessments of communication in a consultation, for which
we recorded>500 real-life consultations). We undertook an extensive process of script development,
role playing and rating prior to filming the vignettes with professional actors (Figure 14).
The vignettes that we produced covered four different clinical scenarios: persistent cough, perforated ear drum, painful elbow and generalised numbness. We developed two different scripts for each clinical scenario: one designed to illustrate poor communication by the doctor and one designed to illustrate good
communication. We formulated‘poor’and‘good’standards of communication according to the GCRS.126This
observer-rated measure of communication competence (derived from the widely used Calgary–Cambridge
guide to the medical interview127,128) was developed as part of our workstream on patients’and raters’
assessments of communication competence within a consultation. The GCRS instrument covers 12 domains including initiating the session, gathering information, building the relationship and achieving a shared
understanding (seeAppendix 1for the full instrument). We then used both the‘poor’and the‘good’
version of the four clinical scenarios to film two sets of vignettes. The first set of vignettes had white British actors playing the GP and the patient, whereas the second repeated the same scripts but with South Asian actors playing the GP and the patient. The GP role was acted throughout by either one white British or one South Asian actor; eight different actors (four white British and four South Asian) role-played patients, each participating in one clinical scenario. The final 16 videos were each scored by three trained clinical raters
using the GCRS to assess communication quality in relation to professionally defined norms.126Mean GCRS
scores for the‘poor’communication vignettes ranged from 0.6 to 2.4 (out of 10), whereas mean GCRS scores
for the‘good’communication vignettes ranged from 5.1 to 8.4.
Data collection
Ipsos MORI fieldworkers conducted data collection in collaboration with our team. As per the original protocol, we aimed to recruit 1120 respondents, each of whom was asked to rate three simulated
GP–patient consultations. Our original sample size calculation was based on data from the General Practice
Assessment Questionnaire (which includes some identical items to those in the GP Patient Survey); we repeated this using more recent GP Patient Survey data. This confirmed that the inclusion of 560 Pakistani
(on a 0–100 scale) seen between these two groups after controlling for age, gender, deprivation, self-rated health and practice. As our analyses of GP Patient Survey data had identified that ethnic disparities were largest among older age groups, we set out to recruit equal numbers above and below the age of 55 years
within each ethnic group.156
Vignette development Rationale
To vary vignettes on three domains: 1. the presenting complaint
2. the quality of GP–patient communication (poor or good)
3. the ethnic backgrounds of the doctor and patient (South Asian or white British)
Rating vignettes
• Each vignette rated by three trained GCRS raters (all GPs) to determine its score for the quality of communication in relation to professionally defined norms Clinical content
• Derived from existing bank of > 500 video-recorded GP–patient consultations • Identified consultations with unisex presenting complaints lasting < 7.5 minutes
(n = 29)
• Four consultations selected by the research team: tennis elbow, persistent cough, numbness and perforated ear drum
Vignette filming
• Vignette actors were recruited via an acting agency specialising in simulated patient role play
• Briefing packs were prepared for all actors, to include scripts, background summaries of vignettes and verbal and non-verbal behaviour guides • Vignettes were filmed with a professional film crew over 2 days, involving 10 actors and one acting supervisor. At least one GP was present at all times to ensure clinical consistency
• Following filming, vignettes were professionally edited to create the final 16 films Final vignettes
• 16 films ranging from 2 to 8 minutes
• Eight films involving a South Asian GP–patient actor pairing: four different clinical scenarios, each filmed in two versions, poor communication and good
communication
• Eight films involving a white British GP–patient actor pairing: four different clinical scenarios, each filmed in two versions, poor communication and good
communication Script development
• Each of the four clinical scenarios summarised for simulated patients on a pro forma covering patient sociodemographics, clinical details, patients’ perspectives, past medical history and social history
• Each scenario role played with a GP (JBe) and simulated patient in two versions – ‘good’ and ‘poor’ for communication. All role plays video recorded • Communication quality of each role-played consultation scored using the GCRS by one rater (JS) to confirm the difference between ‘good’ and ‘poor’
communication role plays at this stage
• Role-played scenarios transcribed in full to act as scripts for the vignettes. Minor changes to the content and stage directions were made by the research team in consultation with the simulated patient
FIGURE 14 Development of the vignettes. Reproduced from Burtet al.190under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).
EXPERIMENTAL VIGNETTE STUDY
NIHR Journals Library www.journalslibrary.nihr.ac.uk
Following consultation with Ipsos MORI, we used different recruitment strategies for the different ethnic groups. To recruit Pakistani respondents, we selected output areas (geographically confined areas of approximately 130 households) in which at least 35% of the population was identified as Pakistani in 2011
census data.173These were then ranked according to the proportion of the population aged>50 years
(the cut-off point of 50 years of age used for sampling reflects available census categories; for our recruitment we specifically used a cut-off point of 55 years of age). Trained fieldworkers then recruited participants within these areas using an in-home face-to-face approach, starting in the output areas with the
highest proportion of residents aged>50 years. Fieldworkers were also provided with one or two output
areas neighbouring the area sampled and were able to recruit from these if necessary. Snowball recruitment (e.g. known neighbours suggested to fieldworkers) and additional household interviews were allowed. To recruit white British participants, we first excluded output areas with low proportions of white British
residents (<90%) and residents aged>50 years. The remaining output areas were ranked by social grade
(the percentage of people who were social grade A/B according to 2011 census data194) and geography.
Ipsos MORI then selected output areas to approach using proportional systematic sampling.
Fieldworkers screened potential participants for ethnicity (using the ONS 18-group categorisation143) and
for English-language competency (using a screening question regarding self-reported confidence in understanding short videos in English). Eligible respondents who consented then completed a computer- administered personal interview during which the fieldworker used a standardised script. Each participant viewed three of the sixteen simulated consultation videos that we had produced. Following each video, the
participant was asked to rate the consultation using five GP–patient communication items taken from the
most recent national GP Patient Survey (Table 23). We assigned videos so that each participant saw three
different presenting conditions (and, therefore, videos), with two of the videos featuring South Asian–South
Asian and white British–white British ethnic GP–patient pairs and at least one of the videos for each
condition featuring either the‘good’or‘poor’communication script. The selection of videos shown to each
participant was such that approximately equal numbers of all possible combinations were used, given the restrictions that we have described. Participants also completed basic sociodemographic questions (age, self-rated health, whether or not born in the UK, language spoken most often at home). An area-based
measure of socioeconomic deprivation (IMD) was recorded based on the participants’postcode.
Analysis
As in our previous analyses of GP Patient Survey data, we scored each participant’s rating of each
consultation by linearly scaling the response options between 0 (very poor) and 100 (very good) and averaging all informative answers when at least three of the five items were completed. We used linear
regression to model the mean difference between white British and Pakistani participants’ratings of
GP–patient communication. We estimated the unadjusted difference in ratings as well as the difference
adjusting for patient age, gender, self-rated health, deprivation and a set of 15 indicator variables for the video. We did not originally plan to conduct any analysis of interaction terms. However, the effect size
TABLE 23 General practitioner communication items used to rate vignettes Thinking about the doctor you have just seen in the
video, how good was the doctor at each of the following? Please put a✗in one box foreachrow
Very good Good Neither good nor poor Poor Very poor Doesn’t applya
Giving enough time □ □ □ □ □ □
Listening to the patient □ □ □ □ □ □
Explaining tests and treatments □ □ □ □ □ □
Involving the patient in decisions about his or her care □ □ □ □ □ □
Treating the patient with care and concern □ □ □ □ □ □
found was much larger than that anticipated in our original power calculations and so we investigated interactions between participant ethnicity and the following variables:
(a) relating to the video: ethnicity of GP/patient and quality of GP–patient communication
(b) relating to the participant: age, gender and deprivation.
When modelling interactions, we used only variables for the video attributes, rather than using indicator variables for all videos. For interactions involving age, the oldest two age groups were combined and a
continuous version of the age groups was used in the interaction term only. CIs andp-values were
estimated using bootstrapping with 500 replications (given non-normal data), clustered by participant (with each participant supplying three communication scores). We conducted a sensitivity analysis that clustered the bootstrap resampling by output area rather than by participant to account for multiple sampling in households and small geographical areas; however, this made only trivial changes to standard errors and we consequently do not report this here.
Results
Participants
We recruited a total of 1128 participants: 564 (50%) self-identified as white British and 564 (50%)
self-identified as Pakistani. The sociodemographic profile of the participants is shown inTable 24. Although
the sampling restriction that half of participants in each group be aged≥55 years increased the similarity
of the groups’age distribution, Pakistani participants were younger than the white British participants within
the sampled age strata. Pakistani participants were also more likely to be male (58% vs. 45%), to be in fair
or poor health (38% vs. 26%) and to live in the most deprived areas (82% vs. 14%).Figure 15shows the
geographical locations from where participants were recruited. White British participants were recruited from a wide range of geographical locations, whereas, as a result of our sampling approach, Pakistani participants were located from a small number of geographically confined locations. Between 202 and
222 participants scored each of the video vignettes for GP–patient communication (Table 25).
Main results
The distribution of communication scores for white British and Pakistani participants was skewed in both groups; however, the communication scores from Pakistani participants were typically higher than those
from white British participants (Figure 16). The mean communication score from Pakistani participants was
67.3 out of 100, 9.9 points higher (95% CI 8.0 to 11.7 points;p<0.001) than the mean score from white
British participants (57.4 out of 100). In a regression model (full output shown inTable 26) adjusting for
participant age, gender, self-rated health, deprivation and video there was a slightly larger difference