Participant selection and sampling - Selection, Sampling and Size

4.3 Selection, Sampling and Size

4.3.2 Participant selection and sampling

Two broad sampling techniques can be used to select participants: non-probability (non-random) or probability (random) sampling. Saunders et al. (2009) note that non-probability sampling techniques allow for the choice of candidates to be based on the researcher’s judgement regarding the characteristics of the population that are important in relation to the data required to address the research aim. In contrast, probability sampling techniques randomly select each participant, thereby eliminating the researcher’s judgement from the choice of actual participants (Saunders 2012).

Marshal (1996) and Saunders (2012) explain that there are three broad ways to generate participant samples for qualitative research:

 Convenience Sampling (probability): Involves selection of most accessible

subjects and is least rigorous.

 Judgement Sampling (non-probability): Can involve developing a framework

118 based on the researcher's practical knowledge of the research area, the available literature and evidence from the study itself.

 Theoretical sampling (non-probability): Relies on the iterative nature of the

research design, whereby the sample is theory driven to a greater or lesser extent.

In this particular study, the judgement sampling technique was primarily used to ensure that the participants selected were well-suited to answer the research questions. However, elements of probability sampling were integrated to strengthen the selection process by providing a level of impartiality and

objectiveness. To achieve this, key attributes of the participant pool relevant to the research question and the focus of the research were identified. This particular process is noted to be a common approach to judgement-based sampling, and is well-acknowledged in the literature (Marshall 1996; Sandelowski 1995; Mason 2010; Denzin and Lincoln 2003).

The importance of the variables was tested in the familiarisation study to determine if “key informants” (refer chapter 3) agreed with the potential impact of these attributes on the quality of the information collected. This was further confirmed via member checks and industry experts to ensure that the key variables were being captured. Finally, a matrix was created, as outlined in table 4.1, and a process was undertaken to ensure that the final participant selection pool included these attributes.

119 Table 4.1 Participant selection matrix

Participant Selection Matrix

Cohort Type Key attributes Gender Balance Rural-Urban Representation Stage of training & Length of career spread Age Spread Medical Students X X X X Pre-vocational Doctors X X X X GP Registrars X X X X Practising GPs X X X X

Whilst ensuring that the selected participants met the above attributes, a probability sampling process was utilised to strengthen the participant selection process. This was achieved by randomly sourcing participants by advertising through an industry database. As the data collection process progressed, a more targeted approach that relied on both judgement and theoretical sampling techniques, as outlined by Marshal (1996), was utilised. This process was undertaken

simultaneously whilst accommodating for data horizontalisation and data saturation, as discussed below. The complete participant classification sheet is available in Appendix 4.

4.3.2.1 Data coding

All interviews were recorded using a digital recorder. The recorded data collected during the interviews was transcribed verbatim by utilising professional

transcriptionists via an online service. Each transcript was manually checked by the researcher for accuracy and consistency. In addition to the interviews, the

researcher’s audio memos for each interview were also transcribed. The transcripts of all the interviews and memos were uploaded into the software program QSR NVIVO® (Appendix 2). Initially, printouts of all the transcripts made use of a combination of different coloured highlighters to start the coding process. Three

120 interviews using this approach were coded to get an outline of the themes that were emerging. These themes were then set up as individual nodes within QSR NVIVO® and recoded using the software. After this point, all interviews and memos were directly coded within the QSR NVIVO® software. The QSR NVIVO® software was of particular value in this study for data management, given the large quantity of data. The capacity of the system for storage, retrieval, recoding and multiple coding of the text and references meant that the coding process was more manageable than if undertaken manually.

Researchers have argued that computer software is no substitute for the insight and intuition that emanates from the work of the researcher (Coffey and Atkinson 1996). This view is emphasised by Denzin and Lincoln (2000, p.805), who state that “*i+t is particularly important to emphasise that using software cannot be a

substitute for learning data analysis methods. The researcher must know what needs to be done, and do it. The software provides tools to do it with”. As such, for this study, the auto-coding feature within the software was not used out of concern that it might not pick up key themes. To maintain coding consistency and accuracy, a random number of interviews were also coded by a research assistant and

checked against the researcher’s coding to identify differences. This comparison was done early during the coding process and the coding differences were discussed with QSR NViVO training staff and the coding style and technique were adjusted to ensure that coding remained consistent and accurate. This is a common

triangulation technique used in research and ensures coding is replicable(Creswell 2003). These codes were attached to nodes that formed the basis from which themes and categories could be determined. The QSR NVIVO® program has several cross-reference and retrieval features that allowed for the compilation of data sets for comparison and analysis in the formation of themes and categories relating to the interview questions (Richards 2005; QSR 2002).

Each respondent was allocated a pseudonym, which acted as an identifier. This allowed for both respondent confidentiality and identification by the researcher. The pseudonyms are explained as follows:

121

 MS1 XX, PGY2 XX GPR3 XX: The first two letters indicate whether the participant is a medical student, junior doctor or GP registrar. The numeral indicates the year of study (e.g. 1=first year.), and the last two characters is the name code assigned to that particular participant.

 GP XX, GP XX: In this case, the first two letters represent that the participant is a practising GP, and the next two characters is the name code assigned to that particular participant.

4.3.2.2 Data horizontalisation

Moustakas (1994) describes data horizontalisation as a key part of research methodologies. It refers to the process in which the data collected is laid out for examination, and each piece of data is treated with equal weight. He notes that during the horizontalisation process, all elements of the phenomenon both from the participant’s experience as well as from the description of the conscious experience are captured and given equal importance and consideration. During this process, the key attributes of the phenomenon are recognised and described, and listed as individual constituents, before being linked thematically to derive a full description (Hays and Singh 2012; Moustakas 1994; Merriam 2009).

A key part of the horizontalisation process is to ensure that all pieces of data have the same value at the beginning of the data analysis stage, and they are then organised into clusters or themes (Conway 2014; Merriam 2009). This ensures that, by treating all aspects of the lived experience as equally important, the researcher is less likely to be distracted away from a truthful interpretation of the experience (Sandberg 2005; Moustakas 1994). Conway (2014) talks further about different perspectives during data analysis, referring to “noema”, or the phenomenon as perceived through the eyes of the participants (including the researcher); and “noesis”, i.e. the actual, real experience. He argues that in qualitative research, noema and noesis apply to each individual phenomenon, and it is the researcher’s responsibility to portray both the perceived experience and the actual experience.

122 The process of horizontalisation allows the objectivity needed to be able to explore these experiences adequately (Conway 2014).

During this study, a combination of Excel spread sheets (Appendix 5) and QSR NVIVO® computer software was used to manage the horizontalisation process. The data was captured and examined carefully. Broad clusters, or “parent nodes”, were established under the banner of personal, professional and social factors, in line with the research objective and the data recorded under each parent node. In the initial stages, the data was given equal weighting and importance, and was treated as individual elements. As the data collection progressed, the data was thematically linked under each parent node, and sub-clusters or “nodes” were established.

This was an iterative process, and the node tree, as outlined in Appendix 6 and 7, was the outcome of numerous incremental changes. New nodes were established and older nodes collapsed, until a picture began to emerge, which illustrated the key issues that were impacting on the participants’ decisions in selecting their careers within medicine. During this process, the scope of the research was established and the dominant themes that would be explored as part of this study were noted. A final representation of the node structures can be found in in appendix 6.

4.3.2.3 Data saturation

The decision to stop collecting further data is an important one in any study. The process of determining whether enough data has been collected can be a difficult (Creswell 1998), and sometimes subjective (Merriam 2009), process. In qualitative research, this is achieved by utilising a technique known as data saturation. Data saturation occurs when the researcher is no longer hearing or seeing new

information during the data collection process (Creswell 2003; Mason 2010). Researchers have noted that the sample size should be such that it prevents the collection of repetitive and, eventually, superfluous data (Mason 2010; Merriam 2009). There are various reasons for this, but the most relevant is that continuously collecting more data does not necessarily translate into more or richer information (Merriam 2009; Denzin and Lincoln 2003).

123 Merriam (2009) argues that whilst samples for qualitative studies are generally much smaller than those used in quantitative studies, they must be large enough to ensure that the important perceptions are collected. She notes that, regardless of the research area, different participants are very likely to hold diverse opinions. She further argues that frequencies are rarely important in qualitative research, as one occurrence of the data is potentially as useful as many in understanding the process behind a topic (Merriam 2009). Qualitative research is an iterative process, and the researchers analyse their data throughout their study (Denzin and Lincoln 2003; Creswell 2003). Researchers have noted that when there is a judgement of diminishing returns, there is little need for more sampling (Simon 2010; Merriam 2009; Given 2008). In practice, this corresponds to the point where new data only confirms the already identified code and category patterns.

During this study, a two-pronged approach was used, as discussed earlier. The first consideration was to ensure that the identified attributes in the participant

selection pool (table 4.1) were met, and the second was to ensure that data saturation was achieved across the emerging themes. As the data collection progressed, it became apparent from the data that saturation had been achieved. This was further evidenced from the fact that no new themes (nodes) were

emerging, and the existing nodes had multiple sets of data coded against them, as represented in appendix 5 and 6. At this point, it was decided that saturation had been reached and the data collection process was complete.

In document Decision factors that determine choice of medical specialty amongst medical students, pre-vocational doctors, general practice registrars and general practitioners (Page 134-140)