Step 6. The aim of this step is to identify a large and representative sample for the research The sample size is one of the most important and also debated issues that will
3.9 Sampling Strategies
Identifying a sampling strategy is one of the most important stages in any research, or as Tashakkori and Teddlie (2003, p. 275) argued, “sampling is destiny”. Sampling is more complicated in mixed methods research, where the aim is to collect different types of data. There are different issues that one should consider before choosing the sampling strategy, such as making sure that the chosen strategy is ethical and matches the research questions and the conceptual framework, and also making sure that the sample will allow “for credible explanation” and “the possibility of drawing clear inferences from the data” (Tashakkori and Teddlie, 2003, p. 276). Moreover, the plan for sampling should be feasible and let the research team generalize the research findings to other populations (Tashakkori and Teddlie, 2003, p. 276). Budget and time are amongst the other factors that should be considered when selecting a strategy (Saunders, Lewis and Thornhill, 2012, p. 260).
Different sampling strategies, e.g., probability sampling, purposive sampling, convenience sampling, and mixed methods sampling have been proposed in the literature (Teddlie and Tashakkori, 2009, p. 170). Probability and non-probability sampling are the two broad categories of sampling strategies (Saunders, Lewis and Thornhill, 2012, p. 261). From all the above-mentioned strategies, probability and purposive sampling are more popular and are primarily used in quantitative-oriented and qualitative-oriented studies, respectively. Despite their differences, these sampling strategies share two main characteristics: (1) both aim to find
an answer to the research questions and (2) both are concerned with generalizability (Teddlie and Tashakkori, 2009, p. 178). The rest of this section describes different sampling strategies.
3.9.1 Purposive Sampling
Purposive sampling, also known as non-probability or judgement sampling (Patton, 2015), is one of the main strategies associated with qualitative data collection. Purposive sampling aims to strategically identify some cases or participants that are relevant to the research questions (Bryman, 2016). Patton (1990) argued that “the logic and power of purposeful sampling lies in selecting in information-rich cases for study in depth” and suggested 15 different strategies that can be used for purposive sampling, including extreme or deviant case sampling, snowball or chain sampling, and criterion sampling. Techniques used in purposive sampling usually have two main aims: to generate representative cases or to produce contrasting cases (Teddlie and Tashakkori, 2009, p. 176).
3.9.2 Probability Sampling
This strategy is usually used in quantitative research and aims to identify and randomly choose individuals that are representative of a population (Creswell and Plano Clark, 2007,p.112). The end goal of a research based on this strategy is to be able to generalize the finding from the population of the study to a larger population (Tashakkori and Teddlie, 2003, p. 277). The process of probability sampling consists of four stages, namely, identifying a sampling frame, deciding on a suitable sample size, selecting sampling techniques, and checking if the sample is representative of the population (Saunders, Lewis and Thornhill, 2012, p. 277). Sampling techniques suggested for this strategy include simple random, systematic random, stratified random, cluster, multiphase or multi-stage (Acharya et al., 2013).
3.9.3 Mixed Methods Sampling
Sampling in mixed methods research may involve the use of more than one sampling strategy. Similar to both of the above-mentioned strategies, the mixed methods sampling aims to generate a sample that will find answers to the questions under study and is also concerned with the issue of generalizability (Teddlie and Tashakkori, 2009, p. 181). Mixed methods sampling may simultaneously use purposive sampling techniques to increase inference quality and probability sampling to increase transferability and generalizability (Tashakkori and Teddlie, 2003, p. 284). Teddlie and Tashakkori (2009, p. 170) have suggested five main
techniques for mixed methods sampling, including basic mixed methods sampling, parallel mixed methods sampling, sequential mixed methods sampling, multilevel mixed methods sampling, and combination of the above-mentioned techniques.
Before selecting the sample for this study, the researcher had to first come up with a set of characteristics or inclusion criteria that were important to be represented in the sample and then to identify the sample that would meet and satisfy those characteristics. Questions of this research could best be answered by ontologists and knowledge engineers that had not only been involved in the process of ontology selection but those who had also considered ontology reuse and had evaluated different ontologies before selecting them for reuse. Moreover, to identify characteristics of the reusable ontologies, the researcher wanted to identify the developer(s) of those ontologies that had already been reused and find the set of steps or principles they had followed to develop a reusable ontology.
3.9.4 Sampling in this Study
This research followed a sequential mixed methods design. Purposive sampling was the only strategy used in this research. Sampling in this study started by applying homogenous purposive sampling and aimed to identify a group of ontologists and knowledge engineers that were or had been involved in the process of evaluating, selecting and reusing ontologies. To do that, different ontology repositories, like BioPortal, were explored and a set of ontologies that have previously been reused was identified; people who had developed and/or reused those ontologies were then contacted.
Mixed methods research is usually associated with using both probability and non-probability sampling strategies. However, according to Teddlie and Yu (2007), one of the techniques alone, either probability or non-probability, is appropriate for some research. While non-probability sampling is often linked with qualitative research, Bernard (2017, p. 145) argued that non- probability samples are also appropriate for large surveys, when the aim is to collect data from expert informants. In other word, non-probability samples can be used when the aim is to conduct research by collecting data from informed informants and not just responsive respondents (ibid.). Thus, based on the research aim, research questions, and inclusion criteria, purposive sampling was used for the second phase of this research, with the aim of finding a larger population of experts in the ontology domain.
The survey was sent to the community of ontologists and knowledge engineers in different domains; similar inclusion criteria were used in the second phase. Besides going through ontology repositories and libraries, different research groups in universities and organisations were explored to identify the experts that were involved in the process of ontology development and reuse. The survey was also forwarded to different active mailing lists in the field of ontology engineering. Some of the mailing lists used are as follows:
The UK Ontology Network ([email protected]) GO-Discuss ([email protected])
DBpedia-discussion ([email protected]) The Protégé User ([email protected]) FGED-discuss ([email protected])
Linked Data for Language Technology Community Group ([email protected]) Best Practices for Multilingual Linked Open Data Community Group (public-
Ontology-Lexica Community Group ([email protected]) Linking Open Data project ([email protected])
Ontology Lookup Service announce ([email protected])
Technical discussion of the OWL Working Group ([email protected])
This is the mailing list for the Semantic Web Health Care and Life Sciences Community Group ([email protected])
The aim of the final phase of this study was to validate the findings of the previous phases. Expert sampling was used in this phase (Etikan, Musa and Alkassim, 2016) and led to the identification of some of the key informants in the ontology domain. This type of sampling was very helpful because most of the findings of the previous phases, especially the social-related features, were novel and had not been discussed previously. Therefore, it was important to know what the experts in the domain think about them.
3.9.5 Sampling Size
Deciding on a suitable sample size is one of the most important issues to address when selecting the sampling strategy of research. There has been a lot of discussions on sample size. As it is seen in the literature, the sample size is influenced by the type of research and the sampling strategy used in it. Teddlie and Tashakkori (2009, p. 179) argued that sample size in probability sampling needs to be large enough, at least 50 units, so that it can be used to establish
representativeness. Sampling size in purposive sampling, however, is typically small and usually less than 30 units (ibid.). Creswell and Clark (2017, p.123) also argued that the sample size in a qualitative study is much smaller than a quantitative data collection and stated that sequential research designs usually have unequal sample sizes.
Sampling size is more complicated in purposive sampling, where there exist no rules about the right number of participants (Saunders, Lewis and Thornhill, 2012, p. 283). In purposive sampling, it is suggested to continue collecting data until “data saturation is reached” (Guest, Bunce and Johnson, 2006; Saunders, Lewis and Thornhill, 2012, p. 283). Guest, Bunce and Johnson (2006) argued that in research with the main aim of understating commonalities within a homogenous group, the saturation occurs within the first twelve interviews. Symon and Cassell (2012, p. 45) also claimed that the minimum non-probability sample size can range from 4 to 36, depending on the nature of the study.
The first phase of this research consisted of two parts. Initially, five pilot interviews were conducted to test the wording of the interview questions and to detect any potential ambiguities as well as the flow of them. Convenience sampling was used in the pilot phase, and the five participants were chosen from the ontologists working in the School of Business and Economics, Loughborough University. They all had previous experiences of developing and reusing ontologies. The pilot phase gave the researcher a good chance to improve her interview skills and time management. Afterwards, an invitation email was sent to 34 ontologists and knowledge engineers who had previous experience of developing and selecting ontologies for reuse; 15 of them accepted the request and participated in the interview study.
The sample size of the first phase was sufficient for different reasons. Firstly, interviews were conducted until no new information or theme was found (Guest, Bunce and Johnson, 2006), and the conceptual saturation was reached. Secondly, and according to Symon and Cassell (2012), anything between 4 to 36 is considered as an acceptable sample size in a non-purposive sampling strategy. Thirdly, some of the well-known studies in this domain, like the survey conducted by Lozano Tello (2002), had fewer responses (only 10). Finally, this was not the only phase of data collection in this study and the findings of this research are based on data collected from the largest group of experts in the ontology domain.
Different sampling strategies were used in the second phase to identify a larger population of ontologists and knowledge engineers. A link to the survey was sent to more than 500 people,
including the participants in the first round, as well as 12 mailing lists. A total number of 314 people clicked on the link to the survey, and 157 of them completed the survey. Like the first phase, before sending the survey out, the researcher discussed the wording of survey questions, questions’ types, and survey designs with a group of experts in the ontology domain and made some adjustments accordingly.
Eight ontology experts were interviewed in the third phase. The sample size is small but is normal in expert sampling (Trotter, 2012). This phase was very helpful in clarifying and validating the newly identified quality metrics and the framework.
3.9.6 Sampling Issues with an Online Survey
Bryman (2016, p. 191) have identified different problems that one might face while conducting an online survey, like the fact that many people have more than one email address. While contacting people in academia, the other main issue was that people changed their workplaces and organisations and many of the email addresses found online were invalid. Finding participants might be more difficult in some domains, like ontology engineering, compared to the others. However, this research was very successful in identifying participants; one of the unique characteristics of this study is that it has the largest population compared to the previous studies in this domain (Lozano-Tello, 2002; Matentzoglu et al., 2018).