PHASE 2. QUAN ‘SUPPORT’ COMPONENT
5.2.7 Internet-based data collection
During the first decade of the 21st century, the Internet-based, computer-mediated communication has become a universal phenomenon worldwide. The global system of the Internet provides empirical researchers with tremendous opportunities and significant advantages over more traditional survey techniques (Kraut, 2004; Solomon, 2001). Notably, it allows for fast and direct access to even large or specific populations through
mailing lists, community websites, discussion boards or chatrooms, collectively referred to as virtual communities. Drawing on these advantages, and considering the characteristics of the research population, data in this thesis has been collected entirely online. It is therefore important to address Internet-based data collection, as a fairly recent alternative to the traditional paper-and-pencil techniques, and discuss why it was found to be the most appropriate method for the present research. Inevitably, just like every other method, it also has certain drawbacks. Thus, special attention is devoted in this discussion to the strategies used to leverage the advantages and eliminate the disadvantages of the method that may occur during the process of data collection.
By definition, a virtual community is an ‘aggregation of individuals or business partners who interact around a shared interest, where the interaction is at least partially supported and/or mediated by technology and guided by some protocols or norms’ (Porter, 2004; cited in Illum, et al., 2009). Often, the size of the samples obtained online far exceeds those obtained with traditional techniques (Gosling, 2004) because online data collection is less intrusive and administration of online surveys is convenient, easy and fast (Cook, 2000). Furthermore, not only it is easier to study large populations, but characteristics or behaviour of very specific or small groups can be directly observed (Kraut, et al., 2004), given that virtual communities are typically structured around shared interest, activities or characteristics. Last but not least, web-based research is relatively inexpensive and time efficient (Gosling, 2004; Illum, et al., 2009; Kraut, et al., 2004) and data entry is dispensable.
Despite these major advantages, online data collection has received suspicion for two set of issues, namely, the quality of the data and research ethics. Considering the first issue, criticism has been directed in particular to the generalisability of Internet samples (To whom does research based on Internet generalise?) and to biases arising from the lack of control over the environment in which the research is conducted (Who and how is exactly administering the questionnaire?) (Kraut, et al., 2004; Gosling, 2004). Issues of research ethics are related to the privacy and informed consent of the research objects given the sometimes blurred borders between public and private spaces (Eysenbach & Till, 2001).
Gosling (2004), in his intriguing paper entitled: ‘Should we trust web-based studies?’ compared traditional paper-and-pencil methods with Internet data collection on six preconceptions related to Internet questionnaires, on massively large samples (Internet-
based: N=361,703; and traditional: a set of 510 published samples). His findings indicate that only one out of six preconceptions on Web-based studies proved to be factual, namely, that Internet data are compromised by the anonymity of participants, which can lead to repeat or fake responses. As the author points out, the great accessibility of Web questionnaires makes them easy targets for non-serious responses. However, he noted that this concern also applies to the traditional post-mailed questionnaires, and that various steps can be taken to detect or eliminate these submissions, as it will be shown later. Other preconceptions, namely, that Internet samples are not sufficiently diverse; Internet samples are unusually maladjusted; Internet findings do not generalise across presentation formats; Internet samples produce high(er) rates of non-responsiveness (unmotivated or non-interpretable responses) and that Internet findings are not consistent with findings from traditional methods had not been supported.
The rationale for Internet-based data collection in this thesis lies primarily in the characteristics and accessibility of the sampling population, but also in the advantages it provided in terms of flexibility of and control over data management and data quality and the time and cost of data collection. The research population included four national networks of local development organisations, which operated largely based on Internet communication and information technologies. Each of the networks had a central website and the majority of the local units operated own, individual websites (For more details on the characteristics of the sampling population see: Section 5.5.4.2 in this chapter).
Previous research comparing Internet-based and mail surveys indicate that Internet-based surveys may be more effective than mail surveys in a setting in which the target population has both Internet access and e-mail (Truell, et al., 2002). The contacts of the local units, including e-mail addresses, telephone numbers and addresses, were available online. The respondents’ work was largely computer-based and e-mail was the major form of internal and external communication, in particular the main form of correspondence of the local units – the source of respondents – with the central authority. These internal communication channels provided the most plausible solution for accessing geographically scattered local units of the target population fastly and directly.
According to Aoki & Elasmar (2000), cited in Cook, et al. (2000) ‘though there are still limitations to be overcome if the Web is used for general population survey, the Web will present advantages over traditional modes of data collection if it is used for specific
populations that are known to be Internet savvy.’ In addition, the quality of the data can further be expected to improve if the specific population under scrutiny is characterised by some level of public responsibility and accountability, and the sensibility of the population for the theme of the questionnaire is presumable.
Selecting a specific sampling population is also indispensible in terms of generalisability, because no sampling frame currently exists that provides a random sample of Internet users. Thus, generalising from an Internet sample to the larger population is especially problematic (Kraut, et al., 2004), unless the research population from which the sample is taken is clearly identifiable.
Nevertheless, the lack of control over the environment is still an existing problem, just as it is in every study that uses indirect data collection methods to reach and sample the population. As mentioned earlier, various steps can be taken before and after data collection to handle potential threats to the integrity of the data, such as repeated and fake responses. Following the recommendations of Gosling (2004), proxy methods were used to identify respondents (through demographic data), and a personal e-mail address was requested to provide in case the respondent wished to receive the results of the research. Also, scale reliabilities and discriminant validities were examined (John & Benet-Martinez, 2000) and data were screened for markers of non-responsiveness such as long strings of identical responses (Johnson, 2001). As a consequence of providing direct and fast access to the research population, a major advantage of online data collection for the present research is that it was time-efficient and inexpensive.
While data for both components were collected online, different methodologies have been applied for the qualitative and the quantitative component. Following the timely order of the research process, the methodology of the qualitative component will be discussed first in the next section.