CHAPTER 3: METHODS
3.3 Content Analysis
In many cases, predictive modeling research focuses on the computational part of improving the performance of algorithms. In this case, the actual importance of the process of developing and selecting features of machine learning may not be recognized. Therefore, this dissertation study conducts content analysis to identify the key factors that influence credibility judgments and to generate ideas about how to operationalize these factors as features for machine-learned models based on the results of the content analysis. Also, the content analysis aims to generate "ground-truth" credibility labels from experts to do quality control during the crowdsourced annotation phase.
Content analysis is “an empirically grounded method, exploratory in process and predictive or inferential in intent” (Krippendorff, 2004, p. xvii). Content analysis of answers is also a widely used method for assessing the quality of answers in Q&A sites (Fichman, 2011). According to prominence-interpretation theory, there are two stages in a person’s credibility judgments: prominence in which a user notices elements of a website and interpretation in which he/she makes a judgment about it. In other words, the elements that are noticed and interpreted impact credibility judgment. Thus, the content analysis conducted in this study was a method to identify factors that are prominent and was used to judge the credibility of health information.
There were three objectives in the content analysis. First, the importance of each factor could be analyzed by examining the frequency of those factors and their relationships (e.g., correlation) with credibility labels. Second, the interaction between factors could be potentially examined. For instance, if a review was too short and had an excessively negative attitude, it was not likely to be considered credible. However, if a review was long enough to contain grounds for excessive criticism, the review was more likely to be considered credible. All features
(independent variables in other words) might need to be tested for their interactions, but it might not be efficient and plausible to test all of them. Through the content analysis, those interactions could be more easily tested on the concept level. Third, through the processes of preparing a definition of each factor for the codebook, important ideas for developing features for machine learning could be obtained. Unlike in data mining, feature generation that converts raw text to either numbers or categories is important in text mining. Machine-learned models make
predictions based on the input feature values. In this respect, predictive features lead to accurate predictions and noisy features (i.e., features that are uncorrelated with the target class) lead to overfitting and poor generalization performance. Content analysis can help us to create predictive features and reduce noisy features by informing the feature creation process.
Inductive content analysis (Krippendorff, 2004) was conducted by two researchers (including myself) in order to find factors that influence credibility assessment. In qualitative coding, selecting a proper coder is imperative because the influence of the coder on the coding scheme is huge. The other coder was chosen with the following criteria: 1) The coder should have a degree in Psychology so that he/she will have professional insight into the cognitive processes involved in credibility judgments. 2) The coder should currently study or have previously studied information science so that he/she will have a fundamental understanding of the context of information science. 3) The coder should have at least two prior experiences in qualitative coding.
I developed the initial codebook by examining 200 randomly selected answers and reviews (100 for each) and referencing various credibility criteria uncovered in existing studies (refer to Table 1). With the initial codebook, the content analysis was conducted in three rounds. In the first round, two researchers analyzed 200 random answers and reviews (100 for each)
making notes in detail for discussion. As a result of discussions on noted issues, the coding scheme was revised. In the second round, we conducted a content analysis on 200 more random answers and reviews (100 for each), using the updated codebook and making notes. We resolved any differences through the codebook and iterative discussions. In the third round, we worked on 600 more random answers and reviews (300 for each) and completed the content analysis.
We coded each factor independently on its own merit according to its presence or absence. For instance, “currency” was coded as either “current” or “not current.” Having “currency” was not influenced by having “specificity” or being “credible.” When selecting data from the previously selected data pool, the even data distribution according to the topic (specific vs. general for both datasets) and the category (allergies, cancer, diabetes, heart diseases, and respiratory diseases for Yahoo! Answers dataset) was kept. The Cohen’s Kappa agreement was calculated to check the quality of coding.
The second purpose of the content analysis was to create credibility labels that could be used for quality control of crowd-sourced credibility labels. Labeling by trained experts enabled us to produce high-quality labels that can be considered ground-truth, and some of those labels were used in pre-tests and random tests to ensure that only crowd workers who maintained reasonable quality jobs could be included in the labeling process. The rest of the credibility labels were used to measure agreement between experts and crowd workers as a proxy for the quality of credibility labels. Good research starts with good measurement (Hinkin et al., 1997). Two experts annotated credibility labels after reading WebMD articles corresponding to the five selected topics. If there are topics beyond knowledge and scope covered by WebMD, other certified health information sources, such as the Mayo Clinic, MedlinePlus, and research papers
were referenced. More details about creating credibility labels will be covered in the next section.