6.1 Data collection and analysis
6.1.5 Content: manual coding and automatic classification
To further investigate what kinds of content are shared in the online forums for urban communities, we sampled 516 posts that initiated a new thread of conversation in the E-Democracy neighborhood forums and manually annotated them to describe their content. To balance our sample regarding the different rates of responsiveness that they achieve, we randomly sampled roughly equal numbers of posts from three subsets of threads: threads that received no response, threads that received one or two answers, and threads with more than two responses.
Given that the goal of participatory information systems for urban communities is to encour-age community involvement and increase social capital, we adopted a coding scheme from prior research [49,89] that characterizes online posts according to their intention to “mobilize” the so-cial capital available through a participatory information system. Specifically, each of the sampled posts was coded to identify:
• An attempt to mobilize their local community. A post was considered a mobilization when it included an explicit request for action or response. The categories of mobilizations were requests for: (1) Recommendation, (2) Factual knowledge, (3) Opinion/poll, (4) Favor/request/
collective action, and (5) Social coordination/invitation/offer.
• The complexity of the attempted mobilization. Complexity was coded in terms of where the requested action was supposed to happen. The options were: (1) In the forum, (2) Somewhere else online (e.g., e-mail, another website), and (3) Offline (e.g., call a phone number, attend a meeting).
Three independent annotators coded the sampled posts after being trained with the coding scheme. A majority vote was used to decide if a post was a mobilization attempt or not. Regarding the other categories, a Cohen’s Kappa coefficient was used to assess inter-coder agreement. The
coefficients were 0.67 for kind of mobilization and 0.72 for complexity of mobilization. These inter-coder agreement scores were considered sufficient, and the annotators and researchers dis-cussed further discrepancies until reaching agreement. Table5shows examples of local posts and their assigned annotation according to the coding scheme.
To increase the size of coded content, we employed automatic classification algorithms to code all posts that initiated a new thread of conversation in the neighborhood forums. The results of the manual coding were used as a ground truth for the classifiers. We processed the text by running a modified version of a Python code created and made available by Dr. Yu-Ru Lin. The code allowed us to retrieve N-grams (unigrams, bigrams and trigrams) and the count of various linguistic features in the posts, after stemming the text of the posts. Linguistic features, such as pronouns and verbs in past tense, were retrieved by reusing the functions of the Linguistic Inquiry and Word Count (LIWC)3package for Python.
N-grams and linguistic data were used as features for the content classification. We used different R4 packages to run alternative classification algorithms. We used 80% of the coded posts to train the classifiers and compared their performance at classifying the remaining 20%
of the content. We assessed the performance of different classification methods such as k-nearest-neighborhoods, decision trees, and support vector machine (SVM). Overall, SVM performed con-sistently better than the alternative methods. To reduce dimensionality and computation time, we conducted principal component analysis on the N-grams and linguistic features data and explored different strategies to achieve high levels of performance in the automatic classification. Details about the process of classification are provided in AppendixA. In summary, we considered these features in isolation and in conjunction. We also tested different thresholds to filter out very mon and uncommon N-grams and to keep the most important components from the principal com-ponent analyses. The best results were obtained with 19 comcom-ponents that explain 95% of variance of the linguistic features. Adding the main components of the unigrams and trigrams harmed the performance. The main components from the bigrams performed almost as well as the linguistic features alone. Bigrams generally helped to improve the classification of non-mobilizations, but made the classification of passive mobilizations slightly worse. Therefore, we decided to choose
3http://liwc.wpengine.com/
4a free software environment for statistical computing
Table 5: Examples of posts by kind of mobilization
Recommendation: “I am looking for a ’no-Jobs-too-small’ handyman, replacing a screen un-der a porch where I can no longer crawl, etc.. Anyone have a referral or recommendation?
Thanks”
Factual knowledge: “According to the Clean City Minneapolis site on graffiti, we should be able to get graffiti removal wipes from our local community alliance. Where and when can I access these? I know that in Whittier the Whittier Alliance has a supply, but I’m not sure the West Bank CDC has the same, especially when their website is a single page with very limited information. There’s quite a bit of the unsightly stuff, and I hate to be one who complains that
’someone should do something’ without actually doing something.”
Opinion/poll: “Do any of you feel invaded when people come to your door, insist on you an-swering because they continue to ring the bell and then proceed to sell their product or promote their candidate and idea? Does anyone have any suggestions for how they deal with this in a positive way? I mentioned this to another person and they thought going door to door would be an effective way to scope out homes and identify vulnerable situations or residents. Does anyone ever ask to see the solicitor’s ID?”
Favor/request/collective action: “<Name> at <address> has lost his dog ’Georgia’ she was last seen yesterday in their yard She has stocky build white with brown spots and has a sweet and shy tempermentplease call <phone number> with any info!”
Social coordination/invitation/offer: “NNO event on Hoyt street between Rice and Marion.
4:30-7:30. Block party and everyone is welcome. Free food and fun. Come join us and the theme this year is wear blue. Hope to see you as we come together for community”
Non-Mobilization: “Please make note of the ramp closures information for our neighborhood.
This will have a major impact on commutes and travel in our area.”
the classifiers that use linguistic components only. The results of this content classification are reported in the next section.
Beyond kinds of content, we were also interested in understanding how the forums reacted to the content that was being posted. Therefore, we measured the rate of online responsiveness in a forum. This variable was computed as the proportion of new threads that were started in a quarter and obtained at least one response to the total number of new threads in the quarter.