Developing Human-Centered Methods
for Studying Journalism’s Big Data
Abstract
Human-centered data science offers a way to address the dilemma of analyzing big data while retaining qualitative approaches to research insights. Through participation in this workshop on human-centered data science, journalism studies scholars will discuss the emerging sites of big data in journalism and propose new questions and research procedures animated by a human-centered data science approach to journalism.
Author Keywords
Big Data; journalism studies; research methods; Human-Centered Data Science
ACM Classification Keywords
H.5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous
Introduction
Virtually no information-oriented industry has emerged untouched by the rise of “big data,” however broadly conceptualized—and journalism is no exception. This article looks at the research questions connected with journalism and big data, moving beyond what has been easier to observe through qualitative methods—e.g., the shifting character of journalism’s boundaries—to examine the issue of how to understand journalism as a Permission to make digital or hard copies of part or all of this work for
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the
owner/author(s). CSCW '16, February 27 - March 02, 2016, San Francisco, CA, USA.
Seth C. Lewis University of Minnesota Minneapolis, MN 55455 [email protected] Nikki Usher
George Washington University Washington, DC 20052 [email protected]
Note: The authors contributed equally to this paper.
Fifth Author YetAnotherCo, Inc. Authorton, BC V6M 22P Canada [email protected] Sixth Author Université de Auteur-Sud 40222 Auteur, France [email protected] Seventh Author Department of Skrywer, University of Umbhali, Cape Town, South Africa [email protected]
creator of big data itself through its activities of social actors, audiences, and technological actants [16]. At the intersection of communication and computer science, big data may be explored through methods such as automated data aggregation and mining, visualization, machine learning, sentiment analysis, and computer-aided content analysis, among others [17]. But what is left to ensure a human-centered approach to data science [3]? How might qualitative researchers make sense of journalism’s big data? This paper thus evaluates how researchers might lay claim to a qualitative sensitivity as they study journalism in a moment of big data.
Conceptualizing Journalism and Big Data
There is a need to consider the more sociotechnical aspects of journalism, and while this approach is found across other fields of research, it has been largely absent in journalism studies [2]. One remedy is to conceptualize news production and distribution via the 4A approach [16]: recognizing the interplay of human social actors in news organizations (e.g., journalists, technologists, and businesspeople), nonhuman technological actants (e.g., algorithms, applications, and audience information systems), and diverse types of audiences connected across multiple platforms—all interconnected through activities that define the production and circulation of news.
Social Actors and their Activities
From the perspective of institutional news work, social actors may be internal to the news organization (e.g., journalists) or external to it (e.g., sources, advertisers, software/hardware producers, and policymakers).
A growing number of journalists are engaged in the creation and gathering of big data for editorial storytelling. It is a purposeful effort by new experts in the newsroom to use computational tools both to create and collect existing datasets and to allow the public to explore such data via interactive news applications [20]. These “programmer journalists” or “data journalists” represent a broader turn toward quantification as a key paradigm in contemporary journalism [9, 15]. For example, ProPublica journalists amassed a database of every U.S. doctor who has received money from a pharmaceutical company [13]; in another case, with computational assistance, The New York Times generated more than 604 septillion calculations with a football playoff interactive [6]. Within the media ecology, sources, advertisers, and others are producing their own big data. Consider the changing role of sources in politics alone: every day, tens of thousands of tweets, posts, and related social media messages are conveyed directly between politicians and their followers—not to mention the entirety of messaging from local, state, and federal government officials, agencies, and beyond. Technological Actants and their Activities
Perhaps most prominently among technological actants used in news organizations today, analytics software gathers data from audiences to help journalists generate insights about audience preferences. There is significant debate as to whether this information—now often collected and analyzed instantaneously [18]—is influencing editorial content [1, 19]. Such software can gather data from every instance of audience
engagement with digital content, computationally aggregating such data into usable numbers.
These actants are gathering—and generating—massive quantities of data. Consider that The Washington Post had 66.9 million unique users to its digital properties in October 2015 [21]. Chartbeat, the industry leader in news audience analytics software, might parse each user into a variety of measurements—its dashboard on quick display is measuring at least 28 different
variables at any single moment online [8]. Such services allow journalists to analyze entry and exit points for each individual story, as well as develop customizable reports about traffic patterns over time. It is clear that computationally driven, quantitative practices have much to offer scholars who wish to understand and analyze such large-scale data in journalism [17]. But how to employ a human-centered data science approach to such research opportunities is far more challenging and uncertain.
Audiences and their Activities
The activities of a participatory audience can generate big data. Audience activity provides the data for analytics software, and audiences produce more and more user-generated content of their own, particularly on social media platforms [14]. What this means to journalists—and journalism studies researchers—is that the ecology of what counts as journalism has not only expanded, but also that there is more forms of labor and activity to consider when assessing how news is produced, circulated, and translated into public understanding [7].
Audiences may contribute first-hand eyewitness content as well as commentary and reaction via social media [4, 14, 16]. New tools have emerged that are both human-driven and algorithmically programmed to
help journalists and researchers make sense of this audience data, but how to do so in a human-centered way remains an open question. Of key interest are intermediary third-party companies—not news
producers nor audience members—that filter and collect audience data. They exist in the form of social media verifiers and social media analytics companies, such as Storyful, Demotix, and First Draft.
While researchers have attempted to study how journalists conceptualize and act upon social media [4], less is known about how such systems are emphasizing certain types of user content relative to others, leaving both journalists (and researchers) in the dark about the politics of algorithmic selection criteria that increasingly shape how networked individuals and communities come to understand public discourse [11].
Pathways for Inductive and Qualitative
Research with Journalism’s Big Data
As journalism and other institutions and industries become sites for big data production and analysis, there are ample and important opportunities for scholars to use computational tools to explore social phenomena in such contexts. Amid this quantitative turn in journalism as well as in society broadly, however, our concern is for the future of qualitative methods and their unique ability to assess social reality as process, as a contested site of knowledge
production, often through inductive approaches such as ethnography and various deployments of grounded theory [10]. Thus, in the interest of stimulating further thinking, we propose two possibilities for applying qualitative methods to the big data of journalism.
1) Generative Question-Asking
Big data research has raised debates about the relative need for traditional, hypothesis-driven approaches to social science [5, 17]. Without settling such debates [12], it’s noteworthy that asking generative research questions, rather than posing hypotheses, is a strength that qualitative methodologies have to offer. These questions may take various forms and functions: • Questions as descriptive.
• Questions as process-oriented. • Questions as retrospective. • Questions as generative.
• Questions to select meaningful variables. • Questions paired with others to explore
human-centered computational data points.
2) Observations of How People Use and Process Big Data in Journalism
Understanding how people incorporate data into their understanding of the world is a critical and often overlooked element of the big data phenomenon. How journalists, businesspeople, and other social actors in a news organization might use big data to construct a picture of the audience or use algorithmically generated content to inform editorial decisions, for example, lends insight into how big data is incorporated and reified as its own kind of knowledge within news institutions. Researchers have begun to explore how social actors work in relation to audience analytics [18], but thus far no academic projects have discovered how news organizations are determining which specific variables or data points are significant to improve or disregard as they create strategic plans for news content or revenue strategies. Researchers therefore should:
• Investigate norms, values, and assumptions of big data.
• Understand how knowledge-based conclusions are made based on socially validated variables. • Explore the epistemology of data-oriented
practices, processes, and philosophies. Qualitatively investigating the taken-for-granted expectations that people ascribe to such audience data before them is a human-centric perspective worth retaining. Ultimately, when blended within a research project, qualitative and quantitative approaches can combine the best of both worlds—human context and algorithmic accuracy—to provide more valid and reliable depictions, both in social research generally and in contemporary journalism particularly.
References
1. Anderson, C. W. 2011. Between creative and quantified audiences: Web metrics and changing patterns of newswork in local US newsrooms. Journalism 12, 5: 550-566. 2. Anderson, C. W., and Kreiss, D. 2013. Black
boxes as capacities for and constraints on action: Electoral politics, journalism, and devices of representation. Qualitative Sociology 36, 4: 365-382.
3. Aragon, C., Bayer, J., Echenique, A., Huang, Y., Hutto, C., Kim, J., Neff, G., and Xing, W. 2015. Developing a research agenda for human-centered data science. Retrieved December 12, 2015 from
https://faculty.washington.edu/aragon/pubs/C
4. Belair-Gagnon, V. 2015. Social Media at BBC News: Remaking Crisis Reporting. Routledge, London, UK.
5. Bollier, D., and Firestone, C. M. (2010). The promise and peril of big data. Aspen Institute, Communications and Society Program, Washington, DC.
6. Bostock, M., Carter, S., Katz, J., and Quealy, K. 2015. How each team can make the N.F.L. playoffs. The New York Times. Retrieved December 16, 2015 from
http://www.nytimes.com/interactive/2015/ups hot/2015-nfl-playoff-simulator.html?_r=0 7. Carlson, M., and Lewis, S. C. 2015. Boundaries
of journalism: Professionalism, practices and participation. Routledge, New York, NY. 8. Chartbeat. 2015. All about your audience
analytics. Retrieved December 16, 2015 from https://chartbeat.com/publishing/multi-platform-tracking
9. Coddington, M. 2015. Clarifying Journalism’s Quantitative Turn: A typology for evaluating data journalism, computational journalism, and computer-assisted reporting. Digital Journalism 3, 3: 331-348.
10. Corbin, J., and Strauss, A. 2008. Basics of qualitative research: Techniques and procedures for developing grounded theory. Sage, Thousand Oaks, CA.
11. Gillespie, T. 2014. The relevance of algorithms. In T. Gillespie, P. J. Boczkowski, & K. A. Foot (Eds.), Media technologies: Essays on communication, materiality, and society. MIT Press, Cambridge, MA, 167-193.
12. Gitelman, L. (Ed.). 2013. Raw data is an oxymoron. MIT Press, Cambridge, MA.
13. Groeger, L, Ornstein, C., Tigas, M., Grochowski Jones, R. 2015, July 1. Dollars for docs: How industry dollars reach your doctors. ProPublica. Retrieved December 16, 2015 from
https://projects.propublica.org/docdollars/ 14. Hermida, A. 2014. Tell everyone why people
share and why it matters. Doubleday Canada, Toronto.
15. Howard, A. B. 2014. The art and science of data-driven journalism. Tow Center for Digital Journalism, Columbia University.
16. Lewis, S. C., and Westlund, O. 2015. Actors, actants, audiences, and activities in cross-media news work: A matrix and a research agenda. Digital Journalism 3, 1: 19-37. 17. Parks, M. R. 2014. Big data in communication
research: Its contents and discontents. Journal of Communication 64, 2: 355-360.
18. Petre, C. 2015. The Traffic Factories: Metrics at Chartbeat, Gawker Media, and The New York Times. Tow Center for Digital Journalism, Columbia University.
19. Usher, N. 2014. Making News at The New York Times. University of Michigan Press, Ann Arbor, MI.
20. Usher, N. 2016. Interactives in the News: Hackers, Data & Code. University of Illinois Press, Champaign, IL.
21. Valinsky, J. 2015. Washington Post tops New York Times online for first time ever. Digiday. Retrieved December 16, 2015 from
http://digiday.com/publishers/comscore- washington-post-tops-new-york-times-online-first-time-ever/