• No results found

The writer’s passion for the topic drives the writing, making the text lively, expressive and engaging.

Phrasing is original–even memorable–yet the language is never overdone.

Striking variety in structure and length gives writing texture and interest.

Voice, Word choice and Sentence fluency traits (Section2.1) Among the traits which are considered essential for good writing, those related to reader interest are the least explored for computational work. In the Six Traits rubric [150], the categories ‘voice’, ‘word choice’ and ‘sentence fluency’ are the ones related to this aspect of quality. They identify texts as interesting and engaging or otherwise dull. Note that a text need not have these aspects and still be error-free, have good organization and content. These reader interest properties determine whether the text is elegant, beautifully written and interesting to read.

In this chapter, we introduce measures for predicting reader interest of articles from the science journalism genre. These experiments use the text quality categories from the science journalism corpus which we introduced in Chapter 3. Several of our features apart from being designed to indicate engaging nature of writing are also specific to the

science news genre. Use of genre-specific measures is little studied in text quality work in the past but there is ample evidence that such features will be helpful.

There are unique patterns of writing which are noticeable for news in general and also science journalism. Therefore features related to the specific writing patterns of a genre could provide a boost over those which are general across genres. Journalism studies refer to patterns in news writing asnews frames. A news frame is the selection of a particular way of reporting an issue and varies in terms of the type of main content that is presented and how the article is organized and written. The differences are perhaps best understood with examples. For example, in general news a few common frames are (definitions are taken from Sametko and Valkenburgh (2000) [145]):

Conflict frame: This frame emphasizes conflict between individuals, groups, or institutions as a means of capturing audience interest.

Human interest frame: This frame brings a human face or an emotional angle to the presentation of an event, issue or problem.

Economic consequences frame: This frame reports an event, problem or issue in terms of the consequences it will have economically on an individual, group, institution, region or country.

Morality frame: This frame puts the event, problem, or issue in the context of religious tenets or moral prescriptions.

Responsibility frame: This frame presents an issue or problem in such a way as to attribute responsibility for its cause or solution to either the government or to an individual or group.

Sametko and Valkenburgh performed a large scale analysis of a few thousand news- paper articles and found that theresponsibilityandconflictframes are widely popular for coverage of political news. In a similar vein, other studies have reported on special- ized frames that are used for science journalism. One such study by Nisbet, Brossard and Kroepsch (2003) [116] examined what types of frames were employed for news reporting during the different stages of policy development related to a scientific issue. They focus on the topic of stem cell research. Media attention surrounding such issues varies during

different times and different types of reporting styles are followed by journalists. Nisbet, Brossard and Kroepsch analyze a large collection of New York Times and Washington Post articles on stem cell research to identify trends in news framing. Abridged short descriptions of some of the frames they used are given below.

NewResearch: Focus on new stem cell-related research released, discovery announced, new medical or scientific application announced.

Scientific background: Focus on general scientific or medical background of stem cell-related research or applications. Includes description of previous research, recap of “known” results and findings.

Scientific/technical controversy or uncertainty: Focus on scientific un- certainty over efficacy or outcomes of stem cell-related research and applica- tions.

Public opinion: Focus on the latest poll results, reporting of public opinion statistics, general references.

Anecdotal personalization: Focus on a patient, or the families/friends of a patient, who is receiving stem cell-related treatment.

Given that it is so well documented in prior literature that journalists choose and place much emphasis on the style for writing an article, we expect that the science journalism genre is an apt one for studying which properties of the writing are successful in creating engaging articles. It is also found that the presentation of content as different news frames influences readers’ thoughts and recall of information about the topic [166].

In this chapter, we develop a system to predict the quality of science news articles. Its diverse feature set involves measures which indicate interesting content and writing together with those that have been previously developed to indicate well-written text.

We design and implement measures for six facets of writing that are related to reader interest: 1) use of visual language,2) involving people in the story,3) creative and surpris- ing use of language,4) sub-genre of the article,5) use of sentiment and emotions, and6) the amount of explicit research descriptions. We study how these aspects are distributed in the quality categories in our corpus and also their strengths in making a prediction of

the category. Rather than add a large number of features which may be indicative of these dimensions indirectly, we aimed to develop measures which specifically indicate a partic- ular aspect. Otherwise when a feature turns out as a significant indicator of the quality categories, we still may not be able to associate the feature with any particular writing aspect. We also validate a few of our features using human annotations to understand if a text that is ranked high according to a particular feature is also considered by people to have the corresponding property which the feature represents. These annotations gives additional strength to our claims of which aspects are related to text quality in science journalism. Sections6.1and6.2focus on the development of features and the annotation study performed to understand the representative nature of the features.

We then examine how these features help to predict the quality categories in our cor- pus. While we have used the intentional structure model and the text specificity features to predict the quality of these articles in the previous chapters (see Sections 4.5and5.7), the accuracies that we obtained were quite low. We show that features related to interest which we develop in this chapter are much more predictive of quality differences leading to accuracies of 77% when articles are compared without regard to topic and70% when comparing articles with the same topic. A detailed analysis of classification accuracy and strengths of different feature classes is presented in Section6.4.

We also show that the interest and genre-specific features complement those which aim to identify readable and well-written texts. We combine a comprehensive set of features from prior work on readability and well-written nature of articles with those we developed for reader interest and find that all these measures together are necessary for text quality prediction on our corpus. These analyses are presented in Section6.5.

Finally, we examine the influence of topic and content of the article on its quality. The metadata available in the New York Times corpus allows us to study which topics are most frequently chosen in the great article set. We present experiments on automatic prediction of quality based on features derived from the metadata and also approximate topic information using words in the articles. In Section6.5.2, we report the results from this experiment and they provide evidence that topic features are also useful indicators of text quality in this genre.