Readability and Readability Formulas - Forensic Scientists’ Conclusions: How Readable Are They

Forensic Scientists’ Conclusions: How Readable Are They for Non-Scientist Report-Users?

Part 1 Readability and Readability Formulas

Readability refers to the ease with which something can be read and understood due to the style of writing (Klare, 1963) and is a prerequisite to comprehensibility (Badarudeen & Sabharwal, 2010). One aspect of writing that hinders readability, and for which scientists have been criticised, is a preference for the use of the passive rather than the active voice (Roland, 2009). A sentence such as, “Scientists wrote reports,” in the active voice becomes “Reports were written (by scientists)” in the passive voice. The passive voice permits the omission of the agent (in this case, “scientists”). Consequently, the passive voice leads to ambiguity and confusion whereas the active voice shows honesty as the writer takes more personal responsibility (Roland, 2009).

Strong predictors of reading difficulty include sentence length and word frequency (Crossley, Duffy, McCarthy, & McNamara, 2007). Because of this, readability scores based on quantifiable characteristics of texts perform relatively well (Benjamin, 2012). Readability formulas include the Flesch-Kincaid, the Gunning Fog, and the SMOG indexes. They use features of text, such as the number of words per sentence and the number of syllables per word, to calculate a score, often given as a grade level2. The grade level calculated is intended to reflect the minimum level of educational attainment

2_{For example, the formula for the Flesch-Kincaid Grade Level is given by the formula,}

FK = 0.39 (total words/total sentences) + 11.8 (total syllables/total words) − 15.59 (Kincaid, Fishburne, Rogers, & Chissom, 1975).

necessary, based on the US education system, to read and comprehend the text. Although the scores are based on American schooling, it is assumed that they reflect Australian levels well enough to provide a useful heuristic (Laidlaw, Spennermann, & Allan, 2007).

As an alternative to a grade level, Flesch Reading Ease (FRE)3 is a score (from 0-100) where the lower the score, the more difficult the text. Flesch described a score of 0 as “practically unreadable” and a score of 100 as “easy for any literate person” (Flesch, 1948, p. 229). More generally, a Flesch Reading Ease score of 0 to 30 is considered very difficult

and a score of 60 to 70 is considered standard (Flesch, 1948).

Researchers have used readability formulas to test the readability levels of various documents written by specialists for non-specialists including patient education materials (D’Alessandro, Kingsley, & Johnson-West, 2001; Estrada, Hyrniewicz, Higgs, Collins, & Byrd, 2000), consent forms for participation in medical research (Paasche-Orlow, Taylor, & Brancati, 2003), the academic integrity policies of Australian universities (Green & van Kessel, 2011), and bushfire risk management plans (Laidlaw et al., 2007).

Whilst online readability calculators are freely available (see e.g., Beaglehole & Yates, 2010), the Flesch-Kincaid Grade Level and Flesch Reading Ease scores in particular are widely used, probably because of their accessibility within standard computer software (Benjamin, 2012). The readability statistics provided in Microsoft Word 2010 (for the selected part of the document) include the numbers of words and sentences, and the average numbers of words per sentence, and characters per word. Scores provided, which may be particularly helpful for assessing scientific conclusions, are the percentage of sentences using the passive voice, the Flesch-Kincaid Grade Level, and the Flesch Reading Ease scores. We chose to use the readability statistics produced by Microsoft Word due to their applicability beyond the research setting. In practice, scientists may use these

3_{The Flesch Reading Ease score is given by the formula,}

FRE = 206.835 − 1.015 (total words/total sentences) − 84.6 (total syllables/total words). Note that the number of decimal places provided in the formulas follows the originals.

statistics quickly and easily to obtain a simple indication of the readability of all or part of their expert reports.

Audience and Reading Ease

The audience for expert reports includes police detectives, lawyers, and judges. (Jurors may hear expert reports read aloud if they contain uncontested evidence and the scientist is not called as an expert witness; Rothwell, 2010). In Australia, science is not a compulsory school subject beyond Grade 10 (for students aged 15-16 years), and is not a prerequisite or co-requisite subject for law or justice studies at university. This means that police investigators, lawyers, and judges may not have studied science subjects in senior high school (for students aged 16-18 years) and are unlikely to have studied science at a

university level. Non-scientist readers of expert reports cannot be expected to be specialists in science.

It is reasonable to suggest that the conclusions of expert reports written at or below a Grade 10 level may be read with ease by police, lawyers, and judges. However, a target of a Grade 8 level (or a range from Grade Levels 7-9) may be preferable for two reasons. First, the reports are written by forensic scientists in their areas of expertise, not in areas of reader specialty. Second, reading ability as given by a grade level is often lower than the level of schooling completed (Fuller, Horlen, Cisneros, & Merz, 2007; Ley & Florio, 1996). Given the issues identified by past research in communicating scientific expert opinion to fact-finders, it was hypothesised that the conclusions of expert reports would be written at levels that would be difficult for non-scientists to read.

Method

Sample. The sample consisted of a set of 111 conclusions written as part of an international proficiency test for forensic scientists who conduct glass analysis.

the results of the tests for the purpose of assisting participants in “maintaining or

enhancing the quality of their results” (Collaborative Testing Services [CTS], 2011, p. 3). Our purpose in using the Summary Reports accords with this aim4. CTS (2011) reported that the test had been sent to 148 potential participants, of whom 111 completed and returned the test (78% response rate).

To maximise the value of proficiency tests, accreditation bodies request that test participants subject the test items to their routine analytical methods and reporting procedures. In Australia, for laboratories to maintain accreditation, the National Association of Testing Authorities (NATA) requires that their scientists participate in proficiency tests in the scientific disciplines in which they undertake casework (NATA, 2012). However, not all participants in CTS tests are from accredited organisations; therefore, some responses examined in our research may not be representative of real case reports. Although all test responses were written in English, it should be noted that some test participants may not typically write their casework conclusions in English, as the test is offered internationally. Test participation is anonymous; therefore, specific data were not available on participating jurisdictions.

Test participants had received 3 glass fragments (1 known and 2 unknown) and a case context5. The test asked participants 3 questions. First, participants were asked to select

yes, no, or inconclusive in response to the question of whether the “glass particles in Items 2 and/or Item 3 could have originated from the broken glassware as represented by Item 1?” (CTS, 2011, p. 31). Second, they were asked to “indicate the procedures used to

4_{The information obtained from the}_{Glass Analysis Test No. 11-548 Summary Report}_{was used}

with the permission of CTS.

5_{The case context provided was as follows:}

“Police are investigating a home invasion where a woman was brutally attacked and found unconscious in the kitchen. Investigators have recovered fragments of glassware found on the kitchen floor. Witnesses claim to have seen her ex-boyfriend driving around the neighbourhood. The suspect was apprehended at a local bar near the woman’s home. Police have recovered glass particles from the suspect’s wool sweater and from the driver’s seat of his car. Investigators have submitted the recovered glass particles along with a sample of broken glassware recovered from the kitchen for analysis.” (CTS, 2011, p. 31)

examine the submitted items” (CTS, 2011, p. 32). Third, participants were asked “What would be the wording of the Conclusions in your report?” (CTS, 2011, p. 33) and a space of 8 lines was provided. Finally, a section containing 4 lines was left for additional comments.

Measures. Microsoft Word 2010 was used to obtain scores for readability (Flesch- Kincaid Grade Level and Flesch Reading Ease score), length (number of sentences in the conclusion and average number of words per sentence), and sentence structure (percentage of sentences in the passive voice).

Procedure. Conclusions were obtained from the Glass Analysis Test No. 11-548 Summary Report (CTS, 2011) available in the Forensic Testing Program section of the CTS website (http://www.ctsforensics.com/reports/main.aspx). The conclusions were copied from the PDF Summary Report into a Microsoft Word 2010 document. Each conclusion was checked to ensure that it had been transferred as a single paragraph. Spelling and grammatical errors (where present) were retained; however, where “[sic]” had been inserted after spelling errors, it was removed prior to obtaining readability statistics. (To obtain the readability statistics in Microsoft Word 2010, the user clicks on the Review tab, selects Options within the Spelling and Grammar pane, and checks show readability statistics.)

Results

Table 1 shows the means, standard deviations, and ranges of the readability statistics obtained. Overall, 20% of sentences in conclusions were written using the passive voice. In 49% (n = 54) of conclusions the passive voice was not used, while in 51% (n = 57) of conclusions the passive voice was used in at least one sentence. In conclusions where the passive voice was used, usage ranged from 12-100% of sentences (M = 39%). In two conclusions, the passive voice was used exclusively.

Table 1

Ranges, Means, and Standard Deviations for Quantitative Features of Conclusions

Readability Statisticsa _Range _Mean _Standard

Deviation

Words in total 14-292 91.2 50.7

Sentences 1-13 4.1 2.3

Average words per sentence 2-53.5 23.8 8.6

Average characters per word 3.7-6.1 4.9 0.5

Sentences in passive voice (%)

0-100 20.0 24.0

Flesch Reading Level 0-78.8 42.4 16.3

Flesch-Kincaid Grade Level 5.4-25.2 13.1 3.9

Note. N = 111 conclusions from Glass Analysis Test No. 11-548 Summary Report (CTS, 2011).

a_{Readability statistics obtained from Microsoft Word 2010.}

As can be seen in Table 2, with a mean Flesch-Kincaid Grade level of 13 and Flesch Reading Ease of 42, most conclusions were classified as difficult or very difficult.

Table 2

Conclusions by Flesch Reading Ease and Flesch-Kincaid Grade Levels

Flesch Reading Ease Scorea Description of Stylea Estimated Flesch- Kincaid Grade Levelb Range of Grade Levels Obtainedin Conclusions Conclusions % (n)

0 to 30 Very difficult College graduate 13.4-25.2 23 (25)

30 to 50 Difficult 13-16 8.7-18.4 49 (54) 50 to 60 Fairly difficult 10-12 8.1-13.9 14 (15) 60 to 70 Standard 8-9 6.3-12.1 10 (11) 70 to 80 Fairly easy 7 5.4-9.0 5 (6) 80 to 90 Easy 6 NA 0 (0) 90 to 100 Very easy 5 NA 0 (0) a_{(Flesch, 1948, p. 230);}b_{(Flesch, 1949, p. 149).} Discussion

The hypothesis that scientific conclusions would be difficult to read for non-scientists was supported by readability statistics. Given that the conclusions examined in this study

were each restricted to a single paragraph, the length of the text was not viewed as an impediment to readability. However, a number of conclusions contained long sentences, with the average sentence length over 23 words. This sentence length was associated with

fairly difficult texts (Flesch, 1948). A blend of long and short sentences is natural and may enhance readability (American Psychological Association [APA], 2010).

Although scientists have been criticised for their overuse of the passive voice (Roland, 2009), only half of the conclusions used the passive voice, and of these a small proportion of scientists made extensive use of the passive voice. Because the proficiency tests are anonymous and international, it is not known whether the use of the passive voice is a preference of some scientists or associated with the reporting guidelines of some laboratories.

Approximately half of the conclusions were written at a level which would be considered difficult and almost one-quarter were very difficult, according to Flesch’s

Reading Ease scale (Flesch, 1948). About one-eighth of conclusions were written at a

fairly difficult level, another one-eighth were standard or fairly easy but none of the conclusions were written at a level corresponding with easy or very easy. Whilst the

perception of difficulty depends upon the individual reader, the conclusions were written at an average grade level of 13, suggesting that some university education would be

necessary to read them with ease. It is reasonable to suggest that the conclusions would be difficult to read even for people with a university education, if that education were not in science.

Limitations. Despite their heuristic value, readability formulas have been criticised because it is possible for nonsensical passages of short words and sentences to obtain high scores on reading ease and low grade-level scores (Benjamin, 2012). One of the simplest ways to decrease the grade level or to increase the reading ease score is to break the

paragraph into more sentences (Beaglehole & Yates, 2010). This is a solution to the readability issue that is supported by the results. However, using readability statistics in isolation, as a tool to inform writing, is not recommended because this may result in artificial texts which do not facilitate comprehension (Klare, 1981). To write readable texts, it is more important to follow principles of good writing, using readability statistics only as a heuristic guide. To identify the aspects of the conclusions that may be simplified to enhance readability, qualitative aspects of their readability were next explored.

In document Communicating expert opinion : what do forensic scientists say and what do police, lawyers, and judges hear? (Page 108-115)