General Education Assessment 2019-2020
Thoughtful Expression: Written
Overview Information
• Courses and number of sections in this sample: 1 course, 4 sections (UNI 101) • Types of work products scored: final papers
• University Studies component(s) from which the work was sampled: First Year Seminar • Semester in which the sampling was done: Fall 2019
• Number of work products scored: N = 94
• Number of multiple-scored work products: n = 20 • Number of scorers: 4
Findings
The following section describes the score distributions for the Written Communication work products.
Figure 1. Score distribution for Thoughtful Expression: Written Communication scores. 0% 20% 40% 60% 80% 100% WC1 Context and
purpose WC2 Contentdevelopment WC3 Genre anddisciplinary conventions
WC4 Sources and
evidence WC5 Control of syntaxand mechanics
Written Communication scores
Table 1
Thoughtful Expression: Written Communication Score Frequencies WC1 Context and purpose WC2 Content development
WC3 Genre and disciplinary conventions WC4 Sources and evidence WC5 Control of syntax and mechanics 0 0% 0% 1.1% 7.4% 0% 1 16% 27.7% 27.7% 25.5% 20.7% 2 61.7% 67% 55.3% 56.4% 67.4% 3 22.3% 5.3% 16% 10.6% 12% 4 0% 0% 0% 0% 0% 25th percentile 2 1 1 1 2 50th percentile 2 2 2 2 2 75th percentile 2 2 2 2 2 Mode 2 2 2 2 2 N 94 94 94 94 92
Scores were highest (based on the percentage of scores at Level 3 as there were no Level 4 scores on any dimension) on WC1 Context and Purpose (22.3%). Work scored at a Level 3 for WC1 (almost a quarter of the sample) demonstrated adequate consideration of context, audience, and purpose and a clear focus on the assigned task(s) (e.g., the task aligns with audience, purpose, and context).
When looking at the lower-scoring dimensions, there were Level 0 scores for only two dimensions: WC3 Genre and Disciplinary Conventions and WC4 Sources and Evidence. Level 0 scores indicate that (for WC3) appropriate formal and informal rules inherent in the expectations for the writing assignment were not followed or (for WC4) no sources and evidence were attempted to be used to support ideas in the writing.
All dimensions had more than half of all work products scoring at Level 2. The full Written Communication rubric can be viewed here.
Analysis across Criteria
Demographic and Preparedness Findings and Comparisons between Criteria
Comparisons were made between a number of demographics and preparedness measures and the scores on the Written Communication rubric.
There was no statistically significant difference in scores across race and ethnicity groups, gender, and high school type (e.g. public, private, homeschool, adult diploma). There were no Honors course sections, Honors students, Isaac Bear Early College students, transfer students, or different class levels (e.g. freshman, sophomore, etc.) in the sample for statistical hypothesis testing.
There were significant differences in scores in the following:
• Pell recipients vs. non-Pell recipient, with Pell recipients scoring higher on WC2 Content Development
• Rural home county vs. non-rural, with work from students with non-rural home counties scoring higher on WC5 Control of Syntax and Mechanics
• Courses taught by tenure-line vs tenure-line faculty, with work from courses taught by non-tenure-line faculty scoring higher on WC1 Context and Purpose of Writing, WC2 Content Development, WC3 Genre and Disciplinary Conventions, and WC5 Control of Syntax and Mechanics.
• Online courses vs. face-to-face course sections, with work from face-to-face course sections scoring higher on WC1 Context and Purpose for Writing, WC2 Content Development, WC3 Genre and Disciplinary Conventions, and WC5 Control of Syntax and Mechanics
Looking at the Spearman rho correlation coefficients, scores on all the rubric dimensions were significantly positively correlated with all other rubric dimensions at the 0.01 level. GPA, total hours completed, SAT scores, and ACT scores were not significantly correlated with scores on any rubric dimension.
Interrater Reliability for Written Communication
There were a number of common papers scored between each pair of faculty scorers so the interrater reliability could be assessed (20 out of 94, or 21.3% of the total number of papers). The following table summarizes the reliability measures.
Table 2
Thoughtful Expression: Written Communication Scorer Interrater Reliability—Percent Agreement, Percent Adjacent, and Krippendorff’s alpha
Dimension Agreement Percent Plus Percent Adjacent Krippendorff’s alpha
WC1 Context and purpose 40.0% 95.0% .298
WC2 Content development 70.0% 100% .747
WC3 Genre and disciplinary conventions 35.0% 100% .259
WC4 Sources and evidence 55.0% 90.0% .283
WC5 Control of syntax and mechanics 60.0% 100% .434
Interrater reliability is a measure of the degree of agreement between scorers, and provides information about the trustworthiness of the data. It helps answer the question: would a different set of scorers at a different time arrive at the same conclusion? In practice interrater reliability is enhanced over time through scorer discussion, as well as through improvements to the scoring rubric. Percent Agreement, Percent Agreement Plus Adjacent, and Krippendorff's alpha measure scorer agreement. The UNCW benchmark is .67 for Krippendorff's alpha. A value of 0 for alpha indicates only chance agreement, and a value of 1 indicates reliable agreement not based on chance. Negative values indicate systematic disagreement. See A Note on Interrater Reliability Measures for a more complete discussion of these statistics and the determination of benchmark levels.
Looking at percent agreement plus adjacent (scores that were within one level of each other), we find that all dimensions had at least 90% of scores in agreement or within on level of each other for all dimensions of the Written Communication rubric.
Comparing the results of the reliability indices for this study to the benchmark of .67 for Krippendorff's alpha, the benchmark was met on one dimension, WC2 Content Development.
Rubric Alignment Feedback
Scorers completed rubric feedback forms for each assignment scored for the Thoughtful Expression: Written Communication learning goal. On these sheets, scorers responded to questions about the relative alignment of the assignment to the Written Communication rubric and potential improvements to the quality criteria. Themes and scorer responses are summarized below. A number of scorers did not return the feedback sheets so this feedback does not include all assignments.
In general, the assignments gave instruction about genre and disciplinary conventions to use (WC3) and the types of sources students should use in completing the assignment (WC4). Explicit instruction about syntax and the expected mechanics of writing were not provided (WC5). The context and purpose of the work (WC1) and content development within the written work (WC2) were implicit expectations, though the assignments did not explicitly address these.
After scoring, scorers provided feedback about any issues they encountered in applying the rubric dimensions to the assignment. One scorer suggested providing previously-scored work products as examples they could use a reference points during the scoring process. One scorer mentioned some difficulty in differentiating between a score Level 1 and 2 for WC2 Content Development. Scorers recommended that assignments be more explicit in directing students to use appropriate syntax and mechanics (WC5) and citation guidelines for sources (to better assess WC4).
Discussion
The table below shows the percent of work products scored at a level two or higher and the percent of work products scored at a level three or higher for each dimension.
Table 5
Percentage of Thoughtful Expression: Written Communication Scores at Levels 2+ and Levels 3+
Dimension Scored Two or Higher % of Work Products Scored Three or Higher % of Work Products
WC1 Context and purpose 84.0% 22.3%
WC2 Content development 72.3% 5.3%
WC3 Genre and disciplinary conventions 71.3% 16.0%
WC4 Sources and evidence 67.0% 10.6%
WC5 Control of syntax and mechanics 79.4% 12.0%
The majority of the work products scored were at a Level 2 or higher, which is appropriate for the first year seminar UNI 101 course from which the work was sampled, as a score of Level 1 on this rubric indicates the performance expectations of an entering freshman. Almost one-quarter of the work products scored at a Level 3+ for one dimension, WC1 Context and Purpose. Generally speaking,
students were most successful (in performing at an appropriate expected score level for this sample, Level 1 to Level 2) in demonstrating attention to context, audience, and the purpose of the assigned task (WC1) and least successful in using credible and relevant sources to support their ideas (WC4).
Finally, it bears mention that the interrater reliability, in terms of the percentage of scores in either perfect agreement OR within one score level is very good for this sample, with all dimensions having 90% of scores as such. The percentage of scores in perfect agreement ranges from 35% to 70%. Interrater reliability was highest on WC2 Content Development and lowest on WC3 Genre and Disciplinary Conventions, though scorers did score 100% of the IRR sample for WC3 in either perfect agreement or within one score level.
Appendix A
Written Communication Rubric