Kosh_unc_0153D_16570.pdf

(1)

The Effects on Mathematics Performance of Personalizing Word Problems to Students’ Interests

Audra Eileen Kosh

A dissertation submitted to the faculty at the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Education in the Learning Sciences

and Psychological Studies program in the School of Education.

Chapel Hill 2016

Approved by: Gregory Cizek Sharon Derry Jeffrey Greene Catherine Scott

(2)

(3)

iii ABSTRACT

Audra Eileen Kosh: The Effects on Mathematics Performance of Personalizing Word Problems to Students’ Interests

(Under the direction of Dr. Gregory J. Cizek)

This study explored student performance on topic-personalized word problems (TPWPs) in middle school mathematics whereby the context of a word problem was customized to students’ self-selected interests (i.e., sports; movies, music, and television; animals; travel; and science and technology). Using a within-subjects research design, 343 rising eighth-graders answered approximately 6,000 word problems – half of which were TPWPs and half of which were generic word problems – in the context of a free, online summer mathematics skills retention program for students. Research questions focused on whether TPWPs triggered students’ situational interest and how accuracy and speed of word problem responses differed between TPWPs and matched generic word problems. After controlling for the mathematics content of the items (i.e., rates and ratios, integer operations, and equations and inequalities), reading demand of the item stem, and students’ perceived mathematics ability level, results of multilevel modeling indicated that students were more likely to rate TPWPs as interesting as compared to generic word problems and that students were more likely to answer items correctly when rating items as interesting. However, no evidence was found that students were more likely to answer TPWPs correctly after

(4)

iv

(5)

v

ACKNOWLEDGEMENTS

This dissertation would not have been possible without my many supporters. My advisor, Dr. Gregory Cizek, and committee members, Dr. Sharon Derry, Dr. Jeffrey Greene, Dr. Catherine Scott, and Dr. Jack Stenner, all provided valuable insight that guided the direction of my work in addition to the training they provided me throughout my graduate school journey. Also, a village of team members at MetaMetrics, Inc. made this study

(6)

vi

TABLE OF CONTENTS

Page

LIST OF TABLES ... x

LIST OF FIGURES ... xiii

Chapter 1: Introduction ... 1

Introduction ... 1

The Mechanism of Action: How Interest Affects Learning ... 4

Purpose and Research Questions... 7

Summary ... 8

Chapter 2: Literature Review ... 10

The Domain of Reading: How Interest Affects Learning ... 10

Personalization of Mathematics Word Problems ... 12

Research on incidental personalization of word problems. ... 12

Research on topic personalization of word problems. ... 15

How Interest and Choice Mediate Motivation, Learning, and Achievement Outcomes .... 18

Cognitive and behavioral outcomes associated with interest. ... 19

Supporting students’ progressions to higher phases of interest. ... 23

Features that Affect the Level of Challenge of Mathematics Word Problems ... 24

Summary ... 30

Chapter 3: Method ... 32

(7)

vii

Development of Student Interest Categories ... 38

Instrument Development ... 42

Data Preparation ... 57

Participants ... 57

Item and form characteristics ... 59

Item response times ... 60

Data Analysis ... 65

Summary ... 68

Chapter 4: Results ... 69

Descriptive Statistics ... 69

Correlation Matrix ... 73

RQ1: Rating TPWPs as Interesting ... 74

Summary of RQ1. ... 80

RQ2: Accuracy of TPWPs ... 80

RQ3: Response Time to TPWPs ... 85

Assessing assumptions of multilevel linear modeling. ... 92

Power Analysis ... 95

Summary ... 101

Chapter 5: Discussion ... 102

Significance and Implications of Results ... 102

(8)

viii

Discussion of RQ2. ... 104

Discussion of RQ3. ... 105

Concluding remarks on results. ... 106

Challenges with Using TPWPs as an Instructional Strategy... 107

Fit of mathematics content. ... 107

Potential of TPWPs to positively affect learning ... 109

The need for expert knowledge in interest categories. ... 109

When to assume interest-specific prior knowledge. ... 110

Possible novelty effect. ... 110

Challenges with Using Technology to Create TPWPs... 111

Inaccurate interest survey responses. ... 111

Modifying context of problem. ... 112

Use of gender pronouns. ... 113

Challenges with Conducting Research on TPWPs ... 113

Limitations of the Present Research Design ... 116

Data collected after instruction occurred. ... 116

Limitations of the interest categories. ... 116

Limitations of the participant sample. ... 117

Possible Hawthorne effect. ... 117

Future Research ... 118

Modifications to research design in this study. ... 118

Exploring the possible mechanism of action. ... 119

(9)

ix

Potential applications of TPWPs in student assessment. ... 121

Summary ... 121

APPENDIX A: RECRUITMENT EMAILS ... 124

Session One Recruitment Email ... 124

Session Two Recruitment Email ... 125

Session Three Recruitment Email ... 126

APPENDIX B: ITEMS USED FOR DATA COLLECTION... 127

Session One Items ... 127

Session Two Items ... 133

Session Three Items ... 139

APPENDIX C: TEXT COMPLEXITY OF ITEMS ... 145

APPENDIX D: ITEM STATISTICS ... 148

(10)

x

LIST OF TABLES

Table Page

1. Quantile Skills and Concepts in Each Data Collection Session ... 36

2. Frequency of Search Terms in Each Interest Category ... 40

3. Example Generic and Personalized Word Problems ... 46

4. Mean Lexile Measures of Item Stems ... 47

5. Mean Number of Words in Item Stems ... 47

6. Differences in Stem Word Counts Across Matched Word Problems ... 49

7. Count of Students by Interest Category, Week, and Type of First Problem ... 52

8. Summary Item Statistics for Generic Problems and TPWPs by Session Number ... 60

9. Variales Used in Multilevel Models ... 67

10. Descriptive Statistics of Variables Used in Multilevel Models ... 70

11. Descriptive Statistics for Proportion of Problems Rated as Interesting... 71

12. Descriptive Statistics for Proportion of Correct Responses... 71

13. Descriptive Statistics for Mean Response Time in Seconds... 72

14. Correlation Matrix of Variables Used in Multilevel Models ... 73

15. Unconditional Model for RQ1 ... 75

16. Full Model for RQ1 ... 78

17. Final Model for RQ1... 80

18. Unconditional Model for RQ2 ... 81

(11)

xi

24. Selected Values of Odds Ratio and Power for TPWP Effect in RQ1... 98

25. Selected Values of Odds Ratio and Power for TPWP Effect in RQ2... 99

26. Selected Values of Effect Size and Power for TPWP Effect in RQ3 ... 101

27. Example TPWPs with Factual Data ... 115

B1. Generic Problem One and Matched TPWPs in Session One ... 127

B2. Generic Problem Two and Matched TPWPs in Session One ... 128

B3. Generic Problem Three and Matched TPWPs in Session One ... 129

B4. Generic Problem Four and Matched TPWPs in Session One ... 130

B5. Generic Problem Five and Matched TPWPs in Session One ... 131

B6. Generic Problem Six and Matched TPWPs in Session One ... 132

B7. Generic Problem One and Matched TPWPs in Session Two ... 133

B8. Generic Problem Two and Matcjed TPWPs in Session Two... 134

B9. Generic Problem Three and Matched TPWPs in Session Two ... 135

B10. Generic Problem Four and Matched TPWPs in Session Two ... 136

B11. Generic Problem Five and Matched TPWPs in Session Two ... 137

B12. Generic Problem Six and Matched TPWPs in Session Two ... 138

B13. Generic Problem One and Matched TPWPs in Session Three ... 139

B14. Generic Problem Two and Matched TPWPs in Session Three ... 140

B15. Generic Problem Three and Matched TPWPs in Session Three ... 141

B16. Generic Problem Four and Matched TPWPs in Session Three ... 142

(12)

xii

B18. Generic Problem Six and Matched TPWPs in Session Three ... 144

C1. Lexile Level of Session One Item Stems ... 145

C2. Level Level of Session Two Item Stems ... 145

C3. Lexile Level of Session Three Item Stems... 146

C4. Number of Words in Session One Item Stems ... 146

C5. Number of Words in Session Two Item Stems ... 147

C6. Number of Words in Session Three Item Stems ... 147

(13)

xiii

LIST OF FIGURES

Figure Page

1. The four-phase model of interest development and associated outcomes. ... 6

2. Number of participants in each state. ... 34

3. Sample dashboard for the SMC. ... 37

4. Sample introductory text to daily SMC activity. ... 38

5. Screenshot of sample student interest questionnaire. ... 43

6. Example of item display for a generic word problem. ... 51

7. Form design. ... 54

8. Example of final webpage of instrument that provided feedback to students. ... 55

9. Distribution of response time in seconds to items in session one.. ... 62

10. Distribution of response time in seconds to items in session two…... 63

11. Distribution of response time in seconds to items in session three... 64

12. Distribution of response time prior to log transformation. ... 86

13. Distribution of response time after log transformation. ... 87

14. Normal Q-Q plot of level 1 residuals for final model in RQ3. ... 93

15. Plot of level 1 standardized residuals for final model in RQ3. ... 94

16. Graph relating the odds ratio for the effect of TPWP to power in RQ1. ... 97

17. Graph relating the odds ratio for the effect of TPWP to power in RQ2. ... 99

18. Graph relating the effect size of TPWP to power in RQ3. ... 100

19. Example geometry problem 1 ... 108

(14)

1

Chapter 1: Introduction Introduction

The Principles and Standards for School Mathematics (National Council of Teachers of Mathematics, 2000) presents a set of mathematical process standards, one of which calls for students to “recognize and apply mathematics in contexts outside of mathematics” (p. 64). One way that curricular materials require students to apply mathematical concepts to real-life contexts is through word problems. Word problems are defined as text that describes a situation whereby the student must infer mathematical relationships in order to answer a question (Verschaffel, Greer, & Corte, 2000). In addition to providing students with real-life scenarios, word problems are a beneficial instructional tool because they can increase students’ motivation in mathematics by: 1) exemplifying how mathematics is relevant in real-life, 2) providing a means to assess students based on their ability to solve various problems applicable to real-life career tracks, 3) developing students’ general problem-solving abilities both within and beyond mathematics, and 4) developing students’ mathematical knowledge at a conceptual level (Verschaffel, Greer, & Corte, 2000).

(15)

2

shown how students fail to make sense of word problems and consequently provide answers to absurd, illogical problems such as “There are 26 sheep and 10 goats on a ship. How old is the captain?” (Verschaffel, Greer, & Corte, 2000, p. 4). In this example word problem, over half of the first- and second-graders in the study’s sample added the numbers in the problem, answering that the captain was 36 years old. This example shows that students learn and routinely apply the rules of school mathematics, often without considering the context of the problem and how the context informs the solution strategy; one such rule is that problems have one single correct answer, usually obtained by adding, subtracting, multiplying, or dividing the numbers in the problem (Verschaffel, Greer, & Corte, 2000).

One reason why many students struggle with word problems may be that word problems often are not personally relevant to students, potentially resulting in low desire to solve the problem and difficulty with making sense of the solution strategy due to the problem’s irrelevance. As in the example cited above with goats and sheep on a ship, it is doubtful that many elementary school students find themselves in a situation whereby they need to either know the captain’s age or count how many animals are on a ship.

(16)

3

In contrast to providing students with personally-relevant word problems and because general curriculum and assessment materials typically need to target a wide range of diverse students, word problems found in instructional materials often include generic contexts in order to increase the likelihood that all students understand the context of the problem. Similarly, mathematics word problems included on large-scale achievement tests such as those mandated at end-of-course or end-of-year for accountability purposes typically go through a sensitivity review process to ensure that the words and context of the problem do not favor or disadvantage any subgroup of students (e.g., English language learners, students of high- or low-socioeconomic status). Thus, as a result of the need to ensure that all

students can interpret a word problem’s context equivalently, word problems in learning materials often use generic contexts designed to apply to all students. For example,

Pythagorean theorem problems frequently include a ladder leaning against a wall; area and perimeter problems often use garden plots or kitchen floors; and quadratic function problems often use throwing balls or other projectiles.

The generic context of word problems is problematic for two reasons. First, when students repeatedly see the same generic problems, students lose a valuable opportunity to learn from contexts that are meaningful in their lives, with the result that students may view mathematics as irrelevant and disconnected from everyday life. Second, because of the repetition in generic word problem contexts, students may eventually learn which

(17)

4

because the student no longer needs to determine which mathematical concept most

appropriately solves a real-life problem. In this way, students learn to game the system and become good at doing school rather than doing mathematics.

To avoid assigning all students generic word problems that fail to connect to students’ individual interests, curriculum designers could potentially create separate sets of problems that use different contexts based on different student interests. Although it is possible that students can still learn the common types of contexts associated with particular interest-specific word problems, providing word problems across a range of interest categories and personal preferences can result in a greater variety of mathematical applications and has the added benefit of potentially helping students see how mathematics can be applied in their unique lives according to topics of interest to the student.

The Mechanism of Action: How Interest Affects Learning

In addition to potentially providing a greater variety of mathematical applications and making mathematics relevant for students, providing students’ with word problems

(18)

5

problems. Moreover, interest has also been found to negatively correlate with cognitive load, meaning that students working on a highly-interesting activity experience reduced cognitive load that improves learning outcomes (Yen, Chen, Lai, & Chuang, 2015; Park, 2015). A full review of empirical research supporting these claims regarding the learning effects associated with interest is provided in Chapter 2.

(19)

6

Figure 1. The four-phase model of interest development and associated outcomes. Fittingly to the topic of the present study, Hidi (2006) provided an example of the difference between individual and situational interest in the context of mathematics word problems:

When we talk about a student who has an individual interest in mathematics and therefore is looking for ways in which he could solve word problems, we conceptualize his/her interest as a predisposition. However, another student who does not have an interest in mathematics may also find the world problem interesting, and thus experience the psychological state of interest triggered by the situation. (p. 73)

(20)

7

mechanism by which TPWPs may result in higher student performance is the increased affect and attention along with reduced cognitive load characteristic of triggered and sustained situational interest. It should be noted, though, that the triggered and sustained situational interest sparked by TPWPs could progress to further stages of interest development for students within mathematics; for example, if a student experienced positive feelings and academic success with a TPWP activity, those feelings could carry into positive feelings about mathematics holistically as a content area, beyond merely the feelings experienced during a TPWP activity. These further-developed phases of interest could then also capitalize on the benefits of enhanced persistence and use of self-regulatory strategies. Purpose and Research Questions

The purpose of this study was to compare middle school students’ performance on TPWPs to performance on matched generic word problems and to explore the possible mechanism by which such word problems may result in increased student performance. The research questions were:

RQ1: Are rising eighth-graders more likely to rate TPWPs as interesting as compared to matched generic word problems?

RQ2: Are rising eighth-graders more likely to answer TPWPs correctly as compared to matched generic word problems, and how do students’ interest ratings of problems relate to the likelihood of answering the problem correctly?

(21)

8

For RQ1, I hypothesized that students would rate TPWPs as more interesting than matched generic word problems due to the aforementioned cognitive and behavioral benefits of sparking students’ situational interest. For the same reasons, I hypothesized for RQ2 that students would be more likely to answer a TPWP correctly as compared to a matched generic problem and that favorable ratings of problems – either personalized or generic – would correlate with the likelihood of answering the problem correctly. I posed RQ3 as an

exploratory question with no directional hypothesis. Although research on assessment shows that the time spent responding to items generally negatively correlates with the item’s

difficulty (i.e., students solve items faster when the items are easier; Daniel & Embretson, 2010), Hidi and Ainley’s (2008) findings of increased persistence on interest-targeted tasks provides support for the idea that students would spend more time on TPWPs due to showing greater persistence, and thus have longer response times for TPWPs.

To answer the three research questions, I administered both TPWPs and generic word problems to students and compared performance on both sets of problems in terms of

accuracy and speed, and I also collected item-level data regarding students’ interest in each word problem. In subsequent chapters of this dissertation, I provide a more comprehensive literature review; I describe the specific data sources and data collection methods used for this study; I present the analytic approaches used and results of the analyses; and I provide conclusions and implications of the findings.

Summary

(22)

9

with sparking students’ interests, it is possible that students will perform better when receiving word problems aligned to their interests. In addition to potentially raising student achievement in mathematics, providing students with personalized word problems also has the potential to overturn perceptions held by some students that mathematics is boring or irrelevant (Brown, Brown, & Bibby, 2008).

If results show that TPWPs are indeed easier for students, then that supports the use of topic personalization as a valuable instructional strategy for students. And, with the growing availability of computers in schools and students’ homes (Lauman, 2000),

computer-based learning environments could programmatically design unique sets of word problems for students based on their interests. The potential for this technology expands as complimentary work on automatic item generation seeks to use natural language processing to produce large banks of word problems (Deane & Sheehan, 2003).

(23)

10

Chapter 2: Literature Review

Four main bodies of literature provide background for this study. The first body of literature summarizes empirical research regarding how interest affects student performance and motivation within the domain of reading. The second body of literatures focuses on prior research on the effectiveness of personalized word problems as an instructional strategy or student motivator in mathematics. The third body of literature covers the mechanism by which interest leads to desirable student outcomes (i.e., learning, achievement, engagement, and motivation). Finally, the fourth section of literature consists of features of mathematics tasks that affect the cognitive complexity of a task, which is important to this study in order to understand how varying features of a mathematics problem can change the way students interact with the problem.

The Domain of Reading: How Interest Affects Learning

Although topic-personalization in the field of mathematics word problems is

relatively new, researchers have long studied the effects of allowing students to choose

instructional materials that best match their interests in other content areas, particularly

reading. The idea is that, when given a choice about which text to read, students will select

texts that are more interesting and relevant to them, which in turns leads to the

aforementioned benefits of triggering situational interest and leading to sustained individual

(24)

11

Humenick (2004) computed 46 effect sizes of experimental and quasi-experimental studies that granted students a choice of texts and found average effect sizes of .95 and 1.2 for student choice on motivation and reading comprehension, respectively.

In many of the studies included in Guthrie and Humenick’s (2004) review, motivation was operationalized as the number of minutes students chose to read beyond the required reading period when given a choice of other activities. For example, McLoyd (1979) asked second- and third-graders to rank books in order from first-choice to last-choice and then assigned half of the participants to read 250 words from their first-choice book and the other half to read 250 words from their last-choice book. After students read 250 words, they were given 10 minutes of free-time to either continue reading, play Scrabble, do crossword

puzzles, or do a math game. McLoyd’s results showed that students in the high-interest condition (i.e., students that read their first-choice book) spent statistically significantly more time reading than students in the low-interest condition, suggesting that students had greater motivation to read when engaging with texts they found interesting. Similar studies have since replicated McLoyd’s findings: for example, Flowerday, Schraw, and Stevens (2004) found that undergraduate students’ situational interest in a text positively affected their attitude toward completing a reading and writing task about the text.

(25)

12

of interest in the topic. Students in the choice condition also answered more questions correctly than students in the no-choice condition, though this difference was not statistically significant. Thus, allowing students to choose instructional materials based on their interests – as the current study will do in the field of mathematics word problems – appears to be a promising instructional strategy.

Personalization of Mathematics Word Problems

Continuing beyond reading to mathematics, another body of research examines how

student achievement, engagement, and motivation are affected by personalization of

mathematics word problems. As mentioned in the previous chapter, TPWPs modify the

context of the word problem based on a students’ self-selected interest. Another type of

word problem, which I name incidentally-personalized word problems (IPWPs) merely

changes surface-level features of the problem (i.e., names of people, places, or favorite

things) without changing the context of the problem. For example, in an IPWP, the phrasing

“A teacher gave her class 12 cans of soda to share…” would be replaced with “Ms. Jones

gave her class 12 cans of Dr. Pepper…” where Ms. Jones is the name of the student’s teacher

and Dr. Pepper is the student’s favorite soda. This type of personalization is different from

the TPWPs proposed for this study because topic personalization requires giving students

different contexts based on their interests. Nevertheless, the literature on IPWPs provides

insight to inform this study.

Research on incidental personalization of word problems. Research on IPWPs has largely occurred in two historical waves based on technology available at the time. Prior to

widespread computer use in classrooms and web-based learning environments, researchers

(26)

13

personalized word problems. The problems were then distributed several days later through

paper and pencil testing. A major limitation of this wave of research is that personalizing

word problems without the aid of computers is extremely time consuming. Later, as

computer-based learning technologies proliferated, technology aided real-time creation of

personalized word problems based on information the student entered to the computer.

Research from the latter wave is relatively recent. Hence, there are fewer studies from the

second wave of research, but they tend to have larger sample sizes of both students and

problems due to increased efficiency in creating IPWPs.

Results from both phases of research indicated positive effects of incidental

personalization on student motivation and mixed effects on student achievement. In one of

the earliest studies on IPWPs, Anand and Ross (1987) randomly assigned fifth- and

sixth-graders to receive instructional materials consisting of either problems with generic contexts

(e.g., “There are 3 objects. Each one is cut in one-half. In all, how many pieces would there

be?”) or matched personalized word problems whereby the students’ favorite things and

friends’ names were substituted into the problem. Results showed that students receiving

IPWPs did statistically significantly better on a posttest and also had a more positive attitude

toward math after completing the unit as compared to the control group.

Several other studies have since replicated these findings by using one of two

common research designs. In the within-subjects approach, researchers have compared

student performance on assessments consisting of both IPWPs and generic word problems.

In the between-subjects design, as was the case in Anand and Ross’s (1987) study,

researchers randomly assign students to receive either personalized or generic instructional

(27)

14

instrument to measure engagement in mathematics or attitude toward mathematics. Across

both types of research design, results have shown positive effects of IPWPs on student

achievement and engagement across diverse samples, such as Norwegian students of ages 12

to15 studying probability (Høgheim & Reber, 2015), fourth-grade Taiwanese students solving two-step word problems (Ku & Sullivan, 2002), American sixth-, seventh-, and

eighth-graders solving two-step word problems (Ku, Harter, Liu, Thompson, & Cheng,

2007), and American fifth-graders solving fraction addition and subtraction problems

(Davis-Dorsey, Ross, & Morrison, 1991). In a slightly different study regarding personalized

elements (e.g., using the student’s name as the game piece avatar, substituting names of the

student’s favorite places into the game context) in the context of a computer game about

order of operations for fourth- and fifth-graders, Cordova and Lepper (1996) found that

students were more interested in playing the game after school and also attempted more

challenging problems when they received the personalized version of the computer game.

In contrast to studies that found positive effects for IPWPs, other studies have found

no statistically significant differences when giving students IPWPs. In a sample of American

third-graders solving a variety of mixed word problems representing different mathematical

content, Bates and Wiest (2004) found that students performed equally as well on IPWPs and

generic word problems when administering a test consisting of both types of problems.

Additionally, although Davis-Dorsey, Ross, and Morrison (1991) found positive effects for

incidental personalization in fifth-graders, the same study included a sample of

second-graders for which personalization had no statistically significant effects on achievement. In

(28)

15

show any greater achievement outcomes after receiving incidentally-personalized

instructional materials as compared to a control group.

The mixed results regarding the effectiveness of using incidental personalization to

increase student achievement and engagement raise questions about why some research has

encouraging, positive results whereas other studies have found no effects. A potential reason for the discrepancy in results is the variability in how researchers define a personalized word problem and the extent to which students may have found personalized word problems interesting. In one study that did not find positive effects for personalization, a teacher had students fill out an interest form including the question “Name one thing you buy at your favorite store” and then substituted that response into a word problem template from a

textbook (Bates & Wiest, 2004, p. 25). The resulting personalized problem was “Suppose 30 bottles of glue are shared equally among 6 classes. How many bottles of glue would each class get?” which was personalized for a student responding with “glue” (p. 25). I argue that this problem represents little, if any, personalization – unless this child was particularly fascinated by bottles of glue – which could explain why the authors found no differences on IPWPs with respect to student interest, understanding, or achievement.

(29)

16

Early research on TPWPs is largely dominated by the work of Walkington (2013) who conducted a quasi-experimental study that randomly assigned Algebra 1 students to receive topic-personalized or non-personalized word problems over the course of a unit about linear functions and independent variables in a cognitive tutoring system. In the study, students receiving TPWPs performed statistically significantly better on achievement indicators (e.g., accuracy of responses and rate of progression through the computer-based curriculum) both during the experimental unit and during a follow-up unit wherein both the control and treatment group received the same problems four units later in the school year. In other words, students who received the personalization treatment early on continued to outperform the control group even after personalization was removed.

A substantial critique of Walkington’s (2013) study relates to the design of the

personalized and non-personalized problems. She provided the following example of a word problem used in the control group: “An experimental liquid (LOT#XLHS-240) is being tested to determine its behavior under extremely low temperatures. Its current temperature is 35 degrees Celsius and is slowly being lowered by two and one-half degrees per hour…” (p. 939). As related to the research design, the control group problems are troubling because the context of this problem, which regards an experimental liquid, represents a context with which students in her study (i.e., mostly ninth- and tenth-graders) probably do not normally interact. This is because, first, high-school students normally are not in a setting of

(30)

17

Now, consider two personalized word problems from Walkington’s (2013) study. The first example was personalized to the interest category of food: “A new soda at McDonald’s is being tested to determine its behavior under extremely low temperatures. Its current temperature is 35 degrees Fahrenheit and is slowly being lowered by two and one-half degrees per hour…” (p. 939). The second example was personalized to the interest category of stores: “The Dippin’ Dots store at the mall uses extremely low temperatures to freeze its ice cream into tiny balls. Right now, the temperature of a batch of chocolate Dippin’ Dots ice cream is 35 degrees Fahrenheit and is slowly being lowered by two and one-half degrees per hour...” (p. 939). There are several concerns related to these problems. First, both problems are actually about food (i.e., one about soda and the other about ice cream), even though the second problem was supposedly targeted to students with an interest in stores. Second, the contexts of both problems represent ideas familiar to high-school students (i.e., McDonald’s, the mall, temperature units in Fahrenheit) whereas the control problem represented an

unfamiliar context. Thus, Walkington’s study has potentially confounded the effect of personalization and merely situating problems in relevant contexts without personalization, thereby failing to provide clear evidence to either refute or support the effect of topic-personalization on student learning.

(31)

18

context of the problem. Nevertheless, personalization was an effective instructional strategy for lower-ability students. Their study, however, included only 24 students that solved word problems in only three different contexts.

Finally, in an effort to reduce the time demands of constructing TPWPs, Walkington and Bernacki (2015) conducted another study whereby students wrote their own algebra problems utilizing contexts relevant to the students’ lives. The authors found that students rated mathematics as more relevant to their lives after writing their own problems, but they also found that problem writing was challenging for some students. For example, students would write problems that did not represent the intended content, had no question, or were not mathematically accurate.

How Interest and Choice Mediate Motivation, Learning, and Achievement Outcomes One of the earliest pieces of scholarly work on interest and learning was John Dewey’s (1913) book, Interest and Effort in Education. Dewey asserted that interest and effort are inherently intertwined, meaning exertions of effort are always motivated by an underlying interest. According to Dewey,

It is psychologically impossible to call forth any activity without some interest. The theory of effort simply substitutes one interest for another. It substitutes the impure interest of fear of the teacher or hope of future reward for pure interest in the material presented. (p. 2)

(32)

19 which interest may lead to learning.

Dewey (1913) distinguished between direct and indirect interest, which are largely equivalent to extrinsic and intrinsic motivation, respectively. In modern literature, where modern is relative to Dewey’s 1913 writings, researchers commonly distinguish between situational interest and individual interest, as was described in Chapter 1 with the four-phase model of interest development (Hidi & Renninger, 2006). On the one hand, individual interest, also known as personal interest and similar to what Dewey called direct interest, is “characterized by intrinsic desire to understand a particular topic that persists over time” (Schraw & Lehman, 2001, p. 24). On the other hand, situational interest, which is similar to what Dewey called indirect interest, is “transitory, environmentally activated, and context-specific” (Schraw & Lehman, 2001, p. 24).

Cognitive and behavioral outcomes associated with interest. Because situational interest is attached to features of the environment whereas individual interest is attached to characteristics of the student, it is arguably easier for educators to manipulate situational interest than it is to manipulate individual interest. Correspondingly, the context of this study (i.e., providing students with word problems aligned to their interest) is one means of

manipulating situational interest, and researchers have documented several cognitive and behavioral outcomes associated with triggering situational interest; these include reducing cognitive load, heightening attention and concentration, and raising affect and hence persistence.

(33)

20

imposes on the cognitive system”, can limit learning when the cognitive load of an activity interferes with the students’ ability to process all of the necessary information (Sweller, van Merrienboer, & Paas, 1998, p. 266). Cognitive load can be classified as intrinsic, extraneous, or germane cognitive load. Intrinsic cognitive load is load due to the difficulty of the learning material, such as solving multistep mathematical problems versus single-step computations, and can be quantified by the number of concepts or procedures a learner must simultaneously process (Debue & van de Leemput, 2014; Sweller, van Merrienboer, & Paas, 1998).

Extraneous cognitive load is load caused by poor instructional design, such as providing a student with word problems with multi-syllable names from unfamiliar ethnicities that are difficult to pronounce. Finally, germane cognitive load is load experienced by learners when processing intended learning goals into long-term memory and schemas, such as making sense of a mathematical model that promotes conceptual understanding rather than performing an algorithm without understanding the rational for why the algorithm works. Accordingly, effective instructional designs should seek to reduce extraneous cognitive load and increase germane cognitive load (Sweller, van Merrienboer, & Paas, 1998). Related to germane cognitive load is the idea of generative cognitive processing, which is when a student actively engages in activities of high germane cognitive load (DeLeeuw & Mayer, 2008). If a student experiences lack of interest, however, the student may experience

generative underutilization, which is when a student is capable of learning but does not exert the necessary effort to accomplish the learning goal (Park, 2015).

(34)

21

expressing higher interest (Park, 2015). In a study of 127 undergraduates in a computer literacy course, Park measured participants’ situational interest with Likert scales such as “I was completely caught up in what I was studying” and likewise measured participants’ perceived cognitive load with an instrument asking participants to rate the amount of mental effort expended on the learning task (p. 222). Park found a negative correlation (rxy = -.417, p<.001 ) between perceived cognitive load and reported situational interest, implying that triggering situational interest may have increased generative cognitive processing through the mechanism of reducing cognitive load.

Taking a different methodological approach in the context of reading interesting versus non-interesting literary passages, McDaniel, Waddil, Finstad, and Bourg (2000) asked students to react to an audible tone that occurred throughout a students’ reading of a passage. The student was told to press the spacebar key on a computer as soon as hearing the tone, and the authors used reaction time to the tone as an indicator of cognitive resources spent on reading the passage, with the idea that a faster reaction time is indicative of spending fewer cognitive resources on reading the passage. The authors found that participants reacted faster to the tones when reading interesting texts, which they claimed supported the idea that

interesting texts required fewer cognitive resources while reading.

(35)

22

participants read sentences of varying degrees of interest, Anderson (1982) found that fourth-graders read interesting texts slower than non-interesting texts.

Alternatively, a student experiencing greater concentration due to peaked interest may be able to process instructional materials faster, thus resulting in spending less time on an interesting task as compared to a similar task that the student did not find interesting. In the context of personalized mathematics word problems, Walkington (2015) found that students in a treatment group answering TPWPs spent less time both reading and solving the

personalized problems as compared to students in a control group solving comparable non-personalized word problems. Walkington concluded that interest-targeted word problems increased students’ attention and engagement, as demonstrated by faster response times.

(36)

23

persistence was related to learning” (p. 558). These findings suggest that students who experienced more positive emotions with the text also read more of the text, and reading more of the text was related to greater learning outcomes as measured by a reading comprehension score.

Supporting students’ progressions to higher phases of interest. The likelihood that

interest-targeted activities will trigger the aforementioned benefits of reduced cognitive load, increased attention and concentration, positive affect leading to persistence, and of use of self-regulatory behaviors corresponds to a student’s phase of interest development. Reduced cognitive load, positive affect, and heightened attention are mostly seen in Phase I and II of the four-phase model of interest development (i.e., triggered and sustained situational

interest; Hidi & Renninger, 2006; Hidi, Renninger, & Krapp, 2004), whereas persistence and use of self-regulatory strategies are mostly seen in Phases III and IV (i.e., emerging and well-maintained individual interest; Hidi & Ainley, 2008). One psychological mechanism in Phases I or II can evolve into another psychological mechanism in Phases III or IV, as is the case with positive affect in Phases I and II leading to persistence in Phases III and IV.

Despite the benefits associated with each phase of interest development, many students do not exhibit Phase III or Phase IV levels of individual interest. However, it is possible to help students progress in their interest development in order to reach the higher phases of interest and thus receive the positive benefits of those phases such as the use of self-regulatory strategies. As recommended by Renninger and Hidi (2002) based on the results of a case study showing how environmental factors triggered the situational interest of a seventh-grader working on a science project, “support for students’ attention to and

(37)

24

instances of triggered situational interest and the inclusion of individual interest (e.g.,

opportunities to work with friends)” (p. 189). In other words, for students that do not have a well-developed interest in a particular task or content domain, educators can help support development of such interest by providing multiple opportunities for triggered interest events, as could possibly be the case in providing students with TPWPs.

When attempting to move students to higher phases of interest development, one instructional method is to provide students with choices related to learning activities, with the assumption that students will choose materials that they find interesting. However, certain conditions must be met in order for choice to intrinsically motivate students. Katz and Assor

(2007) proposed a conceptual framework consisting of three components to describe exactly

when choice benefits motivation and learning. First, the choices must relate to students’

interests. For example, a student may not care to choose which numbers should occur in a

mathematics worksheet but may care about which country he or she will study for a

geography assignment. Second, the number of choices must be constrained, as too many

choices can cause frustration. Last, choice should only be used if culturally-appropriate. For

example, in some cultures, choosing differently from others in a group might be a sign of

rebellious, unacceptable behavior, whereas in other cultures – especially Western cultures –

choice may present an opportunity to express individuality.

Features that Affect the Level of Challenge of Mathematics Word Problems

(38)

25

used matched pairs of personalized and generic word problems that were matched based on features predicted to affect the problem’s difficulty.

Researchers have investigated student performance on word problems or matched symbol problems in school environments where mathematics tasks are typically fabricated to align to a learning objective. In these studies, researchers typically express the level of challenge of a mathematics problem through either item difficulty or cognitive complexity. Item difficulty is a psychometric characteristic of a problem administered as a question on a test, either represented by the percentage of examinees answering the problem correctly or derived from an item response theory model. In either case, the difficulty of an item is a quantitative index based on examinee item response data. Relatedly, the cognitive demand or cognitive complexity of a task refers to the “cognitive processes in which students actually engage as they go about working on the task” (Stein, Grover, & Henningsen, 1996, p. 461), and is often expressed according to a taxonomy of increasingly complex levels such as

Boston and Smith’s (2009) rubric for classifying the cognitive demand of a mathematics task. Regarding research on word problems in school contexts, Nathan and Koedinger (2000) pointed out that teachers and researchers have a “symbol precedence model of

development algebraic reasoning” (p. 168), meaning they believe that students first learn how to solve symbolic equations and then learn to solve story problems (i.e., word problems) by using a strategy whereby the story context is translated into an equation and then solved. The symbol-precedent view is corroborated in textbook design as well: in nine out of ten

(39)

26

The symbol-precedence view of mathematical development has been challenged, however. In a study of high-school students, Koedinger and Nathan (2004) tested student performance on three different types of problems matched for mathematical structure and varied by presentation format: 1) story problems (e.g., a question about a waiter making tips and an hourly rate), 2) word equations (e.g., Starting with $81.90, I subtract $66 and then divide by 6. What number do I get?), and 3) symbolic equations (e.g., Solve for x: (81.90-66)/6 = x). Students performed statistically significantly better on story problems and word equations as compared to symbolic equations, but there were no statistically significant differences on performance between story problems and word equations. The authors concluded that presenting problems verbally as opposed to symbolically is the key determinant of difficulty rather than the situational context.

(40)

27

Enright, Morley, and Sheehan (2002) conducted a study similar to those of Koedinger and Nathan (2004, 2008) that examined the impact of particular story and equation problem features on difficulty. The authors systematically varied characteristics of two sets of algebraic word problems related to rates and probability in a sample of Graduate Record Examination (GRE) examinees. For the rate problems, they varied whether the item included variables or numbers, the context of the problem (i.e., cost or distance), and the level of complexity of the constraints in the problem. The factor that impacted difficulty the most was the presence of variables as opposed to numbers. Interestingly, the authors found that the effect of context depended on whether or not variables were required: for rate problems without variables, a cost context (e.g., prices with dollar signs to calculate a unit rate) made the item statistically significantly easier than a distance context (e.g., miles per hour). But, for items with variables, there was no statistically significant difference between cost and distance rate problems. Similar to Nathan and Koedinger’s results, these results indicate that context matters less for more mathematically complex problems such as problems using variables as opposed to numbers.

For the probability problems in Enright and colleagues’ (2002) study, the authors also varied whether the item was phrased as a problem about probability (i.e., What is the

probability of…) or percentage (i.e., Which percentage of…) and whether the context was a real-life scenario or an abstract number context (e.g., An integer is chosen at random

(41)

28

questions were more difficult than items phrased as percentage questions. Real-life versus abstract context had no statistically significant differences in item difficulty.

Additional studies of word problems further demonstrate how minor semantic changes affect difficulty, particularly through the use of keywords that signal students to use certain operations or strategies. Martin and Bassok (2005) defined translation cues as “standardized phrases and keywords that are highly correlated with correct solutions” which allow students to go directly from words to solution strategies with little need to interpret the context of the word problem (p. 471). For example, students identify altogether to mean addition, difference to mean subtraction, and times to mean multiplication.

Translation cue strategies can backfire when a mismatch exists between the

translation cue and solution strategy. For example, in the statement “There are six times as many students (S) as professors (P)”, 37 percent of undergraduate engineering students incorrectly translated this sentence to the corresponding expression, with the response 6S=P accounting for 68 percent of the incorrect answers (Clement, 1982, p. 17). This type of error, known as a reversal error, commonly occurs when the student tries to directly translate the keywords in the statement to the expression without making sense of the relationship between quantities. Another example of how semantics can complicate mathematics is the commonly-cited bat and ball problem (i.e., A bat and a ball cost $1.10. The bat costs one dollar more than the ball. How much does the ball cost?), for which over 50 % of students at elite universities responded with the incorrect answer of ten cents (Kahneman, 2011).

(42)

29

correct response. They hypothesized that certain objects (e.g., blue marbles and red marbles) represent symmetrical relationships usually modeled by addition or subtraction whereas other objects (e.g., apples and baskets or chairs and tables) represent asymmetrical relationships associated with multiplication and division. Problems have semantic alignment when the symmetrical or asymmetrical relationship between the words in the problem matches the correct solution strategy (Bassok, Chase, & Martin, 1998). Martin and Bassok (2005) presented seventh-graders, ninth-graders, eleventh-graders, and college students with

different types of problems that varied in their semantic alignment and whether students were asked to provide a numerical answer or write an expression or equation. As hypothesized, semantic alignment affected whether students answered story problems correctly; more students answered correctly to semantically aligned problems, as expected. However, semantic alignment had no effect on expression or equation writing tasks. Also, although students performed better on problems with semantic alignment, this effect was stronger for younger students and diminished as age increased. These results imply that word cues matter less for higher-ability students answering more complex questions, probably because the mathematical complexity of the problem trumps context cues.

(43)

30

still developing understanding of mathematical terminology, as was the case in the aforementioned lower-elementary and middle school examples, whereas high-school and college students already have this foundational knowledge.

A synthesis of the above research on mathematics word problems reveals that the role of context as a predictor of item difficulty cannot be summarized with a simple answer about when context makes a problem more or less difficult. Instead, the context of a problem interacts with other features of the problem and the student, with evidence supporting the idea of a tradeoff between word problem context, mathematical complexity of the problem, and the age and ability of the student. Research findings indicate that context affects lower-level students more so than students completing higher-lower-level mathematics.

Summary

In this chapter, I reviewed four bodies of literature that informed the work of this dissertation. First, I presented evidence from the field of reading across pre-school through college contexts to support the idea that motivation, learning, and achievement outcomes increase when students read texts that they self-select as interesting to them. Second, I reviewed research with mixed effects for using incidentally-personalized word problems, and I presented the only two known studies claiming to use TPWPs, each of which had

substantial limitations. Third, I summarized literature about the cognitive processes that mediate the relationship between interest and learning. Fourth, I reviewed features of mathematics word problems that affect the difficulty of those problems, including subtle differences in problem phrasing, vocabulary, and context.

(44)

31

differences as compared to generic word problems in other cases. I assert that a reason for the discrepancy in results is due to the different ways in which authors have operationalized both personalized and control problems, with differences due to incidentally-personalized versus topic-personalized word problems and due to control problems situated in either familiar or abstract contexts. Students’ perceptions of these personalized problems as interesting or not could explain differences in the studies’ results. In my study, I will focus on TPWPs. Currently, the only available research in this area is Walkington’s (2013) study that used minimal context changes to personalized word problems and Walkington,

(45)

32

Chapter 3: Method

Using a within-subjects design whereby participants completed both TPWPs and generic word problems, I compared students’ interest ratings, accuracy of responses, and speed of responses of TPWPs to those of generic word problems. A description of the data collection procedures, participants, instrument design, data preparation methods, and analyses methods follows.

Procedures and Participants

Data collection for this study occurred within the context of a free online summer program designed to prevent summer learning loss in mathematics. The summer program, known as the Summer Math Challenge (SMC), was offered by MetaMetrics® to rising second- through eighth-graders during six weeks of June and July, 2016. The SMC focused on different learning standards each week as aligned to The Quantile® Framework for Mathematics, a scale measuring task and concept difficulty that consists of approximately 550 Quantile Skills and Concepts (QSCs) reflecting the mathematical content students learn in grades K-12 (MetaMetrics, Inc., 2011). The online program included instructional resources, games, quizzes, and interactive activities for students to complete at home each week. Participation was voluntary, and parents learned about the SMC through

announcements made to educational leaders at the state-, district-, or school-level.

(46)

33

participation in both the summer program and the research study. Although students could access SMC resources at any time during the summer or academic school year, data

collection for this study was permitted between June 16, 2016 and July 29, 2016, the official six week duration of the SMC. Any students completing the data collection instrument after July 29, 2016 were excluded from data analyses in order to facilitate timely completion of data analyses.

Data for this study focused on rising eighth-graders enrolled in the SMC. Grade level was reported by the individual that enrolled the student in the SMC, which could have been an educator, parent, other care giver, or the student himself/herself. In total there were 334 students across 34 states included in data analysis. Figure 2 shows the geographic

(47)

34 demographic variables.

Figure 2. Number of participants in each state. Hawaii is not pictured with one student. Alaska is not pictured with zero students.

(48)

35

(49)

36 Table 1

Quantile Skills and Concepts in Each Data Collection Session Data

collection session

Topic Quantile skills and concepts

1 Proportions and constant of proportionality

 Calculate unit rates in number and word problems, including comparison of unit rates.  Calculate unit rates of ratios that include

fractions to make comparisons in number and word problems.

2 Operations with integers

 Model or compute with integers using addition or subtraction in number and word problems.

 Model or compute with integers using multiplication or division in number and word problems.

3 Equations and

inequalities

 Solve two-step linear equations and inequalities and graph solutions of the inequalities on a number line.

 Solve linear equations using the associative, commutative, distributive, and equality properties and justify the steps used.  Write a linear equation or inequality to

represent a given number or word problem; solve.

(50)

37

appears in Appendix A. If a SMC account had more than one child assigned to grade seven in the SMC (e.g., a family with twins in the same grade or an educator enrolling a full class in the SMC), then clicking the Qualtrics link triggered a screen posing the question “Which child is doing the activity?”. The user could then click on the name of the child in order to ensure proper identification of participants.

(51)

38

Figure 4. Sample introductory text to daily SMC activity.

Development of Student Interest Categories

TPWPs in this study were based on five interest categories: 1) sports, 2) music, television, and movies, 3) travel, 4) animals, and 5) science and technology. The student interest categories were developed by analyzing trends in search history records from EdSphere®, an online reading and writing learning platform. In EdSphere, students type a response to the question “What do you want to read about today?” and EdSphere returns texts related to that topic.

(52)

39

eighth-graders for a total of 336,202 searches. From these 336,202 searches, a simple random sample of 1,000 terms was split into two data sets consisting of 500 terms for initial development of student interest categories (i.e., a training data set) and another 500 terms for cross-validating the categories that emerged from the first 500 terms (i.e., a validation data set).

Search terms were first cleaned to remove non-codable records. First, I identified nonsensical searches of random text (e.g., “jajajajjajajja” and “hnnnnnnnnnn”) and uninformative phrases (e.g., “surprise me”, “all”, or “other”). Second, searches of vague words or abstract concepts were removed, such as “kids”, “sorry”, and “courage”, as these search terms were difficult to use as evidence for a student’s interest. Third, searches related to violence or drugs were removed because of their inappropriateness for developing learning materials for minors. Finally, searches for book genres, book titles, or author names were excluded due to the assumption that students were likely trying to identify a particular book in response to the question “What do you want to read about today?” rather than entering a topic. In the training data set, a total of 60 search terms were removed, meaning 440 search terms remained for analysis.

(53)

40

described above, resulting in 399 codable search terms. Using the same categories developed from the training data set (i.e., animals, history, pop culture, sports, science and technology, and travel), 327 search terms fit within these categories. Table 2 presents the categories by frequency count and by percentage of the total codable and uncategorized search terms, along with examples of search terms from each category.

Table 2

Frequency of Search Terms in Each Interest Category

Interest

category Frequency

Cumulative frequency

Percentage of all codable

terms Example search terms

History 81 81 20.3% Revolutionary war, Holocaust,

Articles of Confederation, Middle Ages

Sports 72 153 18.0% soccer, Real Madrid, Babe

Ruth, summer olympics

Pop Culture 62 195 15.5% Star Wars, Sandra Bullock,

The Beatles, One Direction

Travel 41 236 10.3% New York City, Amazon

Forest, London, Washington D.C.

Animals 38 274 9.5% animal sanctuaries, panda,

dogs, cheetahs Science and Technology Uncategorized 33 72 307 399 8.3% 18.1%

nuclear fission, freshwater ecosystems, Japan robotics, erosion

teen driving, Valentine’s Day, Guinness world records

Total 399 399 100.0%

(54)

41

uncategorized search terms was video games, but video games only represented seven searches out of the 399 codable searches, which was not high enough to warrant an entirely new interest category. Furthermore, the number of categories was intentionally minimized in order to facilitate writing a feasible number of problems and to not diminish the motivational effects of choice by providing too many options as described by Katz and Assor (2007).

Despite the six categories that emerged from the inductive coding, I removed history as a category due to challenges related to combining history and mathematics in a meaningful yet research-appropriate way. This decision was made after careful consideration and

consultation with two history teachers, including one middle school teacher with over ten years of experience teaching eighth-grade American history in addition to another 10 years of experience teaching Algebra 1 and one tenth-grade history teacher with over 25 years of experience. Both teachers gave excellent suggestions for word problems within the realm of the content for this study (i.e., given how far Lewis and Clark traveled over a certain period of time, calculate how many miles they walked per day), however writing problems such as these posed challenges related to the numbers used in the problems and other confounding factors. Specifically, the history problems generally needed to be based on facts in order to represent meaningful scenarios rather than fabricated numbers and contexts, and these

(55)

42

clear, I do not claim that history and mathematics lack interdisciplinary overlap; rather, for the purpose of this study, it was not possible to write history word problems that preserved the other features of the research design while still achieving meaningful word problems.

After removing history, the five remaining interest categories utilized in this study were: 1) sports, 2) music, television, and movies, 3) travel, 4) animals, and 5) science and technology. This study is the first – to my knowledge – to develop interest categories for word problems based on empirical student data rather than researchers’ perceptions of students’ interests.

Instrument Development

Students had the opportunity to complete an instrument administered through Qualtrics consisting of 12 word problems – six TPWPs and six generic word problems – once per week for three weeks for a total of 36 problems per student.

Student interest questionnaire. The instrument began with a student interest questionnaire that asked students to select a name for themselves and the name of a friend and then answer “Which topic most interests you?” by selecting either Sports; Music,

Movies, and Television; Science and Technology; Travel; or Animals. The interest categories were displayed in random order. Figure 5 shows an example screenshot of the interest

(56)

43

Figure 5. Screenshot of sample student interest questionnaire.

(57)

44

psychometrically well-performing selected-response items from a large bank of items previously pretested through various K-12 testing programs. Items were selected as well-performing items, known as exemplar items, based on the criteria that the item’s point-biserial correlation was greater than or equal to .2 and the p-value of the item (i.e., the percentage of examinees that answered the item correctly) was between .3 and .7 based on a pretest sample of at least 1,500 seventh-graders. Additionally, the item must have been a word problem that represented the same content standards as the respective week’s content in the SMC. In a few cases, new items were written when an exemplar item was not available that fit into the study constraints (e.g., when it was not possible to modify the interest categories). Names included in generic items were modified in some cases in order to minimize the likelihood that a generic word problem included a student’s actual name by chance. This was done by changing common names (e.g., John, Ryan) in generic word problems to names more frequently used in generations older than the study population (e.g., Phyllis, Marshall), including the formal salutation for an adult (e.g., Mr. Johnson) or by using culturally-diverse names as is done in typical K-12 item development (e.g., Kianna).

(58)

45

mathematical task, cognitive complexity level using Webb’s (1997) Depth of Knowledge hierarchy, formatting and styles (e.g., italicizing variables in equations), sentence structure, distractor rationales, number type (e.g., decimals to the tenths place, whole numbers that are multiples of five, etc.), use of visual aids, etc.. All TPWPs were selected-response items with four options in order to replicate the format of the exemplar items.

(59)

46 Table 3

Example Generic and Personalized Word Problems

Problem type Example item

Generic Mr. Johnson wants to buy plants for his backyard. Which price for plants is the lowest unit price?

A) $136.00 for 20 plants B) $100.80 for 14 plants C) $72.60 for 11 plants D) $67.50 for 9 plants

Sports Student Name wants to buy soccer trophies for a group of friends. Which price for soccer trophies is the lowest unit price?

A) $138.00 for 20 soccer trophies B) $92.40 for 14 soccer trophies C) $86.40 for 12 soccer trophies D) $61.60 for 8 soccer trophies

Animals Student Name wants to buy dog collars for an animal shelter. Which price for dog collars is the lowest unit price?

A) $138.00 for 20 dog collars B) $92.40 for 14 dog collars C) $86.40 for 12 dog collars D) $61.60 for 8 dog collars

Science and Technology

Student Name wants to buy beakers for a science lab. Which price for beakers is the lowest unit price?

A) $138.00 for 20 beakers B) $92.40 for 14 beakers C) $86.40 for 12 beakers D) $61.60 for 8 beakers

Music, Movies, and Television

Student Name wants to download music albums. Which price for album downloads is the lowest unit price?

A) $138.00 for 20 albums B) $92.40 for 14 albums C) $86.40 for 12 albums D) $61.60 for 8 albums

Travel Student Name wants to buy tickets for a group of friends to ride cable cars in San Francisco. Which price for tickets is the lowest unit price?

(60)

47

When writing the stems, effort was made to achieve a similar Lexile® measure – a measure of text complexity – and total word count for all item stems due to prior research indicating that text complexity of word problems impacts item difficulty (Walkington, Clinton, Ritter, & Nathan, 2015). Lexile measures and word counts of item stems were calculated after initially drafting items and were calculated two additional times after revisions due to subject matter expert reviews. When possible, words or phrases in stems were revised to achieve closer Lexile measures and word counts between generic word problems and TPWPs. Table 4 shows mean Lexile measures between generic word problems and TPWPs, and Table 5 present similar information for word counts. Tables with Lexile measures and word counts for individual items appear in Appendix C.

Table 4

Mean Lexile Measures of Item Stems

Session 1 Session 2 Session 3

Generic Word Problems 703L 625L 933L

All TPWPs 794L 811L 1013L

Sports 717L 830L 1055L

Animals 760L 748L 972L

Science and Technology 837L 818L 1002L

Music, Television, and Movies 827L 805L 1003L

Travel 830L 852L 1035L

Table 5

Mean Number of Words In Item Stems