Animate Inanimate - Exploring animacy as a mnemonic dimension

including many different kinds of categories of items including some (but not all) plants, body parts, and “collective” nouns, among others. Additionally, this group was less than half the size of the other two groups. For these reasons, ambiguous words were not analyzed at the subject level: Any meaningful analysis of these items should be done on the item level.

A clear advantage of animate items is visible in these results. A 2 x 3 repeated measures ANOVA with word type and recall trial as variables verifies the pattern, with significant effects of both word type, F(1, 799) = 62.83, MSE = 0.026, η²p = 0.073, p <

0.001, and recall trial, F(2, 1598) = 1283.08, MSE = 0.026, η²p = 0.616, p < 0.001. An interaction also exists between word type and recall trial, F(2, 1598) = 5.44, MSE = 0.010, η²p = 0.007, p < 0.01, illustrating that the size of the animacy effect varies by recall trial (it is largest in the second recall trial). Range is likely to play a role in this interaction, with some participants at floor and ceiling levels of performance in recall trials one and three, respectively. Further, planned comparisons of word type for each recall trial revealed that the animacy advantage was reliable throughout the study, from the beginning to end (all t > 4.4, p < 0.0001). While the size of the animacy advantage is smaller in these data compared to other studies (about 3-5% here compared to a typical 9-12%), it is important to remember that unlike other investigations of animacy, the words in this analysis were uncontrolled on all other variables.

As MTurk is still fairly new, it is also useful to understand how encoding environment might impact the animacy effect. That is, does the effect differ between in-lab participants and MTurk participants? Figure 5 plots recall as a function of setting (Lab or MTurk), word type, and recall trial. While participants who completed the

Figure 5. Results from Study 3 presented on the subject level: Mean proportion of items correctly recalled as a function of recall trial, word type, and setting. Data shown are separate for each recall trial. Error bars represent standard errors of the mean.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Recall 1 Recall 2 Recall 3

Pr oportion Corr ect Re ca ll

Animate (Lab) Inanimate (Lab) Animate (Mturk) Inanimate (Mturk)

Trial 1 Trial 2 Trial 3

study in-lab performed better overall, a significant animacy advantage remains in the MTurk sample. These observations are confirmed by a 2 x 3 x 2 mixed ANOVA, adding setting as a between-subjects variable. Effects of setting, F(1, 798) = 62.79, MSE = 0.043, η²p = 0.073, p < 0.001, word type, F(1, 798) = 55.04, MSE = 0.026, η²p = 0.065, p < 0.001, and recall trial, F(2, 1596) = 1213.87, MSE = 0.025, η²p = 0.603, p <

0.001 all exist. The interaction between recall trial and word type remains, F(2, 1596)

= 5.46, MSE = 0.010, η²p = 0.007, p < 0.01, indicating once again that the animacy advantage varies by recall trial (notably, is largest in the second trial for both groups).

Additionally, an interaction exists between recall trial and setting, indicating that the overall slope (learning from trial-to-trial) is greater for in-lab participants, F(2, 1596) = 53.94 MSE = 0.025, η²p = 0.063, p < 0.001. This result is likely due to a few reasons. First, age is confounded with setting—in-lab participants ranged in age from 18 to 40 with a median age of 19, while MTurk participants ranged in age from 18-69 with a median age of 32. This difference in ages may explain the differences in slope—

younger participants typically learn lists of words at a faster rate than do older participants (Kausler, 1994). Second, the MTurk environment itself is nearly guaranteed to be more chaotic than that of the lab. While the study was timed, participants were under far less pressure to concentrate on the task continuously

compared to in-lab participants. This simple fact likely explains much of the decrement in overall recall when comparing across samples. Despite these factors influencing overall recall patterns, the animacy effect remained reliable overall, and did not reliably interact with setting, F(1, 798) = 2.19, p > 0.10.

Exploratory analyses were also conducted to see if participant age and list composition (that is, the proportion of the list that consisted of animate items) interacted with the animacy advantage in free recall. Participants were binned into quartiles based on both measures, and overall recall (averaged across trials) was

individually plotted as a function of both age and list composition; see Figures 6 and 7.

Note that for age, only the MTurk sample was considered—the in-lab sample is heavily weighted toward younger ages, as previously mentioned. Quartiles for age were 18-27, 28-32, 33-40, and 41-69 years; quartiles for list composition were 0.17-0.33, 0.34-0.4, 0.41-0.47, and 0.48-0.67 proportion animate words. For reference, the proportion of animate items in each list ranged from 0.17 to 0.67, with a mean of 0.405 (identical to the proportion of animate items in the sample overall) and a standard deviation of 0.086. For both participant age and list composition, the animacy advantage remained constant, as Figures 6 and 7 show. These results were confirmed using two individual 2 x 4 mixed ANOVAs: Word type acted as a within-subjects factor and quartile as a between-subjects factor.

The analysis of age and word type revealed significant effects of word type, F(1, 616) = 39.28, MSE = 0.009, η²p = 0.060, p < 0.001, and age quartile, F(3, 616) = 9.60, MSE = 0.045, η²p = 0.045, p < 0.001, illustrating how proportion correct recall actually increased with age. These data are unusual, as recall typically declines as age increases (once again, Kausler, 1994). While not verifiable with the present data, the most likely explanation is that MTurk is a poor environment to study the effects of aging. MTurk Workers are computer literate (or at least enough so to complete MTurk HITs), typically college-educated (as per Appendix B), and incentivized to move

Figure 6. Results from Study 3 presented on the subject level: Mean proportion of items correctly recalled as a function of word type and participant age (divided into quartiles). Data shown are overall recall averages across all three trials. Error bars represent standard errors of the mean.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

18‐27 28‐32 33‐40 41‐69

Proportion Correct Recall

Participant Age (Quartiles) Animate Inanimate

Figure 7. Results from Study 3 presented on the subject level: Mean proportion of items correctly recalled as a function of word type and list composition (divided into quartiles). Data shown are overall recall averages across all three trials. Error bars represent standard errors of the mean.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

.17‐.33 .34‐.40 .41‐.47 .48‐.67

Proportion Correct Recall

Proportion Animate Words in List (Quartiles) Animate

Inanimate

through HITs quickly. The most parsimonious explanation then is that younger MTurk Workers do not focus as much on the tasks they are doing, while older Workers are more likely to be thoughtful and careful. This analysis also brings into question an age-based explanation for the differences in learning rates between the lab and MTurk settings as well: Rather than age as a primary factor in reducing learning rates for MTurk subjects, the environment itself appears to be the primary explanatory factor, or perhaps even an age by environment interaction—younger participants who completed the study via MTurk do not perform as well as their in-lab peers nor as well as older MTurk workers. As no data exist for older participants in the lab, it is impossible to conduct a true comparative analysis. Regardless of these explanations however, the animacy advantage in free recall did not interact with age, F(3, 616) = 1.04, MSE = 0.009, η²p = 0.005, p > 0.10 , and is present across all age quartiles.

For list composition, only a significant effect of word type existed, F(1, 796) = 60.36, MSE = 0.009, η²p = 0.070, p < 0.001, with all other Fs < 1. This analysis

confirms the results shown in Figure 7—that list composition (at least for the range observed) does not interact with the animacy advantage in free recall.

Finally, a similar set of exploratory analyses were conducted on participant-reported Person and Thing Orientation (Graziano et al., 2011). Table 9 reports

descriptive statistics for observed Person and Thing Orientation scores both overall and by reported gender identification—it is important to note that scores were only

available for 90% of the sample (720 participants) due to an error in the survey (10% of participants did not receive the PTO Scale). While Person Orientation did not differ by gender identity (F < 1), men reported higher levels of Thing Orientation than did

Table 9

Person and Thing Orientation Descriptive Statistics by Gender

__________________________________________________________

Scale N Mean SD

__________________________________________________________

Person Orientation 717 2.91 0.73

Female 328 2.93 0.68

Male 389 2.89 0.76

Thing Orientation 717 2.81 1.09

Female 328 2.37 1.01

Male 389 3.17 1.01

__________________________________________________________

women, F(1, 716) = 111.39, η²p = 0.135, p < 0.001. Participants who did not identify as either male or female were ignored in these analyses. These results are mostly

consistent with extant data on Person and Thing Orientation: Typically, men report much higher levels of Thing Orientation than women, while women report somewhat higher levels of Person Orientation than men (Graziano et al., 2011). The numerical difference in Person Orientation by gender is usually much smaller than the difference for Thing Orientation, however. Therefore, it is somewhat unsurprising that a gender difference did not emerge for Person Orientation.

Participants were binned into quartiles based on both Person and Thing Orientation, and Figures 8 and 9 plot proportion of words correctly recalled by both word type and quartile. For Person Orientation, a 2 x 4 mixed ANOVA with word type

Figure 8. Results from Study 3 presented on the subject level: Mean proportion of items correctly recalled as a function of word type and participant Person Orientation (divided into quartiles). Data shown are overall recall averages across all three trials.

Error bars represent standard errors of the mean.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1.0‐2.5 2.6‐2.9 2.9‐3.4 3.4‐5.0

Proportion Correct Recall

Person Orientation Score (Quartiles) Animate Inanimate

Figure 9. Results from Study 3 presented on the subject level: Mean proportion of items correctly recalled as a function of word type and participant Thing Orientation (divided into quartiles). Data shown are overall recall averages across all three trials.

Error bars represent standard errors of the mean.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1.0‐2.0 2.1‐2.8 2.9‐3.6 3.7‐5.0

Proportion Correct Recall

Thing Orientation Score (Quartiles) Animate Inanimate

as a within-subject factor and quartile as a between-subjects factor confirmed an effect of word type, F(1, 716) = 57.53, MSE = 0.009, η²p = 0.074, p < 0.001, but also an effect of quartile, F(3, 716) = 4.46, MSE = 0.092, η²p = 0.018, p < 0.01. These results indicate that participants with higher levels of PO recalled more words overall, regardless of type (an interaction for these factors was not present, F < 1). While an explanation for this pattern is not immediately clear, it is important to note that PO did not interact with the animacy advantage in free recall. A similar analysis for Thing Orientation confirmed a main effect of word type, F(1, 716) = 56.55, MSE = 0.009, η²p

= 0.073, p < 0.001, no effect of Thing Orientation, F(3, 716) = 1.26, MSE = 0.094, η²p

= 0.005, p > 0.10, and no interaction between these factors, F(3, 716) = 1.25, MSE = 0.009, η²p = 0.005, p > 0.10. These results indicate that the animacy advantage in free recall also does not interact with Thing Orientation. Apparently, individual differences in interest for people and things do not moderate the animacy effect in any grand sense, though further examination may still be warranted.

Item-Level Analyses of Recall Data

There are 292 words in common between the Rubin & Friendly (1986) dataset and the current dataset. Average recall data (that is, averaged across the three trials) correlates between these two sets at r(290) = 0.391, which is significant at p < 0.001.

Furthermore, mean recall values for these words differ only slightly, but significantly (MRF = 0.549, SDRF = 0.108; MV = 0.531, SDV = 0.109; t(291) = 2.24, p < 0.05, d = 0.151). The Rubin & Friendly dataset however has relatively few observations per word in many cases. Analyzing only cases where the number of subjects exposed to a given word was 20 or more (to match the current dataset, but this reduces the number

of shared words to 128) modestly increases the relationship between the two recall metrics, r(126) = 0.468, p < 0.001, and eliminates the statistical difference in mean recall scores (MRF = 0.541, SDRF = 0.099; MV = 0.525, SDV = 0.117; t(127) = 1.65, p >

0.10, d = 0.147). While the relationship could be stronger, it is encouraging that some relationship exists between the original normative data for recall and the present study.

The difference in overall recall levels may be partially explained by the MTurk sample as well, which performed worse on the task than in-lab participants—all of the data in the Rubin & Friendly norms were collected in-lab (the Internet had not yet been invented).

Proportion animate and inanimate items correctly recalled plotted by recall trial is presented in Figure 10. These data mirror those of the subject-level analysis (Figure 4), but word type is now a between-subjects variable, as the subject of analysis is now the words themselves. Like in the subject-level analysis, there are effects of both word type F(1, 1035) = 38.22, MSE = 0.032, η²p = 0.036, p < 0.001, and recall trial, F(2, 2070) = 3177.64, MSE = 0.007, η²p = 0.754, p < 0.001. Further mirroring the subject level analysis, an interaction exists between the two variables, F(2, 2070) = 4.65, MSE

= 0.007, η²p = 0.004, p < 0.05. Once again, the interaction appears to represent how the animacy advantage is larger in the second recall trial. Further, planned comparisons of word type for each recall trial revealed that the animacy advantage was reliable

throughout the study, from the beginning to end (all t > 4.1, p < 0.001). Altogether, the fact that the item-level analysis mirrors the subject-level analysis suggests that the animacy advantage in free recall is independent of the list any given participant saw.

Figure 10. Results from Study 3 presented on the item level: Mean proportion of items correctly recalled as a function of recall trial and word type. Data shown are averaged across the three recall trials and separately for each trial. Error bars represent standard errors of the mean.

0.54 0.50 0.38 0.35 0.57 0.52 0.67 0.63

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

In document Exploring animacy as a mnemonic dimension (Page 111-124)