Chapter 4 Stage One: designing and developing an online Implicit Association Test to measure stereotypes of empathy in scientists
4.2. Step I: Selecting Appropriate Categories and Items 1 Category label selection
4.2.2. Stimulus item selection
4.2.2.2. Item selection
In order to select the attribute stimulus items that can best represent rationality and
empathy without ambiguous category membership, a small group of university students
was organized to rate the generated items about their representativeness of rationality and empathy.
4.2.2.2.1. Participants and procedure
Participants were recruited by oral invitation. The study was introduced as a short survey about vocabulary categorisation to random students from two colleges at the University of Cambridge. Participants were offered chocolate bars for their participation in the study.
The survey was carried out in the common rooms of the recruited participants' colleges using the traditional paper-and-pencil administration. Each participant was given a questionnaire comprising the written instructions, item categorization questions and personal information questions. The researcher was present at the survey session to guarantee that participants understood the instructions and completed the questionnaire independently. It took each participant less than 5 minutes to complete the questionnaire.
A total of 32 participants took part in the study, of which 2 had to be removed from further analysis for not having finished the questionnaire. The final sample of 30 participants consisted of 16 (53.4%) women and 14 (46.6%) men. The mean age was 25.9 years (SD = 5.2), ranging from 19 to 39 years. Regarding their major subjects, 20 (66.7%) participants were majoring in liberal arts and 7 (23.3%) participants were majoring in science, the remaining 3 participants did not report the major subject. Though information about their nationality and English levels was not collected, around 80% of the participants were Caucasians and 20% were Asians. Given that all overseas students enrolled in the University of Cambridge are required to demonstrate
competence in the English language at a very high level by taking a standard English language test (e.g., The language requirements are 7.5 out of 9 for the International English Language Testing System (IELTS) and 110 out of 120 for the Test of English as a Foreign Language (TOEFL), all participants were deemed to speak proficient English so were eligible to evaluate the items.
4.2.2.2.2. Materials
A vocabulary categorisation scale was developed by the researcher to assess each item's representativeness of rationality or empathy. Following the item rating method introduced by Fleischhauer, Strobel, Enge and Strobel (2013), participants were asked to rate how much they associate each item with rationality or empathy on a 7-point Likert scale with the attribute anchors 'strongly rationality' versus 'strongly empathy' (See Appendix I for the full attribute item selection scale). Items associated with
rationality were scored 1 to 3, the smaller the score the more representative was the
item of rationality. On the contrary, items associated with empathy were scored 5 to 7, the bigger the score the more representative was the item of empathy. Items in the middle that scored close to 4 were the ambiguous ones that participants found could be associated with both empathy and rationality.
4.2.2.2.3. Results
Table 4.2 presents the mean categorisation ratings of the attribute items. Items with mean scores smaller than 4 are listed on the left as more associated with rationality in an ascending order, and items with mean scores bigger than 4 are listed on the right as more associated with empathy in a descending order. Items presented in the top rows were those with clear category membership rated closest to the two extreme scores (1 and 7) while those in the bottom rows were those with ambiguous category membership that rated close to the middle point (4).
Table 4.2 Mean ratings for attribute items associated with rationality or empathy
Rationality Mean Empathy Mean
Rational 1.42 Empathetic 6.30 Logical 1.69 Emotion 6.18 Reasoning 1.79 Feeling 6.18 Organised 1.81 Affection 6.06 Deduction 2.00 Sensitive 5.97 Systematic 2.09 Caring 5.87 Coherence 2.51 Considerate 5.70 Consistent 2.52 Perspective-taking 5.27 Induction 2.75 Concern 5.25 Ordered 2.75 Appreciation 5.24 Analytical 2.85 Soul 5.06 Calculative 2.85 Understanding 4.72 Standardised 3.71 Intuitive 4.42 Insight 3.72 Ethical 4.18 Sanity 3.90 Comprehensive 3.85 4.2.2.2.4. Discussion
There are three rules in selecting the appropriate stimulus items for empathy and
rationality. First and foremost, the selected items should be representative of only the
corresponding category (Nosek, Greenwald, et al., 2007). Based on the results of the present study, items with mean ratings ranging from 2.75 to 5.25 were considered to have ambiguous double memberships of rationality and empathy, therefore were dropped from the final selection.
The second rule is to avoid items that can be categorised on the basis of irrelevant stimulus features. Specifically, selected items should have similar word length with different initials. For example, if all rationality items were less than 5 letters and the
empathy items were more than 10 letters, participants could sort the items based on
evaluative word length rather than the word meanings. Similarly, if all rationality items were words with the same initial "c", participants could sort these items based on similarities in initials rather than the category membership of the words. Such confounding valences of the items were taken into consideration and carefully avoided. For a good overview of the influence of valence variety and similar letter length on IAT effects, see Teige-Mocigemba, Klauer, & Sherman (2010).
The third rule is to ensure that the category membership of selected attribute items is clear and will not be confounded with target categories (Nosek, Greenwald, et al., 2007). For example, using "subject-related" rationality and empathy items such as "calculative" and "appreciation" could introduce confusion about whether to categorise the items on the basis of their membership to academic discipline or the attribute evaluation. Therefore, items with confounding membership to science and liberal arts were also dropped.
Furthermore, regarding the number of items in each category, evidence has shown that varying the number of items in both the target and attribute categories had no significant influence on the IAT effect magnitude (Greenwald et al., 1998; Nosek et al., 2005). Greenwald et al. (1998) reported no difference of IAT effect magnitude between stimulus sets of 25 items and of 5 items. As long as the number of stimulus items for each category is more than 2, the overall magnitude of implicit biases was consistent, and smaller number of items did not impair the reliability of the task, nor did it increase the influences of potential confounding factors. Such findings suggest that the magnitude and reliability of IAT effects were relatively unaffected by the number of stimulus items per category, except that effects were somewhat weaker when only a single stimulus item per category was used (Nosek et al., 2005). The common amount
of stimulus items per category in the existing IATs is between 5 to 8. As such, we decided to choose 8 items (per category) that meet the three rules mentioned before. Table 4.3 presents the final selected items for the preliminary SE-IAT (1st version).
Table 4.3 Selected items for the preliminary SE-IAT (1st version)
Science Liberal arts Rationality Empathy
Scientist Artist Consistent Affection
Chemist Linguist Coherence Considerate
Physicist Philosopher Deduction Caring
Mathematician Historian Logical Emotion
Engineer Educator Organised Empathetic
Computer scientist Anthropologist Rational Feeling
Astronomer Sociologist Reasoning Perspective-taking
Biologist Musician Systematic Sensitive
Note: Science and Liberal arts are target categories, and Rationality and Empathy are attribute categories