• No results found

• Please make sure you annotator strictly follow this protocol when code the data

• Your cooperation is important to evaluate the protocol in our effort of improving anno- tation quality and coder agreement

• Step 1: Read the attached introduction carefully

– At this step annotator doesn’t need to care about the coding manual. – Instead, pay attention to understand the author’s hypothesis statement

– And author’s intent of making argument to support and/or oppose his hypothesis.

• Step 2: Start coding sentences of the introduction using the coding file

– Identify the hypothesis. If no hypotheses are identified, no needs to identify sup- port/opposition sentences.

– Identify the finding. It’s more important to locate the core sentence (i.e., sentence with citation expression) and content-related satellite sentences than the transition sentences

– Identify the support/opposition sentences. Most of the time, support/opposition sentences are satellite sentences. There however are cases whether orphan sentences (non-satellite, non-core) play support/opposing role.

– If the study supports/opposes the hypothesis, choose the best sentence(s) that states the ideas: (1) Differentiate idea statement and explanation/elaboration sentences, or (2) Sentences that explain/elaborate the key idea should not be coded as support or opposition sentences.

• Highlight guidelines

– Hypothesis sentence is not a question sentence. – Only mark the hypothesis content.

Example: “Our hypothesis as a class was that time of day and gender will not make a difference in the responses of strangers and our alternative hypothesis Is that time of day and gender will alter the responses of participants” – Mark all possible study mentions (i.e., citation) no matter which standard they have.

Example: “Another supporting study was conducted Rutkowski in 1983 that also demonstrated that with larger groups comes less help for victims in non-emergency situations due to less social pressure Rutkowski, 1983.”

Example: “One strong study that opposes the bystander effect was done in 1980 by Junji Harada that showed that increase in group size, even in a face to face proximity, did not decrease the likelihood of being helped (Harada, 1980).”

APPENDIX C

SAMPLE OUTPUT OF SEGMENTATION ALGORITHM

Gender discrimination is prevalent in varying degrees of severity worldwide. Some countries have a reported lack of gender discrimination but it is difficult for every individual society to remove all gender bias. Some cultures are inherently gender-biased through the use of a gendered language. A study published in October of 2011, researchers found that countries with gendered language exhibit less gender equality than those with gender neutral language. (Prewitt-Freilino, Caswell, & Laasko (2012).) Gendered language can take the form of masculine and feminine verbs in romance languages but in English, a naturally gendered language (Prewitt-Freilino, Caswell, & Laasko (2012).), certain words are given a gender through their continued use in a gender discriminatory way. Gendered language affects how people perceive themselves and how they present themselves to others through the use of language, biased or neutral.

Gendered language affects self-perception beginning at a very young age and carries through to adult life in many people. Gender-biases are highly prevalent in adult society when it comes to self-perception, whether it is division of labor in the home or success and compensation in the workplace. A study published in the European Journal of Social Psychology comparing the femininity and masculinity of someone’s actual and ideal selves, found that, regarding professional life, people, both male and female, described their ideal-self being more masculine than their true-self. In the same study, researchers found that in personal relationships people tended to value neutral, or feminine qualities over masculine ones. (DeMarree (2014).)

The traits people assign themselves, whether ideal or true, define who they are and how they describe themselves. Beginning at a young age, each person gathers information about his or her-self based on his or her perceived worth as a person and as a member of a specific group of people or society. Self-esteem is not static and can change on a daily basis. Even something as simple as a person’s mood can change how the perceive themself. Although people with both high and low self-esteem rate themselves positively when in a good mood, it only takes a bad mood for someone with low self- esteem to look at themselves negatively. (Brown, & Mankowski (1993).) People with a higher self-esteem are more influenced by extreme or high intensity words. (Bowers (1963).) The dynamic shifts in self-esteem make understanding it and learning to manipulate it so important to allow society to grow in a more positive, self-confident direction.

In the study we conducted in our research methods in psychology class, we wanted to see if people chose words to describe themselves based on the gender identities assigned to them by their biological sex. We predicted that Participants would more strongly endorse gender-biased words to fit the gender-stereotypes society expects them to fit. The second thing we tested was if a participant had high self-esteem, would they more strongly endorse formal words to describe themselves rather than informal counterparts.

APPENDIX D

PREDICTING PEER RATING IN ACADEMIC ESSAYS

D.1 PEER RATING DATA

In Chapters9and10we showed that argument mining output helps improve persuasive essay score prediction. In this study, we explore an application of argument mining for academic essay scoring. We utilize the academic essay corpus which has been used for our argument mining research (Chapter 3).

The corpus consists of 115 introductions of observational studies written by college stu- dents. The essays were submitted to the SWoRD peer review system (Cho and Schunn, 2007) and reviewed by students in the same classes.1 Student reviewers were asked to pro- vide textual comments and numerical ratings to the papers that they review. The rating rubric is listed in Figure 17. Among 115 essays, we have 113 essays reviewed and graded by student reviewers. Each essay was graded by at least 3 and at most 5 students in scale 1–7. The final score of each essay is a weighted average of peer ratings in which weights indicate rating reliability computed by SWoRD. Although we do not have teacher’s grade for the essays, research in peer assessment has shown that peers’ grade can be as reliable as teacher’s in multiple peer condition (Cho et al., 2006). Thus, our current study uses the weighted average rating of student reviewers as an estimate of essay quality. As shown in Figure 18, the majority of the essays have high score (> 4) and no essay was graded below 2.

Consider the following points when giving your rating:

• Central topic introduced and background information provided?

• Brief high-level overview of study design and clear statement of hypotheses? • Appropriate integration of conflicting research findings into a convincing ar-

gument for at least one hypothesis?

Figure 17: Peer rating rubric.

Figure 18: Peer rating histogram.