How does access to this work benefit you? Let us know!

(1)

City University of New York (CUNY) City University of New York (CUNY)

CUNY Academic Works CUNY Academic Works

Dissertations, Theses, and Capstone Projects CUNY Graduate Center

9-2017

A Sentiment Analysis of Language & Gender Using Word A Sentiment Analysis of Language & Gender Using Word Embedding Models

Embedding Models

Ellyn Rolleston Keith

The Graduate Center, City University of New York

How does access to this work benefit you? Let us know!

More information about this work at: https://academicworks.cuny.edu/gc_etds/2394 Discover additional works at: https://academicworks.cuny.edu

This work is made publicly available by the City University of New York (CUNY).

Contact: [email protected]

(2)

A SENTIMENT ANALYSIS OF LANGUAGE & GENDER USING WORD EMBEDDING MODELS

by 

ELLYN ROLLESTON KEITH

A master’s thesis submitted to the Graduate Faculty in Linguistics in partial fulfillment of the requirements for the degree of Master of Arts, The City University of New York

2017

(3)

© 2017

ELLYN ROLLESTON KEITH All Rights Reserved

ii

(4)

This manuscript has been read and accepted by the Graduate Faculty in Linguistics in satisfaction of the thesis requirement for the degree of Master of Arts.

Professor Martin Chodorow

____________________ ______________________________

Date Thesis Advisor

Professor Gita Martohardjono

____________________ ______________________________

Date Executive Officer

(5)

ABSTRACT 

A Sentiment Analysis of Language & Gender Using Word Embedding Models by  

Ellyn Rolleston Keith

 

Advisor: Martin Chodorow

Since Robin Lakoff started the conversation around language and gender with her 1975 essay

“Language and Woman’s Place,” extensive work has been done on analyzing sociolinguistics associated with gender. While much work has been done on the differences between how men and women use language, there is less research to be found on language about women as

opposed to language about men. In this work, I build a word embedding model from a corpus of Wikipedia film summaries and use this model to create lists of words associated with men and words associated with women. I then use sentiment analysis tools to assess the emotional valence of these words and sentences containing them. I find that when comparing words and sentences associated with men and women, language about women tends to be more consistently more positively valenced, while words associated with men cover a wider breadth of valences, skewing slightly negative.

iv

(6)

Acknowledgements

None of this would have happened without the continuous support and encouragement of the following people: My advisor, Dr. Martin Chodorow, for his endless patience and insight; Dr.

Rivka Levitan, for always having thoughtful answers and new ideas; Rachel Rakov, Maxwell Schwartz, and Pablo Gonzalez, for always believing I knew more than I thought I did; my parents, Rob & Margie Rolleston, for ceaselessly, unwaveringly, believing in me; and lastly, to Adam Keith, for flat-out refusing to give up on me, no matter how much I whined. Thank you.

(7)

List of Tables

1 Seed words……….……6

2 Top words associated with men, women, and shared………6

3 Maximum and minimum similarities and frequencies………..….8

4 Example of VADER compound scoring……….9

5 Example of sentence/valence mismatch……….11

6 Strongest positive and negative valence scores……….…12

7 Averages of word valence……….….13

8 Averages of sentence valence………14

viii

(10)

1 Introduction 

With her 1975 essay “Language and Woman’s Place,” Robin Lakoff introduced work on

sociolinguistics and gender. Since then, many have continued to study how men and women use language differently. This work has continued as research has turned to computational linguistics.

Less common is work on how language about men and women differ. Qualitatively, we can see that this varies significantly--one example is how reporters write about their interview subjects, seemingly depending on whether the subject is male or female (Freeman 2017). However, little work has taken a quantitative approach to this question. In this study, I use a word embedding model to explore gender associations of words, focusing on emotional valence.

It is important to acknowledge that gender exists on a spectrum, and reducing it to simple binary labels is insufficient. With that being said, it cannot be denied that perceptions of these binary labels exist ubiquitously. It is my hope that developing tools to recognize differences in how men and women are talked about may help to lessen those differences, thereby creating space for movement along the gender spectrum and improving balance and neutrality in writing.

2 Related Work

2.1 Qualitative Research

Lakoff’s essay is divided into two main sections. In the first section, Lakoff discusses features she describes as indicative of “women’s language.” These features, including hedges, tag questions, and closer adherence to standard language forms, have been restated as “powerless language” by O’Barr & Atkins (1980), who found substantial instances of men also using these features in a courtroom setting. However, a meta-analysis by Leaper and Robnett (2011), who

(11)

referred to the register as “tentative language,” found that women do tend to use these speech features more frequently than men. Regardless of the actual numbers, stereotypes around women and this language register persist, as in the case of a Google Chrome app designed to note use of tentative language in emails. While Google did not market the app specifically to women, news outlets immediately expressed this view of the product (Cauterucci 2015, Minter 2016).

The second half of Lakoff’s essay discusses language about women. Lakoff points to word pairs that should be analogous, but are not. As a template, we consider the classic analogy

“man is to woman as king is to queen”:

1. Man : woman :: king : queen

Semantic minutiae aside, we can consider these analogies broadly equivalent: the sentences “She is Queen of France” and “He is King of France” carry the same general semantic meaning.

Lakoff gives examples of where these analogies fall short:

(a) He is a master of his craft.

(b) ?? She is a mistress of her craft. 

(c) He’s a lifelong bachelor.

(d) ?? She’s a lifelong bachelorette. 

(e) Trudy is Pete’s widow.

(f) ?? Pete is Trudy’s widower.

These examples suggest that man:master :: woman:mistress is a false equivalency. Further work has suggested that not only are these analogies nonequivalent, but that these non-equivalencies

2

(12)

can in fact reinforce standing beliefs or assumptions. A study by Jacques-Tiura et al (2015) found evidence that men’s discussions with each other about women may encourage or discourage inclinations toward sexual violence. Other qualitative analyses have also discussed how language about women can strongly contribute to actions (Welling 2014, Weatherall 2015).

2.2 Quantitative Research

In the intersection of gender and computational linguistics, work on gender and language use has included gender classification (Bammen et al. 2012, Rao et al 2010) as well as analyses of gender in film dialogues (Ramakrishna et al. 2015, Agarwal et al. 2015). Both of these styles of research focus still on language spoken by men and women, whether in online content, as in the case of gender classification, or scripted content (where it’s worth considering that the language spoken by female characters may be penned by male writers). In order to take a computational approach to language about men and women, however, other approaches are necessary.

Initially, I looked for a word list hand-tagged with words associated with gender. Aside from words specifically indicating man and woman such as boy, girl, his, hers, I looked for words that humans had associated as more aligned to their concept (for better or for worse) of

“man” or “woman,” theorizing that words associated with home life would be more aligned to women, and words associated with work and livelihood would be more aligned to men.

When I was unable to find any data hand-tagged for associations with gender, I looked to word embeddings. Word embedding models have proven useful in computational work on word and phrase meaning, particularly as exemplified by their ability to solve analogies. The

commonly available word2vec embedding (Mikolov et al., 2013), trained on a Google News

(13)

corpus of 3 million English words, readily returns x=queen when queried man:king :: woman:x.

There is evidence, however, that suggests a distinction between language about men and women in these models. Bolukbasi et al. (2016) show that when the same Google News word2vec model is queried man:computer programmer :: woman:x, it returns x=homemaker. A model is only as good as the data on which it’s trained, and while we might expect--or hope--for little gender bias in this corpus, that doesn’t seem to be the case. In order to see if this mismatched analogy was specific to the Google News word2vec model, Bolukbasi et al. also looked at an embedding model trained on a web-crawl corpus by GloVe, an embedding algorithm by Pennington et al.

(2014). Bolukbasi et al. found the results between word2vec and GloVe to be highly consistent, suggesting that these gender-biased findings are not the result of algorithm design or data selection, but instead are a true representation of gender biases that exist in language use.

3 Data 

For this analysis, I used a dataset collected by Bammen et al (2013). This dataset contains 42,306 film summaries that were extracted from Wikipedia. The summaries are annotated with part of speech tags, parses, named entity recognition, and coreference resolution. The corpus also contains metadata for the films, including film release date, genre, and actor names and genders.

It is worth noting that while the films range from the early twentieth century to recent years, we can assume that the summaries themselves are relatively new. Also, Wikipedia’s contributor demographics are not balanced: Between 87-90% of “Wikipedians” are male (Glott et al 2011, Pande 2011). In a broad sense, therefore, we are looking at how men describe male and female characters.

4

(14)

I hypothesize that we will see a difference in language about men and women in terms of emotional valence.

Based on the stereotype that “women are

emotional,” (Fischer 2000), I expect documents about women will be more strongly positive or negative, while documents about men will be closer to neutral.

4 Methodology 

To establish some sort of rubric by which to evaluate the sentences in these summaries, I needed words tagged for gender association. I created a word embedding model to find words semantically close to a list of seed words associated with men and another list of words associated with women. I evaluate these words, and sentences

containing these words, based on weighted valence averages for positive and negative sentiment across the categories. I also examine parts of speech of the words and look at the distribution of the valence scores.

4.1 Word Lists with WEM 

Using the wordVectors library in R (Schmidt 2016), I trained a word embedding model on the film summary corpus, yielding a word2vec file of 92,139 rows and 500 columns. From this model, the top 500 words associated with men and women were extracted. These word lists were created by finding the cosine similarity of word vectors to the centroid of vectors of a given list of nouns and pronouns. These seeds words are shown in Table 1. After removing stopwords and

(15)

overlapping vocabulary, 103 words associated with men and 86 words associated with women remained, along with 34 words shared between the lists. These words are referred to as M-words, W-words, and shared words. The words were tagged for part of speech by finding the word and its part of speech in the Brown corpus in Python’s NLTK (Bird et al. 2009). Some words were tagged with multiple POS tags, but as that did not affect the cosine similarity or frequency count, this was not an issue.

Table 2 shows the top ten words in each list, sorted by word count and by cosine similarity. A brief qualitative analysis of these word lists shows that W-words seem to be more centered around home life, family, and appearance. All of the nouns refer to the subject in relation to other people (“daughter”, “sister”, “ mother”), while the words in M-words category

6

(16)

lean more towards professions (“farmer”, “policeman”, “businessman”), with only two nouns referring to family.

Among the shared words, it stands to reason that general words such as “man,” “boy,”

“woman,” and “girl” were found on both lists. These lists do suggest that language about women more frequently refers to that woman in reference to a man. “Husband” appears on the list of W- words, but “wife” is not among the M-words. Similarly, “he” is on the list of shared words, but

“her” appears only in the list of W-words. This qualitative look at the data suggests that there may be a greater variation in language associated with men, and that language about women may disproportionately describe them in terms of others.

Verbs were left non-lemmatized, as it was hypothesized that verb forms may contain useful information: in the list of top M-words, we see only one verb , while we see that three different forms of the verb “to marry” are among the top ten most frequent W-words. Figure 1 shows the relative frequency distribution of parts of speech in the lists of M-words and W-words.

There is little difference between adjective and noun frequency distributions, but there are around twice as many verbs in the list of W- words.

4.2 Word Weighting

Once the word lists were built, I needed to account for the word frequency in the corpus as well as the cosine similarity between the word and its seed vector. Table 3 shows the

(17)

top word and bottom word in each category, sorted by these two measures. If one were to assess these word lists based on cosine similarity alone, “catechism” would come out to be slightly

more similar to words such as

“woman” and

“girl” than the word “love”:

When the words are sorted by cosine similarity, “catechism” is 18th on the list, while “love” is 30th. If the lists were

assessed based only on word frequency, with no regard to the cosine similarity, “catechism”

would rank much lower than “love.” To balance these factors, each word was assigned a weight equal to the product of the cosine similarity and the common log of the word count.

4.3 Valence Scores 

In order to see if words associated with men and women have distinct emotional valence or intensity, I used VADER, a sentiment analysis toolkit for Python from Hutto and Gilbert (2014).

VADER was designed specifically for sentiments expressed in social media, and it has a lexicon of over 7500 words and emoticons. Each entry into the lexicon is annotated for sentiment valence. VADER includes a tool that assigns a compound sentiment intensity score between -1 and 1 to a given string of any length: word, sentence, or greater. Negative scores connote a negative association, while positive scores connote a positive association. The further the score is

8

(18)

from zero, the greater the emotional intensity. As an example, VADER assigns the word “hate” a compound score of -0.6, while the word “fine” receives a compound score of 0.2.

The majority of word valence scores are neutral, which is a common challenge for sentiment analysis. Particularly at the word level, this is unsurprising, given the scarcity of context. Additionally, the VADER documentation confirms that an initial valence of 0 is assigned to a given string, and that valence remains unchanged if the string doesn’t contain any words from the built-in dictionary. Table 2 shows an example with the word “aimless,” which appears

in the list of M-words. Intuitively, we would expect this word to receive a slightly negative score. But both by itself and in context, we see a score of zero. When the word “man” is changed to “fool,” VADER returns a negative compound score. The same score is returned from the word “fool” by itself, suggesting that this is the only word in the sentence contributing to the score.

I chose to calculate negative, positive, and neutral valences at the word level before moving on to the sentence level, in order to see if there is a difference between looking at words in isolation and within the context of the sentence.

(19)

5 Evaluation

5.1 Weighted Valence Averages

Weighting the valence scores by cosine similarity and word frequency did not yield any difference in averages on the word level, as shown in figure 2. Each of these average valences

does not hover far from 0.5 or -0.5, suggesting that these averages are not imparting any new information.

Additionally, 80% of the list of 224 tagged words were assigned valence scores of zero. Theorizing that this may be due to a lack of word context, the next step was to find these same averages at the sentence level.

Out of the corpus of nearly 700,000 sentences, 200,000 of them contained words from the tagged list. Out of these sentences, only 28% received valence scores of zero. This seemed promising, suggesting that additional context did help the sentiment analysis tool. However, at the sentence level, two issues appeared.

NLTK’s sentence tokenizer, which was used to isolate sentences from each film summary, does not readily accommodate the imperfections of Wikipedia text, and it must be recognized that the data is not perfectly clean. Some summaries contain URLs or HTML tags.

10

(20)

Many have run-on sentences and sentence fragments. The other problem had to do with the polarity score VADER assigned to a given sentence.

Table 4 shows two examples of sentences and their assigned scores. Intuitively, these scores seem to be inaccurate. The second example may be debatable, but rating the first sentence as positive seems inarguably wrong.

With those caveats aside, it was hoped that having more non-zero valence scores would show different average scores among the categories. The weighted and unweighted means for positive and negative valence scores were again calculated for each category, this time at the sentence level. In a departure from my hypothesis, the averages only become closer to 0.5 or -0.5, as shown in figure 3.

There are a variety of possible reasons for this. The combination of messy data and the choice of sentiment analysis tool may have resulted in noise, giving us averages that are not informative. This may also tell us that this

(21)

is not the best method by which to answer the question of language about men and language about women. According to this metric, there is no difference between emotional intensity of W- tagged sentences and M-tagged sentences.

5.2 Valence Score Distributions

Having found that taking means of scores is not informative, I next looked at the distribution of scores. When scores of zero are removed, 21 W-words and 16 M-words remain, representing 24.4% of the W-tagged words and 15.4% of M-tagged words. Figure 4 shows the relative

frequency histograms of M- and W-tagged words.

There are no W-tagged words in the bin representing the most negatively valenced words.

The greatest number of W- words are in the rightmost bin, indicating the most positively valenced words.

There are consistently more M-tagged words to the left of zero, and more W-tagged words in two out of three bins to the right.

A cursory look at these words makes the distinction clear. Table 5 shows the top three strongest negative and positive valences for M- and W-words. The most strongly negatively

12

(22)

valenced W-word, crying, is still less valenced than the third most negatively valenced M-word, liar. Similarly, the M-word with the strongest positive valence, richer, has less strong of a valence than the third strongest W-word beauty. No single word returned a valence score between the bins of -0.2 and 0.2, suggesting that if a single word is valenced, it is going to be a strong valence, whether positive or negative.

When averages are taken, however, the mean and median M- scores smooth out to zero and near-zero, while W-scores are all positive, as shown in table 7.

Given that all scores of zero have been removed, this suggests that the distribution of M-tagged words is more evenly distributed than W-tagged words, which seem to skew consistently positive.

This imbalance may be a consequence of benevolent sexism, a term coined by Glick and Fiske (1996). Glick and Fiske define benevolent sexism as “a set of interrelated attitudes toward women that are sexist in terms of viewing women stereotypically and in restricted roles but that

are subjectively positive” (491). This could show that female characters are described and talked about in more limited terms than their male counterparts. It also suggests that there is more range to the variety of male characters in these

(23)

summaries. Anecdotally, female characters in films are often defined by their relationship to other characters: wives, mothers, girlfriends. While the language about women may seem more positive, it may in fact be

reflecting stereotyped and limited descriptions or depictions of female characters.

At the sentence level, we see a similar pattern. Figure 5 shows the frequency histograms of M- and W-tagged sentences. In every bin from -0.9 to 0.3, we consistently see more M-tagged sentences than W-tagged sentences. Only when valence scores reach 0.5 do we see greater

relative frequencies of W-tagged sentences than M-tagged ones.. Not only is this consistent with the benevolent sexism theory, it also may suggest that language about men in this data set is more varied in general, spanning a greater gamut of emotion. It holds to reason that, in a

distribution of hundreds of thousands of scores, we see more scores closer to zero. This

distribution lowers the mean of W-tagged sentences as opposed to W-tagged words, as shown in Table 8. Interestingly, however, the mean, median, and mode of M-tagged sentences are all more negative than the averages of

14

(24)

individual M-tagged words. One reason for this may be very data-specific: I hypothesize that there are significantly more male villains than female villains in film. While these villains may have female partners, it is rare that we would see a female villain as the sole antagonist.

Therefore, we will rarely find strongly negative-valenced W-sentences.

6 Conclusion & Future Work

This study has aimed to develop a method with which to analyze language about men and

women. Some results seem promising, showing that there does seem to be a consistent difference in how men and women are discussed. One missing piece at this time is in regard to coreference resolution and dependency parses: it is not only possible, but likely, that many of the M-tagged words appear in a sentence in reference to a female character, and vice-versa. It is also likely that sentences refer to both men and women. Future work will take these dependencies into account and re-evaluate sentences with this added information.

The specificity of this corpus is worth bearing in mind. These user-contributed Wikipedia summaries are, as previously noted, imbalanced in terms of gender, and not cleanly copy edited.

Using these same evaluation techniques on other corpora will help determine the validity of this method. Because these are film summaries, it is possible that what has been found here is not sexism in descriptions, but sexism in film. If the majority of female characters are written in a more limited scope than male characters, it follows that a description of these characters would stay within that scope.

(25)

This question could be pursued by looking at subsets of this data set. The corpus includes metadata that was not used in this study, such as syntax parses and film release date. By making use of dependency parses in order to break a summary down to sentences referring to men and sentences referring to women, the data could then be binned by film release date or actor gender proportion. If a diachronic analysis of the data showed a difference in language about men and women relative to when the film was released, this may show how depictions of male and female characters have changed—or remained unchanged—over time. Similarly, looking at films that have a greater number of female characters as opposed to films that have a fewer number may also help to find patterns in character representation and descriptions.

This study set out to analytically determine if there are differences in how men and women are talked about. The men and women in film summaries are characters being talked about—not people. Applying this methodology to other corpora such as interviews, magazine profiles, and news reports, would help to assess the validity of this technique.

In this study, I have proposed a method that can serve as a starting point to further

sociolinguistic research on language and gender. It is my hope that these methods will be refined and evolve as the work continues.

16

(26)

Appendix A: W-Tagged Words

tag word pos cosine similarity valence frequency log_freq weight

W Her PP$ 0.3909 0.0000 3106 3.4922 1.3652

W beautiful JJ 0.3613 0.5994 2378 3.3762 1.2198

W pretty JJ 0.3482 0.4939 459 2.6618 0.9269

W young NN 0.3360 0.0000 12150 4.0846 1.3723

W husband NN 0.3265 0.0000 6751 3.8294 1.2503

W someone PN 0.3058 0.0000 2111 3.3245 1.0165

W mother NN 0.2996 0.0000 14484 4.1609 1.2468

W daughter NN 0.2955 0.0000 9871 3.9944 1.1802

W couple NN 0.2896 0.0000 4143 3.6173 1.0474

W pregnant JJ 0.2875 0.0000 2151 3.3326 0.9582

W blonde JJ 0.2859 0.0000 293 2.4669 0.7053

W beauty NN 0.2785 0.5859 674 2.8287 0.7878

W beggar NN 0.2767 0.0000 160 2.2041 0.6099

W neighbor NN 0.2744 0.0000 904 2.9562 0.8111

W dress NN 0.2698 0.0000 816 2.9117 0.7855

W kind JJ 0.2643 0.5267 934 2.9703 0.7851

W catechism NN 0.2630 0.0000 6 0.7782 0.2046

W Marie NP 0.2623 0.0000 909 2.9586 0.7759

W actress NN 0.2606 0.0000 919 2.9633 0.7722

W suitor NN 0.2593 0.0000 197 2.2945 0.5950

W classmate NN 0.2566 0.0000 344 2.5366 0.6509

W lover NN 0.2556 0.5859 1829 3.2622 0.8339

W dancer NN 0.2539 0.0000 717 2.8555 0.7250

W commoner JJR 0.2504 0.0000 40 1.6021 0.4011

W sister NN 0.2483 0.0000 5540 3.7435 0.9296

W married VBN 0.2479 0.0000 5452 3.7366 0.9262

W age VB 0.2471 0.0000 1321 3.1209 0.7712

W crying VBG 0.2450 -0.4767 611 2.7860 0.6825

W servant NN 0.2408 0.0000 764 2.8831 0.6944

(27)

W sixties NNS 0.2396 0.0000 22 1.3424 0.3217

W sweet JJ 0.2394 0.4588 252 2.4014 0.5748

W Anna NP 0.2390 0.0000 1865 3.2707 0.7817

W kindly RB 0.2376 0.4939 219 2.3404 0.5560

W schoolgirl NN 0.2374 0.0000 54 1.7324 0.4113

W mistress NN 0.2367 0.0000 660 2.8195 0.6674

W thinks VBZ 0.2345 0.0000 2445 3.3883 0.7945

W male NN 0.2334 0.0000 945 2.9754 0.6944

W bubbly JJ 0.2334 0.0000 27 1.4314 0.3340

W love VB 0.2302 0.6369 16757 4.2242 0.9724

W patient JJ 0.2288 0.0000 718 2.8561 0.6535

W sexy JJ 0.2285 0.5267 221 2.3444 0.5357

W Anne NP-HL 0.2283 0.0000 1415 3.1508 0.7193

W prettier JJR 0.2274 0.4767 7 0.8451 0.1921

W mysterious JJ-HL 0.2258 0.0000 2312 3.3640 0.7595

W marry VB 0.2251 0.0000 4723 3.6742 0.8272

W girls NNS 0.2248 0.0000 3467 3.5400 0.7959

W palsy NN 0.2248 0.0000 14 1.1461 0.2577

W innocent JJ 0.2230 0.3400 1007 3.0030 0.6697

W virgin JJ 0.2227 0.0000 228 2.3579 0.5252

W aunt NN 0.2226 0.0000 869 2.9390 0.6542

W Zoe NP 0.2226 0.0000 262 2.4183 0.5382

W teenage JJ 0.2222 0.0000 873 2.9410 0.6535

W fourteen-yearJ-Jold 0.2221 0.0000 16 1.2041 0.2674

W Selma NP 0.2221 0.0000 67 1.8261 0.4055

W Mary NP-NC 0.2220 0.0000 3187 3.5034 0.7778

W nude JJ 0.2217 0.0000 220 2.3424 0.5192

W jealous JJ 0.2212 -0.4588 1095 3.0394 0.6724

W john NN 0.2205 0.0000 17 1.2304 0.2713

W fell VB 0.2203 0.0000 650 2.8129 0.6196

W heroine NN 0.2199 0.5719 189 2.2765 0.5006

W horoscope NN 0.2192 0.0000 13 1.1139 0.2442

18 22

(28)

W Aida NP 0.2170 0.0000 14 1.1461 0.2487

W two-story JJ 0.2170 0.0000 10 1.0000 0.2170

W likes VBZ 0.2168 0.4215 739 2.8686 0.6220

W mirror VB 0.2158 0.0000 713 2.8531 0.6157

W female NN 0.2154 0.0000 1743 3.2413 0.6982

W ghost NN 0.2136 -0.3182 1224 3.0878 0.6594

W maid NN 0.2131 0.0000 673 2.8280 0.6025

W gynecologist NN 0.2129 0.0000 20 1.3010 0.2770

W Naomi NP 0.2124 0.0000 182 2.2601 0.4799

W birth NN 0.2123 0.0000 1425 3.1538 0.6697

W loved VBD 0.2118 0.5994 909 2.9586 0.6267

W marrying VBG 0.2117 0.0000 639 2.8055 0.5939

W kissed VBD 0.2107 0.3818 67 1.8261 0.3848

W jilted VBN 0.2105 0.0000 44 1.6435 0.3460

W friend NN 0.2100 0.4939 10658 4.0277 0.8457

W loves VBZ 0.2092 0.5719 2411 3.3822 0.7077

W marriage NN 0.2085 0.0000 4774 3.6789 0.7669

W boys NNS 0.2078 0.0000 2681 3.4283 0.7124

W abortion NN-NC 0.2071 0.0000 329 2.5172 0.5212

W Rawson NP 0.2068 0.0000 7 0.8451 0.1748

W housewife NN 0.2067 0.0000 207 2.3160 0.4788

W Martha NP 0.2063 0.0000 718 2.8561 0.5893

W older JJR-HL 0.2063 0.0000 1826 3.2615 0.6728

W met VBN 0.2062 0.0000 1948 3.2896 0.6783

W madly RB 0.2053 -0.4019 122 2.0864 0.4284

(29)

Appendix B: M-Tagged Words

tag word pos cosine similarity valence frequency log_freq weight

M father VB 0.2395 0.0000 20037 4.3018 1.0304

M tells VBZ 0.1765 0.0000 19108 4.2812 0.7557

M doctor NN 0.1951 0.0000 3101 3.4915 0.6811

M wealthy JJ 0.2110 0.3612 2226 3.3475 0.7065

M fellow JJ 0.2489 0.0000 1835 3.2636 0.8123

M cat NN 0.1803 0.0000 1799 3.2550 0.5869

M white NN 0.1759 0.0000 1671 3.2230 0.5671

M yet RB 0.1986 0.0000 1593 3.2022 0.6361

M English JJ 0.2062 0.0000 1321 3.1209 0.6435

M grandfather NN 0.2264 0.0000 1214 3.0842 0.6982

M priest NN 0.2020 0.0000 1195 3.0774 0.6216

M soldier NN 0.1819 0.0000 1194 3.0770 0.5598

M artist NN-HL 0.1767 0.0000 1024 3.0103 0.5320

M businessman NN 0.2295 0.0000 972 2.9877 0.6858

M elderly JJ 0.2718 0.0000 940 2.9731 0.8081

M nurse NN 0.1801 0.0000 932 2.9694 0.5348

M worker NN 0.1856 0.0000 838 2.9232 0.5425

M thief NN 0.2682 -0.5267 835 2.9217 0.7836

M murderer NN 0.1808 -0.6808 815 2.9112 0.5263

M youth NN 0.1887 0.0000 761 2.8814 0.5436

M forever RB 0.2263 0.0000 723 2.8591 0.6470

M policeman NN 0.2531 0.0000 717 2.8555 0.7227

M quiet VB 0.1817 0.0000 662 2.8209 0.5126

M assumes VBZ 0.1750 0.0000 660 2.8195 0.4935

M lonely JJ 0.2461 -0.3612 608 2.7839 0.6851

M simple JJ 0.2016 0.0000 604 2.7810 0.5607

M orphanage NN 0.1902 0.0000 509 2.7067 0.5147

M farmer NN 0.2680 0.0000 500 2.6990 0.7234

M inspector NN 0.1793 0.0000 500 2.6990 0.4840

M rabbit NN 0.1748 0.0000 488 2.6884 0.4699

20 24

(30)

M Pierre NP 0.2111 0.0000 486 2.6866 0.5671

M honest JJ 0.1769 0.5106 481 2.6821 0.4744

M eccentric JJ 0.1790 0.0000 468 2.6702 0.4780

M clerk NN 0.1862 0.0000 449 2.6522 0.4938

M delivery NN 0.1783 0.0000 406 2.6085 0.4650

M musician NN 0.1813 0.0000 398 2.5999 0.4714

M middle-aged JJ 0.2268 0.0000 371 2.5694 0.5826

M bully NN 0.2244 -0.4939 309 2.4900 0.5587

M poet NN 0.1919 0.0000 302 2.4800 0.4759

M butler NN 0.1763 0.0000 291 2.4639 0.4344

M acquaintance NN 0.1958 0.0000 269 2.4298 0.4757

M aged VBN 0.2114 0.0000 266 2.4249 0.5126

M Agnes NP 0.1747 0.0000 261 2.4166 0.4221

M waiter NN 0.2187 0.0000 248 2.3945 0.5236

M lucky JJ 0.2212 0.4215 242 2.3838 0.5273

M widower NN 0.1760 0.0000 237 2.3747 0.4178

M fisherman NN 0.2119 0.0000 206 2.3139 0.4904

M playboy NN 0.1956 0.0000 203 2.3075 0.4513

M caller NN 0.1797 0.0000 196 2.2923 0.4119

M parent NN 0.1866 0.0000 193 2.2856 0.4266

M butcher NN 0.1845 0.0000 190 2.2788 0.4205

M gentleman NN 0.6595 0.0000 189 2.2765 1.5013

M smart RB 0.2033 0.4019 184 2.2648 0.4603

M wise JJ 0.1749 0.4767 173 2.2380 0.3915

M middle-class NN 0.1755 0.0000 170 2.2304 0.3914

M tutor NN 0.1762 0.0000 164 2.2148 0.3902

M noble JJ 0.2058 0.4588 145 2.1614 0.4447

M idealistic JJ 0.2076 0.4215 130 2.1139 0.4389

M Marcel NP 0.1931 0.0000 126 2.1004 0.4056

M villager NN 0.2082 0.0000 124 2.0934 0.4358

M nobleman NN 0.2156 0.0000 123 2.0899 0.4506

M contractor NN 0.1770 0.0000 119 2.0755 0.3674

(31)

M drunkard NN 0.2546 0.0000 118 2.0719 0.5276

M vendor NN 0.2182 0.0000 116 2.0645 0.4505

M fairly RB 0.1883 0.0000 99 1.9956 0.3758

M liar NN 0.1847 -0.5106 97 1.9868 0.3669

M Frenchman NP 0.1852 0.0000 96 1.9823 0.3671

M youngster NN 0.2098 0.0000 82 1.9138 0.4015

M courtesan NN 0.2051 0.0000 80 1.9031 0.3904

M shepherd VB 0.2019 0.0000 77 1.8865 0.3808

M tastes NNS 0.1818 0.0000 71 1.8513 0.3365

M Rosy JJ-TL 0.1945 0.0000 67 1.8261 0.3552

M invalid JJ 0.1747 0.0000 65 1.8129 0.3167

M retarded VBN-HL 0.1755 -0.5719 64 1.8062 0.3171

M cadet NN 0.2020 0.0000 60 1.7782 0.3592

M nagging NN 0.1783 -0.4019 60 1.7782 0.3170

M lad NN 0.1948 0.0000 57 1.7559 0.3420

M traveler NN 0.1923 0.0000 55 1.7404 0.3347

M Wednesday NR 0.1751 0.0000 54 1.7324 0.3034

M aimless JJ 0.2147 0.0000 50 1.6990 0.3648

M undertaker NN 0.1756 0.0000 45 1.6532 0.2903

M simpleton NN 0.2164 0.0000 44 1.6435 0.3557

M richer JJR 0.1764 0.5267 42 1.6232 0.2863

M uneducated JJ 0.1994 0.0000 40 1.6021 0.3195

M talkative JJ 0.1837 0.0000 40 1.6021 0.2943

M sensible JJ 0.1920 0.0000 38 1.5798 0.3033

M lame JJ 0.1908 -0.4215 33 1.5185 0.2898

M impaired VBN 0.2188 0.0000 30 1.4771 0.3232

M passerby NN 0.2024 0.0000 29 1.4624 0.2960

M fawn NN 0.1919 0.0000 22 1.3424 0.2577

M omen NN 0.2208 0.0000 21 1.3222 0.2920

M chap NN 0.2183 0.0000 19 1.2788 0.2791

M bookseller NN 0.2059 0.0000 17 1.2304 0.2534

M do-gooder NN 0.2023 0.0000 15 1.1761 0.2380

22 26

(32)

M Cantonese NP 0.1760 0.0000 13 1.1139 0.1961

M redheaded JJ 0.1928 0.0000 12 1.0792 0.2081

M lower-middle-c lNasNs 0.1764 0.0000 12 1.0792 0.1903

M accommodatesVBZ 0.2179 0.0000 10 1.0000 0.2179

M shiftless JJ 0.1942 0.0000 9 0.9542 0.1853

M apprenticed VBN 0.1813 0.0000 9 0.9542 0.1730

M redhead NN 0.1938 0.0000 8 0.9031 0.1750

M grubby JJ 0.1855 0.0000 6 0.7782 0.1443

M shacked VBN 0.1994 0.0000 5 0.6990 0.1394

M threadbare JJ 0.1977 0.0000 5 0.6990 0.1382

(33)

Appendix C: Shared Words

tag word pos cosine similarity valence frequency log_freq weight

S man NN 0.5721 0.0000 17859 4.2519 2.4327

S boy NN 0.5407 0.0000 5557 3.7448 2.0249

S woman NN 0.4462 0.0000 10062 4.0027 1.7862

S girl NN 0.4163 0.0000 8771 3.9430 1.6416

S He PPS 0.4057 0.0000 40950 4.6123 1.8710

S guy VB 0.3204 0.0000 931 2.9689 0.9514

S poor JJ-NC 0.3051 -0.4767 1579 3.1984 0.9758

S orphan NN 0.3019 0.0000 556 2.7451 0.8287

S stranger JJR-NC 0.2927 0.0000 665 2.8228 0.8261

S rich NN 0.2854 0.5574 2077 3.3174 0.9468

S prostitute VB 0.2681 0.0000 949 2.9773 0.7981

S blind JJ 0.2565 -0.4019 895 2.9518 0.7570

S child NN 0.2470 0.0000 4933 3.6931 0.9123

S charming JJ 0.2442 0.5859 275 2.4393 0.5956

S kid VB 0.2365 0.0000 582 2.7649 0.6539

S widow NN 0.2266 0.0000 807 2.9069 0.6586

S claims NNS 0.2226 0.0000 2164 3.3353 0.7425

S teenager NN 0.2196 0.0000 544 2.7356 0.6007

S little AP 0.2143 0.0000 3513 3.5457 0.7599

S wastrel NN 0.2098 0.0000 14 1.1461 0.2404

S teacher NN 0.2093 0.0000 2148 3.3320 0.6975

S old JJ 0.2059 0.0000 8419 3.9253 0.8082

S shy JJ 0.2052 -0.2500 389 2.5899 0.5314

S women NNS 0.2005 0.0000 4253 3.6287 0.7274

S lady NN 0.2001 0.0000 1040 3.0170 0.6037

S always RB 0.1925 0.0000 2120 3.3263 0.6402

S token JJ 0.1909 0.0000 63 1.7993 0.3435

S nice RB 0.1904 0.4215 352 2.5465 0.4849

S waitress NN 0.1897 0.0000 410 2.6128 0.4956

S housekeeper NN 0.1856 0.0000 288 2.4594 0.4565

S customer NN 0.1830 0.0000 297 2.4728 0.4526

28 24

(34)

S 25-year-old JJ 0.1822 0.0000 17 1.2304 0.2242

S peasant NN 0.1808 0.0000 179 2.2529 0.4074

S attractive JJ 0.1757 0.4404 661 2.8202 0.4955

(35)

References

Agarwal, Apoorv, Jiehan Zheng et al. 2015. “Key Female Characters in Film Have More to Talk About Besides Men: Automating the Bechdel Test.” Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL.

Bamman, David, Jacob Eisenstein & Tyler Schnoebelen. 2012. “Gender in Twitter: Styles, Stances, and Social Networks.” Presentation at NWAV 41, Indiana University, Bloomington.

Bammen, David, Brendan O’Connor et al. 2013. “Learning Latent Personas of Film Characters.”

ACL 2013, Sofia, Bulgaria.

Bolukbasi, Tolga, Kai-Wei Chang et al. 2016. “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.” arXiv:1607.06520.

Bird, Steven, Edward Loper, and Ewan Klein. 2009. Natural Language Processing with Python.

OReilly Media Inc.

Cauterucci, Christina. 29 December 2015. “New Chrome App Stops Women Saying ‘Just’ and

‘Sorry’ in Their Emails.” Slate.

Fischer, Agneta. 2000. Gender and Emotion: Social Psychological Perspectives. Cambridge University Press.

26 25

(36)

Freeman, Hadley. 20 March 2017. “Why Do So Many Journalists Think Female Stars are Flirting With Them?” The Guardian.

Glick, Peter and Susan T. Fiske. 1996. “The Ambivalent Sexism Inventory: Differentiating Hostile and Benevolent Sexism.” Journal of Personality and Social Psychology, 70:3, pp.

491-512.

Glott, Ruediger, Phillipp Schmidt et al. 2011. Wikipedia Study. UNU-MERIT. Archived from the original on 28 August 2011.

Hutto, C.J. and E. E. Gilbert. 2014. “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text.” Eighth International Conference on Weblogs and Social Media. Ann Arbor, MI.

Jacques-Tiura, Angela J., Antonia Abbey et al., 2014. “Friends matter: protective and harmful aspects of male friendships associated with past-year sexual aggression in a community sample of young men.” American Journal of Public Health 105, no. 5: pp. 1001-1007.

Lakoff, Robin. 1975, 2004. Language and Woman's Place: Text and Commentaries. New York, Oxford University Press.

(37)

Leaper, Campbell & Rachael D. Robnett. 2011. “Women are more likely than men to use tentative language, aren't they? A meta-analysis testing for gender differences and moderators.” Psychology of Women Quarterly, 35(1): 129-142.

Mikolov, Thomas, Ilya Sutskever et al. 2013. “Distributed Representations of Words and Phrases and their Compositionality.” Advances in Neural Information Processing Systems. arXiv:

1310.4546.

Minter, Harriet. 14 January, 2016. “The just not sorry app is keeping women trapped in a man’s world.” The Guardian.

O'Barr, William, & Bowman Atkins. 1980. “'Women's Language' or 'powerless language'?” In McConnell-Ginet et al. (eds) Women and languages in Literature and Society. pp. 93-110.

New York: Praeger.

Pennington, Jeffrey, Richard Socher et al. 2014. “GloVe: GLobal Vectors for Word Representation.”

Pande, Mani. "Wikipedia editors do it for fun: First results of our 2011 editor survey".

Wikimedia. Wikimedia Foundation.

27

28

(38)

Ramakrishna, Anil, Nikolaos Malandrakis et al. 2015. “A quantitative analysis of gender differences in movies using psycholinguistic normatives.” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.

Rao, Delip, David Yarowsky et al. 2010. “Classifying latent user attributes in Twitter.”

Proceedings of the 2nd international Workshop on Search and Mining User-Generated Contents. 37-44.

Schmidt, Ben. 2016. wordVectors. Github repository, https://github.com/bmschmidt/

wordVectors.

Weatherall, Ann. 2015. “Sexism in language and talk-in-interaction.” Journal of Language and Social Psychology 34:4. Pp. 410-426.

Welling, Paula C. 2014. “Limited by language: words, images, and their effect on women.” (Electronic thesis or dissertation). Retrieved from https://etd.ohiolink.edu/