Detecting Dementia through Retrospective Analysis of Routine Blog Posts by Bloggers with Dementia

Full text


Detecting Dementia through Retrospective Analysis of Routine Blog Posts

by Bloggers with Dementia

Vaden Masrani Dept. of Computer Science University of British Columbia

Gabriel Murray

Dept. of Computer Information Systems University of the Fraser Valley Thalia Field

Dept. of Neurology University of British Columbia

Giuseppe Carenini Dept. of Computer Science University of British Columbia


We investigate if writers with demen-tia can be automatically distinguished from those without by analyzing linguis-tic markers in written text, in the form of blog posts. We have built a corpus of sev-eral thousand blog posts, some by people with dementia and others by people with loved ones with dementia. We use this dataset to train and test several machine learning methods, and achieve prediction performance at a level far above the base-line.

1 Introduction

Dementia is estimated to become a trillion dollar disease worldwide by 2018, and prevalence is ex-pected to double to 74.7 million by 2030 (Prince, 2015). Dementia is a clinical syndrome caused by neurodegenerative illnesses (e.g. Alzheimer’s Dis-ease, vascular dementia, Lewy Body dementia). Symptoms can include memory loss, decreased reasoning ability, behavioral changes, and – rel-evant to our work – speech and language impair-ment, including fluency, word choice and sentence structure (Klimova and Kuca,2016).

Recently, there have been attempts to combine clinical information with language analysis using machine learning and NLP techniques to aid in di-agnosis of dementia, and to distinguish between types of pathologies (Jarrold et al., 2014; Ren-toumi et al., 2014; Orimaye et al., 2014; Fraser et al., 2015; Masrani et al., 2017). This would provide an inexpensive, non-invasive and efficient screening tool to assist in early detection, treat-ment and institution of supports. Yet, much of the work to date has focused on analyzing spoken lan-guage collected during formal assessment, usually with standardized exam tools.

There has been comparatively little work done on analyzingwrittenlanguage spontaneously gen-erated by people with dementia. In coming years, there will be an increased number of tech-savvy seniors using the internet, and popular online com-mentators will continue to age. There will there-fore be a growing dataset available in the form of tweets, blog posts, and comments on social media, on which to train a classifier. Provided our writers have a verified clinical diagnosis of dementia, such a dataset would be large, inexpensive to acquire, easy to process, and require no manual transcrip-tions.

There are downsides to using written language samples as well. Unlike spoken language, writ-ten text can be edited or revised by oneself or others. People with dementia may have “good days” and “bad days,” and may write only on days when they are feeling lucid, and therefore written samples may be biased towards more intact lan-guage. Furthermore, we do not have an accompa-nying audio file and patients are not constrained to a single topic; people with dementia may have greater facility discussing familiar topics. A non-standardized dataset will also prevent the collec-tion of common test-specific linguistic or acous-tic features. However, working with a very large dataset may be able to mitigate the effects of these limitations.

In this work we gather a corpus of blog posts publicly available online, some by people with de-mentia and others by the loved ones of people with dementia. We extract a variety of linguistic fea-tures from the texts, and compare multiple ma-chine learning methods for detecting posts written by people with dementia. All models perform well above the baseline, demonstrating the feasibility of this detection task.


2 Related Work

Early signs of dementia can be detected through analysis of writing samples (Le et al., 2011; Ri-ley et al.,2005;Kemper et al.,2001). In the “Nun Study” researchers analyzed autobiographies writ-ten in the US by members of the School Sisters of Notre Dame between 1931-1996. Those nuns who met criteria for dementia had lower grammat-ical complexity scores and lower “idea density” in their autobiographies.

Le et al.(2011) performed a longitudinal analysis of the writing styles of three novelists: Iris Mur-doch who died with Alzheimer’s disease (AD), Agatha Christie (suspected AD), and P.D. James (normal brain aging). Measurements of syntactic and lexical complexity were made from 51 nov-els spanning each of the author careers. Murdoch and Christie exhibited evidence of linguistic de-cline in later works, such as vocabulary loss, in-creased repetition, and a deficit of noun tokens (Le et al.,2011).

Despite evidence that linguistic markers predic-tive of dementia can be found in writing samples, there have been no attempts to train models to classify dementia based on writing alone. Previ-ous work has been successful in training models using transcribed utterances from patients under-going formal examinations, but this data is diffi-cult to acquire and many models use audio and/or test-specific features which would not be available from online text (Rentoumi et al.,2014;Orimaye et al.,2014;Fraser et al.,2014;Roark et al.,2011). State-of-the-art classification accuracy of 81.92% was achieved by Fraser et al. (2015) with logis-tic regression using acouslogis-tic, textual, and test-specific features on 473 samples from Demen-tiaBank dataset, an American cohort of 204 per-sons with dementia and 102 controls describing the “Cookie Theft Picture”, a component of the Boston Diagnostic Aphasia Examination (Becker et al., 1994; Giles et al., 1996). More recently, these results have been extended via domain adap-tation by Masrani et al. (Masrani et al.,2017).

Our methods are similar to Fraser et al.(2015), with the main difference being the dataset used and their inclusion of audio and test-specific fea-tures, which are not available in our case. To the best of our knowledge, ours is the first compari-son of models trained exclusively on unstructured written samples from persons with dementia.

3 Experimental Design

In this section, we describe the novel blog corpus and experimental setup.

3.1 Corpus

We scraped the text of 2805 posts from 6 public blogs as described in Table 1. Three blogs were written by persons with dementia (First blogger: male, AD, age unknown. Second blogger: female, AD, age 61. Third blogger: Male, Dementia with Lewy Bodies, age 66) and three written by fam-ily members of persons with dementia to be used as control (all female, ages unknown). Other de-mographic information, such as education level, was unavailable. From each of the three demen-tia blogs, we manually filtered all texts not writ-ten by the owner of the blog (e.g. fan letters) or posts containing more images than text. We were left with 1654 samples written by persons with de-mentia and 1151 from healthy controls. The script to download the corpus is available at https: //

3.2 Classification Features

Following Fraser et al.(2015), we extracted 101 features across six categories from each blog post. These features are described below.

Parts Of Speech (14) We use the Stanford Tag-ger (Toutanova et al., 2003) to capture the fre-quency of various parts of speech tags (nouns, verbs, adjectives, adverbs, pronouns, determin-ers, etc). Frequency counts are normalized by the number of words in the sentence, and we report the sentence average for a given post. We also count not-in-dictionary words and word-type ra-tios (noun to verb, pronoun to noun, etc).

Context Free Grammar (45) Features which count how often a phrase structure rule occurs in a sentence, including NP→VP PP, NP→DT NP, etc. Parse trees come from the Stanford parser (Klein and Manning,2003).

Syntactic Complexity (28) Features which measure the complexity of an utterance through metrics such as the depth of the parse tree, mean length of word, sentences, T-Units and clauses and clauses per sentence. We used the L2 Syntactic Complexity Analyzer (Lu,2010).


URL Posts Mean words Start Date Diagnosis 618 242.22 (s=169.42) Dec 2003 AD 344 263.03 (s=140.28) Sept 2006 AD 692 393.21 (s=181.54) May 2009 Lewy Body 201 803.91 (s=548.34) Mar 2012 Control 452 615.11 (s=206.72) Jan 2008 Control 498 227.12 (s=209.17) Sept 2009 Control

Table 1: Blog Information.

processing and learnability (Salsbury et al.,2011). We used five psycholinguisic features: Familiar-ity, Concreteness, Imageability, Age of acquisi-tion, and the SUBTL , which is a measure of the frequency with which a word is used in daily life (Kuperman et al., 2012;Brysbaert and New, 2009a; Salsbury et al., 2011). Psycholinguis-tic word scores are derived from human ratings1

while the SUBTL frequency norm2 is based on

50 million words from television and film subti-tles (Brysbaert and New,2009b).

Vocabulary Richness (4) We calculated four metrics which capture the range of vocabulary in a text: type-token ratio, Brunet’s index, a length insensitive version of the type-token ratio, Hon-ore’s statistic, and the moving-average type-token ratio (MATTR) (Asp and De Villiers,2010; Cov-ington and McFall, 2010). These metrics have been shown to be effective in previous AD re-search (Bucks et al.,2000;Fraser et al.,2015)

Repetitiveness (5) We represent sentences as TF-IDF vectors and compute the cosine similarity between sentences. We then report the proportion of sentence pairs below three similarity thresholds (0, 0.3, 0.5) as well as the min and average cosine distance across all pairs of sentences.

3.3 Training and Testing

We perform a 9-fold cross validation by training each model on all the posts of four blogs and test-ing on the remaintest-ing two, where we assure that each test set contains the posts of one control blog and one dementia blog. Within each fold we per-form a feature selection step before training where we select for inclusion into the model the first k features which have the highest absolute correla-tion with the labels in the training fold.




4 Results

For each machine learning model, we calculate the ROC curve and the area under the curve (AUC), comparing with a random performance baseline AUC of 0.5. The AUC results are shown in Fig-ure1, with all models well above the baseline of 0.5. The best performing models are logistic re-gression and neural networks, with average AUC scores of 0.815 and 0.848, respectively.

The SUBTL measure of vocabulary richness was the feature most correlated with the outcome variable in eight out of nine folds. Figure2shows the SUBTL scores for each blog post in the cor-pus, arranged by blog and with the bloggers with dementia shown in the top row. A lower score in-dicates a richer vocabulary. We can see that the bloggers with dementia have a less rich vocabu-lary. Interestingly, however, the longitudinal trend does not show their vocabularies worsening during the time-period captured in this corpus. The analy-sis of other features highly informative for the tar-get prediction is ongoing, and additional findings will be discussed at the workshop.

5 Conclusion


Figure 1: Comparison of models. We show the mean AUC and 90% confidence intervals across a 9-fold CV. All the posts of a blog appear in either the training or test set, but not both.



Elissa D Asp and Jessica De Villiers. 2010.When lan-guage breaks down: Analysing discourse in clinical contexts. Cambridge University Press.

James T Becker, Franc¸ois Boiler, Oscar L Lopez, Ju-dith Saxton, and Karen L McGonigle. 1994. The natural history of alzheimer’s disease: description of study cohort and accuracy of diagnosis. Archives of Neurology51(6):585–594.

Marc Brysbaert and Boris New. 2009a. Moving be-yond Kuˇcera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure

for american english. Behavior Research Methods


Marc Brysbaert and Boris New. 2009b. Moving be-yond kuˇcera and francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure

for american english. Behavior research methods


Romola S Bucks, Sameer Singh, Joanne M Cuer-den, and Gordon K Wilcock. 2000. Analysis of spontaneous, conversational speech in dementia of alzheimer type: Evaluation of an objective

tech-nique for analysing lexical performance.


Michael A Covington and Joe D McFall. 2010. Cut-ting the gordian knot: The moving-average type– token ratio (mattr). Journal of Quantitative Linguis-tics17(2):94–100.

Kathleen C Fraser, Graeme Hirst, Naida L Graham, Jed A Meltzer, Sandra E Black, and Elizabeth Ro-chon. 2014. Comparison of different feature sets for identification of variants in progressive aphasia.

ACL 2014page 17.

Kathleen C Fraser, Jed A Meltzer, and Frank Rudz-icz. 2015. Linguistic features identify alzheimers disease in narrative speech. Journal of Alzheimer’s Disease49(2):407–422.

Elaine Giles, Karalyn Patterson, and John R Hodges. 1996. Performance on the boston cookie theft pic-ture description task in patients with early dementia of the alzheimer’s type: missing information. Apha-siology10(4):395–408.

William Jarrold, Bart Peintner, David Wilkins, Dim-itra Vergryi, Colleen Richey, Maria Luisa Gorno-Tempini, and Jennifer Ogar. 2014. Aided diagnosis of dementia type through computer-based analysis

of spontaneous speech. InProceedings of the ACL

Workshop on Computational Linguistics and Clini-cal Psychology. pages 27–36.

Susan Kemper, Lydia H Greiner, Janet G Marquis, Katherine Prenovost, and Tracy L Mitzner. 2001. Language decline across the life span: findings from

the nun study. Psychology and aging16(2):227.

Dan Klein and Christopher D Manning. 2003.

Accu-rate unlexicalized parsing. In Proc. of ACL,

Sap-poro, Japan. Association for Computational Lin-guistics, pages 423–430.

Blanka Klimova and Kamil Kuca. 2016. Speech and

language impairments in dementia. Journal of

Ap-plied Biomedicine14(2):97–103.

Victor Kuperman, Hans Stadthagen-Gonzalez, and Marc Brysbaert. 2012. Age-of-acquisition ratings

for 30,000 english words. Behavior Research


Xuan Le, Ian Lancashire, Graeme Hirst, and Regina Jokel. 2011. Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three british novelists. Literary and Linguistic Computing26(4):435–461.

Xiaofei Lu. 2010. Automatic analysis of syntactic

complexity in second language writing.

Interna-tional Journal of Corpus Linguistics15(4):474–496.

Vaden Masrani, Gabriel Murray, Thalia Field, and Giuseppe Carenini. 2017. Domain adaptation for

detecting mild cognitive impairment. In Proc. of

Canadian AI, Edmonton, Canada.

Sylvester Olubolu Orimaye, Jojo Sze-Meng Wong, and Karen Jennifer Golden. 2014. Learning predictive linguistic features for alzheimer’s disease and re-lated dementias using verbal utterances. InProc. 1st Workshop. Computational Linguistics and Clinical Psychology (CLPsych).

Martin James Prince. 2015. World Alzheimer Report

2015: the global impact of dementia: an analysis of prevalence, incidence, cost and trends. London.

Vassiliki Rentoumi, Ladan Raoufian, Samrah Ahmed, Celeste A de Jager, and Peter Garrard. 2014. Fea-tures and machine learning classification of con-nected speech samples from patients with autopsy proven alzheimer’s disease with and without

addi-tional vascular pathology. Journal of Alzheimer’s


Kathryn P Riley, David A Snowdon, Mark F Desrosiers, and William R Markesbery. 2005. Early life linguistic ability, late life cognitive function, and

neuropathology: findings from the nun study.

Neu-robiology of aging26(3):341–347.

Brian Roark, Margaret Mitchell, John-Paul Hosom, Kristy Hollingshead, and Jeffrey Kaye. 2011. Spo-ken language derived measures for detecting mild

cognitive impairment. IEEE Transactions on

Au-dio, Speech, and Language Processing19(7):2081– 2090.

Tom Salsbury, Scott A Crossley, and Danielle S McNa-mara. 2011. Psycholinguistic word information in

second language oral discourse. Second Language



Table 1: Blog Information.

Table 1:

Blog Information. p.3
Figure 1: Comparison of models. We show the mean AUC and 90% confidence intervals across a 9-foldCV

Figure 1:

Comparison of models. We show the mean AUC and 90% confidence intervals across a 9-foldCV p.4
Figure 2: SUBTL word scores for each post in a given blog. Bloggers with dementia (AD or Dementiaw/ Lewy Bodies) appear in the top row.

Figure 2:

SUBTL word scores for each post in a given blog. Bloggers with dementia (AD or Dementiaw/ Lewy Bodies) appear in the top row. p.4