Knowledge base - Knowledge-driven method - High-level Activity Recognition and Adaptation with

High-level Activity Recognition and Adaptation with Dynamically Available

5.4 Knowledge-driven method

5.4.1 Knowledge base

In this section, we describe how to mine the knowledge (i.e. context-activity conditional probabilities) from the websites, www.wikihow.com and www.ehow.com [116, 156]. Both of these two websites describe how to perform daily activities and involved contexts. The basic idea of this knowledge-driven method is that the probability of observing a context in an activity is related to the probability of the textual representation of the context ap-pearing in the textual description of the activity. We first crawl the websites and get the descriptive documents for each target activity class, and then identify the contexts involved in each activity using the natural language processing method. Finally, we calculate the context-activity conditional probability of each context with respect to different activities.

The mining process can be described by the following steps:

• Search the two aforementioned websites for the target activities. As illustrated in Fig-ure 5.3, the website lists multiple superlinks that redirect to the webpages that describe how to perform the activities step by step. We automatically crawl all the pages for each target activities. As we search for the target activities in the same website, the webpages that describe the activities have the same html schema. This makes it feasi-ble to automatically crawl the textual descriptions.

• When we get the textual descriptions for the target activities, natural language

process-Figure 5.3:Search for activity description.

“Here is what you'll need to make a mocha coffee drink using brewed coffee” -> ['Here', 'is', 'what',

‘you', "'ll", 'need', 'to', 'make', 'a', 'mocha', 'coffee', 'drink', 'using', 'brewed', 'coffee']

Figure 5.4:Example of tokenization

ing methods are used to extract the interesting contexts from the text. The processing of the texts from the webpage goes through the following pipeline: tokenisation, part-of-speech (POS) tagging, lowercase, stemming, WordNet filtering. We first tokenise the sentences in the texts into a list of single words (shown in Figure 5.4) so that they can be further processed by later phases.

At the second step, we tag the tokenised words with part-of-speech tags as shown in Figure 5.5. Since the contexts involved in the activities are nouns, we only select those words tagged with "NOUN" for further analysis.

We then change the capital letter into lowercase and stem the morphological variants of a word that have the similar meanings to their stemmed or root forms (e.g. standing-stand, bottles-bottle). The rationale behind these two steps is that words that have dif-ferent meaning or variants should have the unique representation in our case. Finally, since the contexts involved in the activities are objects or substances in the physical

[('Here', u'ADV'), ('is', u'VERB'), ('what', u'PRON'), ('you', u'PRON'), ("'ll", u'VERB'), ('need', u'VERB'), ('to', u'PRT'), ('make', u'VERB'), ('a', u'DET'),('mocha', u'NOUN'), ('coffee', u'NOUN'), ('drink', u'NOUN'), ('using', u'VERB'), ('brewed', u'VERB'), ('coffee', u'NOUN')]

Figure 5.5:Example of POS tagging

Figure 5.6:Example of hypernyms path

space, we used the knowledge base WordNet for filtering. In WordNet, each word has its hypernyms, and the relations between the word and its hypernyms follow the "is-a"

relationship (e.g. coffee is-a [beverage, tree, seed, brown]). For each word, we walk through its hypernyms paths, and the word is categorised as an object or a substance if the word "object" or "substance" reside in any of its hypernyms paths. Figure 5.6 shows that "coffee" is classified as an object as there are multiple hypernyms paths walking through "substance".

• After the processing phases, we get thousands of contextual terms, some of them are not discriminative and not useful for the activity recognition task. In this step, we pro-pose to find the top-k most important contexts for each activity class. Specifically, we calculate the term frequency-inverse document frequency (tf-idf) of each context term with respect to the activity classes as the measurement of the discriminative power, and choose the contexts for each activity class based on this measurement.

t f −id f_c,y = ⁿ^c,y

∑_cn_c,y ·log |{d}|

|{d: c ∈ d}| ^(5.4.1)

where n_c,y is the number of occurrences of context c in activity class y. |{d}|is the total number of collected texts describing different activity classes, and|{d : c ∈ d}|is the number of texts where context c appears. The first term _∑ⁿ^c,y

cnc,y denotes the frequency of the context in a specific activity class. If the context appears frequently in an activity

Table 5.1:Examples of context-activity conditional probability

1 make coffee 2 make tea 3 make pasta 4 make oatmeal

coffee 0.93 tea 0.89 pasta 0.85 bowl 0.69

water 0.69 water 0.87 water 0.66 mix 0.62

cup 0.68 cup 0.69 salt 0.61 oatmeal 0.55

sugar 0.45 sugar 0.50 oil 0.58 sugar 0.49

pot 0.36 leaf 0.43 sauce 0.58 oat 0.48

class y, then the whole term is larger, meaning that probability of observing the context is higher in this activity class. The second term log log_|{_d:c^|{^d_∈^}|_d_}| is the inverse document frequency for the context c. This is used to punish the context that is universal and appears in almost all documents, as it provides little discriminative power.

• Finally, we calculate the context-activity probability of those selected contexts with respect to different activity classes based on the processed descriptive texts. Specif-ically, we calculate the context-activity probability with the Naive Bayesian method.

Let P(c|y) be the context-activity probability (i.e. probability of context c occurring in documents that describe activity y), let n_k(c) be the number of texts that describe ac-tivity class y=kin which context c is observed; and let N_k be the total number of texts of that activity class. Then we can estimate the parameters of the context likelihood as,

P(c|y =k) = ⁿ^k(c)

N_k (5.4.2)

the relative frequency of documents of activity class y = k that contain context c. In practice, we use a small superparameter α for smoothing¹.

P(c|y =k) = ⁿ^k(c) +α

N_k+ |{c}|α (5.4.3)

where|{c}|is the total number of contexts. Table 5.1 shows examples of some activity classes and the related contexts with high probabilities.

In document A framework for mobile activity recognition (Page 112-115)