2.4 User Modelling
2.4.3 Learning and Inferences
Adaptive systems often employ automated procedures that have their origins in the fields of Artificial Intelligence and Machine Learning to build a user model. These
procedures and techniques enable a system to learn about individual users and make inferences and decisions about them. This section reviews several computation paradigms that have been employed in different systems, including classification learning, collaborative filtering, stereotypes (Jameson, 2009), and others.
2.4.3.1
Classification Learning
A broad category of machine learning techniques called classification learning, in the family of supervised learning algorithms in Machine Learning, has been em- ployed in adaptive systems. A variety of methods have been developed that fall into this branch of learning, including decision trees, probabilistic classifiers, neu- ral networks and others (Han and Kamber, 2001; Langley, 1996; Mitchell, 1997; Webb et al., 2001).
In a nutshell, in a classification problem - a classifier - attempts to identify a set of categories or sub-populations to which a new observation belongs, based on a training set of data that contains observations whose category membership is known (i.e. labelled data). The learning process starts with a set of training examples (i.e., features). To determine, for example, whether one’s favourite genre is jazz, features could include the number of times visited to jazz bars, online activity searching for different genres and so forth. Each training example is classified, i.e. a label that corresponding to a specific category is assigned. For example, labels could be jazz or not jazz in a binary classification problem when predicting jazz genre, or jazz, rock or heavy metal in a three-class classification problem. Once the training set is generated, the process learns a classifier, which is a model capable of assigning a new item (new observation) to one of the same set of categories. The classifier, however, cannot be always certain about the assignment, thus some methods yield a set of probabilities for each category indicating degrees of certainty (accuracy, prediction and recall).
An example of a classifier, from the family of probabilistic classifiers, is the Bayesian network (BN). It represents probabilistic relationships among variables of interest (Heckerman, 1998). Simply, it relates the current probability to prior probability (i.e. express quantitatively how to update prior information given new
2.4. User Modelling 37 evidence). Bayesian networks have been increasingly used to infer users’ goals and model users’ preferences and needs (Horvitz et al., 1998). An example of using the Bayes theorem for user profiling is the Lumiere project (Horvitz et al., 1998), which is described earlier in the related work. Another example is the work presented in (Garc´ıa et al., 2005) where BN used to detect and model students’ learning styles in a Web-based education system. The Na¨ıve Bayes classifier is a special case of Bayesian networks and has also been used for user profiling. It is based on the Bayes theorem with na¨ıve independence assumptions between the fea- tures. The News Dude (Billsus and Pazzani, 1999), a news recommendation system, learns the long-term interest profile by using NBC. Other examples are the systems Syskill&Webert (Pazzani et al., 1996) and Personal WebWatcher (Mladenic, 1996), which use NBC to detect users’ interests whilst browsing the web.
Another example of methods used in classification problems is decision trees. Decision trees, however, are known to suffer from the bias and variance tradeoff (i.e. large bias with simple trees and a large variance with complex trees). The bias is an error due to simplistic assumptions made by the algorithm during the learning process which in turn can lead to the model underfitting the data. The variance is an error due to too much complexity in the learning algorithm which can lead to the algorithm being over sensitive to high degrees of variation in the training data set, and thus can lead to model overfitting. It is important, therefore, to handle the bias- variance tradeoff. Ensemble methods have, therefore, been developed, which com- bine several decision trees to produce better predictive performance than utilising a single tree. They utilise a divide-and-conquer approach to improve performance by creating a group of weak learners that combine together to form a strong learner. Two general techniques developed to perform ensemble trees are: the bagging (or bootstrap aggregation) and boosting. A widely used classifier is the Random For- est (RF) (Breiman, 2001), which uses the bootstrapping method for training and testing, and decision trees for prediction. In other words, the algorithm ‘overfits’ a subsample of the training data set and then reduces the overfit by averaging the predictors. Another popular classifier is XGBoost (Friedman, 2002), which uses a
boosting method wherein a new classifier is added at a time in order that the next classifier is trained to improve the existing ensemble.
Classification problems, however, cannot be employed under all circum- stances. They require, first, that a labelled training set can be obtained and, second, that there is an adequate number of training examples available before the classi- fier begins to make inferences about new observations (Webb et al., 2001). Those two prerequisites in a classification learning method may be hard to fulfil in some cases. Alternative techniques such as expert labelling or methods that handle small datasets have been proposed to address those limitations.
A large body of research has also focused on modelling users directly from their actions with specific user interface elements, clickstream behaviour (Wang et al., 2016), usage patterns (Dev and Liu, 2017) and others, often without any prior knowledge about users. Sophisticated methods of user modelling involving super- vised learning techniques have also been demonstrated in different research works. In the domain of news, for example, Billsus and Pazzani (2000) inferred users’ news preferences from interaction data, specifically on desktop environments. In the mo- bile environment, research works have also been demonstrated in relation to search engines (Bertini et al., 2005), news articles preference (Carreira et al., 2004), and using function usage histories to refine menu displays (Fukazawa et al., 2009).
2.4.3.2
Collaborative filtering
The paradigm of Collaborative Filtering (CF) has been widely illustrated in recom- mender systems in a variety of domains including movies, music, news, and has been proven as an appropriate technique for recommending content. The principal idea of this paradigm lies on the automatic prediction (filtering) of a user’s interests by collecting preferences to which other users have previously expressed an interest (collaborating) (Schafer et al., 2007; Das et al., 2007; Linden et al., 2003; Sarwar et al., 2001). A more detailed survey of CF techniques was conducted by Su and Khoshgoftaar (2009) in which they discussed the main challenges such as data spar- sity and scalability and they presented three main categories of CF techniques: the memory-based, model-based and hybrid. A movie recommender system (Melville
2.4. User Modelling 39 et al., 2002), for example, will generate recommendations based on how other users have rated movies rather than on the user’s interests and preferences.
Although this idea is intriguing, collaborative filtering techniques suffer from the same problems as classification learning. For example, this method cannot be applicable if too few users have previously rated or annotated items with their pref- erence. Further, if the other users’ interests and preferences are fresh and relevant to the user now, they might not be relevant by the time the algorithm obtains enough responses from other users to generate recommendations. In a nutshell this is a major challenge in CF, known as the cold-start problem. The cold-start problem describes the difficulty of recommendations when the users or items are new. For example, it is difficult to initiate an accurate recommendation when no or very few interests/ratings are available to infer the interest/rating of a new user (Schein et al., 2002). To overcome these limitations many techniques have been proposed by em- ploying additional information such as demographic information of a new user (Loh et al., 2009; Kim et al., 2010) or content information for the new item (Leung et al., 2008).
2.4.3.3
Stereotypes
The use of stereotypes in user modelling is one of the oldest paradigms that have been proposed but a less widely used. It has its origin in the works of Elaine Rich (1979, 1989), and in this approach an individual user is associated with a class of users and facts about the class are then attributed to the individual.
The general approach in the previous paradigms has been the acquisition (learning) of a user model at a feature-level. Significantly, this approach is en- tirely data-driven (i.e., bottom up) and makes use of virtually no “general knowl- edge about users, their goals, or the items that they are dealing with” (Jameson, 2009). An alternative, top-down approach is to use prior knowledge or theoretical models, to determine a category (stereotype) for a user. This approach enables in- ferences about other characteristics of a user such as task expertise and personality traits; it supports reasoning about adaptation of sets of interface features and sets of interface variants. It enables user interface adaptation based on matching com-
plete user interface variants to particular user categories. Specifically, it provides a mapping of specific features or attributes to one of the stereotypes. The emphasis is less on sophisticated computation than on realistic specification of the content of the stereotypes and the rules for activating them (Jameson, 2009). This stereo- typical approach has been applied to user modelling in natural dialogues (Carberry et al., 2012), accessible systems for users with disabilities (Stephanidis et al., 1998), digital guides for museum visitors (Kuflik et al., 2012). For example, Kuflik et al. (2012) in order to provide personalised information presentation in the context of mobile museum guides they monitored and modelled users’ visiting patterns. The modelling was achieved using a known museum visiting style classification (Veron and Levasseur, 1983) in order to classify the visiting style of visitors as they start their visit. Another example is the Europeana project for cultural heritage that of- fers a typology of cultural heritage objects (Haslhofer and Isaac, 2011; Oomen and Aroyo, 2011).
This thesis explores the combination of top-down (stereotypes) and bottom-up (data-drive) approaches, and makes use of knowledge from stereotypes to inform the user modelling procedure.