Efficient Recommendation for Smart TV Environment

Efficient Recommendation for Smart TV Contents

3 Efficient Recommendation for Smart TV Environment

In this paper, we propose an efficient recommendation method for smart TV environment and its overall structure is shown in Figure 1 [14]. In the method, users are first clustered into groups of similar preference patterns based on their past view records, then the user-based collaborative filtering is performed and finally the result is integrated into other recommendation method to produce the more personalized

recommendation results. T clustering step, 2) the recommendation integration

3.1 User Clustering

In this step users of simila records and information of users’ preference patterns tailored services to users. W any significant preference p our user clustering method.

Fig. 1. The structure of the

Smart TV contents ar constitutes the menu struct reflecting its service polic common categories for sm by a frequency vector of th scoring based on the freque such frequencies are bia containing many episodes, propose a new scoring tech

The recommendation process is composed of 1) the u user-based collaborative filtering step, and 3) n step.

ar preference patterns are clustered based on their view f selected contents. Particularly in smart TV environm are so diverse that user clustering is important for m Without user clustering we notice that we can hardly f patterns that apply to all users. In the following we descr

proposed recommendation method for smart TV environment

e classified by genres, content providers, etc. and ture. The menu structure is built by the service provi cy. We analyzed a few menu structures to derive 2

art TV contents. For user clustering each user is mode hose content categories. For smart TV, however, a sim ency of choices of contents may not be appropriate beca ased toward best-selling contents and series conte

and it can result a biased user clustering. In this paper hnique based on CF-IUF (category frequency-inverse u

user the wing ment more find ribe t d it ider 269 eled mple ause ents r we user

frequency) as given by equations (3) and (4), a modification of TF-IDF, which is a well-known concept for information retrieval [14].

s 1 log · log , if 0

0 , if 0

(3)

, , , (4)

Here, s represents CF-IUF of user u for category q as given in equation (3). In the equation denotes the number of views of category q by user u, represents the number of users who viewed category q, represents the total number of users. Using CF-IUF for each category, an m-dimensional (m: the number of categories) vector is created for user u as in equation (4). This vector represents the individual user’s preferences for categories and it provides an effective user modeling in an appropriate level of abstraction with a significant dimension reduction.

In user clustering users of similar preference patterns are grouped together by the ISOData algorithm, which minimizes the sum of squared errors between data points and their closest cluster centers and automatically determines an optimal number of clusters [15]. The CF-IUF user modeling not only significantly reduces the time for user clustering but also allows an effective user clustering not biased toward high frequency contents such as best-seller contents.

3.2 User-Based Collaborative Filtering

Using the information of user clustering the user-based collaborative filtering is performed for recommendation. Our method differs from the conventional approaches in that users are first clustered to create fairly loose clusters of similar users and the preference for a target user is estimated based on the preferences of similar users within the cluster which the target user belongs to. The method is efficient when the numbers of both users and items are large such as for smart TV. [12] shows using the real IPTV data that series contents containing many different episodes are recommended more than 50% of times, which is trivial and no more helpful to users in recommendation. We use contents of series level instead of episode level to avoid the problem.

In user-based collaborative filtering we use two different correlation coefficients of Pearson's correlation coefficient (PCC) and Spearman’s rank correlation coefficient (SCC) for extraction of similar users from a user cluster. PCC is a widely used similarity measure in the conventional collaborative recommendation and computed based on the number of user’s views of contents as in equation (1). SCC uses the ranks of users’ views and the coefficient between users u and v is computed as in equation (5). In the equation represents the rank of preference user u for content c, and represents the set of contents that both of users u and v selected. Here UCFP and UCFS denote the user-based collaborative filtering using PCC and the user-based collaborative filtering using SCC, respectively. K most similar users are

selected first and the preference of the target user for that content is estimated using their preferences for the target content.

∑

,

|

(5) As equation (6) shows the final preference ( ̂ ) is computed as the weighted sum of preferences of similar users to the target user. In the equation represents the set of similar users to user u and represents the weight between users u and v.

̂ ∑

∑ (6)

We use in UCFP while we use in UCFS. The estimated preference is a real number between 0 and 1 and it represents higher preference as it approaches to 1.

3.3 Recommendation Integration

By the nature of current TV environment a user corresponds to a household unit which may contain different individual users of different preferences. Currently it is difficult to identify individual users based on user log data. Accordingly, recommendation should be household level, not individual user level. In this step we integrate different recommendation methods for more personalized recommendation. It is still difficult to evaluate the performance of personalized recommendation because individual user level log data are not available. However, we propose category match ratio as a measure for personalized recommendation. Category match ratio is a ratio of frequencies that the category of a recommended content agrees to the category of content that the user currently selected. In this paper we investigate different parallel integrations using Borda count and certainty factor. In parallel integration we simply integrate the results of two or more recommendation methods to produce the final recommendation result [5,6,7].

The Borda count method adds up the scores of individual contents recommended by different methods and the scores usually are the inverse of priority. For example, suppose that content A is recommended as the first priority and the second priority, by two different methods, respectively. Then the final score associated with content A is 1+1/2 = 1.5.

Certainty factor is used to represent the uncertainty of knowledge in artificial intelligence. Certainty factor represents uncertainty of knowledge by a real number between -1 and 1 where 1 and -1 represent truth and falsity, respectively. Two certainty factors associated with the same knowledge are combined as in equation (7). In the equation C and C represent certainty factors of the same knowledge denoted by a and b, respectively and C denotes combined certainty factor.

C C C C · C , when C 0, C 0 C C C · C , when C 0, C 0 C C 1/min |C |, |C | , otherwise (7)

We use the certainty factor method to integrate different recommendation results by transforming into certainty factor in [-1, 1] preferences for each content resulted from

individual recommendation methods. Certainty factors are combined according to equation (7) and the result is transformed back into the combined preference. For example, suppose that content A is given 0.9 and 0.7 as preference scores by two different recommendation methods, respectively. The two numbers are transformed into certainty factors 0.8 and 0.4, respectively. The certainty factors are combined to produce 0.88, and it is transformed back into preference score 0.94, which becomes the combined preference.

In document A comparison of statistical machine learning methods in heartbeat detection and classification (Page 171-175)