Vol 10, No 1 (2014)

(1)

IJCSBI.ORG

ISSN: 1694-2108 | Vol. 10, No. 1. FEBRUARY 2014 41

Improving Recommendation Quality

with Enhanced Correlation Similarity

in Modified Weighted Sum

Khin Nila Win

Facutly of Information and Communication Technology,

University of Technology

Yatanarpon Cyber City

Thiri Haymar Kyaw

Facutly of Information and Communication Technology,

University of Technology

Yatanarpon Cyber City

ABSTRACT

Recommender systems aim to help users in finding the items of their interests from large data collections with little effort. Those systems use various recommendation approaches to provide accurate recommendation more and more. Among them, collaborative filtering approach is the most widely used approach in recommender systems. In the two types of CF system, item-based CF systems overtake the traditional user-based CF systems since it can overcome the scalability problem of the user-based CF. Item-based CF system computes the prediction of the user tastes on new items based on the item similarity result from the explicit rating of the users. They predict rating on the new items based on the historical ratings of the users. The proposed system improves the item-based collaborative filtering approach by enhancing the similarity of rating on items with demographic similarity of the items. It modifies one of the prediction methods, weighted sum, weighted by enhanced similarity of the items. This system intends to offer better prediction quality than other approaches and to produce better recommendation results as a result of considering item-demographic similarity with similarity result from explicit rating of the user.

Keywords

Recommender systems, collaborative filtering approach, item-based CF system, user-based CF systems, demographic similarity, weighted sum.

1. INTRODUCTION

(2)

IJCSBI.ORG

motivation of web mining is to discover users‟ access models automatically and quickly from the vast amount of Web log data, such as frequent access paths, frequent access page groups and user clustering. More recently, Web usage mining has been proposed as an underlying approach for Web personalization. The goal of personalization based on Web usage mining is to recommend a set of objects to the current (active) user, possibly consisting of links, ads, text, products, or services, tailored to the user‟s perceived preferences as determined by the matching usage patterns [1].

2. MEMORY-BASED TECHNIQUES IN RECOMMENDER

SYSTEMS

Memory-based techniques continuously analyze all user or item data to calculate recommendations, and can be classified in following main groups: Collaborative Filtering, Content-based techniques, and Hybrid techniques [2]. While content-based techniques base their recommendations on individual information and ignore contributions from other users, collaborative filtering system emphasizes on the preferences of similarity users or items for their recommendations. Since the proposed system uses collaborative filtering techniques, explanations of other techniques are omitted in this paper and analysis of collaborative filtering techniques are emphasized.

2.1 Collaborative Filtering Techniques (CF)

This approach recommends items that were used by similar users in the past; they base their recommendations on social, community driven information (e.g., user behavior like ratings or implicit histories).

Table 1. Special types and special characteristics of Memory-based CF Techniques

Special type of Memory-based CF

techniques

Pros Cons

-Neighborhood-based CF

- Item-based/user-based top-N recommendations

-easy to implement - easy for addition if new data

-no need to consider the content of

the items in recommendation

- reliant on human ratings - dispersed amount of data may be impact on performance

are sparse - problems in

recommendation for new users and items

- scalability limitation for large

(3)

IJCSBI.ORG

Memory-based collaborative filtering techniques have special characteristics and representative techniques. Table 1 describes the pros and cons of memory-based CF techniques [2].

In user-based CF algorithms, first it finds a set of k similar users of the target user based on correlations or similarities between user records and the target user. Then, it produces a prediction value for the target user on unrated items based on the similar users‟ ratings. This approach suffer scalability problem in large-scale recommender system.

In contrast, item-basedCF algorithms attempt to find k similar items that are co-rated by different users similarly. This performs similarity computations among the items. Thus, item-based CF algorithms avoid the bottleneck in user-based algorithms by first considering the relationships among items. For a target item, predictions can be generated by taking a weighted average of the target user‟s item ratings on these similar items [3, 6].

2.1.1 Similarity Computation

Most of the recommender systems usually use three similarity computing techniques: Cosine-based Similarity, Correlation-based Similarity, and Adjusted Cosine Similarity. The proposed system uses adjusted cosine similarity for similarity computation.

2.1.1.1 Adjusted Cosine Similarity Vs. Modified Adjusted Cosine Similarity 1) Adjusted Cosine Similarity

Computation of similarity value using basic cosine measure in item-based recommendation system has one important weakness since the differences in rating scale between different users are not taken into account. The adjusted cosine similarity subtracts the corresponding user average from each co-rated pair to offset this drawback. However, it has one drawback- the different rating styles of the different users are not taken into account.

Adjusted cosine similarity finds the subtraction value of the rate value of user u on items i and j respectively and his/her average ratings. Then, it computes the similarity value as shown in Eq. (1).

(1)

In Eq. (1),

is the average value of the u-th user‟s ratings [4]. 2) Modified Adjusted Cosine Similarity

(4)

IJCSBI.ORG

Adjusted cosine similarity still ignores the casual rating styles of the user. For this reason, the proposed system improves the computation by normalizing the rate values.

Table 2. Enhanced Correlation Similarity Values Vs. Simple Modified Adjusted Cosine Similarity Values

Modified Adjusted Cosine Similarity

(simi,j)

Demographic Similarity or

Content Similarity of

Items (dem_corij)

Enhanced Correlation Similarity

(enh_corij=simi,j,+(simi,j*de

m_corij))

0.5 0.2 0.6

0.3 0.4 0.42

0.6 0.2 0.72

0.4 0.8 0.72

0.5 0.5 0.75

0.8 0.1 0.88

0.7 0.3 0.91

(5)

IJCSBI.ORG

In Eq. (2),

is the average value of the u-th user‟s ratings

(3)

In Eq. (3),

HS means highest rating scale of the system

HRu means highest rating scale of the current user

Considering the topic similarity of item,

Where,

simij means the similarity of item i and item j from the adjusted cosine

similarity after normalizing the user‟s rating behaviour, dem_corij means the

similarity of the item i and item j according to the topic similarity.

Table 2 describes the way of computing to get enhanced correlation similarity and also demonstrates how the demographic similarity improves the modified adjusted similarity value.

2.1.2 Prediction Computation

To get the recommendation, recommender systems always compute the prediction value firstly and then recommend the item according to the prediction values. Among them, weighted sum is one of the widely used techniques for prediction. However, it uses only the rating-based similarity of the item. The proposed system enhances weighted sum techniques by using enhanced correlation similarity instead of adjusted cosine similarity value. Enhanced correlation similarity is the similarity value in which the modified adjusted cosine similarity value is enhanced with demographic similarity of the two items.

2.1.2.1 Weighted Sum Vs. Modified Weighted Sum 1) Weighted Sum

The prediction value of weighted sum technique is computed by computing the summation of the ratings of the user on the items similar to i. Each rating of user is weighted by the corresponding similarity si,j between items i

and j. Eq. 4 denotes the formula for prediction computation with weighted sum.

(4)

i u u i

u _HR R

HS NR

, ,  

ij ij

ij

ij sim sim dem cor

cor

enh_    _

u R







N items

allsimilar iN N u iN N

items allsimilar

sim

R

sim

i

u

P

,

, ,

)

(

)

(

(6)

IJCSBI.ORG

2) Modified Weighted Sum

In Modified Weighted Sum in Eq. (5), each normalized rating, NRu,N in Eq.

(6), is weighted by the enhanced correlation similarity enh_coriN. The

prediction Pu,i is denoted as

(5)

In Eq. (5),

(6)

Modifying the weighted sum by enhanced correlation similarity performs the prediction more accurately than the existing systems. Each of the systems considering the item demographic data produces the prediction quality more than 9% higher than the systems which do not consider the item demographic data.

3. RELATED WORKS

Recommendation techniques are applied in many areas in the mid-1990. Some researchers develop recommender systems for various songs. Popular music recommendation systems in the early 2000 are [7], [8], [9], [10]. In e-learning systems, web mining techniques are used to learn all available information about learners and build models to apply in personalization. A detailed description about using and applying educational data mining was given in (Romero et al., 2006) and (Romero et al., 2007) [11]. Many resources and supported techniques such as [12], [13] are developed for recommendation and personalization.

There have been many collaborative systems developed in the academia and the industry. Grundy system [14] was the first recommender system, which proposed to use stereotypes as a mechanism for building models of users based on a limited amount of information on each individual user. Later on, the Tapestry system relied on each user to identify like-minded users manually [15]. GroupLens [16, 17], Video Recommender [18], and Ringo [19] were the first systems to use collaborative filtering algorithms to automate prediction.

4. REAL RECOMMENDER SYSTEM

Most of the earlier learning resources recommender systems find the problems in determining the recommended pages accurately since they







N items

a llsimila r iN

N u iN

N items a llsimila r

cor

enh

NR

cor

enh

i

u

P

,

, ,

)

_

(

)

_

(

,

N u u

N

u

_HR

R

HS

NR

_,

(7)

IJCSBI.ORG

ignore the rating style of the current user. The proposed system, Recommender System for Resources and Educational Assistants for Learners, overcomes this challenge by normalizing the current user's rating style. And in the section of similarity computation, the system considers the rating similarity accompanying with topic similarity of resources pages. To avoid the cold-start problem for users earlier system encountered, the proposed system uses stereotypes or demographic CF. As a result, the system takes advantages of not only item-based CF and but also stereotypes or demographic CF. Moreover, the system can avoid the scalability and quality bottleneck of the user based CF since it uses item-based collaborative filtering techniques.

Modifying adjusted cosine similarity with normalized rating of users and modifying weighted sum with enhanced correlation similarity are not only able to determine accurately which the user's most likes but also able to produce the higher prediction quality than the systems which do not consider the item demographic data and only emphasize the rating of the users. The system can reduce mean absolute error (MAE) between the predicted ratings and actual ratings of the users due to the advantages of modified adjusted cosine similarity and modified weighted sum.

5. CASE STUDY OF RESOURCES AND EDUCATIONAL

ASSISTANTS RECOMMENDATION

The following tables show the case study of resources and educational assistants recommendation. Table 3 shows all links current user u has rated in the first column and the links in second column are the links need to be predicted for current user since they are the links current user has not rated.

Table 3. The links which current user has rated and other links which current user has not rated but other users has rated

The links current user has rated

The links current user has not rated but other users

has rated IEEE seminar topics on

networking 2011-2012 Social Networking Electronics &

Communication Project Topics

LAN Monitoring and Controlling

Network Books of Free Computer Books

LAN & WAN IPv6

JavaWorld:Solutions for Java Developers

(8)

IJCSBI.ORG

The data in Table 4 describes the respective co-rated links with the links to be predicted. Fig 1 distinguishes that four links are the links the current user has just rated but other three links has not among the co-rated links with the predicted link, LAN & WAN. In Fig 2, there are three co-rated links the current user has already rated and four links that has not. Unfortunately, there is no co-rated links the current user has rated in Fig 3, 4, and 5. According to this result, these three links may not be possible the current user‟s interested links. Finally, the system recommends the two links, LAN & WAN and IPv6 according to the prediction values.

Table 4. Predicted links with their similar links

The links to predict

for current user The links similar to the link to be predicted LAN & WAN Social Networking

LAN Monitoring and Controlling

Network Books of Free Computer Books Unified Communications of Infoworld Networking of Infoworld

Social Hubs, IPv6

IPv6 Network Books of Free Computer Books Social Networking

IEEE seminar topics on networking 2011-2012 LAN & WAN

Mobile Java Java & XML Java Security JavaWorld:Solutions

for Java Developers

Core Java Java & XML

Web Services & SOAs Swing/GUI Programming Java Security

Mobile Java Core Java Java Security

JavaWorld:Solutions for Java Developers LAN & WAN

Network Books of Free Computer Books Core Java Mobile Java

Swing/GUI Programming Docjar

(9)

IJCSBI.ORG

Fig. 1 Fig. 2

Fig. 3 Fig. 4

Fig. 5 Fig. 6 Fig. 1 - Fig. 5. Co-rated links for the respective predicted links

(10)

IJCSBI.ORG

6. EVALUATION OF THE SYSTEM

The recommender system can be evaluated by comparing recommendations with a test set of known user ratings. These systems are measured using predictive accuracy metrics [5, 6], where the predicted ratings are directly compared with actual user ratings. The most commonly used metric is Mean Absolute Error (MAE) which is the average absolute difference between predicted ratings and actual ratings. Eq. 7 denotes the computation of MAE value.

(7)

In Eq. (7),

Pu,i is the predicted rate value of user u on item i,

ru,i is the actual rate value of user u on item i,

N is the amount of ratings in the test set.

The proposed system can reduce MAE by applying both demographic correlation and rating similarity of items.

6.1.1 Comparison of MAE Values

The following table compares MAE between the system which uses adjusted cosine for similarity computation and weighted sum for prediction computation and the proposed system.

Table 5. Comparison of MAE Values

MAE Values For

Existing System with Adjusted Cosine

and Weighted Sum

MAE Values For Proposed System

1.48 0.68

1.6 1.045

2.987 2.635

1.96 1.93

1.92 1.87

N

r

P

MAE



u i u i u i

(11)

IJCSBI.ORG

ISSN: 1694-2108 | Vol. 10, No. 1. FEBRUARY 2014 51 7. CONCLUSIONS

Recommendation systems are very popular in research area. Those systems are applied in many areas such as book recommendation [1], movie recommendation, as well as music recommendation. However, there are few recommendation systems for learning resources. Recommendation techniques are sometimes used in e-learning systems. These systems are mostly intended for providing convenient for learners in accessing learning resources provided by the systems. Such systems are based on user log data, rarely based on rating-based. Moreover, due to the fact that most of the learning resources provided in such systems are e-books and audio/video lectures, the proposed system intends to fulfil the lack of resources. The learning resources in the proposed system are not only e-books and audio/video lectures but also educational hyperlinks from the web. Topic of these links are related with various fields such as information and communication technology, computer sciences, digital signal processing, personalized information management, security challenges in mobile network, and management in cloud services. Moreover, appropriate international scholar universities are recommended according to the user profiles. Finally, the proposed system intends not only to fulfill the lack of resources for learners by providing the rich topics from the web but also to offer more accurate prediction quality by proposed methods.

REFERENCES

[1] Bamshad Mobasher, DePaul University, “Web Usage Mining and Personalization”

[2] Hendrik Drachsler*, Hans G.K. Hummel and Rob Koper, “Personal recommender systems for learners in lifelong learning networks: the requirements,techniques and model”

[3] Good, N., Schafer, J.B., Konstan, J.A., Borchers, A., Sarwar, B., Herlocker, J., Riedl, J.: “Combining collaborative filtering with personal agents for better recommendations”. Proceedings of AAAI 99 (1999) 439-446

[4] BadrulSarwar, George Karypis, Joseph Konstan, and John Riedl, “Item-based Collaborative Filtering Algorithms”,ACM 1-58113-348-0/01/0005, May 1-5, 2001, Hong Kong.

[5] Andrew I. Schein, AlexandrinPopescul, Lyle H. Ungar, and David M. Pennock,"Methods and metrics for cold-start recommendations", SIGIR ‟02:Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 253–260, New York,NY, USA, 2002. ACM.

[6] L. Candillier, F. Meyer, F. Fessant, and K. Jack, “State-of-the-art recommender systems,” 2009.

[7] Shao, B., Wang, D., Li, T., and Ogihara, M. (2009). Music recommendation based on acoustic features and user access patterns. IEEE Transactions on Audio, Speech And Language Processing, 17(8):1602–1611.

(12)

IJCSBI.ORG

[9] Music recommendation from song sets. In 5th International Conference on Music Information Retrieval, pages 425–428.

[10]Hu, Yajie, "A Music Recommendation System Based on User Behaviors and Genre Classification" (2012). Open Access Theses. Paper 336.

[11]Khribi, M. K., Jemni, M., & Nasraoui, O. (2009). Automatic Recommendations for E-Learning Personalization Based on Web Usage Mining Techniques and Information Retrieval. Educational Technology & Society, 12 (4), 30–42.

[12]Brusilovsky, P., & Henze, N. (2007). Open Corpus Adaptive Educational Hypermedia. In P. Brusilovsky, A. Kobsa & W. Nejdl (Eds.), The Adaptive Web: Methods and Strategies of Web Personalization (pp. 671-696). Heidelberg, Germany: SpringerVerlag.

[13]Brusilovsky, P., Sosnovsky, S., & Shcherbinina, O. (2005). User Modeling in a Distributed E-Learning Architecture. In L. Ardissono, P. Brna & M. A. (eds.),

Proceedings of 10th International Conference on User Modeling (UM'2001), Edinburgh, UK (pp.387-391).

[14]Rich, E. User Modeling via Stereotypes. Cognitive Science, 3(4):329-354, 1979.

[15]Goldberg, D., D. Nichols, B. M. Oki, and D. Terry. “Using collaborative filtering to weave an information tapestry”, Communications of the ACM, 35(12):61-70, 1992.

[16]Konstan, J. A., B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl. GroupLens: “Applying collaborative filtering to Usenet news”. Communications of the ACM, 40(3):77-87, 1997.

[17]Resnick, P., N. Iakovou, M. Sushak, P. Bergstrom, and J. Riedl. GroupLens: “An open architecture for collaborative filtering of netnews”. In Proceedings of the 1994 Computer Supported Cooperative Work Conference, 1994.

[18]Hill, W., L. Stead, M. Rosenstein, and G. Furnas. “Recommending and evaluating choices in a virtual community of use”. In Proceedings of CHI’95.

[19]Shardanand, U. and P. Maes. Social information filtering: “Algorithms for automating „word of mouth‟”. In Proc. of the Conf. on Human Factors in Computing Systems, 1995.

This paper may be cited as: