A CASE STUDY ON THE PERFORMANCE OF PERSONALIZED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHMS

(1)

251 | P a g e

A CASE STUDY ON THE PERFORMANCE OF PERSONALIZED COLLABORATIVE FILTERING

RECOMMENDATION ALGORITHMS

M. Sreedhar Redddy

Scholar,Samskruti College of Engineering and Technology (Affiliated toJNTUH)Hyderabad(India)

ABSTRACT

Collaborative filtering recommender systems (CFRS) systems known preferences of a group of users to make recommendations or predictions of the unknown preferences for other user shave been confirmed to be extremely efficient for personalized and precise recommendations. CFRS systems are based on the recommendations of preceding rankings by a variety of users for a range of products. The research on Collaborative filtering recommender systems, diverse systems has been developed for diversified domains and its functional areas. In addition Collaborative filtering recommender systems algorithms have possession of additional features that construct them different from additional systems. During careful lessons of a large number of Collaborative Filtering recommender algorithms, hierarchies of most important characteristics supports on which one can estimate dissimilar Collaborative Filtering recommended algorithms are proposed.

The method is functional to evaluate Mean Absolute Error (MAE) with factual data. An assessment scheme is also derived based on the forecast worth for the specified features. The outcomes are evaluated throughout investigational tests and results were tabulated, observes the outcome of dissimilar parameters based on the presentation of the algorithm. A proportional investigation has been evaluated with a range of collaborative filtering recommended techniques are found that hybrid collaborative algorithm approach is performs better than other collaborative filtering algorithms.

Keywords: Collaborative Filtering Recommended Algorithm, Mean Absolute Error, Hybrid Collaborative Algorithm

I. INTRODUCTION

The volatile expansion and ease of use of data on the World Wide Web has caused information surplus.

Searching for a query is not easy in the sources of information available for the interest of an individual user.

Recommender system form a specific type of information filtering technique that attempts to present information items that are likely of interest to the user. A recommender system compares the user‟s profile to some reference characteristics, and seeks to predict the „rating‟ that a user would give to an item that they had not yet considered. These characteristics may be from the information items or user‟s social environment Recommender systems are designed to help a user to survive with this situation by selecting a small number of

(2)

252 | P a g e options to present the user. Recommender systems helps to users to overcome information overload by providing personalized suggestions based on a history of a user's likes and dislikes. On-line web portals like Amazon.com [6];eBay.com [7]; Netflix.com [8] etc. are providing these recommending services to know about the personalized interests.Collaborative filtering recommender systems [1] recommend items by identifying other users with similar taste and use their opinions for recommendation; whereas content based recommender systems recommend items based on the content information of the items.These systems suffer from scalability, data sparsity, over specialization, and cold-start problems resulting in poor quality recommendations and reduced coverage[4]. Recommender System is to generate significant recommendations to a collection of users for items or products that might interest them. Content-based filtering and collaborative filtering (CF) are two technologies used in recommender systems. Content-based filtering systems analyze the contents of a set of items together with the ratings provided by individual users to conclude which non-rated items might be of interest for a specific user.

II. COLLABORATIVE FILTERING (CF)

CF is one of the most frequently used technique in personalized recommendation systems which is also making predictions about the interests of a user based on previous preferences by all users based on the idea that users who liked similar things in the past will tend to favor similar items. CF is the process of filtering for information or patterns using techniques involving collaboration amongst numerous agents, viewpoints, statistics sources, etc. They are primarily categorized into three filtering, namely, model based collaborative filtering ,memory based collaborative filtering, and hybrid collaborative filtering techniques [5].

2.1Memory based Collaborative Filtering Algorithms.

Memory-based algorithms make use of the complete user-item database to produce a forecast. These systems utilize statistical techniques to discover a set of users, known as neighbors, that have a past of approving with the intention user (i.e., they whichever tempo different items likewise or they be inclined to purchase related sets of objects). Previously a neighborhood of users is formed[7]. These systems use dissimilar algorithms to merge the preferences of neighbors to create a forecast or top-N recommendation for the dynamic user. The techniques, also known as nearest neighbor or user based collaborative filtering algorithms are more admired and extensively used carry out.

2.2 Model based collaborative filtering

Model based collaborative filtering algorithms present item recommendation by developing a model of user ratings initially. Algorithms in this category take a probabilistic approach and envision the collaborative filtering process as computing the expected value of a user prediction, given user ratings resting on additional objects.

The model building process is performed by different machine learning algorithms such as Bayesian network, clustering, Singular Value Decomposition (SVD) [1] and rule-based approaches.

(3)

253 | P a g e

2.3 Hybrid Collaborative Filtering (HCF)

Considering the inaccuracy of the traditional similarity method in the k-nearest neighbor algorithm, the hybrid algorithm crossing the principal components analysis and KNN thru a hybrid way. Hybrid Collaborative Filtering [8] systems combine CF with other commendation techniques to make predictions or recommendations These algorithms in the contribution rates of features mined by PCA to build the similarity model for users. The experiment results argue that the fresh hybrid algorithm makes personalized recommendations very effective.

Anticipating keeping away from boundaries of whichever recommender system and improving recommendation performance, HCF recommenders are combined by adding content based characteristics to Collaborative Filtering models, adding Collaborative Filtering characteristics to content based models, combining CF with or combine different CF algorithms[9].

III. THE CHARACTERISTICS AND THEIR MOTIVATION

Frequently, a recommender system providing fast and accurate recommendations will attract the interest of customers and bring benefits to business[9]. For Collaborated Filtering systems, producing high quality predictions or recommendations depend on how well they address the challenges, which are characteristics of collaborative filtering tasks as well as are explained.

3.1 Data Sparsity

The data sparsity challenge appears in several situations in collaborative filtering, specifically, the cold start problem [11] occurs when a new user or item has just entered the system; it is difficult to find similar ones because there is not enough. New items cannot be recommended until some users‟ rate it, and new users are unlikely given good recommendations because of the lack of their rating or purchase history. This could reduce the effectiveness of a recommendation system and therefore generating predictions.Hybrid collaborative filtering algorithms [12], such as the content-boosted collaborative filtering algorithm, are found helpful to address the sparsity problem, in which external content information can be used to produce predictions for new users or new items. In Ziegler et al., a hybrid collaborative filtering approach was proposed to utilize bulk information designed for exact product classification to address the data sparsity problem of collaborative filtering recommendations, based on the generation of profiles.

3.2 Scalability

The numbers of accessible users and items produce enormously, traditional collaborative filtering algorithms will endure severe scalability problems, with computational resources going beyond practical or acceptable levels[15]. As well, many systems need to react immediately to online requirements and make recommendations for all users regardless of their purchases and ratings history, which demands a high scalability of a collaborative filtering system [6].

3.3 Synonymy

It refers to the tendency of a number of the same or very similar items to have different names or entries. Most recommender systems are unable to discover this latent association and thus extravagance these products

(4)

254 | P a g e differently. The frequency of synonyms decreases the recommendation performance of collaborative filtering systems. The drawback for fully automatic methods is that some added terms may have different meanings from intended, thus leading to rapid degradation of recommendation performance.

3.4 Gray Sheep

Gray sheep refers to the users whose opinions do not consistently agree or disagree with any group of people and thus do not benefit from recommended collaborative filtering. Black sheep are the contradictory group whose individual tastes makes recommendations nearly impossible. Even though this is a malfunction of the recommender systems also have enormous problems in these cases, so black sheep is an acceptable failure[16].

Claypool et al. provided a hybrid approach combining content-based and collaborative filtering recommendations by basing a prediction on a weighted average of the content-based prediction and the collaborative filtering prediction.

3.5 Shilling Attacks

In cases where anyone can provide recommendations, people may give tons of positive recommendations for their own materials and negative recommendations for their competitors. It is desirable for collaborative filtering systems to introduce precautions that discourage this kind of phenomenon [2]. Item-based collaborative filtering algorithm was much less affected by the attacks than the user based collaborative filtering algorithm, and they suggest that new ways must be used to evaluate and detect shilling attacks on recommender systems[18]. The item-based collaborative filtering systems have been examined by Mobasher et al., and alternative collaborative filtering systems such as hybrid collaborative filtering systems and model-based collaborative filtering systems were believed to have the ability to provide partial solutions to the problem.

IV. DESCRIPTION OF ALGORITHMS EVALUATED

There are no. of algorithms to estimate to which are frequently mentioned in the literature and works on explicit ratings data. These algorithms are proposed for modification by introducing new approach. In some cases, the suggestion completed for trivial modifications to existing algorithms in order to improve performance. All the tested algorithms with basic information are described.

V. METHODOLOGY

In our research work (i) user-based collaborative filtering, (ii) k-NN algorithm, (iii) item-based collaborative filtering (iv) Apriori algorithm, (v) singular value decomposition (iv) Simple Bayesian Collaborated Filtering Algorithm and (iiv) content boosted algorithm are implemented and are examined for quite a few combinations of appropriateness for creation an efficient grouping to form a hybrid collaborative filtering algorithm by dropping the quality of predictions [20]. With reference to the presentation, user based collaborative filtering, singular value decomposition,content boosted algorithm are customized for additional dropping of the quality of prediction.The investigational methodology used for computing the dissimilar prediction algorithms namely collaborative filtering predictor, Content based algorithm are presented and evaluated. Then prediction times and the quality of their predictions are measured for each evaluated algorithm. All the experiments are

(5)

255 | P a g e conducted at the research lab on an Intel Pentium quad core Processor with 4 GB RAM system and using a Java platform and results were carried out.

VI. DATASET

To carry out the research and analysis for content boosted collaborative filtering system, the Crazy Research Project agency at the Research Institute developed Internet association of Consumer Index Database which contains the customer product ratings and the product details, [5]. this dataset may be used to derive the results.

The data set consists of 645 users with 100000 ratings on 2018 items. Each user has rating at least 25 products.It provides based on demographic data such as area, age, gender, education, and Income. The content of the information of every product is considered as a set of slots. Each slot is represented by number of words.

Further, the data has been isolated and redundant for having less than 15 ratings or in complete demographic information. Further the dataset presents the genuine rating data provided by each user for various ecommerce websites are used in the accomplishment of the process to generate forecast values for various algorithmic systems.

VII. EXPERIMENTAL EVALUATION

A subset of the ratings data from the Consumer Index data set is used for the purposes of comparison.

comparison.18% of the users were randomly selected to be the test users. In the internet shopping consumers are ordered from ecommerce website [9], it mentioned that the data sets c1.base and c1.test through c5.base and c5.test are 82%/18% splits of the u data into training and test data. Each of c1,c2,c3,c4,and c5 has disjointed test sets for cross validation. These data sets can be generated from c.data by ebay.com. Source file c.data contained the c dataset by 645 users with 100000 ratings on 2018 items. Each user has rating at least 25 products. This is a tab separated list of user id, item id, rating and timestamp.MAE calculates the irrelevance between the recommendation value predicted by the system and the actual evaluation value rated by the user.

The measurement method of evaluating the recommendation quality of recommendation system mainly includes statistical precision measurement method it includes to measure the recommendation quality [6]. The produced forecast values are stored in an array list and are tabulated.

VIII. PERFORMANCE ASSESSMENT AND COMPARATIVE ANALYSIS

The persuade of a variety of nearest neighbors set on predictive validity is experienced by steadily mounting the number of neighbors. The dataset predicts the items rating of the users are evaluated as per the opinions of the users chosen ratings. The results are shown in graphical representing MAE values versus individual with their neighbor set sizes. MAE values are computed using different test data sets and tabulated in table 2. The Comparative analysis of these computed values are presented.Fig.1. Comparing the performance of three modified collaborative filtering algorithms i.e., UBCF, SVD and CBCF. The results accessible and shows the MAEs for the different NNSs assessment users using test dataset is 32.3% and 19.5%. In most of the cases, it is observed modified algorithms are performed well and showed increase in the quality prediction performance. It

(6)

256 | P a g e is notice that is the differences between the overall performances of the modified UBCF,SVD and CBCF.

CBCF performs better than the other compared algorithms.

IX. CONCLUSION

In this research paper, personalized collaborative filtering algorithms that are identified for their effectiveness, significant collaborative filtering algorithms are assessed. These results formed by the experiments conducted with the dissimilar algorithms and recital assessment during the research work. The experimental results are obtained in all the modified collaborative filtering methods, and the recital of content boosted collaborative filtering algorithm could be demonstrated and improved, of which one is connected with conventional collaborative filtering algorithms.

X. REFERENCES

[1] Xiuyan Gu, Linfeng Jiang, and Ziyi Zhang, “Study on User‟s Browse Behavior to Measure the User‟s Browse Interest”, Network and Communication, vol.15, pp.43-45, 2005.

[2] Incremental SVD method, Simon Funk [3] Amazon, an online portal,

[4] eBay, an online portal,

[5] Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., 1996.Advances in Knowledge Discovery and Data Mining, MIT Press,Cambridge, MA.

[6] Etzioni, O., 1996. TheWorldWideWeb: Quagmire or gold mine,Communicationsof the ACM, 39, 65-68.

[7] Borges, J. and Levene, M., 1999. Data mining of user navigation patterns,International WEBKDD Workshop - Web Usage Analysis and UserProﬁling, San Diego, CA, USA, August 15, pp. 31-36.

[8] Madria, S. K., Bhowmick, S. S., Ng,W.K. and Lim E. P., 1999. Research issuesin Web data mining, 1st International Conference on Data Warehousingand Knowledge Discovery (DaWaK‟99), Florence, Italy, August 30 -September 1, pp. 303-312.

[9] NETCRAFT, http://www.netcraft.com/.

[10] Cooley, R., Mobasher, B. and Srivastava, J., 1997. Web mining: Informationand pattern discovery on the World Wide Web, 9th IEEE InternationalConference on Tools with Artiﬁcial Intelligence (ICTAI‟97), NewportBeach, CA, USA, November 03-08, pp. 558-567.

[11] Kosala, R. and Blockeel, H., 2000. Web mining research: A survey, ACMSIGKDD Explorations, 2, 1-15.

[12] Aggarwal, C. C,Wolf, J. L. and Yu, P. S., 1999. Caching on theWorldWideWeb,IEEE Transactions on Knowledge and Data Engineering, 11, 95-107.

[13] Shim, J., Scheuermann, P. and Vingralek, R., 1999. Proxy cache algorithms:Design, implementation and performance, IEEE Transactions onKnowledge and Data Engineering, 11, 549-562.

[14] Pitkow, J. and Pirolli, P., 1999. Mining longest repeating subsequencesto predict World Wide Web surﬁng, USENIX Symposium on InternetTechnologies and Systems (USITS‟99), Boulder, CO, USA, October11-14, pp. 139-150.

(7)

257 | P a g e [15] Sarukkai, R. R., 2000. Link prediction and path analysis using markov chains,Computer Networks, 33,

377-386.

[16] Pirolli, P. Pitkow, J. and Rao, R., 1996. Silk from a sow‟s ear: Extractingusable structures from the Web, ACM Conference on Human Factors inComputing Systems (CHI‟96), Vancouver, BC, Canada. April 17, pp.118-125.

[17] Brin, S. and Pagepp, L., 1998. The anatomy of large-scale hypertextualWeb search engine, Seventh International World-Wide Web Conference,Brisbane, Australia, April 14-18, pp.107-

[18] Goldberg, D., Nichols, D., Oki, B. and Terry, D., 1992. Using collaborativeﬁltering to weave an information tapestry, Communications of the ACM,35, 61-70.

[19] Yan, T. W. and Molina, H. G., 1999. The IFT information dissemninationsystem, ACM Transactions on Database Systems, 24, 529-565.