(E AggNN) Implementation of Effective and Efficient Algorithm for flexible aggregate similarity Search on Recommender System

(1)



Abstract— The basic idea of this system comes from the fact that most of the recommendations are based on the past results of the system. But most of the time results for the nearest answers are also play a vital role in forming best recommendation outcomes. So this system combines the traditional recommendation system with aggregated nearest neighbour’s results to get best recommendation model. The main purpose of this proposed system is to provide best recommendation for the movies based on the user query by using Aggregate nearest neighbour model along with collaborative filtering technique.

Index Terms— filtering, Aggregation, Rank Generation, Similarity Search etc.

I. INTRODUCTION

While measuring recommendation quality, only accuracy is

not sufficient. Therefore, using the item ratings and user

profiles, recommender system has been proposed to provide

diverse recommendations i.e. highly personalized items with

only a minimal accuracy loss as well as suggest a sequence of

items instead of a single recommendation to improve the

quality of recommendations and use consumer-oriented or

manufacturer oriented ranking mechanisms so both

consumer and manufacturer will get benefit from

recommendations. Due to the explosive growth of

information available on the web, user may take poor

decisions while finding relevant information. User, who does

not have sufficient experience, does not know different

alternatives are present on the web. Recommender systems

use purchase data, items ratings, and user profiles to predict

which items are best suited to a particular user. On the off

chance that there are more components to be viewed as, for

example, the area value, then the k most suitable locales

might be recovered from the database as hopefuls. Total

similitude look strategies could likewise advantage

applications that make utilization of pertinence criticism, a

type of query refinement in which the client is given the

chance to choose a few objects from a past query result to

serve as the premise for an ensuing query. Razente et al.

proposed total closeness inquiries as a pertinence criticism

system for content based picture recovery. Besides, as pointed

out in total similitude inquiries can be used in bunching and

anomaly discovery.

While measuring recommendation quality, only accuracy is

not sufficient. Therefore, using the item ratings and user

profiles, recommender system has been proposed to provide

diverse recommendations i.e. highly personalized items with

only a minimal accuracy loss as well as suggest a sequence of

items instead of a single recommendation to improve the

quality of recommendations and use consumer-oriented or

manufacturer oriented ranking mechanisms so both

consumer and manufacturer will get benefit from

recommendations. Due to the explosive growth of

information available on the web, user may take poor

decisions while finding relevant information. User, who does

not have sufficient experience, does not know different

alternatives are present on the web. Recommender systems

use purchase data, items ratings, and user profiles to predict

which items are best suited to a particular user.

(E-AggNN) Implementation of Effective and

Efficient Algorithm for flexible aggregate

similarity Search on Recommender System

Swati Shripad Joshi Sonali A. Pail

Computer Engineering Assistant Professor Computer Engineering /department BhivarabaiSawant Institute of Technology & Research Bhivarabai Intitute of Technology & Research Wagholi, Pune-421207 Wagholi ,Pune-421207

(2)

II. PROBLEM DEFINITION

An increasing variety of content has been made available in

the Web environment, with the online market witnessing a

particularly rapid growth. However, customers still

experience a great deal of frustration when searching for the

items. Collaborative filtering (CF)-based recommender

systems represent a promising solution for the rapidly

growing web market. However, in the Web environment, a

traditional CF system that uses explicit ratings to collect user

preferences has a limitation: customers find it difficult to rate

their tastes directly because of poor interfaces and high

telecommunication costs. Implicit ratings are more desirable

for the Web, but commonly used cardinal (interval, ratio)

scales for representing preferences are also unsatisfactory

because they may increase estimation errors. In this paper,

we propose E-AggNN -based recommendation methodology

based on both implicit ratings and less ambitious ordinal

scales.

III. EXISTING SYSTEM

While much effort has been spent on studying possibilities to

facilitate efficient similarity search in high dimensional data,

scarcely ever the question arose how to support similarity

search when the similarity of objects is based on a subset of

attributes only. Aside from fundamentally studying the

behavior of data structures in such settings, this is a

practically highly relevant question.

If efficient support of subspace range queries or subspace

nearest neighbor queries were available, virtually all

subspace cluster approaches could be accelerated

considerably. Note this problem is essentially different from

the feature selection problem.

IV. PROPOSED SYSTEM

The proposed methodology of recommendation using

aggregate nearest neighbor methodology using movie dataset

taken from IMDB.The basic idea of this system comes from

the fact that most of the recommendations are based on the

past results of the system. But most of the time results for the

nearest answers are also play a vital role in forming best

recommendation outcomes. So this system combines the

traditional recommendation system with aggregated nearest

neighbor’s results to get best recommendation model.

V. SYSTEM ARCHITECTURE

Step 1: Preprocessing and User Query Handling - This is the initial step of the proposed methodology where user enters the query for the attributes like movie Director name, Actors, Budget , Genres, Year, keyword of the movie and Finally for IMDD Score then these input query keywords are stored in static list to process further.

Once these query attributes are received by the system it read the movie dataset that is obtained from the URL https://www.kaggle.com/deepmatrix/imdb-5000-movie-data set.Once this dataset is read into the two dimensional list then it is tend to preprocess based on the selective attributes to perform the further action.

Step 2: Aggregate Nearest Neighbor – Here in this step all the preprocessed attributes are considered to evaluate the nearest neighbor for the input query attributes. Each of the preprocessed entities are evaluated with the given query attributes using the set protocols of the selected attributes like like movie Director name, Actors, Budget , Genres, Year, keyword of the movie and Finally for IMDB Score .On analysis of the protocol a list of nearest neighbor is yielded based on the decided protocols for the attributes. This process can be viewed in the below algorithm 1.

Once the array of Aggregate nearest neighbor is created from the above algorithm 1 then it is subjected to produce the cosine similarity between the query keywords and the dataset keywords using the following equation 1.

Cos(d,q)= (d.q) / ||d|| ||q||

______________________(1)

Where

(3)

Step 3: Rank Generation- As the equation 1 generates the cosine similarity between the query keywords and dataset keywords all the similarity values are stored in an array. Now this array will be sorting using bubble sort technique in descending order to get the best high valued similar pairs of data from dataset. And this data is been inserted into the database to create a profiled history data for recommendation process.

Step 4: Colloborative Filtering and Jaccard Distance-Here in this step query keywords are been evaluated with the database attributes that are stored in the prior steps. Now system analyzes the data in database using collaborative filtering technique which is depicted in algorithm 2.

Once these collaborative filtering indices are estimated then it is subjected to find the nearest coefficient using jaccard distance as mentioned in equation 2.

_________________(2)

Where Jd- Jaccard Distance

P= Number of variables that are positive for both objects Q= Number of variables that are positive for the ith objects and negative for the jth object

R=Number of variables that are negative for the ith objects and positive for the jth object

On applying equation 2 jaccard Distance list is been

evaluated and this list will be sorted in descending order to

get the best hybrid recommendation results for

recommending the movies

VI. MATHEMATICALMODEL

1. Let HR={ } be as system for Hybrid

Recommendation

2. Identify Input as I={ Md,Uq}

Where Md=Movie Dataset Uq= User Query

3. Identify Ras Output i.e. Recommendation

HR= {I,R}

4. Identify Process P

HR= {I, P,R} P= {Pr, CS,Nn,Jn,Cf}

Where

Pr =Preprocessing

CS=Cosine Similarity

Nn = Aggregate Nearest Neighbor

Jn = Jacquard Distance

Cf= collaborative Filtering

5. HR = {I,Pr, CS, Nn,Jn,Cf,R}

VII. PRPPOSED ALGORITHM

ALGORITHM 1:ANN

/ Input: User Query Set Q, Preprocessed Data Set P //Output: Aggregate Nearest neighbor index

Step 0: Start

Step 1: FOR i=0 to Size of P

Step 2: AL of Pi ( AL is Attribute List) Step 3: FOR j=0 to size of AL

Step 4: FOR k=0 to size of ALi Step 5:IF ALik ∈ Q

Step 6: THEN | ALik – Q | < T (T is Threshold Protocol) Step 7: Set Protocol Query Index PQ

Step 8: Create PQ array (Aggregate Nearest neighbor index) Step 9: END FOR

Step 10: END FOR Step 11: END FOR Step 12: Return PQ Step 13: Stop

ALGORITHM 2: COLLOBORATIVE FILTERING

/ Input: User Query Set Q ,Database Data Set D //Output: CollaborativeFilteringindex List-CFI

Step 0: Start

Step 1: FOR i=0 to Size of D

Step 2: DAL of Di (DALDatabase Attribute List) Step 3: IF DAL∈ Q

Step 4: THEN | DAL– Q | <CFL

[CFL -Collaborative Filtering Limit variable] Step 5: Set Collaborative Filtering index CFI Step 6: Create CFI List

(4)

VIII. SNAPSHOT

IX. RESULT AND CONCLUSION

[image:4.595.119.548.90.611.2]

When the accuracy of the proposed system is compared with that of accuracy of the different methodologies as mentioned in [11] the results obtained are tabulated below in the below table.

For the real time incorporation of the hybrid recommendation system we used one machine of standard configuration with core i3 processor and 4 GB RAM. Deployment used java based windows machine with

Netbeans as the development IDE and MYSQL as database server. System is subjected to many crucial tests for its performance evaluation as stated in the below tests.

Figure 1: Performance comparison with other methodologies

X. CONCLUSION

Accurate Nearest neighbor Estimation using Cosine Similarity and Proper Rank Estimation.

So Fine blending of Collaborative filtering with jaccard distance to provide hybrid recommendation model.

XI. ACKNOWLEDGMENT

Tthis special opportunity to express my sincere gratitude

towards professor and all the people who supported me

during my entire project work. I would like to express my

gratitude to my guide and also the project coordinator Prof.

Sonali A.Patil for providing special guidance. I would also

(5)

enough time to solve student’s problems at any time. Finally

thanks to all teachers who are always supportive at us.

References

[1] D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis, “Group nearest neighbor queries,” in Proc. Int. Conf. Data Eng., 2004, pp. 301–312.

[2] Michael E. Houle, Xiguo Ma, and Vincent Oria ”Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional

Spaces” IEEE TRANSACTIONS ON

KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 12, DECEMBER 2015

[3]GediminasAdomavicius, and YoungOk Kwon” Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

[4]SeokKee Lee a,1, Yoon Ho Cho b,*, SoungHie Kim a,2” Collaborative filtering with ordinal scale-based implicit ratings for mobile music recommendations” Information Sciences 180 (2010) 2142–2155

[5] B. Sarwar, G. Karypis, J.A. Konstan, J. Riedl, Application of dimensionality reduction in recommender system – a case study, in: Proceedings of the ACM WebKDD Workshop, 2000.

[6] K. Bradley and B. Smyth, “Improving Recommendation Diversity,” Proc. of the 12th Irish Conf. on Artificial Intelligence and Cognitive Science, 2001.

[7] S.T. Park and D.M. Pennock, “Applying collaborative filtering techniques to movie search for better ranking and browsing,” Proc. of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 550-559, 2007

[8] Saúl Vargas and Pablo Castells, “Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems”, RecSys’11, October 23–27, 2011.

[9]G. Adomavicius and A. Tuzhilin, “Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions,” IEEE Trans. On Knowledge and Data Engineering, 17(6), pp. 734-749, 2005.

[10] Shan Liu, Yao Dong, Jianping Chai," Research of Personalized News Recommendation System Based on Hybrid Collaborative Filtering Algorithm ", 2nd IEEE International Conference on Computer and Communications,2016.

First Author

Ms.Swati Shripad Joshi.

Pursuing M.E in Computer Science & Engineering Department

JSPM'S BSIOTR, Pune University

Second Author

Prof. Sonali Appasaheb Patil

Mtech CSE, Phd pursuing from BSAU, Chennai.

Asst.prof.Department of Computer Engineering

JSPM'S BSIOTR,WAGHOLI,PUNE

Third Author

Ms.Priyanka N.Kamble.

Pursuing M.E in Computer Science & Engineering Department