Abstract— The basic idea of this system comes from the fact that most of the recommendations are based on the past results of the system. But most of the time results for the nearest answers are also play a vital role in forming best recommendation outcomes. So this system combines the traditional recommendation system with aggregated nearest neighbour’s results to get best recommendation model. The main purpose of this proposed system is to provide best recommendation for the movies based on the user query by using Aggregate nearest neighbour model along with collaborative filtering technique.
Index Terms— filtering, Aggregation, Rank Generation, Similarity Search etc.
I. INTRODUCTION
While measuring recommendation quality, only accuracy is
not sufficient. Therefore, using the item ratings and user
profiles, recommender system has been proposed to provide
diverse recommendations i.e. highly personalized items with
only a minimal accuracy loss as well as suggest a sequence of
items instead of a single recommendation to improve the
quality of recommendations and use consumer-oriented or
manufacturer oriented ranking mechanisms so both
consumer and manufacturer will get benefit from
recommendations. Due to the explosive growth of
information available on the web, user may take poor
decisions while finding relevant information. User, who does
not have sufficient experience, does not know different
alternatives are present on the web. Recommender systems
use purchase data, items ratings, and user profiles to predict
which items are best suited to a particular user. On the off
chance that there are more components to be viewed as, for
example, the area value, then the k most suitable locales
might be recovered from the database as hopefuls. Total
similitude look strategies could likewise advantage
applications that make utilization of pertinence criticism, a
type of query refinement in which the client is given the
chance to choose a few objects from a past query result to
serve as the premise for an ensuing query. Razente et al.
proposed total closeness inquiries as a pertinence criticism
system for content based picture recovery. Besides, as pointed
out in total similitude inquiries can be used in bunching and
anomaly discovery.
While measuring recommendation quality, only accuracy is
not sufficient. Therefore, using the item ratings and user
profiles, recommender system has been proposed to provide
diverse recommendations i.e. highly personalized items with
only a minimal accuracy loss as well as suggest a sequence of
items instead of a single recommendation to improve the
quality of recommendations and use consumer-oriented or
manufacturer oriented ranking mechanisms so both
consumer and manufacturer will get benefit from
recommendations. Due to the explosive growth of
information available on the web, user may take poor
decisions while finding relevant information. User, who does
not have sufficient experience, does not know different
alternatives are present on the web. Recommender systems
use purchase data, items ratings, and user profiles to predict
which items are best suited to a particular user.
(E-AggNN) Implementation of Effective and
Efficient Algorithm for flexible aggregate
similarity Search on Recommender System
Swati Shripad Joshi Sonali A. Pail
Computer Engineering Assistant Professor Computer Engineering /department BhivarabaiSawant Institute of Technology & Research Bhivarabai Intitute of Technology & Research Wagholi, Pune-421207 Wagholi ,Pune-421207
II. PROBLEM DEFINITION
An increasing variety of content has been made available in
the Web environment, with the online market witnessing a
particularly rapid growth. However, customers still
experience a great deal of frustration when searching for the
items. Collaborative filtering (CF)-based recommender
systems represent a promising solution for the rapidly
growing web market. However, in the Web environment, a
traditional CF system that uses explicit ratings to collect user
preferences has a limitation: customers find it difficult to rate
their tastes directly because of poor interfaces and high
telecommunication costs. Implicit ratings are more desirable
for the Web, but commonly used cardinal (interval, ratio)
scales for representing preferences are also unsatisfactory
because they may increase estimation errors. In this paper,
we propose E-AggNN -based recommendation methodology
based on both implicit ratings and less ambitious ordinal
scales.
III. EXISTING SYSTEM
While much effort has been spent on studying possibilities to
facilitate efficient similarity search in high dimensional data,
scarcely ever the question arose how to support similarity
search when the similarity of objects is based on a subset of
attributes only. Aside from fundamentally studying the
behavior of data structures in such settings, this is a
practically highly relevant question.
If efficient support of subspace range queries or subspace
nearest neighbor queries were available, virtually all
subspace cluster approaches could be accelerated
considerably. Note this problem is essentially different from
the feature selection problem.
IV. PROPOSED SYSTEM
The proposed methodology of recommendation using
aggregate nearest neighbor methodology using movie dataset
taken from IMDB.The basic idea of this system comes from
the fact that most of the recommendations are based on the
past results of the system. But most of the time results for the
nearest answers are also play a vital role in forming best
recommendation outcomes. So this system combines the
traditional recommendation system with aggregated nearest
neighbor’s results to get best recommendation model.
V. SYSTEM ARCHITECTURE
Step 1: Preprocessing and User Query Handling - This is the initial step of the proposed methodology where user enters the query for the attributes like movie Director name, Actors, Budget , Genres, Year, keyword of the movie and Finally for IMDD Score then these input query keywords are stored in static list to process further.
Once these query attributes are received by the system it read the movie dataset that is obtained from the URL https://www.kaggle.com/deepmatrix/imdb-5000-movie-data set.Once this dataset is read into the two dimensional list then it is tend to preprocess based on the selective attributes to perform the further action.
Step 2: Aggregate Nearest Neighbor – Here in this step all the preprocessed attributes are considered to evaluate the nearest neighbor for the input query attributes. Each of the preprocessed entities are evaluated with the given query attributes using the set protocols of the selected attributes like like movie Director name, Actors, Budget , Genres, Year, keyword of the movie and Finally for IMDB Score .On analysis of the protocol a list of nearest neighbor is yielded based on the decided protocols for the attributes. This process can be viewed in the below algorithm 1.
Once the array of Aggregate nearest neighbor is created from the above algorithm 1 then it is subjected to produce the cosine similarity between the query keywords and the dataset keywords using the following equation 1.
Cos(d,q)= (d.q) / ||d|| ||q||
______________________(1)
Where
Step 3: Rank Generation- As the equation 1 generates the cosine similarity between the query keywords and dataset keywords all the similarity values are stored in an array. Now this array will be sorting using bubble sort technique in descending order to get the best high valued similar pairs of data from dataset. And this data is been inserted into the database to create a profiled history data for recommendation process.
Step 4: Colloborative Filtering and Jaccard Distance-Here in this step query keywords are been evaluated with the database attributes that are stored in the prior steps. Now system analyzes the data in database using collaborative filtering technique which is depicted in algorithm 2.
Once these collaborative filtering indices are estimated then it is subjected to find the nearest coefficient using jaccard distance as mentioned in equation 2.
_________________(2)
Where Jd- Jaccard Distance
P= Number of variables that are positive for both objects Q= Number of variables that are positive for the ith objects and negative for the jth object
R=Number of variables that are negative for the ith objects and positive for the jth object
On applying equation 2 jaccard Distance list is been
evaluated and this list will be sorted in descending order to
get the best hybrid recommendation results for
recommending the movies
VI. MATHEMATICALMODEL
1. Let HR={ } be as system for Hybrid
Recommendation
2. Identify Input as I={ Md,Uq}
Where Md=Movie Dataset Uq= User Query
3. Identify Ras Output i.e. Recommendation
HR= {I,R}
4. Identify Process P
HR= {I, P,R} P= {Pr, CS,Nn,Jn,Cf}
Where
Pr =Preprocessing
CS=Cosine Similarity
Nn = Aggregate Nearest Neighbor
Jn = Jacquard Distance
Cf= collaborative Filtering
5. HR = {I,Pr, CS, Nn,Jn,Cf,R}
VII. PRPPOSED ALGORITHM
ALGORITHM 1:ANN
/ Input: User Query Set Q, Preprocessed Data Set P //Output: Aggregate Nearest neighbor index
Step 0: Start
Step 1: FOR i=0 to Size of P
Step 2: AL of Pi ( AL is Attribute List) Step 3: FOR j=0 to size of AL
Step 4: FOR k=0 to size of ALi Step 5:IF ALik ∈ Q
Step 6: THEN | ALik – Q | < T (T is Threshold Protocol) Step 7: Set Protocol Query Index PQ
Step 8: Create PQ array (Aggregate Nearest neighbor index) Step 9: END FOR
Step 10: END FOR Step 11: END FOR Step 12: Return PQ Step 13: Stop
ALGORITHM 2: COLLOBORATIVE FILTERING
/ Input: User Query Set Q ,Database Data Set D //Output: CollaborativeFilteringindex List-CFI
Step 0: Start
Step 1: FOR i=0 to Size of D
Step 2: DAL of Di (DALDatabase Attribute List) Step 3: IF DAL∈ Q
Step 4: THEN | DAL– Q | <CFL
[CFL -Collaborative Filtering Limit variable] Step 5: Set Collaborative Filtering index CFI Step 6: Create CFI List
VIII. SNAPSHOT
IX. RESULT AND CONCLUSION
[image:4.595.119.548.90.611.2]When the accuracy of the proposed system is compared with that of accuracy of the different methodologies as mentioned in [11] the results obtained are tabulated below in the below table.
For the real time incorporation of the hybrid recommendation system we used one machine of standard configuration with core i3 processor and 4 GB RAM. Deployment used java based windows machine with
Netbeans as the development IDE and MYSQL as database server. System is subjected to many crucial tests for its performance evaluation as stated in the below tests.
Figure 1: Performance comparison with other methodologies
X. CONCLUSION
Accurate Nearest neighbor Estimation using Cosine Similarity and Proper Rank Estimation.
So Fine blending of Collaborative filtering with jaccard distance to provide hybrid recommendation model.
XI. ACKNOWLEDGMENT
Tthis special opportunity to express my sincere gratitude
towards professor and all the people who supported me
during my entire project work. I would like to express my
gratitude to my guide and also the project coordinator Prof.
Sonali A.Patil for providing special guidance. I would also
enough time to solve student’s problems at any time. Finally
thanks to all teachers who are always supportive at us.
References
[1] D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis, “Group nearest neighbor queries,” in Proc. Int. Conf. Data Eng., 2004, pp. 301–312.
[2] Michael E. Houle, Xiguo Ma, and Vincent Oria ”Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional
Spaces” IEEE TRANSACTIONS ON
KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 12, DECEMBER 2015
[3]GediminasAdomavicius, and YoungOk Kwon” Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
[4]SeokKee Lee a,1, Yoon Ho Cho b,*, SoungHie Kim a,2” Collaborative filtering with ordinal scale-based implicit ratings for mobile music recommendations” Information Sciences 180 (2010) 2142–2155
[5] B. Sarwar, G. Karypis, J.A. Konstan, J. Riedl, Application of dimensionality reduction in recommender system – a case study, in: Proceedings of the ACM WebKDD Workshop, 2000.
[6] K. Bradley and B. Smyth, “Improving Recommendation Diversity,” Proc. of the 12th Irish Conf. on Artificial Intelligence and Cognitive Science, 2001.
[7] S.T. Park and D.M. Pennock, “Applying collaborative filtering techniques to movie search for better ranking and browsing,” Proc. of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 550-559, 2007
[8] Saúl Vargas and Pablo Castells, “Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems”, RecSys’11, October 23–27, 2011.
[9]G. Adomavicius and A. Tuzhilin, “Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions,” IEEE Trans. On Knowledge and Data Engineering, 17(6), pp. 734-749, 2005.
[10] Shan Liu, Yao Dong, Jianping Chai," Research of Personalized News Recommendation System Based on Hybrid Collaborative Filtering Algorithm ", 2nd IEEE International Conference on Computer and Communications,2016.
First Author
Ms.Swati Shripad Joshi.
Pursuing M.E in Computer Science & Engineering Department
JSPM'S BSIOTR, Pune University
Second Author
Prof. Sonali Appasaheb Patil
Mtech CSE, Phd pursuing from BSAU, Chennai.
Asst.prof.Department of Computer Engineering
JSPM'S BSIOTR,WAGHOLI,PUNE
Third Author
Ms.Priyanka N.Kamble.
Pursuing M.E in Computer Science & Engineering Department