A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques

(1)

Contents lists available at ScienceDirect

Expert

Systems

With

Applications

journal homepage: www.elsevier.com/locate/eswa

A

recommender

system

based

on

collaborative

ﬁltering

using

ontology

and

dimensionality

reduction

techniques

Mehrbakhsh Nilashi

a ,b ,∗

_{, Othman Ibrahim}

a

_{, Karamollah Bagherifard}

c

a_{Faculty of}_Computing,_Universiti_{Teknologi Malaysia, 81310 Skudai, Johor,}_Malaysia b_{Department of Computer Engineering,}_Lahijan_Branch,_{Islamic Azad University, Lahijan, Iran} c_{Young Researchers}_{and Elite Club, Yasooj}_Branch,_{Islamic Azad University, Yasooj, Iran}

a

r

t

i

c

l

e

i

n

f

o

Article history:

Received22July2017

Revised23September2017

Accepted24September2017

Availableonline29September2017

Keywords:

Recommendersystems

Ontology Clustering

Dimensionalityreduction Scalability

Sparsity

a

b

s

t

r

a

c

t

Improvingtheefficiencyofmethodshasbeenabigchallengeinrecommendersystems.Ithasbeenalso importanttoconsiderthetrade-off betweentheaccuracyand thecomputationtimeinrecommending theitemsbytherecommendersystemsas theyneedtoproduce therecommendationsaccuratelyand meanwhile inreal-time.Inthisregard, thisresearchdevelopsanew hybrid recommendationmethod basedonCollaborativeFiltering(CF)approaches.Accordingly,inthisresearchwesolvetwomain draw-backsof recommendersystems, sparsity and scalability,using dimensionality reduction and ontology techniques.Then, weuseontologytoimprovetheaccuracy ofrecommendations inCFpart.IntheCF part,wealsouseadimensionalityreductiontechnique,SingularValueDecomposition(SVD),tofindthe mostsimilaritemsandusersineachclusterofitemsanduserswhichcansignificantlyimprovethe scal-abilityoftherecommendationmethod.Weevaluatethemethodontworeal-worlddatasetstoshowits effectivenessandcomparetheresultswiththeresultsofmethodsintheliterature.Theresultsshowed thatourmethodiseffectiveinimprovingthesparsityandscalabilityproblemsinCF.

1. Introduction

Finding information in large-scale websites is a difficult and time-consumingprocess.Artificial Intelligence(AI)approachesare appearing at the forefront of research in information retrieval and information filtering systems. Recommender systems are a good example of one such AI approach. They have emerged in the e-commerce domain and are one way to address this is-sue. Such systems have been developed to actively recommend relevant information to users (Jugovac, Jannach, & Lerche, 2017; Nilashi, Jannach, bin Ibrahim, Esfahani, & Ahmadi, 2016 a), typ-ically without the need for an explicit search query. The his-tory of recommender systems dates back to 1979 in relation to cognitive science (Rich, 1979 ). These systems have been impor-tant toolsamongother applicationareassuch as, information re-trieval(Salton, 1989 ),tourism(Kabassi, 2010 ),managementscience (Murthi & Sarkar, 2003 ),approximationtheory(Powell, 1981 ), con-sumerchoicemodelinginbusinessandmarketing(Lilien, Kotler, & Moorthy, 1992 ),andforecastingtheories(Armstrong, 2001 ).

∗ _{Corresponding}_author.

E-mail addresses: [email protected](M.Nilashi),[email protected] (O. Ibrahim),[email protected](K. Bagherifard).

Collaborative Filtering (CF) systems are information retrieval systems that operate under the assumption a user will like the same data items that other users have liked in the past. These systems are particularlypopular and havebeen applied in many online shopping websites (Nilashi, Jannach et al. 2016; Nilashi, Salahshour et al., 2016 b).CFalgorithmsmainlyaggregatefeedback for items from different users and use the similarities between itemsand items(item-based) or between users andusers (user-based)to providerecommendationstoatarget user(Nilashi, Jan- nach, bin Ibrahim, & Ithnin, 2015 ).

Basically, CF recommendation algorithms are based on two maincategorieswhicharemodel-basedandmemory-based meth-ods(Adomavicius & Tuzhilin, 2005 ). Memory-based(or heuristic-based) methods, such as correlation analysis and vector similar-ity,search the userdatabase foruser proﬁles that are similar to the proﬁle of the active user that the recommendation is made for.Inthistype ofrecommendersystems,itisimportantthat the useranditemdatabasesremaininsystemmemoryduringthe al-gorithm’s runtime. Because heuristic-based approachescan make predictionsbasedonthelocalneighbourhoodoftheactiveuser,or canbasetheirpredictionsonthesimilaritiesbetweenitems,these systems can also be classed into the user-based anditem-based approaches(Sarwar et al., 2001 ).

(2)

Memory- andmodel-based approaches havesome advantages and disadvantages for item recommendation. Sparsity has been one of the main diﬃculties associated with these approaches, whereasrecommendationwithhighaccuracyhasbeenoneofthe important advantages of the memory-based approach. However, thisapproachisnotscalableforcurrentrecommendationsystems astheir databases include huge numbers of items and users. In addition,memory-basedmethodsareheuristicsandtheprediction andrecommendationsarebasedonthewholeratingsprovidedby theuserstotheitems.Hence,allratingsarerequiredtobe main-tainedinmemory.Thismethodisatypicalapproachforhigh rec-ommendationaccuracybasedonCF, butisnotscalable for large-scale websites that use the huge number of users and items in recommendationsystems(Sarwar, Karypis, Konstan, & Riedl, 2002 ). Accordingto Goldberg, Roeder, Gupta, and Perkins (2001) , model-based methods for learning a model utilize the group selection ofratings whichis then applied to providerating predictions. In addition,model-basedCFalgorithms havebeenanalternative ap-proachtok-NNtosolve thescalabilityproblemofmemory-based method(Nilashi, Esfahani et al., 2016 c).A probabilisticmethod is utilizedforthesesystemsandtheunratedvalueofauser predic-tionismeasured basedontheratingstheuserhasgiventoother items.Model-based algorithms donot sufferfrommemory-based drawbacksandcancreatepredictionoverashorterperiodoftime comparedtomemory-basedalgorithmsbecausemodel-based algo-rithmsperformoff-linecomputationfortraining.Thesetechniques regularly make a concise rating pattern off-line. Model-based CF (e.g.,Singular ValueDecomposition (SVD)-basedCF)improvesthe scalability and the eﬃciency problem (Koren, Bell, & Volinsky, 2009 ,Liuetal., Nilashi et al., 2015; Nilashi, bin Ibrahim, & Ithnin, 2014 ), but may lead to some problems such as decreasing the accuracyperformance(Linden, Smith, & York, 2003 ).

Hence, in this studya newmethod is proposed based on CF methodtoovercomethesparsityandscalabilityproblemsinCF al-gorithmsaccordinglytoimprovetheperformanceofrecommender systems. In fact, the performance improvement is achieved us-ingontology(Shambour & Lu, 2012 )anddimensionalityreduction (Koren, 2008; Koren et al., 2009 )techniques.Atthemoment,there isnoimplementationofrecommendersystemsbytheuseof com-biningontology anddimensionalityreductiontechniques tosolve thescalabilityandsparsityissuesofCFrecommendersystems. Ac-cordingly, this research tries to develop a new recommendation systembasedon CF usingontology anddimensionalityreduction techniques.Inordertoenhancethepredictionaccuracyand over-come the scalability issue of recommender systems, we propose touseontology andSVD. Speciﬁcally,we develop themethodfor user-anditembasedCF.Weusetheitems’ontology forthe item-based semanticsimilarity calculation andSVD for the item- and user-basedCFrecommendationpart.In comparisonwiththe pre-viousstudies,inthisresearchwe:

• developanewrecommendationsystemusingontology and di-mensionalityreductiontechniques.

• improvethe accuracyof recommendationsystemsby alleviat-ingthesparsityissueinitem-basedCFusingontology.

• improve the scalability of recommendation systems using di-mensionalityreductiontechniques.

The remainder of this paper is organized as follows: In Section 2 , we brieﬂy introduce the related subjects for the de-velopment of the proposed recommender system. In Section 3 , problemstatementandour research contributions are presented. Section 4 presents research methodology. Section 5 provides methodevaluationresults.In Section 6 ,weprovidethediscussion. Finally,conclusionisprovidedin Section 7 .

2. Backgroundtheories

In the followingsub-sections, we present the relatedsubjects forthedevelopmentoftheproposedrecommendersystem.Since, the ontology,clustering, dimensionalityreduction andCF are im-portantcomponentsoftheproposedmethod,ashortintroduction ofthemispresented.

2.1. CFrecommendationmethods

The recommendationsystems generallyare divided into three categories: CF, Content-Based Filtering (CBF) and hybrid method. CF techniques in recommender systems are particularly popular and have been applied in many online shopping websites (Liu et al., 2011; Nilashi et al., 2014 ). The key tosuccessful collabora-tive recommendationliesinthe abilitytomakemeaningful asso-ciations between people and their product preferences, in order to assistthe end-userin futuretransactions.Similarities between pastexperiencesandpreferencesareexploitedtoform neighbour-hoods of like-minded people from which to draw recommenda-tionsorpredictionsfora givenindividualuser.Basedonthe gen-uineprocessofCFstrategy(Schafer, Frankowski, Herlocker, & Sen, 2007 ), a target user in the website willreceive recommendation listofitemsthatotherusers,withsimilartastes,likedinthepast. All CF methodsrequirethe past ratingsof usersin orderto pre-dictandaccordinglyrecommenditemstothetargetuser(Cheng & Wang, 2014 ).Todoso,similaritiesbetweentheusersanditemsare calculatedusingthedistancemeasures.As CFcanbe classiﬁedas user-based and item-based, accordinglythe similarity calculation fortheseapproacheswillbedifferent(Nilashi et al., 2014 ).

2.2. ClusteringmethodsandCF

CFisoneofthemethodswidelyusedinrecommendersystems using two different techniques, memory-based and model-based (Breese, Heckerman, & Kadie, 1998; Su & Khoshgoftaar, 2009 ).The memory-based dependson the entire ratingwhich exists in the user-itemmatrixforformingneighborsofthe activeuserto gen-eraterecommendationtailoredtohis/herpreferences.Incontrast, the model-based methods learn the models of recommendations from the entire ratings to generate the recommendation for the target user.The well-knownmachine learningtechniquesforthis approachisclustering(Sarwar et al., 2002 ),probabilisticLatent Se-manticAnalysis(pLSA) (Hofmann & Puzicha, 2004 ),matrix factor-ization(e.g.SVD)(Koren et al., 2009 )andmachinelearningonthe graph(Zhou et al., 2008 ).

(3)

2011; Truong, Ishikawa, & Honiden, 2007; Zhang, Zhou, & Zhang, 2011 ). For example, in the work conducted by Ungar and Fos-ter,theauthorsdevelopedastatisticalmodelforCF.They accord-ingly used clustering methods for estimating model parameters (Ungar & Foster, 1998 ). They also noted that the clustering can solve overgeneralization (Kushwaha and Vyas, 2014 ) problem in recommendersystems. Breese et al. (1998) furtherinvestigatedthe role of clusteringfor theaccuracy problemof recommender sys-tems.Theyfoundthatthemodelbasedmethodachievesbetter ac-curacyinrelationtothememory-basedapproaches(Breese et al., 1998 ).Furthermore, Sarwar et al. (2002) foundthat employing bi-secting k-means leadsto less-personalCF recommendations than other methods. Similarly, Linden et al. (2003) tested a cluster-ingmethodforitem-to-item CFsystememployedatAmazon.com. Theyfoundthattheuseofclusteringimprovesthescalabilityissue inCF. Xue et al. (2005) experimentsalsoshowedthatthe combi-nation ofmemory based andmodel based methods improvethe recommendation eﬃciency by improving the accuracy and solv-ing scalability problem. They used k-means clusteringto provide smoothing operations to solve the missing-value problems and evaluatedthemethodonMovieLensandEachMoviedatasets.

2.3. Ontologymethodinrecommendersystem

Modeling the information at the semantic level is one of the main goals of using ontologies (Guarino, Oberle, & Staab, 2009 ). The original definition of ontology in the computer sci-ence was provided by Gruber (1992) , and was later refined by Staab and Studer (2009) . The notion of an ontology is origi-nally defined by Gruber (1992) as an “explicit specification of a conceptualization” . Borst (1997) defines an ontology as a “for-mal specification of a shared conceptualization” . In addition, Taniar and Rahayu (2006) definedontology “asa knowledge do-mainconceptualizationintoacomputerprocessableformatwhich models entities, attributes, and axioms” . Ontology is typically madeupofavocabulary andrelationshipsbetweentheconcepts. According to Antoniou and Van Harmelen, (2004) , ontologies are conceptproperties,disjointnessstatements,valuerestrictions,and specificationoflogicalrelationshipsbetweentheobjects.Ontology hasbeenatooltoformallymodelthestructureofasystembased ontherelationshipswhichareemergedfromitsobservation.

Inrecommendersystems,thesemanticinformationofan item includestheattributes,therelationshipsamongtheitems,andthe relationshipbetweenmeta-informationanditems.Inrecentyears, ontologies have been successfully adopted in recommender sys-tems forovercoming theshortcomings ofthese systems(Martín- Vicente, 2014; Lopez-Nores et al., 2010 ). Porcel, Martinez-Cruz, Bernabé-Moreno, Tejeda-Lorente, and Herrera-Viedma (2015) , fo-cused on the accuracy improvement of recommender systems by incorporating fuzzy ontology in their method. Many re-searchers involve domain ontologies in the recommender sys-tems to in measuring the preferences of users to the items of the content (Middleton, De Roure, & Shadbolt, 2009 ). Some re-searchers develop the semantic recommendation approach us-ing withcombiningitem-basedCF anditem-basedsemantic sim-ilarity techniques. In Daramola, Adigun, and Ayo (2009) , au-thors develop a prototypee-tourism recommendationsystem us-ing ontology for tourism services. In Wang and Kong (2007) ), authors propose a semantic enhanced collaborative recommen-dation system using the usage data and semantic informa-tion. Moreover, using knowledge about items and users help to produce a recommendation based on knowledge and reason-ing about which item meet the needs of users (Trewin, 20 0 0 ). The present study aims to use a hybrid method based on knowledge.

3. Relatedwork

Beforegivingdetailsofthetechniquesincorporatedinthe pro-posedmethodandourexperimentalevaluation,inthissectionwe summarizeotherexisting approachesofrecommendersystemsin theliterature.

Lee and Olafsson (2009) proposed a cooperative prediction schemeforCF recommendersystems.Theyevaluated themethod on EachMovie and MovieLens datasets. De Campos, Fernández- Luna, Huete, and Rueda-Morales (2010) presentedanewBayesian networkmodelto dealwiththeproblemofhybrid recommenda-tionby combiningcontent-based andcollaborativefeatures. They usedMovieLensandTheInternetMovieDatabase(IMDB)datasets toshow the effectivenessof themethod. Fan et al. (2014) devel-oped a recommendation methodof user-based CF based on pre-dictivevalue padding.Theirmethodpredicts theempty valuesin user-itemmatrixbytheintegrationofcontent-based recommenda-tionalgorithmanduseractivitylevelbeforecalculatinguser simi-larity.TheyusedMovieLensdatasettoshowtheaccuracy improve-mentofthemethod.

Shambour and Lu (2012) developedarecommendationmethod tosolve the sparsityandcold-start issues. Theyincorporated ad-ditionalinformationfromthe users’social trust network andthe items’semantic domainknowledgeforimprovingthe recommen-dation accuracy and coverage. They used Yahoo! Webscope R4 andMovieLensdatasetsfortheirexperiments.Theresultsoftheir study showed that the method is effective in solving the spar-sity and cold-start issues. Tsai and Hung (2009) used clustering ensembles for CF and content based recommendation methods. They used k-means and Self-Organizing Maps (SOM) for cluster-ing task.Theyused MovieLens datasetfortheir experiments and showedthat theensemblesofclusteringmethodsoutperform the recommendationmethods which do not rely on ensemble learn-ing. Wu, Chang, and Liu (2014) developeda hybridapproachthat combinescontent-basedapproach withCF underauniﬁed model calledCo-ClusteringwithAugmentedMatrices(CCAM).They eval-uated the method on two MovieLens datasets (100 k and 1M datasets). They showed that content-based information can help reducethesparsityproblemthroughminimizingthemutual infor-mationlossofthethreedatamatricesbasedonCCAM.

Lee, Chun, Shim, and Lee (2006) developed anontology-based productrecommendersystemforBusiness-to-Business(B2B) mar-ketplaces. Their method was keyword-based and independent of the underlying physical structure of product ontology. Speciﬁ-cally,their methodwasbased oncontent-based recommendation techniquewhich represented product data in ontological graphs. Zhuhadar et al. (2009) developed a multi-model ontology-based frameworkforsemanticsearchofeducationalcontentine-learning context.Theycombinedthecontent-basedwiththerule-based ap-proaches to provide the user the hybrid recommendations. They evaluatedthemethodusingTop-NprecisionandTop-Nrecall met-rics.

(4)

system, SigTur/E-Destination, to provide personalized recommen-dationsoftouristicactivitiesintheregionofTarragona.

Hawalah and Fasli (2014) utilized contextual ontological user proﬁles for personalized recommendations. Also, they developed a method to compute semantic relatedness between concepts in rich and complex ontological structures. They conducted a user-centered study to assess the effectiveness of the recommenda-tions by the method and used precision metric for the method evaluation. Martinez-Cruz, Porcel, Bernabé-Moreno, and Herrera- Viedma (2015) developedarecommendersystembyincorporating ontologies to improvethe representation of userproﬁles. Specif-ically, they usedfuzzy linguisticmodeling to facilitate the repre-sentationofdifferentconcepts.Theproposedmethodcouldableto revealtherelationshipsbetweenusersandtheirpreferencesabout theitemsbyincorporatingdomainontologytothesystem.The au-thors used Mean Absolute Error (MAE) and coverage metrics to evaluate the method. Al-Hassan, Lu, and Lu (2015) used seman-tic knowledge of items to enhance the recommendation quality. Accordingly, they developed a hybrid semantic enhanced recom-mendation method by combining the Inferential Ontology-based SemanticSimilarity(IOBSS)measureandthestandarditem-based CFapproach.TheyevaluatedthemethodonAustraliantourism ser-vicesusingMAEmetricwithten-foldcrossvalidationtechnique.

Lv, Hu, and Chen (2016) developed a recommendationsystem basedon relational data usingthe relational data in thedomain ontology. They used genetic algorithm for recommendation pro-cess.Theyused MovieLensdatasetforthe methodevaluation us-ing recall, diversity and precision metrics. Pham, Jung, Nguyen, and Kim (2016) proposed an ontology-based multilingual recom-mendationsystem for themovie domain. The aim of their work wastodiscovertherelationshipsamongmultilingualconcepts for searching on a movie domain and ontological user preferences. Celdrán, Pérez, Clemente, and Pérez (2016) developedahybrid rec-ommendersystemthat combinedcontent-based,CF,and context-awareapproaches.Inaddition,theyusedsemanticwebtechniques tomodeltheinformationoftherecommendersystem.Specifically, the recommender ontology wasdefined with the Ontology Web Language 2 (OWL 2). They evaluated the method on MovieLens dataset using F1, recall and precision metrics. Moreno, Segrera, López, Muñoz, and Sánchez (2016) proposedacompleteframework todealjointlywiththescalability,sparsity,firstraterandcoldstart problemswithcombiningweb mining methodsanddomain spe-cificontologies.Theyevaluatedthemethodon MovieLensdataset usingprecisionmetric.

Bassiliades, Symeonidis, Meditskos, Kontopoulos, Gouvas, and Vlahavas (2017) developed recommendation algorithm using on-tology for providing the application developer with recommen-dations about the best matching Cloud Platform as a Service (PaaS) offering. The results of their work demonstrated that the methodwas effective insolving scalability issue. Tarus, Niu, and Yousif (2017) proposed a hybrid knowledge-based recommender system based on ontology and Sequential Pattern Mining (SPM) for recommendation of e-learning resources to learners. The re-searcher used SPM to discover the learners’ sequential learning patternsandontologytomodelandrepresentthedomain knowl-edge aboutthe learner andlearning resources. Kermany and Al- izadeh (2017) proposed a recommender system using Adaptive Neuro-Fuzzy Interference System (ANFIS) for multi-criteria rec-ommendersystems by incorporating demographic information of usersandontologicalitem-basedsemanticinformation.They eval-uatedthemethodusingthedatafromtheYahoo!Moviesplatform by F1, MAE and precision metrics. The results of their work re-vealedthattheuseofsemanticinformationenhancesthe predic-tiveaccuracyofmulti-criteriarecommendersystems.

4. Proposedhybridrecommendersystem

In Fig. 1 ,theproposed recommendersystemispresented.The proposed method aims to produce accurate and scalable recom-mendations. Twomain phasesare considered for the method.In the ﬁrst phase, the recommendation models are constructed. In this phase several tasks are performed which are clustering the rating,dimensionalityreductionusingSVDandproducingthe sim-ilaritiesmatricesoftheitemsandusers.Intheﬁrststep,wecluster theusers’ratingsonmoviesusingExpectationMaximization(EM) algorithm.Thenforeachclusterweprovidethesemanticsimilarity calculationmatricesfromthemovieontologyrepository.

Meanwhile,on eachclusterwe performSVDtoobtainthe de-compositionmatrices. Fromthefigure,itcanbeseenthatwe de-velop theSVD models forusers anditems. Hence, aftermatrices decompositiontask,thesimilaritiescalculationscanbeeffectively performedoneach matrix.In thesecond phase,after performing initialtrainsofthemodelsintheofflinephase,thepredictionand accordinglyrecommendationstasksareperformedforagivenuser (targetuser).Infact,arankedlistofitemsisprovidedtobe recom-mendedby therecommendersystemtothetargetuser.Todoso, thetargetuserisassignedtooneoftheclustersdeterminedinthe first phase.Then SVDcalculation isperformedbased onthe past ratingstofindthetargetusersimilaritiestotheotherusers (find-ing the neighborsofthe target user). Foritem-based recommen-dation,we alsoperformsameprocedurefortheitems.We finally combineuser-anditem-basedpredictionsinaweightedapproach aspresentedby Liu, Hu et al. (2014) .

Ontology. The movie ontology is constructed using the Movie Ontology (MO) which can be accessed through (http://www. movieontology.org/ )(see Fig. 2 ).MOsemanticallydescribesmovie related concepts. In addition, we used MO to establish a corre-spondence betweenclasses in the ontology anddatabase genres. Movie’genreisamulti-valuedattribute whereasorigincountryis amono-valuedattribute(Ticha et al., 2012 ).TheURLofthemovie in IDMb (TheInternet Movie Database) is a unique key that can show every movie item in the system. By implementing a Web Crawler(see Fig. 3 ) andusing theseunique keys,the systemcan extract content information fromthe IMDb for each movie item. Thiscontentinformationissavedinthedatabasesothattheycan be usedforgeneratingontology-based metadata.Infact,the web crawler analyzes the IMDb web pages based on the predeﬁned featuresofeachmovieandextractsfeature-values.Eachextracted feature-valuebelongstoafeature.

Calculatingthesimilaritybetweenthetwoitemsbasedontheir semantic descriptions is an important task. In this research, we usebinaryJaccardsimilaritycoeﬃcientfortheitem-based seman-tic similarity(Kermany & Alizadeh, 2017; Shambour & Lu, 2012 ). To do so, foran item taxonomy

ℵ

with m categories that items may fall into, we consider each item as a binary vector (_E₌

(

ex, 1,ex, 2,...,ex,m)whereabinaryvariableex,p(p=1,…,m)is

de-ﬁnedas:

ex,p=

1IfItem × belongtocategoryp 0IfItem × doesnotbelongtocategoryp

(1)

Then,thesemanticsimilarityofmoviesxandyispresentedas follow(Kermany & Alizadeh, 2017; Shambour & Lu, 2012 ):

SemSim

(

x,y

)

= K11 K01+K10+K11

(2)

In Eq. (2) ,K01,K10andK11 respectivelyindicatethetotal num-ber of genres for(ex,j=0; ey,j=1),(ex,j=1; ey,j=0) and (ex,j=1;

ey,j=1).

(5)

Fig.1. SystemFramework.

clusters. In thisstudy, we useEM clusteringalgorithm. EM algo-rithm, proposed by Dempster, Laird, and Rubin (1977) , has been widely used in the prior studies as an unsupervised learning method.In Algorithm 1 ,theEMalgorithm ispresented.Fromthe EM algorithm,itcan be seenthat EMis mainlydivided intotwo main steps which are E-step and M-step. In the E-step of EM

(6)

Fig.2. Overviewofthemovieontologydomain.

addition,E-stepandM-stepneedtoberepeateduntilconvergence ismetwhichinEMthiscanbe performedbasedon convergence oftheparametersorontheloglikelihoodfunction.

A. UserClustering

Inuserclustering, usersare clusteredbasedonsimilar prefer-encesaccordingtotheir rating.After creatingtheclusters,the aggregationofopinionsineach clusteris usedtoperformthe predictiontaskforthetargetuser.Thus,itresultsinimproving performancesincetheclusterthatshouldbeanalyzedincludes much fewerusers comparedtothe numberofall users(since the sizeofthe group thatmust be analyzedis muchsmaller) (Gong, 2010; Sarwar et al., 2002 ).

In Fig. 4 a,mindicates thenumberofall users,aij isthe

aver-ageratingofuserclustercenterigiventoitemj,Rijdeﬁnesthe

ratingthathasbeenprovidedbyuseri foritemj,andnandc

respectivelydenotethenumberofallitemsandthenumberof usercenters.

B. ItemClustering

In item clustering, items are clustered based on similar ratings provided by users. After creating the clusters, the aggregation of opinions of the other items in any clusters is usedforpredictiontaskforthetarget item.Thus, itresultsin improving performance since the cluster that should be ana-lyzed includes muchfewer items comparedto the numberof allitems(Gong, 2010 ).

In Fig. 4 b,mdenotes thenumberofallusers,aijistheaverage

(7)

Fig.3. Constructingontologies.

Algorithm1EMAlgorithm.

Variables

z isanunknownhiddenvariable. μjmeansofthemodel. πjdistributionfunction.

jvariancesofthemodel. {x i}Ni=1datasetwith N datapoints. z arandomvariable.

p(x |z = j)∼N(μj,σjI j)aGaussiandistribution. l (θ, D )likelihoodfunction.

StepsofEM

Initialize:Initializemeansandvariancesofthemodel{μ(0) j ,

(0) j ,π(

0) j }.

Step1.Expectation:Usingtheestimatesofθ(t)₌_{_μ(t) j ,

(t) j ,π(

t)

j },parameterscomputetheestimateof w ij w (t)

i j = p(z = j|x i,θ(t))= π(t)

j p(x i|z i= j,θ(t))

k m=1π(

t)

mp(x i|z i=k,θ(t))

Step2.Maximization:Usingestimatesof w (i)

i j,updatetheestimatesofthemodelparameters

μ(t+1)

j =

N i=1w

(t)

i jx i

N i=1w

(t)

i j

σ(t+1)

j =

N i=1w

(t) i j||x i−μi||2

N i=1w (

t)

i j

π(t+1)

i =

1

N N

i=1 w (t)

i j

Step3.Checkforconvergence:Thiscanbeperformedbasedonconvergenceoftheparametersorontheloglikelihoodfunction.Iftheconvergencecriterionisnot satisﬁedreturntoStep1.

allitems,Rij indicatestheratingofuseritoitemj,andkisthe

numberofitemcenters.

Singular Value Decomposition. SVD (Golub & Reinsch, 1970 ) has been one of the robust data dimensionality reduction tech-niquesinrealmatricesA_∈_Rn×n _and_a_powerful_computation_tool

forsolvingdataanalysisproblemsinnumericallinearalgebra. Us-ing SVDamatrixA ∈RN×M _with_the_rank_of_r_≤ _min₍_N,_M_),_can

bedecomposedas:A=U

VT_where,

U=

⎡

⎣

u

1,u2,...,u

r

Ur

,ur+1,...,uN

Ur

⎤

⎦

=[Ur

|

Ur] (3)

V=

⎡

⎣

v

1,

v

2,...,

v

r

Vr

,

v

r+1,...,

v

M

Vr

⎤

⎦

=[Vr

|

Vr] (4)

A=[Ur

|

Ur]

r 0

0 0

VT r

=Ur

rVrT (5)

A=

r

i=1

σi

uiuTi =u1

v

T1

σ1

+u2

v

T2

σ2

+u3

v

T3

σ3

+u4

v

T4

σ4

+...+ur

v

rT

σ

r, rank

(

A

)

=r

B=

k

i=1

σi

uiuTi =u1

v

T1

σ1

+u2

v

T2

σ2

+u3

v

T3

σ3

+u4

v

T4

σ4

+...+uk

v

kT

σ

k rank

(

B

)

=k

A−B=U

AVT−U

BVT=UT[

A−

B]VT→

||

A−B

||

=

σ

k+1

(6)

Deﬁnition 1. Let A=U

VT _be _a _SVD _of _A₌ _Rm×n _and

(8)

Fig.4. Forming(a)UserClustersinCFBasedand(b)ItemClustersinCFBased.

Fig.5. IllustratingthebasicSVDtheorem.

A−A_k_F=min_rank₍_B₎_≤_kA−B_k_F=r

j= k+1σ2j, whereXF=

i,j|xi j|2isthe Frobenius-normofamatrixX.

According to thenature ofSVDandits linearity,it is possible that we apply it on a matrix which has two dimensions. Fig. 5 showsthe generalprocedureofSVDfordimensionalityreduction inuser-itemmatrixinCFthatAimpliestheratingmatrixofuser toitems,Ureferstouserconceptsmatrix,Sindicatessingular val-uesandVT _that_is_a_reprehensive_of_item_concepts._Therefore, us-ingSVDalgorithm,itis possibletoconvertagivenmatrixA into A=USVT_.

Therefore,thematrixAcanbedecomposedwithrankrandby consideringthematrixAwithrankk,itcanbe obtainedthe ma-trixBwhichgivestheapproximation ofAbasedon deﬁnedk.In addition,inCFtheuser-itemmatrixforexamplein Table 1 canbe reducedintwo dimensionsasshownin Fig. 6 to getlatent rela-tionshipsbetween its objects. In the following example, the rat-ingsare presentedin Table 1 . Thus,U,VandS canbe calculated

Table1

User-itemmatrix.

Item1 Item2 Item3 Item4

User1 5 5 0 5

User2 5 0 3 4

User3 3 4 0 3

User4 0 0 5 3

User5 5 4 4 5

User6 5 4 5 5

asfollows:

U =

⎛

⎜

⎝

0.45 0.54 0.01 0.50 −0.50 0.11 0.36 −0.25 −0.86 0.15 0.21 0.06 0.29 0.40 0.23 0.10 0.83 0.08 0.21 −0.67 0.40 0.59 0.07 0.02 0.51 −0.06 0.11 −0.29 −0.07 −0.80 0.53 −0.19 0.19 −0.53 −0.14 0.58

⎞

⎟

⎠

....

V =

⎛

⎜

⎝

0.57 0.22 −0.67 −0.41 0.43 0.52 0.69 −0.26 0.38 −0.82 0.25 −0.33 0.59 −0.05 −0.01 0.81

(9)

Fig.6. Two-dimensionalspaceofapplyingSVDforusersanditems.

Fig.7. Two-dimensionalspaceofapplyingSVDforusers,itemsandnewuser.

S=

⎛

⎜

⎝

17.71 0.00 0.00 0.00 0.00 6.39 0.00 0.00 0.00 0.00 3.10 0.00 0.00 0.00 0.00 1.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

⎞

⎟

⎠

Forabetterunderstanding ofthisdecomposition,theresultof applyingSVDonuserratingmatrix isprojectedin2-dimensional spaceandplottedin Fig. 6 .The ﬁrstcolumnandthesecond col-umnofUistreatedasxandy,respectively.Inaddition,formatrix V the ﬁrstcolumnand thesecond column istreatedas xandy, respectively.Todothis,onlytwosingularvalues17.71and6.39of matrixSareselectedasshownin Fig. 6 .

In Fig. 7 ,itisclearthattheusers5and6anditems1and4are locatedveryclosetoeachother.Therefore,SVDfordimensionality reduction canbe applied effectivelytoreveal theusers thathave similartasteandformneighborsforitemsandusers.Furthermore, this decomposition ofrating matrix signiﬁcant reduces the com-putational complexityfor users anditemssimilarities calculation astheﬁrstcolumnsofmatricesareconsidered.

Forgivingsimilarusersandrecommendationtothenewuser, considernewusershareshis/herratingas([5,3,3,2]foritems1–4). Forﬁndingthepositionofnewuserinthe2dimensionsspace,the followingcalculationisperformedas:

Newuser2D=NewuserT×V2×S2−1

Newuser2D=

⎡

⎢

⎣

5 5 3 2

⎤

⎥

⎦

×

⎡

⎢

⎣

0.57 0.22 0.43 0.52 0.38 −0.82 0.59 −0.05

⎤

⎥

⎦

×

17.71 0.00 0.00 6.39

−

1

→Newuser2D=[0.4132 0.1784]

Algorithm2SVDinthepredictiontask.

Step1:Theuser-itemmatrixR m,n_with_raw_data_is_converted_to_the_dense matrixBm,n_.

Step2:MatrixBm,n_is_normalized_using_Z-score_to_the_matrix_Zm,n_by

Zi j= B i j−B ¯j

σj ,

whereB ¯j_and_σ

jindicateaveragevalueandStandardDeviation(SD)forthe ratingsinthe B j_,_{respectively,}_that

¯

B j₌ 1 m

m

i=1B i j,σ 2 j=

1

m −1 m

i=1 (B i j−B ¯j)2

Step3.TheSVDmethodisappliedonZ.

Step4.AnapproximationofZiscalculatedasZd.

Step5.P ijiscalculatedbasedonB ¯j+σj(Zd)i j.

Ascanbeseenin Fig. 7 ,bycalculationcosine-basedsimilarity, itcanfoundtheusersclosetothenewuserforformingk-nearest neighbors.Thismethodcanbeappliedforitemsandwecanform thesimilaritems(item-basedsimilarity).

The SVD approach approximates the missing rating values basedonthematrixfactorization

(

rˆ=

(

UkSk1/ 2

)

u.

(

Sk1/ 2Vk

)

i

))

.The

followingsteps aredone by an exampleto estimatean unknown rating to the active user. Let Y=

{

ai j

}

∈Rm,n be the user-item

matrix contains the ratings users U={u1,u2,…, um} to the items

I₌{i1,i2,…,in}.Thegoalistopredictunknownratingsinthis

ma-trix. The algorithm for this taskis presented in Algorithm 2 . As an example, let we have the rating matrix R, thus p23 can be calculatedusingthe Algorithm 2 :

I1 I2 I3 I4 I5 I6

R = U1 U2 U3 U4

⎛

⎜

⎝

2 3 5 ? 1 4

? 3 ? 4 ? 3

3 5 ? 2 4 ?

3 3 4 ? 3 ?

⎞

⎟

⎠

, p23=?

(

CalculatingB

i

)

Step2 −−−→

⎛

⎜

⎝

B1 B2 B3 B4 B5 B6

⎞

⎟

⎠

=

⎛

⎜

⎝

2 3.5 2.25 1.5 2 1.75

⎞

⎟

⎠

,

(

Calculating

σ

i

)

Step2 −−−→

⎛

⎜

⎝

σ

1

σ2

σ

3

σ

4

σ5

σ6

⎞

⎟

⎠

=

⎛

⎜

⎝

1.41 1.00 2.66 1.91 1.83 2.06

⎞

⎟

⎠

Step3(Z−scores)

−−−−−−−−−→Z4×6

=

⎛

⎜

⎝

0.00 −0.50 1.04 −0.79 −0.55 1.09

−1.42 −0.50 −0.85 1.31 −1.10 0.61

0.71 1.50 −0.85 0.26 1.10 −0.85 0.71 −2.00 0.66 −0.79 0.55 −0.85

⎞

⎟

⎠

Step4(Zd,d=2)

−−−−−−−−→

⎛

⎜

⎝

0.05 0.45

−0.05 −0.71

−0.89 −0.18

−0.45 0.50

⎞

⎟

⎠

×

3.17 0.00 0.00 2.86

×

⎛

⎜

⎝

−0.32 0.31

−0.67 −0.05

0.32 0.72 0.09 −0.58

−0.38 0.20

(10)

Fig.8. Throughputofallmethods.

=

⎛

⎜

⎝

−0.17 −1.12 0.63 −0.37 −0.45 0.39

−1.35 0.01 −0.47 0.99 −1.24 1.02

0.68 1.60 −0.75 0.21 1.04 −0.87 0.81 −1.37 1.12 −1.20 0.39 −0.30

⎞

⎟

⎠

Step5

−−−→p23=B¯3+

σ

3

(

Zd

)

23≈1

5. Experimentalresults

In this section, the proposed recommender system is evalu-atedontworeal-worddatasets.Accordingly,wepresenttheresults and compare our method with state-of-the-art recommendation methods.

5.1.Datasetdescription

Toevaluatetheproposedmethod,twodatasets,MovieLensand Yahoo! Webscope R4, were considered. The data were cleaned prior to use in the evaluation process. The descriptions of the datasetsareasfollows:

MovieLens dataset:Thisdataset(http://www.movieLens.org )is oneofthewell-knownmoviedatasetsthathasbeenusedforthe evaluation of recommender systems. The numbers of users and moviesinthe Movielensdatasetare6040 and3952, respectively. Inthisdataset,the usershaveprovided ratingson a 5-starscale. Weselecttheusersinthedatasetwho haveprovidedatleast20 ratings. Hence, based on the number of users and movies, this datasetincludes1000,209anonymousratings.

Yahoo! Webscope R4 dataset: This dataset (http://webscope. sandbox.yahoo.com )wasprovidedbytheYahoo!ResearchAlliance Webscopeprogram.Inthisdataset,theusershaveprovidedratings ona5-starscale(1 to5).Thisdatasetis dividedintotwosets of data,atrainingset andatest set. Thetrainingset includes7642 users,11,915movies and211,231 ratings.The testingsetincludes 2309users,2380moviesand10,136ratings.

In this study, the data collection of items (movies) content is made from the IMDb (http://imdb.com ). To do so, we use a Web crawler WebSPHINX available in (http://cs.cmu.edu/ ∼rcm/ websphinx/ ) whichhasbeendevelopedby RobMiller atCarnegie MellonUniversity.Thecollecteddataisusedforconstructingand completingitemontology.Furthermore,fortestingthemodelthe datasetwassplitintotwogroupsastrainingandtestingsets.We randomlyselected80%ofdataforthetrainingsetand20%restof datafortestingset.

5.2. Evaluatingtherecommendersystem

Theproposedrecommendersystemweredevelopedusing MAT-LAB7.10(R2010a)undera 4GHzprocessorPC,8GBRAM and 32-bit Microsoft Windows 7. The proposed algorithm is compared andevaluated with CF recommendationengine that employs the Pearsonnearestneighboralgorithm,item-basedpredictionmethod withclustering,SVDandontology,anduser-anditem-based pre-dictionmethodswithclusteringandSVDbutwithnocontribution ofontologyfromtwoperspectivesincludingtimethroughput (rec-ommendationpersecond)andaccuracy.

Throughput.Toshowtheeffectivenessoftheproposedmethod in improving the scalability issue, we evaluate our method on MovieLens and Yahoo! Webscope R4 datasets for throughput which is deﬁned asthe number of recommendationper second. In Figs. 8 a-b we present the performance results of our experi-ments forall methods.The throughputofthemethods isplotted as a function ofthe cluster size. We use EM for those methods basedontheclustering.We alsoconsiderdifferentclusteringsize forthe methods. From theplots we can seethat the throughput of those methods that use clustering and dimensionality reduc-tiontechniquesissubstantiallyhigherothermethods.Inaddition, the method which uses the ontology,EM andSVD has a higher throughputthanthemethodswhich solelyrelyonnearest neigh-bor algorithm for all datasets. This is due to the fact that with the use of clusteringa fractionof neighbors is used by the rec-ommendationalgorithms.Inaddition,fromtheseﬁguresitcanbe found that withtheincrease ofclustering sizethe throughputof themethodsareincreased,howeverasthenearestneighbor algo-rithmhastoscanthroughalltheneighbors,thenumberofclusters hasnoimpactonitsthroughput.

Predictive accuracy. With statistical metrics, for example, the MAE between the predicted and the actual ratings is measured. In contrast, decision-support metrics compare the recommended itemswiththerelevantones,e.g. bycountingtheoverlap.MAEis presentedin Eq. (7) .

MAE

(

pred,act

)

=

N

i=1

predu,i−actu,i

N

(7)

whereNisthenumberofitemsonwhichauseruhasexpressed anopinion.

(11)

on-Fig.9. MAEforallmethodsondifferentsizeofneighbors.

Table2

F1andprecisionforallmethodondifferentnumbersofTop-N(MovieLensdataset).

Top-N MethodA MethodB MethodC MethodD

Precision F1-metric Precision F1-metric Precision F1-metric Precision F1-metric

Top-5 0.787 0.797 0.771 0.773 0.719 0.721 0.564 0.583

Top-10 0.796 0.807 0.782 0.784 0.736 0.739 0.582 0.601

Top-15 0.816 0.827 0.802 0.804 0.747 0.749 0.592 0.615

Top-20 0.821 0.833 0.809 0.811 0.757 0.760 0.601 0.622

Top-25 0.833 0.844 0.823 0.825 0.769 0.770 0.628 0.650

Top-30 0.831 0.840 0.819 0.821 0.757 0.762 0.603 0.605

Top-35 0.823 0.832 0.806 0.808 0.750 0.751 0.581 0.590

Top-40 0.819 0.830 0.801 0.803 0.739 0.741 0.573 0.579

Top-45 0.818 0.827 0.793 0.795 0.733 0.732 0.556 0.558

Top-50 0.813 0.822 0.783 0.785 0.723 0.722 0.541 0.546

Method A=User- and Item-based+SVD+EM+Ontology, Method B=Item-based+SVD+EM+Ontology, Method C=User- and Item-

based +SVD +EM, MethodD=Nearest Neighbor .

tology, and user- and item-based prediction methods with clus-tering and SVD but withno contribution of ontology. Similar to previous studies (Liu, Hu et al., 2014; Koren, 2008 ), we consider differentnumbers ofneighbors(k) forthisevaluation(k=10, 20, 30, 40, 50,60, 70, 80,90 and 100). Figs. 9 a-b show the predic-tion accuracy for different neighborhood size k on datasets. For MAE, it can be found that in the considered neighbor sizes, our methodwhichSVDandontology helptoimproveremarkablythe prediction accuracy comparedwith thePearson nearest neighbor algorithm. When compared to Item-based+SVD+EM+Ontology

method, user- and Item-based+SVD+EM+Ontology method also worksbetterbutthereisasmalldifferencebetweenthem.The su-periority ofuser-andItem-based+SVD+EM+Ontologycanbe ex-plainedbythefactthatthismethodforpredicationusesontology for the item-based CF part. Although the prediction accuracy of

Item-based+SVD+EM+Ontologyis slightly better than User- and

Item-based+SVD+EM, howeverfromitsthrough result, itcan be

seen that this method throughput is lower than User- and

Item-based+SVD+EMasitdoesnotuseSVD.

Decision-support accuracy metrics. Concerning the accuracy measures, in particular thedecision-support metrics will play an important role for the multi-criteria recommender evaluations. Many metrics for this purpose are well known from the infor-mation retrieval area andwill be discussedin the following.The precision Eq. (8) measures the portion ofitemsthat are relevant within thereceivedresult.Incontrast,therecall Eq. (9) measures theportion ofrelevant itemsthat havebeenretrieved.Both met-ricsshouldbeusedincommon, aswithincreasingtheamountof retrieveditems,therecallincreases,whereastheprecisionusually

dropswithlargerresultsizes.

Precision= TR

TR+FR (8)

Recall= TR

TR+FN (9)

whereFNisthenumberoffalsenon-relevantpredictions,TRisthe numberoftruerelevantpredictionsandFRisthenumberoffalse relevantpredictions.

A metric that considers both values is the F-measure (Tsai & Hung, 2012 ) (see Eq. (10) ), which calculates the mean ofthe re-call andthe precision.

β

can be used toweight the inﬂuence of oneofboth,where

β

>1increasestheimportanceoftheprecision and

β

<1,ontheopposite,raisestheinﬂuenceoftherecall.Fora balancedF-measure,

β

=1isused.

F1=

(

1

_β

+₂

β

2

)

.precision.recall

.precision+recall (10)

(12)

Table3

F1andprecisionforallmethodondifferentnumbersofTop-N(Yahoo!WebscopeR4dataset).

Top-N MethodA MethodB MethodC MethodD

Precision F1-metric Precision F1-metric Precision F1-metric Precision F1-metric

Top-5 0.757 0.767 0.714 0.727 0.669 0.694 0.494 0.516

Top-10 0.766 0.777 0.737 0.757 0.683 0.707 0.517 0.534

Top-15 0.783 0.792 0.74 0.764 0.688 0.709 0.528 0.549

Top-20 0.786 0.797 0.747 0.771 0.692 0.711 0.536 0.557

Top-25 0.788 0.797 0.749 0.776 0.697 0.714 0.542 0.565

Top-30 0.789 0.801 0.757 0.779 0.704 0.724 0.551 0.571

Top-35 0.791 0.803 0.761 0.785 0.715 0.727 0.564 0.582

Top-40 0.799 0.805 0.764 0.791 0.721 0.734 0.571 0.594

Top-45 0.809 0.814 0.772 0.793 0.727 0.744 0.585 0.608

Top-50 0.815 0.821 0.784 0.798 0.739 0.762 0.617 0.631

Method A=User- and Item-based +SVD +EM +Ontology, Method B=Item-based +SVD +EM +Ontology , Method C=User- and Item- based +SVD +EM, MethodD=Nearest Neighbor .

Fig.10. MAEanddifferentdatasparsitylevels.

whichuse ontology work better than the nearest neighbor algo-rithm.Thesuperiorityofourmethodcanbeexplainedbythefact that in our method we use ontology in the item-based CF part. These results are suﬃcient to support our claim that method is reasonablyscalableandaccurateinrelationstothenearest neigh-boralgorithm.

We also evaluate the method on the different lev-els of sparsity and calculated the average MAE. The spar-sity level of the MovieLens dataset is 93.7% (sparsity level=1−(100,000/(943×1682))=0.937). In addition, the spar-sity level of the Yahoo! Webscope R4 dataset is 99.8% (sparsity level=1−(211,231/(7642×11,915))=0.9976).Accordingly, we cre-ate six datasets with different sparsity levels for the MovieLens and Yahoo! Webscope R4 datasets (i.e., 99.5%, 99%, 98.5%, 98%, 97.5%and97%).We applythe methodonthedatasetswiththese sparsity levels and compare the results with the other recom-mendationalgorithms. As canbe seen from Figs. 10 a-b,the MAE valuesoftwomethodsUser-andItem-based+SVD+EM+Ontology

and Item-based+SVD+EM+Ontology for all sparsity levels of

datasetarelower thanUser-andItem-based₊SVD₊EMand

Near-est Neighbor. In addition, from these ﬁgures it can be seen that

theincreasing ratioof theMAE forNearestNeighbor isvery high compared to the other methods. The results also reveal that the methods which use ontology have better prediction accuracy compared to the other methods for the dataset that is sparser. Thisisbecauseofthefactthat themethodswhichusedontology aremoreeffectiveinsolvingthesparsityissueandaccordinglyare moreaccurate.

Table4

TheMAEresultsofmethods.

Method MAE

User-andItem-based+SVD+EM+Ontology 0.6342±0.0061

User-basedCF 0.8080±0.0095

Item-basedCF 0.8370±0.0083

CCAM 0.7520±0.0053

Inorderto compareourworkwiththemethods developedin theliterature,weevaluatedourmethodontheactualsparsitylevel oftheMovieLensdataset(sparsitylevel=93.7%)usingMAEmetric. The resultsare presentedin Table 4 .We reportthe averageMAE ofUser- andItem-based+SVD+EM+Ontology, CCAM(Wu et al., 2014 ),User-basedCF(Sarwar et al., 2001 )andItem-basedCF (Sar-waretal., 2001) methods.From theresultspresented in Table 3 , we can see that the proposed method which uses ontology and dimensionalityreduction techniques help to improvethe MAEof recommendationover theCCAM (Wu et al., 2014 ), User-based CF (Sarwar et al., 2001 )andItem-basedCF(Sarwar et al., 2001 ) meth-ods.

6. Discussion

(13)

datasets,Yahoo!Webscope R4datasetandMovieLens,wereused. TheresultsprovidedbyMAE,Recall,PrecisionandF1showedthat the use of ontology with the aid of clustering and dimension-ality reduction techniqueswas effective inimproving the perfor-manceoftheCF recommendersystems.Theresults ofour analy-sisdemonstratedthat thehybridrecommendationmethodcanbe used to solve the scalabilityand sparsityissues of recommender systems.

With regard to the scalability issue, the proposed method enhanced thescalabilityoftheCF recommendersystemsthrough throughput which is deﬁned as the number of recommendation per second. The results showed that the throughput of those methods that use clustering and dimensionality reduction tech-niques is substantially higher other methods. In addition, we foundthat themethodwhichusestheontology,EMandSVDhas a higherthroughputthan themethodswhichsolelyrelyon near-est neighbor algorithm. With regard to sparsityissue,the hybrid methodoutperformedthenearestneighbormethodinalldatasets. Inaddition, theresultsalsorevealedthat themethodswhichuse ontology have better prediction accuracy compared to the other methods for thedataset that is sparser. The improvement inthe accuracy of the recommendations by the proposed method is because the recommendation method uses semantic similarity relationsforitemsinitem-basedCF.Theuseofsemanticsimilarity accordinglyimprovestherecommendationaccuracyofitem-based CFinthehybridmethod.

Overall, it is worth mentioning that the proposed hybrid methodisasigniﬁcantimprovementwithrespecttothe through-put, predictionand recommendationaccuracy. This demonstrates itseffectivenessinalleviatingthesparsityandimprovingthe scal-ability of CF recommender systems. Because in recommendation systems the trade-off between the computation time (improving the scalability) and the accuracy (alleviating the sparsity) is im-portant, our method can be a promising andeffective intelligent systemformovierecommendation.

7. Conclusions

The presentstudyproposed a recommendation methodbased on CF usingontology anddimensionalityreduction techniquesto improvethe sparsityandscalabilityproblems inCF. Weanalyzed the predictive accuracy and time complexity (scalability) of pro-posedmethodonreal-worlddatasetsinthedomainofmovie rec-ommendationprovidedbyMovielensandYahoo!ResearchAlliance Webscope program. The proposed method was evaluated using precision, MAE and F1 metrics to be comparable with the algo-rithms in previous studies. Our experiments conﬁrmed that the proposed method hasimprovedboth thepredictive accuracy and throughputofmovierecommendations.

Inthisstudy,we haveusedsolelyEMclusteringforclustering andnon-incrementalSVDfordimensionalityreductiontasksinthe proposed method.The useofincremental SVDmayhelpthe rec-ommender systemto provide recommendationswithgood scala-bility inrelationto the non-incrementalSVD. In addition,in this studytheproposed methodhasbeenevaluated onmovie recom-mendationdomain.Hence,inourfutureworkweplantoconsider other clusteringmethodsespeciallyclusteringensemblesmethods to be incorporated into the proposed recommendation method. Furthermore,inourfuturestudies wewill extendthemethodby incorporatingtheincrementalSVDintothepredictionsmodelsfor improvingthescalabilityissueofCF.Moreover,inourfuturework we planto furtherimprovethe proposedmethod andevaluateit using additional metrics such as diversity and novelty on other typesofdatasets.

References

Adomavicius,G.,&Tuzhilin,A.(2005).Towardthenextgenerationofrecommender systems:A surveyofthestate-of-the-art andpossibleextensions. Knowledge andDataEngineering,IEEETransactionson,17(6),734–749.

Al-Hassan,M., Lu,H., &Lu,J.(2015).Asemanticenhancedhybrid recommenda-tionapproach:Acasestudyofe-Governmenttourismservicerecommendation system.DecisionSupportSystems,72,97–109.

Antoniou,G.,&VanHarmelen,F.(2004).SemanticWebPrimer.Cambridge,MA,USA: TheMITPress.

Armstrong,J.S.(2001).Principlesofforecasting:ahandbookforresearchersand prac-titioners:Vol.30.Boston:KluwerAcademicPublishers.

Bassiliades,N.,Symeonidis,M.,Meditskos,G., Kontopoulos,E.,Gouvas,P.,& Vla-havas,I.(2017).AsemanticrecommendationalgorithmforthePaaSport plat-form-as-a-servicemarketplace.ExpertSystemswithApplications,67,203–227. Breese,J.S.,Heckerman,D.,&Kadie,C.(1998).Empiricalanalysisofpredictive

al-gorithmsforcollaborativeﬁltering.InPaperpresentedattheProceedingsofthe FourteenthconferenceonUncertaintyinartiﬁcialintelligence.

Celdrán,A.H.,Pérez,M.G.,Clemente,F.J.G.,&Pérez,G.M.(2016).Designofa recommendersystembasedonusers’behaviorandcollaborativelocationand tracking.JournalofComputationalScience,12,83–94.

Cheng,L.-C.,&Wang,H.-A.(2014).Afuzzyrecommendersystembasedonthe in-tegrationofsubjectivepreferencesandobjectiveinformation.AppliedSoft Com-puting,18,290–301.

Daramola,O.,Adigun,M.,&Ayo,C.(2009).Buildinganontology-basedframework fortourismrecommendationservices.Informationandcommunication technolo-giesintourism,2009,135–147.

DeCampos,L.M.,Fernández-Luna,J.M.,Huete,J.F.,&Rueda-Morales,M.A.(2010). Combining content-based and collaborative recommendations: A hybrid ap-proachbasedonBayesiannetworks.InternationalJournalofApproximate Rea-soning,51(7),785–799.

Dempster,A.P.,Laird,N.M.,&Rubin,D.B.(1977).Maximumlikelihoodfrom incom-pletedataviatheEMalgorithm.JournaloftheRoyalStatisticalSociety.SeriesB (methodological),39,1–38.

Fan,J.,Pan,W.,&Jiang,L.(2014January).Animprovedcollaborativeﬁltering al-gorithmcombiningcontent-basedalgorithmanduseractivity.InBigDataand SmartComputing(BIGCOMP),2014InternationalConferenceon(pp.88–91). Goldberg,K.,Roeder,T.,Gupta,D.,&Perkins,C.(2001).Eigentaste:Aconstanttime

collaborativeﬁlteringalgorithm.InformationRetrieval,4(2),133–151.

Golub,G.H.,&Reinsch,C.(1970).Singularvaluedecompositionandleastsquares solutions.Numerischemathematik,14(5),403–420.

Gong,S.(2010).Acollaborativeﬁlteringrecommendationalgorithmbasedonuser clusteringanditemclustering.JSW,5(7),745–752.

Gruber,T.R.(1992).Ontolingua:amechanismtosupportportableontologies:Vol.27. Stanford:StanfordUniversity,KnowledgeSystemsLaboratory.

Guarino,N.,Oberle,D.,&Staab,S.(2009).WhatisanOntology?InHandbookon ontologies(pp.1–17).BerlinHeidelberg:Springer.

Hawalah, A., & Fasli, M. (2014). Utilizing contextual ontological user proﬁles for personalized recommendations. Expert Systems with Applications, 41(10), 4777–4797.

He,Y.,Yang,S.,&Jiao,C.(2011).Ahybridcollaborativeﬁlteringrecommendation al-gorithmforsolvingthedatasparsity.PaperpresentedattheComputerScience andSociety(ISCCS),2011InternationalSymposiumon.

Hofmann,T.,&Puzicha,J.C.(2004).Systemandmethodforpersonalizedsearch, informationﬁltering,and forgeneratingrecommendationsutilizingstatistical latentclassmodels:GooglePatents.

Jugovac,M.,Jannach,D.,&Lerche,L.(2017).Eﬃcientoptimizationofmultiple rec-ommendation qualityfactorsaccordingto individual usertendencies. Expert SystemswithApplications,81,321–331.

Kabassi,K.(2010).Personalizingrecommendationsfortourists.Telematicsand Infor-matics,27(1),51–66.

Kermany,N.R.,&Alizadeh,S.H.(2017).Ahybridmulti-criteriarecommender sys-temusingontologyandneuro-fuzzytechniques.ElectronicCommerceResearch andApplications,50–64.

Koren,Y.(2008August).Factorizationmeetstheneighborhood:Amultifaceted col-laborativeﬁlteringmodel.InProceedingsofthe14thACMSIGKDDinternational conferenceonKnowledgediscoveryanddatamining(pp.426–434).

Koren,Y.,Bell,R.,&Volinsky,C.(2009).Matrixfactorizationtechniquesfor recom-mendersystems.Computer,42(8),30–37.

Kushwaha,N.,&Vyas,O.P.(2014October).SemMovieRec:Extractionofsemantic featuresofDBpediaforrecommendersystem.InProceedingsofthe7thACM In-diaComputingConference(p.p.13).

Lee,J.S.,&Olafsson,S.(2009).Two-waycooperative predictionforcollaborative ﬁlteringrecommendations.ExpertSystemswithApplications,36(3),5353–5361. Lee,T.,Chun,J.,Shim,J.,&Lee,S.G.(2006).Anontology-basedproduct

recom-mendersystemforB2B marketplaces.InternationalJournalofElectronic Com-merce,11(2),125–155.

Liao,I.,Hsu,W.,Chen,M.-S.,&Chen,L.(2010).Alibraryrecommendersystembased onapersonalontologymodelandcollaborativeﬁlteringtechniqueforEnglish collections.EmeraldGroupPublishing,28(3),386–400.

Liao,I.-E.,Liao,S.-C.,Kao,K.-F.,&Harn,I.-F.(2006).Apersonalontologymodelfor libraryrecommendationsystem,ICADL’06.InProceedingsoftheNinth Interna-tionalConferenceonAsianDigitalLibraries(pp.173–182).

(14)

Lilien,G.L.,Kotler,P.,&Moorthy,K.S.(1992).Marketingmodels.Prentice-Hall En-glewoodCliffs.

Linden,G.,Smith,B.,&York,J.(2003).InAmazon.comrecommendations:Item– to-itemcollaborativeﬁltering.InternetComputing:7(pp.76–80).IEEE. Liu,H.,Hu,Z.,Mian,A.,Tian,H.,&Zhu,X.(2014a).Anewusersimilaritymodel

toimprovetheaccuracyofcollaborativeﬁltering.Knowledge-BasedSystems,56, 156–166.

Liu,L.,Mehandjiev,N.,&Xu,D.L.(2011October).Multi-criteriaservice recommen-dationbasedonusercriteriapreferences.InProceedingsoftheﬁfthACM confer-enceonRecommendersystems(pp.77–84).

López-Nores,M.,Pazos-Arias,J.J.,García-Duque,J.,Blanco-Fernández,Y., Martín-Vi-cente,M.I.,Fernández-Vilas,A.,etal.(2010).MiSPOT:Dynamic product place-mentfordigitalTVthroughMPEG-4processingandsemanticreasoning. Knowl-edgeandInformationSystems,22(1),101–128.

Lv,G.,Hu,C.,&Chen,S.(2016).Researchonrecommendersystembasedon ontol-ogyandgeneticalgorithm.Neurocomputing,187,92–97.

Martinez-Cruz,C., Porcel,C., Bernabé-Moreno,J.,&Herrera-Viedma, E.(2015).A modeltorepresentuserstrustinrecommendersystemsusingontologiesand fuzzylinguisticmodeling.InformationSciences,311,102–118.

Martín-Vicente,M.I.,Gil-Solla,A.,Ramos-Cabrer,M.,Pazos-Arias,J.J., Blanco-Fer-nández,Y.,&López-Nores,M.(2014).Asemanticapproachtoimprove neigh-borhoodformationincollaborativerecommendersystems.ExpertSystemswith Applications,41(17),7776–7788.

Middleton,S.E.,DeRoure, D.,&Shadbolt,N. R.(2009).Ontology-based recom-mendersystems. InHandbookonontologies(pp.779–796).BerlinHeidelberg: Springer.

Moreno,A.,Valls,A.,Isern,D.,Marin,L.,&Borràs,J.(2013).Sigtur/e-destination: Ontology-basedpersonalizedrecommendationoftourismandleisureactivities. EngineeringApplicationsofArtiﬁcialIntelligence,26(1),633–651.

Moreno,M.N.,Segrera,S.,López,V.F.,Muñoz,M.D.,&Sánchez,Á.L.(2016).Web miningbasedframeworkforsolvingusualproblemsinrecommendersystems. Acasestudyformovies׳recommendation.Neurocomputing,176,72–80. Murthi,B.,&Sarkar,S.(2003).Theroleofthemanagementsciencesinresearchon

personalization.ManagementScience,49(10),1344–1362.

Nilashi,M.(2016).AnOverviewofDataMiningTechniquesinRecommender Sys-tems.JournalofSoftComputingandDecisionSupportSystems,3(6),16–44. Nilashi,M.,binIbrahim,O.,&Ithnin,N.(2014).Hybridrecommendationapproaches

formulti-criteriacollaborativeﬁltering.ExpertSystemswithApplications,41(8), 3879–3900.

Nilashi,M.,Jannach,D.,binIbrahim,O.,&Ithnin,N.(2015).Clustering-and regres-sion-basedmulti-criteriacollaborativeﬁlteringwithincrementalupdates. Infor-mationSciences,293,235–250.

Nilashi,M., Jannach,D., binIbrahim, O.,Esfahani, M. D., &Ahmadi,H.(2016a). Recommendationquality,transparency,and websitequalityfor trust-building inrecommendationagents.Electronic CommerceResearchandApplications,19, 70–84.

Pham,M.C.,Cao,Y.,Klamma,R.,&Jarke,M.(2011).Aclusteringapproachfor col-laborative ﬁlteringrecommendationusingsocial networkanalysis. Journalof UniversalComputerScience,17(4),583–604.

Pham,X.H.,Jung,J.J.,Nguyen,N.T.,&Kim,P.(2016).Ontology-basedmultilingual searchinrecommendationsystems.ActaPolytechnicaHungarica,13(2),195–207. Porcel, C., Martinez-Cruz, C., Bernabé-Moreno, J., Tejeda-Lorente, Á., & Her-rera-Viedma, E. (2015). Integrating ontologies and fuzzy logic to represent user-trustworthiness inrecommendersystems.ProcediaComputer Science,55, 603–612.

Powell,M. J.D. (1981).Approximationtheory andmethods.Cambridgeuniversity press.

Rich,E.(1979).Usermodelingviastereotypes.Cognitivescience,3(4),329–354. Sadaei,H.J.,Enayatifar,R.,Lee,M.H.,&Mahmud,M.(2016).Ahybridmodelbased

ondifferentialfuzzylogicrelationshipsandimperialistcompetitivealgorithm forstockmarketforecasting.AppliedSoftComputing,40,132–149.

Salton,G.(1989).Automatictextprocessing:Thetransformation,analysis,andretrieval of:.Addison-Wesley.

Sarwar,B.M.,Karypis,G.,Konstan,J.,&Riedl,J.(2002).Recommender systemsfor large-scalee-commerce:Scalableneighborhoodformationusingclustering.In PaperpresentedattheProceedingsoftheﬁfthinternationalconferenceon com-puterandinformationtechnolog.

Sarwar,B.,Karypis,G.,Konstan,J.,&Riedl,J.(2001,).Item-basedcollaborative ﬁlter-ingrecommendationalgorithms.InProceedingsofthe10thinternational confer-enceonWorldWideWeb(pp.285–295).

Schafer,J.B., Frankowski,D.,Herlocker,J.,&Sen,S.(2007).Collaborativeﬁltering recommendersystemstheadaptiveweb(pp.291–324).Springer.

Shambour,Q.,&Lu,J.(2012).Atrust-semanticfusion-basedrecommendation ap-proachfore-businessapplications.DecisionSupportSystems,54(1),768–780. Shinde,S.K.,&Kulkarni,U.(2012).Hybridpersonalizedrecommendersystemusing

centering-bunchingbasedclusteringalgorithm.ExpertSystemswithApplications, 39(1),1381–1387.

Staab,S.,&Studer,R.(2009).Handbookonontologies.Springer.

Su,X.,&Khoshgoftaar,T.M.(2009).Asurveyofcollaborativeﬁlteringtechniques. Advancesinartiﬁcialintelligence,2009(4),1–19.

Taniar,D.,&Rahayu,J.W.(2006).Websemanticsandontology.IdeaGroupPub. Tarus,J.K.,Niu,Z.,&Yousif,A.(2017).Ahybridknowledge-basedrecommender

systemfore-learningbasedonontologyandsequentialpatternmining.Future GenerationComputerSystems,72,37–48.

Ticha,S.B.,Roussanaly,A.,Boyer,A.,&Bsaies,K.(2012September).User seman-ticpreferencesfor collaborativerecommendations. InInternational Conference onElectronicCommerceandWebTechnologies(pp.203–211).BerlinHeidelberg: Springer.

Trewin,S.(2000).Knowledge-basedrecommendersystems.EncyclopediaofLibrary andInformationScience:Supplement,69(32),180–200.

Truong,K.,Ishikawa,F.,&Honiden,S.(2007).Improvingaccuracyofrecommender systembyitemclustering.IEICETransactionsonInformationandSystems,90(9), 1363–1373.

Tsai,C.-F.,&Hung,C.(2012).Clusterensemblesincollaborativeﬁltering recommen-dation.AppliedSoftComputing,12(4),1417–1425.

Ungar,L.H.,&Foster,D.P.(1998).Clusteringmethodsforcollaborativeﬁltering. PaperpresentedattheAAAIWorkshoponRecommendationSystems.

Wang,P.(2012).APersonalizedCollaborativeRecommendationApproachBasedon ClusteringofCustomers.PhysicsProcedia,24,812–816.

Wang,R.Q.,&Kong,F.S.(2007August).Semantic-enhancedpersonalized recom-mendersystem.InMachineLearningandCybernetics,2007International Confer-enceon:7(pp.4069–4074).

Wu,M.L.,Chang,C.H.,&Liu,R.Z.(2014).Integratingcontent-basedﬁlteringwith collaborativeﬁlteringusingco-clusteringwithaugmentedmatrices.Expert Sys-temswithApplications,41(6),2754–2761.

Xue,G.-R.,Lin,C.,Yang,Q.,Xi,W.,Zeng,H.-J.,Yu,Y.,etal.(2005).Scalable collab-orativeﬁlteringusingcluster-basedsmoothing. InPaperpresentedatthe Pro-ceedingsofthe28thannualinternationalACMSIGIRconferenceonResearchand developmentininformationretrieval.

Zhang, Z.-K., Zhou,T., &Zhang, Y.-C. (2011). Tag-aware recommender systems: A state-of-the-art survey. Journal ofComputer Scienceand Technology, 26(5), 767–777.

Zhou,D.,Zhu,S.,Yu,K.,Song,X.,Tseng,B.L.,Zha,H.,etal.(2008).Learningmultiple graphsfordocumentrecommendations.InPaperpresentedattheProceedingsof the17thinternationalconferenceonWorldWideWeb.