Contents lists available at ScienceDirect
Information
Sciences
journal homepage: www.elsevier.com/locate/ins
Sparse
online
collaborative
filtering
with
dynamic
regularization
Kangkang
Li
a,
Xiuze
Zhou
b,
Fan
Lin
a,∗,
Wenhua
Zeng
a,
Beizhan
Wang
a,
Gil
Alterovitz
caSoftware School, Xiamen University, Xiamen City, Fujian, China bDepartment of Automation, Xiamen University, Xiamen City, Fujian, China cBoston Children’s Hospital, Harvard Medical School, Boston, USA
a
r
t
i
c
l
e
i
n
f
o
Article history:
Received17November2017 Revised23July2019 Accepted29July2019 Availableonline29July2019
Keywords:
Collaborativefiltering Dynamicregularization Onlinecollaborativefiltering Neighborhoodfactor
a
b
s
t
r
a
c
t
Collaborativefiltering(CF)approachesarewidelyappliedinrecommendersystems. Tradi-tionalCFapproacheshavehighcoststotrainthemodelsandcannotcapturechangesin userinterestsanditempopularity.MostCFapproachesassumethatuserinterestsremain unchangedthroughout the wholeprocess. However, user preferencesarealways evolv-ingand thepopularityofitemsisalwayschanging.Additionally,inasparsematrix,the amountofknownratingdataisverysmall.Inthispaper,weproposeamethodofonline collaborativefilteringwithdynamicregularization(OCF-DR),thatconsidersdynamic infor-mationandusestheneighborhoodfactortotrackthedynamicchangeinonline collabo-rativefiltering(OCF).TheresultsfromexperimentsontheMovieLens100K,MovieLens1M, andHetRec2011 datasetsshow thattheproposed methodsaresignificantimprovements overseveralbaselineapproaches.
© 2019 Elsevier Inc. All rights reserved.
1. Introduction
Recommendersystemshelpconsumersfinditemsofinterestfroma widerangeofchoices[3].Theyprovideadvice to findproducts,alleviateinformationoverload[2],andenhanceoursatisfactionandloyaltytoincreasesalesfore-commerce firms[12].
Collaborativefiltering(CF)isoneofthemostsuccessfultechniquesappliedinrecommendersystems[3].CFapproaches predictuserpreferencesonlyontheirhistoricalratingdataanddonotrequiredomainknowledgeoradditionalinformation. ThekeyideaofCFisthatiftwousershavepreviouslylikedsimilaritems,theywillcontinuetodosointhefuture[34].
Manymodel-basedCF approacheshave beenproposed to improvethe quality ofrecommendersystems. Forexample, Zhouetal.addedratinginformationtothe latentDirichletallocation(LDA)model toobtainthe distributionofusers’ in-terests[39].Palumboetal.proposedentity2rectolearnuser-itemrelatednessbasedonknowledgegraphsfortop-n recom-mendersystems[26].Liangetal.usedmatrixfactorizationanditemembeddingtoobtainthecooccurrencepatternsofrare items[17].
Manydeeplearningmethodshavebeenapplied inrecommendersystems.Forexample,Wangetal.proposeda hierar-chicalBayesianmodelbyintegratingastackeddenoisingautoencoderintoprobabilisticmatrixfactorization[29].Heetal.
∗ Correspondingauthor.
E-mail address: [email protected](F.Lin).
https://doi.org/10.1016/j.ins.2019.07.093
proposedageneralframework,neuralnetwork-basedCF,whichleveragesamulti-layerperceptronmodeltolearnthe user-item interactionfunction [8].Wu etal.proposedrecurrentrecommendernetworkscombiningalong short-termmemory modelwithtraditionallow-rankfactorization[31].Sedhainetal.usedanautoencodertoembeditemsintolatentspace[27]. One ofthemostpopularCF approachesismatrix factorization(MF),includingprobabilistic matrixfactorization(PMF) [1]andnonnegativematrixfactorization(NMF)[9].Theidea istoobtainuser/item latentfeatures fromhigh-dimensional dataandusethemtopredictunknownratings.
TraditionalCFmethodsthatadoptbatchlearningalgorithms[10]havefourmaindrawbacks.First,retrainingamodelis expensive[4,16,18,19],andthemodelsrequireallthedatatobeavailablebeforeretraining.Second,theydonotwellhandle dynamicratingsdata, whichare sequential,andnewusersandproducts are constantlybeingadded [18]. Third,they fail tocapturethedriftofuserpreferences[4,13,37]becauseuserinterests arealways evolvingandthepopularityofitemsis continuouslychanging.Finally,theyareunsuitableandnon-scalableforreal-worldlarge-scaleonlineapplicationsinwhich ratingsusuallyarrivesequentiallyandperiodically[23].
Avarietyofonlinecollaborativefiltering(OCF)methodshavebeenproposedrecentlytoalleviatetheseissues.Abernethy etal.proposedtolearnalow-rankMFbyoptimizingtheobjectivefunctioninstochasticgradientdescent[11].Wangetal. exploitedaprinciplesimilartothatofonlinemultitasklearningforOCF[30].Luetal.appliedconfidence-weightedlearning [6]forOCFtasks[23].Chenghaoetal.proposed anonlineBayesianinferencealgorithmincorporatingcontentinformation intoOCF[21].
OCF methods, which apply online learning algorithms, have several attractive advantages: (i) they avoid retraining a modelfromscratchfornewtrainingdatabecauseOCFmethodsupdatemodelssequentially[10];(ii)theirscaleislinearly related tothe number ofobserved ratings andthe size ofthe latent features [19,38]; and (iii)they have low sensitivity to changes whenmodels add newratings [38] because they adoptonlinelearningalgorithms to update modelsfor new trainingdata.
The parameters ofmostcurrentCF methods,both offlineand online,arepredeterminedandinvariant throughoutthe entire operation.These methods assume that user preferences anditem popularityare static [23], which means that all itemsare equallyratedby everyuserandall usersare equallylikelyto rateeveryitem. However,thisassumption isnot basedinrealitybecauseuserpreferencescontinuouslyevolve[36].
CFmethodsmustbe abletotrackuserinterestsandquicklyproviderecommendations[22].Therefore,Koren proposed a CF method withtemporal dynamicsand builta factorizationmodel of thechanging characteristics ofusers and items [13].Xiongetal.proposedafactor-basedCFmethodthataccountsfortime[32].Plovicsetal.exploitedtemporalinfluences betweenuserstoimproverecommendationquality[25].
Inthispaper,we adddynamicfactors,includingthedynamicaveragerating,userdynamicratinghabits,item dynamic attributes, anddynamic ratingdistribution, toOCF to improveits accuracy. The neighborhood factoris alsoincorporated inourmethodstotrackchangesinuserpreferences.Userratingbehavior,itempopularity,andthedistributionofauser’s ratings areconstantly influx,especially foronlinelearning. Ifauser’stastes change,thenhis/herneighborhood willalso change.
Ourcontributionsinthispaperincludethefollowing:
(i) Wecomputethedynamicratingbehavior, i.e.,theratingaverageofeachuser,thebiasofeach user,andthebiasof eachitemineachinteraction.
(ii) Weupdatetheweightoftheuserfeaturevectoranditemfeaturevectorineachround. (iii) Weaddtheneighborhoodfactortoourmethodtotrackthedriftofuserpreferences.
The rest of this paper is organized as follows. Section 2 briefly reviews the background and related work. Section 3 presents thedetails ofour proposed model. Section 4describes the experimental settingand resultson three public datasets to demonstratethe performance of our methods.Section 5provides our conclusionsand suggestions for futurework.
2. Backgroundandrelatedwork
Inthissection,weintroducetheproblemsettingsofCF. Thenwereview somerelatedworkonrecommendersystems, includingmatrix-factorization-basedCFmethods,whichperformwelltoreducetheproblemofhigh-dimensionaldata,and OCFmethods,whichwestudyinthispaper.
2.1. Problemsetting
CFconsistsofthreebasicelements:user,item,andrating.ThedefinitionsofCFincludethefollowing: Definition1. UserfeaturematrixP∈Rk×m;anduserfeaturevectorsP
u. Definition2. ItemfeaturematrixQ∈Rk×n;anditemfeaturevectorsQ
i. Definition3. User-itemratingmatrixR∈Rm×n;r
misthe numberofusers,n isthenumberofitems, andk isthenumberoflatentfeatures. kismuch smallerthann
andm.PandQareusedtopredicttheunknownrating: ˆ
ru,i=PuTQi.
TheapproximationerrorisminimizedtoobtaintheoptimalmatricesPandQ: argmin
P∈Rk×m,Q∈Rk×n
R
−PTQ2 F,where
·FistheFrobeniusnormofthematrix.Theaboveequationcanbeformulatedas argminP∈Rk×m,Q∈Rk×n
(u,i)∈C
l
(
Pu,Qi,ru,i)
,whereC⊆{(u,i)|ru,iisknown},andthelfunction isusedtomeasure thedifferencebetweenthepredictedandrealvalues. Themostcommonmeasurementsarerootmeansquarederror(RMSE)andmeanabsoluteerror(MAE),definedas
RMSE=
1|
C|
(u,i)∈C(
ru,i−rˆu,i)
2 MAE= 1|
C|
(u,i)∈Cr
u,i−rˆu,i,whereC⊆{(u, i)|ru,iisknown},|C| is the number ofknown ratings in C,rˆu,i=PuTQi is the predictedrating, andru,i isthe knownrating.TooptimizetheRMSE,wedefinethelossfunctionas
l1
(
Pu,Qi,ru,i)
=(
ru,i−PuTQi)
2.Similarly,tooptimizetheMAE,wedefinethelossfunctionas l2
(
Pu,Qi,ru,i)
=r
u,i−PuTQi.2.2.Matrixfactorization-basedCFmethods
CFmethodsaregenerallydividedintotwobasiccategoriesofmodel-andmemory-basedCF[3].Model-basedCFmethods performbetter onproblems ofdatasparsity, andare thereforemorepromising,andonce trained,they make predictions moreefficiently[19].
MFisone ofthemostsuccessfulmodel-basedalgorithms[13].It performswell atreducing high-dimensionaldata.Its keyideaisthatauser’sratingbehavioronanitemisdeterminedbysomelatentfeatures[24].First,MFlearnstheuserand itemfeature vectorsbasedonuser-item ratingsto reducethe dimensionality,andthenuses thefeature vectorto predict unknownratings.Matrix-factorization-basedCFisdefinedas
(u,i)∈C l
(
Pu,Qi,ru,i)
+λ
2 m u=1 Pu2+ n i=1 Qi2 ,whereC⊆{(u,i)|ru,iisknown},l(Pu,Qi,ru,i)isalossfunction,and
λ
isaregularizationparameter.2.3.OCFmethods
Onlinelearning algorithms can update the model and avoid the cost ofretraining it when a new instance is added [10,28,35].Manystudies onOCF haveemerged recently. Abernethyet al.proposed an OCFmethod forlearninglow-rank MF byoptimizing theobjective function directly[11]. However, thismethod optimizedonly theloss function andled to overfitting.Aneffectivesolutiontothisproblemistoaddregularizationtermstoconstraintheobjectivefunction.Dasetal. proposedacombinationofMinHashclustering,probabilisticlatentsemanticindexing,andcovisitationcountstorecommend personalizednewsforonlineusers[5].Zhouetal.addedbiasandconfidenceweights toOCFtoimproveits stabilityand accuracy[38].Lingetal.designeda dual-averagemethodforprobabilisticMFbyaddingpreviousratinginformationinan approximateaveragegradientoftheloss[19].
OCFisappliedinmanyotherareas.Forexample,Chenghaoetal.combinedonlineadaptivepassive-aggressivemethods withnonnegativematrixfactorizationtoimprovetheaccuracy[20],andYangetal.appliedOCFtosocialnetworking[33]. 3. Onlinecollaborativefilteringwithdynamicregularization
Wenowintroducethealgorithmforonlinecollaborativefilteringwithdynamicregularization(OCF-DR).Itskeyfunction istoadddynamicregularizationandaneighborhoodfactortoOCFtoimproveitsstabilityandaccuracy.
3.1. Onlinecollaborativefilteringwithdynamicregularization
The parametersofmostCF methodsare predeterminedandinvariable, anduserpreferencesanditempopularities are assumedtobe static[36].Therefore,themethods assumethat allitemsare equallyratedbyevery userandall usersare equallylikelytorateeveryitem.Thisassumptionisnotrealistic[38].
First,thepopularityofproductsconstantlychanges,especiallywhennewchoicesemerge[18].Seasonalfactorsareamong themostimportant.Forexample,sweatersareeasiertosellinwinterthaninsummer.Second,auser’sinclination contin-uouslychanges overtime,whichleadstotherepositioningoftheirtastes.Emotionsareanotherimportantfactor[39].For example,userswhoratedamovieat3starsmaychangetheirratingto4starsdependingontheirmood.So,dynamic fac-torsareessentialforOCFmethods[18].However,currentworkonOCFseldomconsidersthedynamicchangesofusersand items,andtoconsidertheratingofonlytheinteractionoftheuserfeaturevectoranditemfeaturevectorisunreasonable.
Therefore,weproposeOCF-DR,whichaddsdynamicregularizationandaneighborhoodfactortoOCF.Ithasthreemain parts:(i) computingthedynamicchange,i.e.,thedynamicratingaverageofeachuser,thedynamicbiasofeachuser,and the dynamicbias ofeach item ineach interaction; (ii) updating the weights ofthe userfeature vector anditem feature vectorineachround;and(iii)takingtheneighborhoodfactorintoaccounttotrackthechangeofuserpreferences.
Ourmethodpredictsratingsasfollows: ˆ
ru,i=
μ
+bu+bi+PuTQi,where
μ
istheoverallratingaverage;bu istheuserbias,theobserveddeviationofuseru,i.e., theratingbehavior ofthe user;andbiistheitembias,theobserveddeviationofitemi,i.e.,theuniquecharacteristicsoftheitem.Userbiasrepresentstheintrinsictrendoftheuser.Forexample,someuserstendtogivehigherratingsthanothers.Item biasrepresentstheintrinsicpropertiesoftheitem.Forexample,someitemsreceivehigherratingsthanothers.Bothbiases areindependentofuserinteractions.
TooptimizetheRMSE,wedefinethelossfunctionas
l1
(
Pu,Qi,bu,bi,ru,i)
=(
ru,i−μ
−bu−bi−PuTQi)
2. (1) Similarly,tooptimizetheMAE,wedefinethelossfunctionasl2
(
Pu,Qi,bu,bi,ru,i)
=r
u,i−μ
−bu−bi−PuTQi. (2) Inourmethods,wechoosel1 astheevaluationmetric.3.1.1. OCF-DR_I
TheobjectivefunctionofOCF-DR_Iis C
(
Pu,Qi,bu,bi)
= (u,i)∈C l(
Pu,Qi,bu,bi,ru,i)
+λ
2 m u=1 p(
u)
Pu2+ n i=1 q(
i)
Qi2+b2u+b2i , (3)wherep(u)istheprobability-of-observingrowofuseru,i.e.,thenumberofitemsratedbytheuser;q(i)isthe probability-of-observingcolumnofitemi,i.e.,thenumberofuserswhohaverateditemi;and
λ
isaregularizationparameter.Theprobabilitydistributionofuserratingbehaviorisnotuniformbecausesomeitemsaremorepopularandmorelikely toreceive ratings,andsomeusers aremorelikely toprovideratings.Therefore,thepenalization ofthefeature vectorsof usersoritemswithmoreratingsisincreased.
Stochasticgradientdescent(SGD)isaneffectivetoolforoptimizationandonlinelearning[10].WeadoptSGDtooptimize formula(3)andobtaintheupdaterulesPu,Qi,bu,andbi:
Pu=Pu−
η ∂
C∂
Pu =(
1−λη
·p(
u))
Pu+2η
eu,iQi (4) Qi=Qi−η ∂
C∂
Qi =(
1−λη
·q(
i))
Qi+2η
eu,iPu (5) bu=bu−η ∂
C∂
bu =(
1−λη
)
bu+2η
eu,i (6) bi=bi−η ∂
C∂
bi =(
1−λη
)
bi+2η
eu,i, (7)whereeu,i=ru,i−rˆu,i.rˆu,iisthepredictedrating,andru,iistheknownrating.
λ
isaregularizationparameter.η
denotesthe learningrateparameter,whichcontrolsthechangeineachstep(Algorithm1).Algorithm1 Onlinecollaborativefilteringwithdynamicregularization(OCF-DR_I). Parameters:n,m,k,
λ
,η
Input:asequenceofratingpairs
(
u,i,ru,i)
Initialization: initialize a random matrixfor userfeature matrix P∈Rk×m anditem feature matrix Q∈Rk×n; initialize a randommatrixforbu∈R1×mandbi∈R1×n;initializezeromatricesforTu∈R1×m,thenumberofitemsratedbyuseru,and
Ti∈R1×n,thenumberofuserswhohaverateditemi Fort=1, 2, . . . , T do
1: Receiveratingpredictionrequestofuseruonitemi 2: Compute
μ
andmakepredictionrˆu,i=μ
+bu+bi+PuTQi3: Thetrueratingru,iisrevealed
4: Thealgorithmsuffersalossl
(
Pu, Qi, bu, bi, ru,i)
5: UpdateTu andTi:Tu=Tu+1;Ti=Ti+1
6: Update p
(
u)
andq(
i)
: p(
u)
=Tu/t,q(
i)
=Ti/t7: UpdatePu,Qi,bu,andbiaccordingto(4),(5),(6),and(7),respectively Endfor
Fig.1. Diagramofdynamicneighborhoods.
3.1.2. OCF-DR_II
NeighboringtemporalinformationisanimportantfactorinCFthatimprovestherecommendationaccuracy[14].Lathia et al. incorporated the adaptive neighborhood into temporal CF [15], but they considered only the varying size of the neighborhood over time. Liu et al.added temporal information anddeveloped incremental learningsimilarity to extend theneighborhood-basedalgorithm[22].
Foreveryinteraction,auserfeaturevectorwillchangeafterauserratesanitem.Topreventoverfittinginthisdynamic process,weincorporatearegularizationtermrelatedtothedifferencebetweenausersfeaturevectorsandhis/her neighbor-hoodsfeaturevectorsforaperiodoftime.Byonlineupdating,thefarthestneighborsoftheusermaychangeforaperiod oftime.AdiagramofdynamicneighborhoodsisshowninFig.1.
So,we attempttotake theneighboringtemporal informationintoOCF-DR_I. Thenthe objectivefunctionof OCF-DR_II becomes C
(
Pu,Qi,bu,bi)
= (u,i)∈C l(
Pu,Qi,bu,bi,ru,i)
+λ
2 m u=1 p(
u)
Pu2+ n i=1 q(
i)
Qi2+b2u+b2i +α
2 m u=if∈Fu sim(
u,f)
P
u−Pf 2 , (8)whereFuisthesetofneighborsofuseru;|Fu|isthenumberofneighborsofuseru;
α
andλ
areregularizationparameters;p(u)denotestheprobability-of-observingrowofuseru;q(i) istheprobability-of-observingcolumnofitemi;andsim(u,f) isthesimilaritybetweenuseruandf,definedas
sim
(
u,f)
=I
I
u∩Ifu∪If
whereIu andIfaretherespectivesetsthatusersuandfhaverated. WecansimilarlyobtaintheOCF-DR_IIupdaterules:
Pu=Pu−
η ∂
C∂
Pu =(
1−αη
f∈Fu sim(
u,f)
−λη
·p(
u))
Pu+αη
f∈Fu sim(
u,f)
Pf+2η
eu,iQi (10) Qi=Qi−η ∂
C∂
Qi =(
1−λη
·q(
i))
Qi+2η
eu,iPu (11) bu=bu−η ∂
C∂
bu =(
1−λη
)
bu+2η
eu,i (12) bi=bi−η ∂
C∂
bi =(
1−λη
)
bi+2η
eu,i, (13)where rˆu,i is the predicted rating, ru,i is the known rating, eu,i=ru,i−rˆu,i,
λ
is a regularization parameter, andη
is the learningrateparameter,whichcontrolsthechangeineachstep.Toimprovethealgorithm’s speed,we regularly updatethe neighborhoodofa user.We definethe updatecycleofthe userneighborhoodCandthenumberofneighborhoodsH(Algorithm2).
Algorithm2 OCF-DR_II. Parameters:n, m, k,
λ
,α
,η
, C, HInput:asequenceofratingpairs
(
u,i,ru,i)
Initialization: initialize a random matrix foruser feature matrixP∈Rk×m and item feature matrixQ∈Rk×n; initialize a randommatrixforbu∈R1×mandbi∈R1×n;initializezeromatricesforTu∈R1×m,thenumberofitemsratedbyuseru,and
Ti∈R1×n,thenumberofuserswhohaverateditemi;initializearandommatrixfortheneighborhoodmatrixF∈Rm×H Fort=1,2,...,T do
1:Receiveratingpredictionrequestofuseruonitemi 2:Compute
μ
andmakepredictionrˆu,i=μ
+bu+bi+PuTQi3:Thetrueratingru,iisrevealed
4:Thealgorithmsuffersalossl
(
Pu,Qi,bu,bi,ru,i)
5:UpdateTuandTi:Tu=Tu+1;Ti=Ti+1
6:Update p
(
u)
andq(
i)
: p(
u)
=Tu/t, q(
i)
=Ti/t7: UpdatePu,Qi,bu,andbiaccordingto(10),(11),(12),and(13),respectively
8:ift=Cthen
9: UpdateF accordingto(9)
10:endif Endfor
4. Experiments
Weperformedseveralexperimentstocomparethequalityofourproposedmethodstothoseofthebaselineapproaches, andanalyzedtheresultsindetail.
4.1. Datasets
Weperformedexperimentsonthreepublic datasets:MovieLens100K[7],MovieLens1M[7],andHetRec2011[7],which are availablefromthe MovieLenswebsite.1 TheMovieLens100K datasetcontains100,000ratingsfrom943users on1682
movies.The MovieLens1Mdatasetconsistsof1million ratingsfrom6000 userson4000movies.The HetRec2011dataset comprises855,598ratingsfrom2113userson10,109movies.
We adopted themostpopular metric,RMSE, to evaluate the predictionaccuracy ofour proposed methods. Asmaller RMSEindicatesbetterpredictionaccuracy.
4.2. Baselineapproaches
Todemonstratetheadvantagesanddisadvantagesofthealgorithm, weconducteda comparativeexperimentwithour threebaselineapproaches.
Table1 PerformanceonMovieLens100K. Method\k 5 10 15 20 25 30 35 40 OLR 1.1242 0.9911 1.0142 1.1133 1.236 1.3603 1.4833 1.5949 CWOCF_I 1.0405 1.0101 1.0184 1.0536 1.0931 1.1541 1.2340 1.3473 SOCF_II 1.0403 0.9863 1.0434 1.1485 1.2432 1.3286 1.3985 1.4861 OCF-DR_I 1.0384 1.0390 1.0401 1.0398 1.0391 1.0393 1.0398 1.0405 OCF-DR_II 1.0381 1.0388 1.0384 1.0385 1.0389 1.0391 1.0387 1.0393 Table2 PerformanceonMovieLens1M. Method\k 5 10 15 20 25 30 35 40 OLR 1.1166 0.9760 0.9695 1.0236 1.0976 1.1768 1.2500 1.3204 CWOCF_I 0.9732 0.9642 0.9693 0.9851 1.0139 1.0521 1.0806 1.1200 SOCF_II 1.0054 0.9469 0.9699 1.0211 1.0874 1.1544 1.2173 1.2765 OCF-DR_I 0.9707 0.9701 0.9691 0.9697 0.9700 0.9705 0.9719 0.9726 OCF-DR_II 0.9688 0.9689 0.9685 0.9691 0.9683 0.9694 0.9698 0.9710 Table3 PerformanceonHetRec2011. Method\k 5 10 15 20 25 30 35 40 OLR 0.9075 0.8869 0.896 0.9099 0.9654 1.0244 1.0862 1.1473 CWOCF_I 0.8754 0.8541 0.8906 0.8959 0.9076 0.9156 0.9222 0.9392 SOCF_II 0.9131 0.8630 0.8982 0.9147 0.9209 0.9367 0.9419 0.9577 OCF-DR_I 0.8921 0.8906 0.8912 0.8917 0.8920 0.8939 0.8941 0.8957 OCF-DR_II 0.8908 0.8900 0.8898 0.8893 0.8888 0.8884 0.8879 0.8870
• OLR:onlinelow-rankapproximation,whichlearnsarank-kMFbyusingonlinegradientdescenttodirectlyoptimizethe lossfunction[11];
• CWOCF_I:confidence-weightedOCF,whichappliesconfidence-weightedonlinelearningtoaddresstheOCFtask[23];
• SOCF_II:second-ordersparseOCF,whichaddsanabsolutetermtotheobjectivefunction[18].
4.3.Resultsandanalysis 4.3.1. Overallperformance
Weevaluatedtheperformanceofallthemethodsonthethreedatasets.Thelearningrate
η
ofallthemethodswassetto 0.005forafaircomparison.Thenumberoflatentfeaturesvariedfrom5to40.Theparametersofthesebaselineapproaches weresetaccordingtoprevious workorbysearchingfromasingleexperimentoneachdataset.Eachexperimentwasrun randomlytentimes,andweobtainedtheaveragevaluesofeachmethod.Tables 1–3 present the performance ofthe methods on each dataset.The bold elements in the tables represent the bestperformanceamongallmethods.Thetablesshow thatourproposedmethodsperformedbestinmostcases.Wetake Table1asanexample.Asthenumberoflatentfeatureskincreases,theRMSEincreasesslowlyinOCF-DR_I.Underthesame valueofk,theRMSEvaluesinOCF-DR_IIareslightlybetterthanthoseofOCF-DR_I.Becausetheratingmatrixissparse,the neighborweightmethodimprovesthepredictionaccuracyofOCF.Comparedwiththoseofthebaselines,theRMSEvalues inourmethodsarestablewithincreasingkbecauseonceourmodelhasbeenfitted,userswithveryfewratingswillhave featurevectorsclosetothemean,sothepredictedratingsforthoseuserswillbeclosetotheaveragemovieratings.
TheRMSEvaluesofall approachesonthethreedatasets withthenumberoflatentfeatures rangingfrom5to 40are shownin Figs. 2–6. Ourmethods outperformed the other OCF methods, andconverged faster than all the baseline ap-proaches,especiallywhenthealgorithmstartedrunning.
First,comparedwithotherbaselineapproaches,i.e.,OLR,CWOCF_I,andSOCF_II,wefoundthat ourmethodsperformed betterintermsofRMSE.Inmostcases,ourmethodresultedinsmallerRMSEvalues,indicatingthatmethodsthatconsider dynamicregularizationareeffectiveforonlinelearningCF.
Second,weobservedthatourmethodsweremorestablethanthebaselineapproachesinmostcases.OLRwasrelatively sensitivebecauseitlosessometemporalinformationinonlinelearning.TemporaldynamicfactorsareimportantforOCFto capturetheusersdynamichabitsandintereststhroughouttheonlinelearningprocess.
Third,ourmethodswere abletoobtainrelatively stableRMSEvaluesunderdifferentconditionswithvarying numbers oflatentfeatures.Therefore,ourmethodsarerobusttothenumberoflatentfeatures.
Finally,when we compared OCF-DR_I andOCF-DR_II, we found that OCF-DR_II achievedlower RMSEvalues in most cases,indicatingthattheneighborhoodfactorisessentialforOCFtaskstoimprovethepredictionaccuracy.
Fig.2. PerformanceofallmethodsonMovieLens100Kwithk =5.
Fig.4. PerformanceofallmethodsonMovieLens1Mwith k =5.
Fig.6. PerformanceofallmethodsonHetRec2011with k =30.
Table4
Results ofOCF-DR_II with different α on Movie-Lens1Mwhenk =25. α\λ 0.01 0.1 1 10 0.01 0.9870 0.9818 0.9695 1.0142 0.1 0.9866 0.9822 0.9683 1.0155 1 0.9854 0.9807 0.9730 1.0354 10 0.9849 0.9806 0.9745 1.0522
4.3.2. Impactofthenumberoflatentfeatures
Figs.7and8presenttheimpactofthenumberoflatentfeaturesonall methods.Fig.9showsanenlargeddrawing of ourmethodswithdifferentnumbersoflatentfeatureskonMovieLens1M.Weobservethatourmethodswerestableunder differentnumbers oflatentfeatures, whileOLR,CWOCF_I,andSOCF_IIwere sensitive tochanges inthenumberoflatent features.Ononehand,toosmallavalueofk(lessthan15)causestheseapproachestounderfit.However,theyallperform wellasthevalueofkincreases.Ontheotherhand,becausealargevalueofkcausestheseapproachestooverfit,whenthe valueofkislarge(largerthan15) andcontinuestoincrease, theperformance ofallthebaseline approachesworsens.Our methodsarestable,andkhaslittleeffectonouralgorithms.
When the value ofk is small(k<15), the latent features arenot sufficient to contain theuser’s interest information, andouralgorithmsperformnobetterthanothermethods.ThisisbecauseCWOCF_IandSOCF_IIaresecond-ordermethods which usethecovarianceinformation ofthelatent vector tocompensate forthelack ofuserinterest information.When thevalueofkislarge(k> 15),thelatentfeaturescontainsufficientuserinterestinformation.Covarianceinformationofthe latentvectorisredundant,whichresultsinnoobviousincrease inprecision.However, ourmethodsare stablebecausewe trackdynamicchanges:theaverageofeachuser,thebiasofeachuser,andthebiasofeachitemineachround.Moreover, Fig.9showsthatRMSEvaluesofourmethodsfluctuateinaverysmallrangeaskincreases.Thissuggeststhatalargelatent factorhaslittleeffectonourmethods.
4.3.3. Theimpactofparameters
Fig.10andTable4presenttheimpactoftwoparametersonourmethods.First,wesetk=25forOCF-DR_Itoshowits performanceunderdifferentvaluesof
λ
,whichcontrolstheimpactoftheregularizationpenaltyforOCF-DR_I.Weobserve that the curve ofOCF-DR_I declines moresharplyandfinally reaches alarger RMSEvalue withan increasing numberof samples.Second,wesetk=25forOCF-DR_IItoshowitsperformanceunderdifferentvaluesofα
andλ
.FromTable4,weFig.7. PerformanceofallmethodsonMovieLens100Kwithdifferentnumbersoflatentfeaturesk .
Fig.9. EnlargeddrawingofOCF-DR_IandOCF-DR_IIonMovieLens1Mwithdifferentnumbersoflatentfeaturesk .
Fig.10. ConvergenceeffectsofOCF-DR_Iwithdifferent λonMovieLens1Mwhen k =25.
observethat fora givenparameter,theRMSEvaluedecreases firstandthenincreasesasanotherparameterincreases be-causethealgorithmisunderfitforextremelysmall
λ
orα
andoverfitforextremelylargeλ
orα
.Therefore,bothparameters areessentialtoconstrainthestabilityandconvergencerateofthealgorithms.5. Conclusions
Inthispaper,we consideredthe dynamicratingbehavior andupdatedthe weightoftheuserfeature vectoranditem featurevectorineachroundtoimprovepredictionaccuracy.Wethenaddedtheneighborhoodfactortoourmethodtotrack thedriftofuserpreferences.Experimentalresultsshowthatourproposedmethodsoutperformedthebaselineapproaches. Comparedtothebaselineapproaches, ourmethods convergerapidlyandobtainhighpredictionaccuracy,i.e.,lower RMSE
values.Therefore, to consider rational dynamicfactors is crucial to improve the predictionaccuracy of OCF methods. In thefuture,wewill focusontwo directions.First, wewill trytoextract useranditem featuresfromuser/item side infor-mationby deeplearningapproachesoffline asinputforOCF.Second, wewillattemptto applyourmethods toeducation recommendersystems.
DeclarationofCompetingInterest
Wewishtoconfirmthattherearenoknownconflictsofinterestassociatedwiththispublicationandtherehasbeenno significantfinancialsupportforthisworkthatcouldhaveinfluenceditsoutcome.
References
[1] Y.Cao,W.Li,D.Zheng,Animprovedneighborhood-awareunifiedprobabilisticmatrixfactorizationrecommendation,Wirel.Pers.Commun.102(4) (2018)3121–3140.
[2] Z.D.Champiri, S.R.Shahamiri,S.S.B.Salim,Asystematicreviewofscholarcontext-aware recommendersystems,ExpertSyst.Appl.42(3)(2015) 1743–1758.
[3] R.Chen,Q.Hua,Y.Chang,B.Wang,L.Zhang,X.Kong,Asurveyofcollaborativefiltering-basedrecommendersystems:fromtraditionalmethodsto hybridmethodsbasedonsocialnetworks,IEEEAccess6(2018)64301–64320.
[4] K.Christakopoulou,F.Radlinski,K.Hofmann,Towardsconversationalrecommendersystems,in:Proceedingsofthe22NdACMSIGKDDInternational ConferenceonKnowledgeDiscoveryandDataMining,in:KDD’16,ACM,NewYork,NY,USA,2016,pp.815–824.
[5] A.S.Das,M.Datar,A.Garg,S.Rajaram,Googlenewspersonalization:scalableonlinecollaborativefiltering,in:Proceedingsofthe16thInternational ConferenceonWorldWideWeb,in:WWW’07,ACM,NewYork,NY,USA,2007,pp.271–280.
[6] M.Dredze,K.Crammer,F.Pereira,Confidence-weightedlinearclassification,in:Proceedingsofthe25thInternationalConferenceonMachineLearning, in:ICML’08,ACM,NewYork,NY,USA,2008,pp.264–271.
[7] F.M.Harper,J.A.Konstan,Themovielensdatasets:historyandcontext,ACMTrans.Interact.Intell.Syst.5(4)(2015)19:1–19:19.
[8] X.He,L.Liao,H.Zhang,L.Nie,X.Hu,T.-S.Chua,Neuralcollaborativefiltering,in:Proceedingsofthe26thInternationalConferenceonWorldWide Web,in:WWW’17,InternationalWorldWideWebConferencesSteeringCommittee,RepublicandCantonofGeneva,Switzerland,2017,pp.173–182.
[9] A.Hernando,J.Bobadilla,F.Ortega,AnonnegativematrixfactorizationforcollaborativefilteringrecommendersystemsbasedonaBayesian proba-bilisticmodel,Knowl.-BasedSyst.97(C)(2016)188–202.
[10] S.C.H.Hoi,D.Sahoo,J.Lu,P.Zhao,Onlinelearning:acomprehensivesurvey,CoRR,2018abs/1802.02871.
[11]J.Langford,J.Abernethy,K.Canini,A.Simma,OnlineCollaborativeFiltering,Tech.Rep.,UniversityofCaliforniaatBerkeley,2007.
[12] M.Jugovac,D.Jannach,Interactingwithrecommendersoverviewandresearchdirections,ACMTrans.Interact.Intell.Syst.7(3)(2017)10:1–10:46.
[13] Y.Koren,Collaborativefilteringwithtemporaldynamics,Commun.ACM53(4)(2010)89–97.
[14] Y.Koren,Factorintheneighbors:scalableandaccuratecollaborativefiltering,ACMTrans.Knowl.Discov.Data4(1)(2010)1:1–1:24.
[15] N.Lathia,S.Hailes,L.Capra,Temporalcollaborativefilteringwithadaptiveneighbourhoods,in:Proceedingsofthe32ndInternationalACMSIGIR ConferenceonResearchandDevelopmentinInformationRetrieval,in:SIGIR’09,ACM,NewYork,NY,USA,2009,pp.796–797.
[16] Y.Li,Z.Li,F.Wang,L.Kuang,Acceleratedonlinelearningforcollaborativefilteringandrecommendersystems,in:2014IEEEInternationalConference onDataMiningWorkshop,2014,pp.879–885.
[17]D.Liang,J.Altosaar,L.Charlin,D.M.Blei,Factorizationmeetsthe itemembedding:regularizingmatrix factorizationwithitemco-occurrence,in: Proceedingsofthe10thACMConferenceonRecommenderSystems,in:RecSys’16,ACM,NewYork,NY,USA,2016,pp.59–66.
[18] F.Lin,X.Zhou,W.H.Zeng,F.Lin,X.Zhou,W.Zeng,Sparseonlinelearningforcollaborativefiltering,Int.J.Comput.Commun.Control11(2)(2016) 248–258.ISSN
[19] G.Ling,H.Yang,I.King,M.R.Lyu,Onlinelearningforcollaborativefiltering,in:The2012InternationalJointConferenceonNeuralNetworks(IJCNN), 2012,pp.1–8.
[20]C.Liu,S.C.Hoi,P.Zhao,J.Sun,E.-P.Lim,Onlineadaptivepassive-aggressivemethodsfornon-negativematrixfactorizationanditsapplications,in: Proceedingsofthe25thACMInternationalonConferenceonInformationandKnowledgeManagement,in:CIKM’16,ACM,NewYork,NY,USA,2016, pp.1161–1170.
[21]C.Liu,T.Jin,S.C.Hoi,P.Zhao,J.Sun,Collaborativetopicregressionforonlinerecommendersystems:anonlineandBayesianapproach,Mach.Learn. 106(5)(2017)651–670.
[22]N.N.Liu,M.Zhao,E.Xiang,Q.Yang, Onlineevolutionarycollaborativefiltering,in:Proceedingsofthe FourthACMConferenceonRecommender Systems,in:RecSys’10,ACM,NewYork,NY,USA,2010,pp.95–102.
[23]J.Lu,S.C.H.Hoi,J.Wang,P.Zhao,Secondorderonlinecollaborativefiltering,in:JMLR:WorkshopandConferenceProceedings:ACML2013,14,2013, pp.325–340.
[24]X.Luo,M.Zhou,Y.Xia,Q.Zhu,Anefficientnon-negativematrix-factorization-basedapproachtocollaborativefilteringforrecommendersystems,IEEE Trans.Ind.Inform.10(2)(2014)1273–1284.
[25]R.Pálovics,A.A.Benczúr,L.Kocsis,T.Kiss,E.Frigó,Exploitingtemporalinfluenceinonlinerecommendation,in:Proceedingsofthe8thACM Confer-enceonRecommenderSystems,in:RecSys’14,ACM,NewYork,NY,USA,2014,pp.273–280.
[26]E.Palumbo,G.Rizzo,R.Troncy,Entity2rec:learninguser-itemrelatednessfromknowledgegraphsfortop-nitemrecommendation,in:Proceedingsof theEleventhACMConferenceonRecommenderSystems,in:RecSys’17,ACM,NewYork,NY,USA,2017,pp.32–36.
[27]S.Sedhain,A.K.Menon,S.Sanner,L.Xie,Autorec:autoencodersmeetcollaborativefiltering,in:Proceedingsofthe24thInternationalConferenceon WorldWideWeb,in:WWW’15Companion,ACM,NewYork,NY,USA,2015,pp.111–112.
[28]S.Shalev-Shwartz,Onlinelearningandonlineconvexoptimization,Found.TrendsMach.Learn.4(2)(2012)107–194.
[29]H.Wang,N.Wang,D.-Y.Yeung,Collaborativedeeplearningforrecommendersystems,in:Proceedingsofthe21thACMSIGKDDInternational Confer-enceonKnowledgeDiscoveryandDataMining,in:KDD’15,ACM,NewYork,NY,USA,2015,pp.1235–1244.
[30]J.Wang,S.C.Hoi,P.Zhao,Z.-Y.Liu,Onlinemulti-taskcollaborativefilteringforon-the-flyrecommendersystems, in:Proceedingsofthe7thACM ConferenceonRecommenderSystems,in:RecSys’13,ACM,NewYork,NY,USA,2013,pp.237–244.
[31] C.-Y.Wu,A.Ahmed,A.Beutel,A.J.Smola,H.Jing,Recurrentrecommendernetworks,in:ProceedingsoftheTenthACMInternationalConferenceon WebSearchandDataMining,in:WSDM’17,ACM,NewYork,NY,USA,2017,pp.495–503.
[32]L.Xiong,X.Chen,T.-K.Huang,J.G.Schneider,J.G.Carbonell,TemporalcollaborativefilteringwithBayesianprobabilistictensorfactorization.,in:SDM, SIAM,2010,pp.211–222.
[33]X.Yang,H.Steck,Y.Liu,Circle-basedrecommendationinonlinesocialnetworks,in:Proceedingsofthe18thACMSIGKDDInternationalConference onKnowledgeDiscoveryandDataMining,in:KDD’12,ACM,NewYork,NY,USA,2012,pp.1267–1275.
[34]Z.Yang,B.Wu,K.Zheng,X.Wang,L.Lei,Asurveyofcollaborativefiltering-basedrecommendersystemsformobileinternetapplications,IEEEAccess 4(2016)3273–3287.
[35]P.Zhao,S.C.H.Hoi,R.Jin,DUOL:adoubleupdatingapproachforonlinelearning,in:Proceedingsofthe22ndInternationalConferenceonNeural InformationProcessingSystems,in:NIPS’09,CurranAssociatesInc.,USA,2009,pp.2259–2267.
[36]X.Zhao,W.Zhang,J.Wang,Interactivecollaborativefiltering,in:Proceedingsofthe22ndACMInternationalConferenceonInformation&Knowledge Management,in:CIKM’13,ACM,NewYork,NY,USA,2013,pp.1411–1420.
[37]Z.Zhao,H.Lu,D.Cai,X.He,Y. Zhuang, Userpreferencelearningfor onlinesocialrecommendation,IEEETrans.Knowl. DataEng.28(9)(2016) 2522–2534.
[38]X.Zhou,W.Shu,F.Lin,B.Wang,Confidence-weightedbiasmodelforonlinecollaborativefiltering,Appl.SoftComput.70(2018)1042–1053.