• No results found

Sparse online collaborative filtering with dynamic regularization

N/A
N/A
Protected

Academic year: 2021

Share "Sparse online collaborative filtering with dynamic regularization"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Contents lists available at ScienceDirect

Information

Sciences

journal homepage: www.elsevier.com/locate/ins

Sparse

online

collaborative

filtering

with

dynamic

regularization

Kangkang

Li

a

,

Xiuze

Zhou

b

,

Fan

Lin

a,∗

,

Wenhua

Zeng

a

,

Beizhan

Wang

a

,

Gil

Alterovitz

c

aSoftware School, Xiamen University, Xiamen City, Fujian, China bDepartment of Automation, Xiamen University, Xiamen City, Fujian, China cBoston Children’s Hospital, Harvard Medical School, Boston, USA

a

r

t

i

c

l

e

i

n

f

o

Article history:

Received17November2017 Revised23July2019 Accepted29July2019 Availableonline29July2019

Keywords:

Collaborativefiltering Dynamicregularization Onlinecollaborativefiltering Neighborhoodfactor

a

b

s

t

r

a

c

t

Collaborativefiltering(CF)approachesarewidelyappliedinrecommendersystems. Tradi-tionalCFapproacheshavehighcoststotrainthemodelsandcannotcapturechangesin userinterestsanditempopularity.MostCFapproachesassumethatuserinterestsremain unchangedthroughout the wholeprocess. However, user preferencesarealways evolv-ingand thepopularityofitemsisalwayschanging.Additionally,inasparsematrix,the amountofknownratingdataisverysmall.Inthispaper,weproposeamethodofonline collaborativefilteringwithdynamicregularization(OCF-DR),thatconsidersdynamic infor-mationandusestheneighborhoodfactortotrackthedynamicchangeinonline collabo-rativefiltering(OCF).TheresultsfromexperimentsontheMovieLens100K,MovieLens1M, andHetRec2011 datasetsshow thattheproposed methodsaresignificantimprovements overseveralbaselineapproaches.

© 2019 Elsevier Inc. All rights reserved.

1. Introduction

Recommendersystemshelpconsumersfinditemsofinterestfroma widerangeofchoices[3].Theyprovideadvice to findproducts,alleviateinformationoverload[2],andenhanceoursatisfactionandloyaltytoincreasesalesfore-commerce firms[12].

Collaborativefiltering(CF)isoneofthemostsuccessfultechniquesappliedinrecommendersystems[3].CFapproaches predictuserpreferencesonlyontheirhistoricalratingdataanddonotrequiredomainknowledgeoradditionalinformation. ThekeyideaofCFisthatiftwousershavepreviouslylikedsimilaritems,theywillcontinuetodosointhefuture[34].

Manymodel-basedCF approacheshave beenproposed to improvethe quality ofrecommendersystems. Forexample, Zhouetal.addedratinginformationtothe latentDirichletallocation(LDA)model toobtainthe distributionofusers’ in-terests[39].Palumboetal.proposedentity2rectolearnuser-itemrelatednessbasedonknowledgegraphsfortop-n recom-mendersystems[26].Liangetal.usedmatrixfactorizationanditemembeddingtoobtainthecooccurrencepatternsofrare items[17].

Manydeeplearningmethodshavebeenapplied inrecommendersystems.Forexample,Wangetal.proposeda hierar-chicalBayesianmodelbyintegratingastackeddenoisingautoencoderintoprobabilisticmatrixfactorization[29].Heetal.

Correspondingauthor.

E-mail address: [email protected](F.Lin).

https://doi.org/10.1016/j.ins.2019.07.093

(2)

proposedageneralframework,neuralnetwork-basedCF,whichleveragesamulti-layerperceptronmodeltolearnthe user-item interactionfunction [8].Wu etal.proposedrecurrentrecommendernetworkscombiningalong short-termmemory modelwithtraditionallow-rankfactorization[31].Sedhainetal.usedanautoencodertoembeditemsintolatentspace[27]. One ofthemostpopularCF approachesismatrix factorization(MF),includingprobabilistic matrixfactorization(PMF) [1]andnonnegativematrixfactorization(NMF)[9].Theidea istoobtainuser/item latentfeatures fromhigh-dimensional dataandusethemtopredictunknownratings.

TraditionalCFmethodsthatadoptbatchlearningalgorithms[10]havefourmaindrawbacks.First,retrainingamodelis expensive[4,16,18,19],andthemodelsrequireallthedatatobeavailablebeforeretraining.Second,theydonotwellhandle dynamicratingsdata, whichare sequential,andnewusersandproducts are constantlybeingadded [18]. Third,they fail tocapturethedriftofuserpreferences[4,13,37]becauseuserinterests arealways evolvingandthepopularityofitemsis continuouslychanging.Finally,theyareunsuitableandnon-scalableforreal-worldlarge-scaleonlineapplicationsinwhich ratingsusuallyarrivesequentiallyandperiodically[23].

Avarietyofonlinecollaborativefiltering(OCF)methodshavebeenproposedrecentlytoalleviatetheseissues.Abernethy etal.proposedtolearnalow-rankMFbyoptimizingtheobjectivefunctioninstochasticgradientdescent[11].Wangetal. exploitedaprinciplesimilartothatofonlinemultitasklearningforOCF[30].Luetal.appliedconfidence-weightedlearning [6]forOCFtasks[23].Chenghaoetal.proposed anonlineBayesianinferencealgorithmincorporatingcontentinformation intoOCF[21].

OCF methods, which apply online learning algorithms, have several attractive advantages: (i) they avoid retraining a modelfromscratchfornewtrainingdatabecauseOCFmethodsupdatemodelssequentially[10];(ii)theirscaleislinearly related tothe number ofobserved ratings andthe size ofthe latent features [19,38]; and (iii)they have low sensitivity to changes whenmodels add newratings [38] because they adoptonlinelearningalgorithms to update modelsfor new trainingdata.

The parameters ofmostcurrentCF methods,both offlineand online,arepredeterminedandinvariant throughoutthe entire operation.These methods assume that user preferences anditem popularityare static [23], which means that all itemsare equallyratedby everyuserandall usersare equallylikelyto rateeveryitem. However,thisassumption isnot basedinrealitybecauseuserpreferencescontinuouslyevolve[36].

CFmethodsmustbe abletotrackuserinterestsandquicklyproviderecommendations[22].Therefore,Koren proposed a CF method withtemporal dynamicsand builta factorizationmodel of thechanging characteristics ofusers and items [13].Xiongetal.proposedafactor-basedCFmethodthataccountsfortime[32].Plovicsetal.exploitedtemporalinfluences betweenuserstoimproverecommendationquality[25].

Inthispaper,we adddynamicfactors,includingthedynamicaveragerating,userdynamicratinghabits,item dynamic attributes, anddynamic ratingdistribution, toOCF to improveits accuracy. The neighborhood factoris alsoincorporated inourmethodstotrackchangesinuserpreferences.Userratingbehavior,itempopularity,andthedistributionofauser’s ratings areconstantly influx,especially foronlinelearning. Ifauser’stastes change,thenhis/herneighborhood willalso change.

Ourcontributionsinthispaperincludethefollowing:

(i) Wecomputethedynamicratingbehavior, i.e.,theratingaverageofeachuser,thebiasofeach user,andthebiasof eachitemineachinteraction.

(ii) Weupdatetheweightoftheuserfeaturevectoranditemfeaturevectorineachround. (iii) Weaddtheneighborhoodfactortoourmethodtotrackthedriftofuserpreferences.

The rest of this paper is organized as follows. Section 2 briefly reviews the background and related work. Section 3 presents thedetails ofour proposed model. Section 4describes the experimental settingand resultson three public datasets to demonstratethe performance of our methods.Section 5provides our conclusionsand suggestions for futurework.

2. Backgroundandrelatedwork

Inthissection,weintroducetheproblemsettingsofCF. Thenwereview somerelatedworkonrecommendersystems, includingmatrix-factorization-basedCFmethods,whichperformwelltoreducetheproblemofhigh-dimensionaldata,and OCFmethods,whichwestudyinthispaper.

2.1. Problemsetting

CFconsistsofthreebasicelements:user,item,andrating.ThedefinitionsofCFincludethefollowing: Definition1. UserfeaturematrixPRk×m;anduserfeaturevectorsP

u. Definition2. ItemfeaturematrixQRk×n;anditemfeaturevectorsQ

i. Definition3. User-itemratingmatrixRRm×n;r

(3)

misthe numberofusers,n isthenumberofitems, andk isthenumberoflatentfeatures. kismuch smallerthann

andm.PandQareusedtopredicttheunknownrating: ˆ

ru,i=PuTQi.

TheapproximationerrorisminimizedtoobtaintheoptimalmatricesPandQ: argmin

PRk×m,QRk×n

R

PTQ

2 F,

where

·

FistheFrobeniusnormofthematrix.Theaboveequationcanbeformulatedas argmin

PRk×m,QRk×n

(u,i)C

l

(

Pu,Qi,ru,i

)

,

whereC{(u,i)|ru,iisknown},andthelfunction isusedtomeasure thedifferencebetweenthepredictedandrealvalues. Themostcommonmeasurementsarerootmeansquarederror(RMSE)andmeanabsoluteerror(MAE),definedas

RMSE=

1

|

C

|

(u,i)C

(

ru,irˆu,i

)

2 MAE= 1

|

C

|

(u,i)C

r

u,irˆu,i

,

whereC⊆{(u, i)|ru,iisknown},|C| is the number ofknown ratings in C,rˆu,i=PuTQi is the predictedrating, andru,i isthe knownrating.TooptimizetheRMSE,wedefinethelossfunctionas

l1

(

Pu,Qi,ru,i

)

=

(

ru,iPuTQi

)

2.

Similarly,tooptimizetheMAE,wedefinethelossfunctionas l2

(

Pu,Qi,ru,i

)

=

r

u,iPuTQi

.

2.2.Matrixfactorization-basedCFmethods

CFmethodsaregenerallydividedintotwobasiccategoriesofmodel-andmemory-basedCF[3].Model-basedCFmethods performbetter onproblems ofdatasparsity, andare thereforemorepromising,andonce trained,they make predictions moreefficiently[19].

MFisone ofthemostsuccessfulmodel-basedalgorithms[13].It performswell atreducing high-dimensionaldata.Its keyideaisthatauser’sratingbehavioronanitemisdeterminedbysomelatentfeatures[24].First,MFlearnstheuserand itemfeature vectorsbasedonuser-item ratingsto reducethe dimensionality,andthenuses thefeature vectorto predict unknownratings.Matrix-factorization-basedCFisdefinedas

(u,i)C l

(

Pu,Qi,ru,i

)

+

λ

2

m u=1

Pu

2+ n i=1

Qi

2

,

whereC⊆{(u,i)|ru,iisknown},l(Pu,Qi,ru,i)isalossfunction,and

λ

isaregularizationparameter.

2.3.OCFmethods

Onlinelearning algorithms can update the model and avoid the cost ofretraining it when a new instance is added [10,28,35].Manystudies onOCF haveemerged recently. Abernethyet al.proposed an OCFmethod forlearninglow-rank MF byoptimizing theobjective function directly[11]. However, thismethod optimizedonly theloss function andled to overfitting.Aneffectivesolutiontothisproblemistoaddregularizationtermstoconstraintheobjectivefunction.Dasetal. proposedacombinationofMinHashclustering,probabilisticlatentsemanticindexing,andcovisitationcountstorecommend personalizednewsforonlineusers[5].Zhouetal.addedbiasandconfidenceweights toOCFtoimproveits stabilityand accuracy[38].Lingetal.designeda dual-averagemethodforprobabilisticMFbyaddingpreviousratinginformationinan approximateaveragegradientoftheloss[19].

OCFisappliedinmanyotherareas.Forexample,Chenghaoetal.combinedonlineadaptivepassive-aggressivemethods withnonnegativematrixfactorizationtoimprovetheaccuracy[20],andYangetal.appliedOCFtosocialnetworking[33]. 3. Onlinecollaborativefilteringwithdynamicregularization

Wenowintroducethealgorithmforonlinecollaborativefilteringwithdynamicregularization(OCF-DR).Itskeyfunction istoadddynamicregularizationandaneighborhoodfactortoOCFtoimproveitsstabilityandaccuracy.

(4)

3.1. Onlinecollaborativefilteringwithdynamicregularization

The parametersofmostCF methodsare predeterminedandinvariable, anduserpreferencesanditempopularities are assumedtobe static[36].Therefore,themethods assumethat allitemsare equallyratedbyevery userandall usersare equallylikelytorateeveryitem.Thisassumptionisnotrealistic[38].

First,thepopularityofproductsconstantlychanges,especiallywhennewchoicesemerge[18].Seasonalfactorsareamong themostimportant.Forexample,sweatersareeasiertosellinwinterthaninsummer.Second,auser’sinclination contin-uouslychanges overtime,whichleadstotherepositioningoftheirtastes.Emotionsareanotherimportantfactor[39].For example,userswhoratedamovieat3starsmaychangetheirratingto4starsdependingontheirmood.So,dynamic fac-torsareessentialforOCFmethods[18].However,currentworkonOCFseldomconsidersthedynamicchangesofusersand items,andtoconsidertheratingofonlytheinteractionoftheuserfeaturevectoranditemfeaturevectorisunreasonable.

Therefore,weproposeOCF-DR,whichaddsdynamicregularizationandaneighborhoodfactortoOCF.Ithasthreemain parts:(i) computingthedynamicchange,i.e.,thedynamicratingaverageofeachuser,thedynamicbiasofeachuser,and the dynamicbias ofeach item ineach interaction; (ii) updating the weights ofthe userfeature vector anditem feature vectorineachround;and(iii)takingtheneighborhoodfactorintoaccounttotrackthechangeofuserpreferences.

Ourmethodpredictsratingsasfollows: ˆ

ru,i=

μ

+bu+bi+PuTQi,

where

μ

istheoverallratingaverage;bu istheuserbias,theobserveddeviationofuseru,i.e., theratingbehavior ofthe user;andbiistheitembias,theobserveddeviationofitemi,i.e.,theuniquecharacteristicsoftheitem.

Userbiasrepresentstheintrinsictrendoftheuser.Forexample,someuserstendtogivehigherratingsthanothers.Item biasrepresentstheintrinsicpropertiesoftheitem.Forexample,someitemsreceivehigherratingsthanothers.Bothbiases areindependentofuserinteractions.

TooptimizetheRMSE,wedefinethelossfunctionas

l1

(

Pu,Qi,bu,bi,ru,i

)

=

(

ru,i

μ

bubiPuTQi

)

2. (1) Similarly,tooptimizetheMAE,wedefinethelossfunctionas

l2

(

Pu,Qi,bu,bi,ru,i

)

=

r

u,i

μ

bubiPuTQi

. (2) Inourmethods,wechoosel1 astheevaluationmetric.

3.1.1. OCF-DR_I

TheobjectivefunctionofOCF-DR_Iis C

(

Pu,Qi,bu,bi

)

= (u,i)C l

(

Pu,Qi,bu,bi,ru,i

)

+

λ

2

m u=1 p

(

u

)

Pu

2+ n i=1 q

(

i

)

Qi

2+b2u+b2i

, (3)

wherep(u)istheprobability-of-observingrowofuseru,i.e.,thenumberofitemsratedbytheuser;q(i)isthe probability-of-observingcolumnofitemi,i.e.,thenumberofuserswhohaverateditemi;and

λ

isaregularizationparameter.

Theprobabilitydistributionofuserratingbehaviorisnotuniformbecausesomeitemsaremorepopularandmorelikely toreceive ratings,andsomeusers aremorelikely toprovideratings.Therefore,thepenalization ofthefeature vectorsof usersoritemswithmoreratingsisincreased.

Stochasticgradientdescent(SGD)isaneffectivetoolforoptimizationandonlinelearning[10].WeadoptSGDtooptimize formula(3)andobtaintheupdaterulesPu,Qi,bu,andbi:

Pu=Pu

η ∂

C

Pu =

(

1−

λη

·p

(

u

))

Pu+2

η

eu,iQi (4) Qi=Qi

η ∂

C

Qi =

(

1−

λη

·q

(

i

))

Qi+2

η

eu,iPu (5) bu=bu

η ∂

C

bu =

(

1−

λη

)

bu+2

η

eu,i (6) bi=bi

η ∂

C

bi =

(

1−

λη

)

bi+2

η

eu,i, (7)

whereeu,i=ru,irˆu,i.rˆu,iisthepredictedrating,andru,iistheknownrating.

λ

isaregularizationparameter.

η

denotesthe learningrateparameter,whichcontrolsthechangeineachstep(Algorithm1).
(5)

Algorithm1 Onlinecollaborativefilteringwithdynamicregularization(OCF-DR_I). Parameters:n,m,k,

λ

,

η

Input:asequenceofratingpairs

(

u,i,ru,i

)

Initialization: initialize a random matrixfor userfeature matrix PRk×m anditem feature matrix QRk×n; initialize a randommatrixforbuRmandbiRn;initializezeromatricesforTuRm,thenumberofitemsratedbyuseru,and

TiRn,thenumberofuserswhohaverateditemi Fort=1, 2, . . . , T do

1: Receiveratingpredictionrequestofuseruonitemi 2: Compute

μ

andmakepredictionrˆu,i=

μ

+bu+bi+PuTQi

3: Thetrueratingru,iisrevealed

4: Thealgorithmsuffersalossl

(

Pu, Qi, bu, bi, ru,i

)

5: UpdateTu andTi:Tu=Tu+1;Ti=Ti+1

6: Update p

(

u

)

andq

(

i

)

: p

(

u

)

=Tu/t,q

(

i

)

=Ti/t

7: UpdatePu,Qi,bu,andbiaccordingto(4),(5),(6),and(7),respectively Endfor

Fig.1. Diagramofdynamicneighborhoods.

3.1.2. OCF-DR_II

NeighboringtemporalinformationisanimportantfactorinCFthatimprovestherecommendationaccuracy[14].Lathia et al. incorporated the adaptive neighborhood into temporal CF [15], but they considered only the varying size of the neighborhood over time. Liu et al.added temporal information anddeveloped incremental learningsimilarity to extend theneighborhood-basedalgorithm[22].

Foreveryinteraction,auserfeaturevectorwillchangeafterauserratesanitem.Topreventoverfittinginthisdynamic process,weincorporatearegularizationtermrelatedtothedifferencebetweenausersfeaturevectorsandhis/her neighbor-hoodsfeaturevectorsforaperiodoftime.Byonlineupdating,thefarthestneighborsoftheusermaychangeforaperiod oftime.AdiagramofdynamicneighborhoodsisshowninFig.1.

So,we attempttotake theneighboringtemporal informationintoOCF-DR_I. Thenthe objectivefunctionof OCF-DR_II becomes C

(

Pu,Qi,bu,bi

)

= (u,i)C l

(

Pu,Qi,bu,bi,ru,i

)

+

λ

2

m u=1 p

(

u

)

Pu

2+ n i=1 q

(

i

)

Qi

2+b2u+b2i

+

α

2

m u=ifFu sim

(

u,f

)

P

uPf

2

, (8)

whereFuisthesetofneighborsofuseru;|Fu|isthenumberofneighborsofuseru;

α

and

λ

areregularizationparameters;

p(u)denotestheprobability-of-observingrowofuseru;q(i) istheprobability-of-observingcolumnofitemi;andsim(u,f) isthesimilaritybetweenuseruandf,definedas

sim

(

u,f

)

=

I

I

uIf

uIf

(6)

whereIu andIfaretherespectivesetsthatusersuandfhaverated. WecansimilarlyobtaintheOCF-DR_IIupdaterules:

Pu=Pu

η ∂

C

Pu =

(

1−

αη

fFu sim

(

u,f

)

λη

·p

(

u

))

Pu+

αη

fFu sim

(

u,f

)

Pf+2

η

eu,iQi (10) Qi=Qi

η ∂

C

Qi =

(

1−

λη

·q

(

i

))

Qi+2

η

eu,iPu (11) bu=bu

η ∂

C

bu =

(

1−

λη

)

bu+2

η

eu,i (12) bi=bi

η ∂

C

bi =

(

1−

λη

)

bi+2

η

eu,i, (13)

where rˆu,i is the predicted rating, ru,i is the known rating, eu,i=ru,irˆu,i,

λ

is a regularization parameter, and

η

is the learningrateparameter,whichcontrolsthechangeineachstep.

Toimprovethealgorithm’s speed,we regularly updatethe neighborhoodofa user.We definethe updatecycleofthe userneighborhoodCandthenumberofneighborhoodsH(Algorithm2).

Algorithm2 OCF-DR_II. Parameters:n, m, k,

λ

,

α

,

η

, C, H

Input:asequenceofratingpairs

(

u,i,ru,i

)

Initialization: initialize a random matrix foruser feature matrixPRk×m and item feature matrixQRk×n; initialize a randommatrixforbuRmandbiRn;initializezeromatricesforTuRm,thenumberofitemsratedbyuseru,and

TiRn,thenumberofuserswhohaverateditemi;initializearandommatrixfortheneighborhoodmatrixFRm×H Fort=1,2,...,T do

1:Receiveratingpredictionrequestofuseruonitemi 2:Compute

μ

andmakepredictionrˆu,i=

μ

+bu+bi+PuTQi

3:Thetrueratingru,iisrevealed

4:Thealgorithmsuffersalossl

(

Pu,Qi,bu,bi,ru,i

)

5:UpdateTuandTi:Tu=Tu+1;Ti=Ti+1

6:Update p

(

u

)

andq

(

i

)

: p

(

u

)

=Tu/t, q

(

i

)

=Ti/t

7: UpdatePu,Qi,bu,andbiaccordingto(10),(11),(12),and(13),respectively

8:ift=Cthen

9: UpdateF accordingto(9)

10:endif Endfor

4. Experiments

Weperformedseveralexperimentstocomparethequalityofourproposedmethodstothoseofthebaselineapproaches, andanalyzedtheresultsindetail.

4.1. Datasets

Weperformedexperimentsonthreepublic datasets:MovieLens100K[7],MovieLens1M[7],andHetRec2011[7],which are availablefromthe MovieLenswebsite.1 TheMovieLens100K datasetcontains100,000ratingsfrom943users on1682

movies.The MovieLens1Mdatasetconsistsof1million ratingsfrom6000 userson4000movies.The HetRec2011dataset comprises855,598ratingsfrom2113userson10,109movies.

We adopted themostpopular metric,RMSE, to evaluate the predictionaccuracy ofour proposed methods. Asmaller RMSEindicatesbetterpredictionaccuracy.

4.2. Baselineapproaches

Todemonstratetheadvantagesanddisadvantagesofthealgorithm, weconducteda comparativeexperimentwithour threebaselineapproaches.

(7)

Table1 PerformanceonMovieLens100K. Method\k 5 10 15 20 25 30 35 40 OLR 1.1242 0.9911 1.0142 1.1133 1.236 1.3603 1.4833 1.5949 CWOCF_I 1.0405 1.0101 1.0184 1.0536 1.0931 1.1541 1.2340 1.3473 SOCF_II 1.0403 0.9863 1.0434 1.1485 1.2432 1.3286 1.3985 1.4861 OCF-DR_I 1.0384 1.0390 1.0401 1.0398 1.0391 1.0393 1.0398 1.0405 OCF-DR_II 1.0381 1.0388 1.0384 1.0385 1.0389 1.0391 1.0387 1.0393 Table2 PerformanceonMovieLens1M. Method\k 5 10 15 20 25 30 35 40 OLR 1.1166 0.9760 0.9695 1.0236 1.0976 1.1768 1.2500 1.3204 CWOCF_I 0.9732 0.9642 0.9693 0.9851 1.0139 1.0521 1.0806 1.1200 SOCF_II 1.0054 0.9469 0.9699 1.0211 1.0874 1.1544 1.2173 1.2765 OCF-DR_I 0.9707 0.9701 0.9691 0.9697 0.9700 0.9705 0.9719 0.9726 OCF-DR_II 0.9688 0.9689 0.9685 0.9691 0.9683 0.9694 0.9698 0.9710 Table3 PerformanceonHetRec2011. Method\k 5 10 15 20 25 30 35 40 OLR 0.9075 0.8869 0.896 0.9099 0.9654 1.0244 1.0862 1.1473 CWOCF_I 0.8754 0.8541 0.8906 0.8959 0.9076 0.9156 0.9222 0.9392 SOCF_II 0.9131 0.8630 0.8982 0.9147 0.9209 0.9367 0.9419 0.9577 OCF-DR_I 0.8921 0.8906 0.8912 0.8917 0.8920 0.8939 0.8941 0.8957 OCF-DR_II 0.8908 0.8900 0.8898 0.8893 0.8888 0.8884 0.8879 0.8870

OLR:onlinelow-rankapproximation,whichlearnsarank-kMFbyusingonlinegradientdescenttodirectlyoptimizethe lossfunction[11];

CWOCF_I:confidence-weightedOCF,whichappliesconfidence-weightedonlinelearningtoaddresstheOCFtask[23];

SOCF_II:second-ordersparseOCF,whichaddsanabsolutetermtotheobjectivefunction[18].

4.3.Resultsandanalysis 4.3.1. Overallperformance

Weevaluatedtheperformanceofallthemethodsonthethreedatasets.Thelearningrate

η

ofallthemethodswassetto 0.005forafaircomparison.Thenumberoflatentfeaturesvariedfrom5to40.Theparametersofthesebaselineapproaches weresetaccordingtoprevious workorbysearchingfromasingleexperimentoneachdataset.Eachexperimentwasrun randomlytentimes,andweobtainedtheaveragevaluesofeachmethod.

Tables 1–3 present the performance ofthe methods on each dataset.The bold elements in the tables represent the bestperformanceamongallmethods.Thetablesshow thatourproposedmethodsperformedbestinmostcases.Wetake Table1asanexample.Asthenumberoflatentfeatureskincreases,theRMSEincreasesslowlyinOCF-DR_I.Underthesame valueofk,theRMSEvaluesinOCF-DR_IIareslightlybetterthanthoseofOCF-DR_I.Becausetheratingmatrixissparse,the neighborweightmethodimprovesthepredictionaccuracyofOCF.Comparedwiththoseofthebaselines,theRMSEvalues inourmethodsarestablewithincreasingkbecauseonceourmodelhasbeenfitted,userswithveryfewratingswillhave featurevectorsclosetothemean,sothepredictedratingsforthoseuserswillbeclosetotheaveragemovieratings.

TheRMSEvaluesofall approachesonthethreedatasets withthenumberoflatentfeatures rangingfrom5to 40are shownin Figs. 2–6. Ourmethods outperformed the other OCF methods, andconverged faster than all the baseline ap-proaches,especiallywhenthealgorithmstartedrunning.

First,comparedwithotherbaselineapproaches,i.e.,OLR,CWOCF_I,andSOCF_II,wefoundthat ourmethodsperformed betterintermsofRMSE.Inmostcases,ourmethodresultedinsmallerRMSEvalues,indicatingthatmethodsthatconsider dynamicregularizationareeffectiveforonlinelearningCF.

Second,weobservedthatourmethodsweremorestablethanthebaselineapproachesinmostcases.OLRwasrelatively sensitivebecauseitlosessometemporalinformationinonlinelearning.TemporaldynamicfactorsareimportantforOCFto capturetheusersdynamichabitsandintereststhroughouttheonlinelearningprocess.

Third,ourmethodswere abletoobtainrelatively stableRMSEvaluesunderdifferentconditionswithvarying numbers oflatentfeatures.Therefore,ourmethodsarerobusttothenumberoflatentfeatures.

Finally,when we compared OCF-DR_I andOCF-DR_II, we found that OCF-DR_II achievedlower RMSEvalues in most cases,indicatingthattheneighborhoodfactorisessentialforOCFtaskstoimprovethepredictionaccuracy.

(8)

Fig.2. PerformanceofallmethodsonMovieLens100Kwithk =5.

(9)

Fig.4. PerformanceofallmethodsonMovieLens1Mwith k =5.

(10)

Fig.6. PerformanceofallmethodsonHetRec2011with k =30.

Table4

Results ofOCF-DR_II with different α on Movie-Lens1Mwhenk =25. α\λ 0.01 0.1 1 10 0.01 0.9870 0.9818 0.9695 1.0142 0.1 0.9866 0.9822 0.9683 1.0155 1 0.9854 0.9807 0.9730 1.0354 10 0.9849 0.9806 0.9745 1.0522

4.3.2. Impactofthenumberoflatentfeatures

Figs.7and8presenttheimpactofthenumberoflatentfeaturesonall methods.Fig.9showsanenlargeddrawing of ourmethodswithdifferentnumbersoflatentfeatureskonMovieLens1M.Weobservethatourmethodswerestableunder differentnumbers oflatentfeatures, whileOLR,CWOCF_I,andSOCF_IIwere sensitive tochanges inthenumberoflatent features.Ononehand,toosmallavalueofk(lessthan15)causestheseapproachestounderfit.However,theyallperform wellasthevalueofkincreases.Ontheotherhand,becausealargevalueofkcausestheseapproachestooverfit,whenthe valueofkislarge(largerthan15) andcontinuestoincrease, theperformance ofallthebaseline approachesworsens.Our methodsarestable,andkhaslittleeffectonouralgorithms.

When the value ofk is small(k<15), the latent features arenot sufficient to contain theuser’s interest information, andouralgorithmsperformnobetterthanothermethods.ThisisbecauseCWOCF_IandSOCF_IIaresecond-ordermethods which usethecovarianceinformation ofthelatent vector tocompensate forthelack ofuserinterest information.When thevalueofkislarge(k> 15),thelatentfeaturescontainsufficientuserinterestinformation.Covarianceinformationofthe latentvectorisredundant,whichresultsinnoobviousincrease inprecision.However, ourmethodsare stablebecausewe trackdynamicchanges:theaverageofeachuser,thebiasofeachuser,andthebiasofeachitemineachround.Moreover, Fig.9showsthatRMSEvaluesofourmethodsfluctuateinaverysmallrangeaskincreases.Thissuggeststhatalargelatent factorhaslittleeffectonourmethods.

4.3.3. Theimpactofparameters

Fig.10andTable4presenttheimpactoftwoparametersonourmethods.First,wesetk=25forOCF-DR_Itoshowits performanceunderdifferentvaluesof

λ

,whichcontrolstheimpactoftheregularizationpenaltyforOCF-DR_I.Weobserve that the curve ofOCF-DR_I declines moresharplyandfinally reaches alarger RMSEvalue withan increasing numberof samples.Second,wesetk=25forOCF-DR_IItoshowitsperformanceunderdifferentvaluesof

α

and

λ

.FromTable4,we
(11)

Fig.7. PerformanceofallmethodsonMovieLens100Kwithdifferentnumbersoflatentfeaturesk .

(12)

Fig.9. EnlargeddrawingofOCF-DR_IandOCF-DR_IIonMovieLens1Mwithdifferentnumbersoflatentfeaturesk .

Fig.10. ConvergenceeffectsofOCF-DR_Iwithdifferent λonMovieLens1Mwhen k =25.

observethat fora givenparameter,theRMSEvaluedecreases firstandthenincreasesasanotherparameterincreases be-causethealgorithmisunderfitforextremelysmall

λ

or

α

andoverfitforextremelylarge

λ

or

α

.Therefore,bothparameters areessentialtoconstrainthestabilityandconvergencerateofthealgorithms.

5. Conclusions

Inthispaper,we consideredthe dynamicratingbehavior andupdatedthe weightoftheuserfeature vectoranditem featurevectorineachroundtoimprovepredictionaccuracy.Wethenaddedtheneighborhoodfactortoourmethodtotrack thedriftofuserpreferences.Experimentalresultsshowthatourproposedmethodsoutperformedthebaselineapproaches. Comparedtothebaselineapproaches, ourmethods convergerapidlyandobtainhighpredictionaccuracy,i.e.,lower RMSE

(13)

values.Therefore, to consider rational dynamicfactors is crucial to improve the predictionaccuracy of OCF methods. In thefuture,wewill focusontwo directions.First, wewill trytoextract useranditem featuresfromuser/item side infor-mationby deeplearningapproachesoffline asinputforOCF.Second, wewillattemptto applyourmethods toeducation recommendersystems.

DeclarationofCompetingInterest

Wewishtoconfirmthattherearenoknownconflictsofinterestassociatedwiththispublicationandtherehasbeenno significantfinancialsupportforthisworkthatcouldhaveinfluenceditsoutcome.

References

[1] Y.Cao,W.Li,D.Zheng,Animprovedneighborhood-awareunifiedprobabilisticmatrixfactorizationrecommendation,Wirel.Pers.Commun.102(4) (2018)3121–3140.

[2] Z.D.Champiri, S.R.Shahamiri,S.S.B.Salim,Asystematicreviewofscholarcontext-aware recommendersystems,ExpertSyst.Appl.42(3)(2015) 1743–1758.

[3] R.Chen,Q.Hua,Y.Chang,B.Wang,L.Zhang,X.Kong,Asurveyofcollaborativefiltering-basedrecommendersystems:fromtraditionalmethodsto hybridmethodsbasedonsocialnetworks,IEEEAccess6(2018)64301–64320.

[4] K.Christakopoulou,F.Radlinski,K.Hofmann,Towardsconversationalrecommendersystems,in:Proceedingsofthe22NdACMSIGKDDInternational ConferenceonKnowledgeDiscoveryandDataMining,in:KDD’16,ACM,NewYork,NY,USA,2016,pp.815–824.

[5] A.S.Das,M.Datar,A.Garg,S.Rajaram,Googlenewspersonalization:scalableonlinecollaborativefiltering,in:Proceedingsofthe16thInternational ConferenceonWorldWideWeb,in:WWW’07,ACM,NewYork,NY,USA,2007,pp.271–280.

[6] M.Dredze,K.Crammer,F.Pereira,Confidence-weightedlinearclassification,in:Proceedingsofthe25thInternationalConferenceonMachineLearning, in:ICML’08,ACM,NewYork,NY,USA,2008,pp.264–271.

[7] F.M.Harper,J.A.Konstan,Themovielensdatasets:historyandcontext,ACMTrans.Interact.Intell.Syst.5(4)(2015)19:1–19:19.

[8] X.He,L.Liao,H.Zhang,L.Nie,X.Hu,T.-S.Chua,Neuralcollaborativefiltering,in:Proceedingsofthe26thInternationalConferenceonWorldWide Web,in:WWW’17,InternationalWorldWideWebConferencesSteeringCommittee,RepublicandCantonofGeneva,Switzerland,2017,pp.173–182.

[9] A.Hernando,J.Bobadilla,F.Ortega,AnonnegativematrixfactorizationforcollaborativefilteringrecommendersystemsbasedonaBayesian proba-bilisticmodel,Knowl.-BasedSyst.97(C)(2016)188–202.

[10] S.C.H.Hoi,D.Sahoo,J.Lu,P.Zhao,Onlinelearning:acomprehensivesurvey,CoRR,2018abs/1802.02871.

[11]J.Langford,J.Abernethy,K.Canini,A.Simma,OnlineCollaborativeFiltering,Tech.Rep.,UniversityofCaliforniaatBerkeley,2007.

[12] M.Jugovac,D.Jannach,Interactingwithrecommendersoverviewandresearchdirections,ACMTrans.Interact.Intell.Syst.7(3)(2017)10:1–10:46.

[13] Y.Koren,Collaborativefilteringwithtemporaldynamics,Commun.ACM53(4)(2010)89–97.

[14] Y.Koren,Factorintheneighbors:scalableandaccuratecollaborativefiltering,ACMTrans.Knowl.Discov.Data4(1)(2010)1:1–1:24.

[15] N.Lathia,S.Hailes,L.Capra,Temporalcollaborativefilteringwithadaptiveneighbourhoods,in:Proceedingsofthe32ndInternationalACMSIGIR ConferenceonResearchandDevelopmentinInformationRetrieval,in:SIGIR’09,ACM,NewYork,NY,USA,2009,pp.796–797.

[16] Y.Li,Z.Li,F.Wang,L.Kuang,Acceleratedonlinelearningforcollaborativefilteringandrecommendersystems,in:2014IEEEInternationalConference onDataMiningWorkshop,2014,pp.879–885.

[17]D.Liang,J.Altosaar,L.Charlin,D.M.Blei,Factorizationmeetsthe itemembedding:regularizingmatrix factorizationwithitemco-occurrence,in: Proceedingsofthe10thACMConferenceonRecommenderSystems,in:RecSys’16,ACM,NewYork,NY,USA,2016,pp.59–66.

[18] F.Lin,X.Zhou,W.H.Zeng,F.Lin,X.Zhou,W.Zeng,Sparseonlinelearningforcollaborativefiltering,Int.J.Comput.Commun.Control11(2)(2016) 248–258.ISSN

[19] G.Ling,H.Yang,I.King,M.R.Lyu,Onlinelearningforcollaborativefiltering,in:The2012InternationalJointConferenceonNeuralNetworks(IJCNN), 2012,pp.1–8.

[20]C.Liu,S.C.Hoi,P.Zhao,J.Sun,E.-P.Lim,Onlineadaptivepassive-aggressivemethodsfornon-negativematrixfactorizationanditsapplications,in: Proceedingsofthe25thACMInternationalonConferenceonInformationandKnowledgeManagement,in:CIKM’16,ACM,NewYork,NY,USA,2016, pp.1161–1170.

[21]C.Liu,T.Jin,S.C.Hoi,P.Zhao,J.Sun,Collaborativetopicregressionforonlinerecommendersystems:anonlineandBayesianapproach,Mach.Learn. 106(5)(2017)651–670.

[22]N.N.Liu,M.Zhao,E.Xiang,Q.Yang, Onlineevolutionarycollaborativefiltering,in:Proceedingsofthe FourthACMConferenceonRecommender Systems,in:RecSys’10,ACM,NewYork,NY,USA,2010,pp.95–102.

[23]J.Lu,S.C.H.Hoi,J.Wang,P.Zhao,Secondorderonlinecollaborativefiltering,in:JMLR:WorkshopandConferenceProceedings:ACML2013,14,2013, pp.325–340.

[24]X.Luo,M.Zhou,Y.Xia,Q.Zhu,Anefficientnon-negativematrix-factorization-basedapproachtocollaborativefilteringforrecommendersystems,IEEE Trans.Ind.Inform.10(2)(2014)1273–1284.

[25]R.Pálovics,A.A.Benczúr,L.Kocsis,T.Kiss,E.Frigó,Exploitingtemporalinfluenceinonlinerecommendation,in:Proceedingsofthe8thACM Confer-enceonRecommenderSystems,in:RecSys’14,ACM,NewYork,NY,USA,2014,pp.273–280.

[26]E.Palumbo,G.Rizzo,R.Troncy,Entity2rec:learninguser-itemrelatednessfromknowledgegraphsfortop-nitemrecommendation,in:Proceedingsof theEleventhACMConferenceonRecommenderSystems,in:RecSys’17,ACM,NewYork,NY,USA,2017,pp.32–36.

[27]S.Sedhain,A.K.Menon,S.Sanner,L.Xie,Autorec:autoencodersmeetcollaborativefiltering,in:Proceedingsofthe24thInternationalConferenceon WorldWideWeb,in:WWW’15Companion,ACM,NewYork,NY,USA,2015,pp.111–112.

[28]S.Shalev-Shwartz,Onlinelearningandonlineconvexoptimization,Found.TrendsMach.Learn.4(2)(2012)107–194.

[29]H.Wang,N.Wang,D.-Y.Yeung,Collaborativedeeplearningforrecommendersystems,in:Proceedingsofthe21thACMSIGKDDInternational Confer-enceonKnowledgeDiscoveryandDataMining,in:KDD’15,ACM,NewYork,NY,USA,2015,pp.1235–1244.

[30]J.Wang,S.C.Hoi,P.Zhao,Z.-Y.Liu,Onlinemulti-taskcollaborativefilteringforon-the-flyrecommendersystems, in:Proceedingsofthe7thACM ConferenceonRecommenderSystems,in:RecSys’13,ACM,NewYork,NY,USA,2013,pp.237–244.

[31] C.-Y.Wu,A.Ahmed,A.Beutel,A.J.Smola,H.Jing,Recurrentrecommendernetworks,in:ProceedingsoftheTenthACMInternationalConferenceon WebSearchandDataMining,in:WSDM’17,ACM,NewYork,NY,USA,2017,pp.495–503.

[32]L.Xiong,X.Chen,T.-K.Huang,J.G.Schneider,J.G.Carbonell,TemporalcollaborativefilteringwithBayesianprobabilistictensorfactorization.,in:SDM, SIAM,2010,pp.211–222.

[33]X.Yang,H.Steck,Y.Liu,Circle-basedrecommendationinonlinesocialnetworks,in:Proceedingsofthe18thACMSIGKDDInternationalConference onKnowledgeDiscoveryandDataMining,in:KDD’12,ACM,NewYork,NY,USA,2012,pp.1267–1275.

[34]Z.Yang,B.Wu,K.Zheng,X.Wang,L.Lei,Asurveyofcollaborativefiltering-basedrecommendersystemsformobileinternetapplications,IEEEAccess 4(2016)3273–3287.

[35]P.Zhao,S.C.H.Hoi,R.Jin,DUOL:adoubleupdatingapproachforonlinelearning,in:Proceedingsofthe22ndInternationalConferenceonNeural InformationProcessingSystems,in:NIPS’09,CurranAssociatesInc.,USA,2009,pp.2259–2267.

(14)

[36]X.Zhao,W.Zhang,J.Wang,Interactivecollaborativefiltering,in:Proceedingsofthe22ndACMInternationalConferenceonInformation&Knowledge Management,in:CIKM’13,ACM,NewYork,NY,USA,2013,pp.1411–1420.

[37]Z.Zhao,H.Lu,D.Cai,X.He,Y. Zhuang, Userpreferencelearningfor onlinesocialrecommendation,IEEETrans.Knowl. DataEng.28(9)(2016) 2522–2534.

[38]X.Zhou,W.Shu,F.Lin,B.Wang,Confidence-weightedbiasmodelforonlinecollaborativefiltering,Appl.SoftComput.70(2018)1042–1053.

535–548 ScienceDirect www.elsevier.com/locate/ins https://grouplens.org/datasets/movielens/ 3121–3140 1743–1758 64301–64320 815–824 271–280 264–271 19:1–19:19 173–182 188–202 abs/1802.02871 2007 10:1–10:46 89–97 4 796–797 879–885 59–66 248–258 2012, 1161–1170 651–670 95–102 325–340 1273–1284 273–280 32–36 111–112 107–194 1235–1244 237–244 495–503 211–222 1267–1275 3273–3287 2259–2267 1411–1420 2522–2534 1042–1053 135–143

Figure

Fig. 1. Diagram of dynamic neighborhoods.
Table 1  Performance on MovieLens100K.  Method \ k  5  10  15  20  25  30  35  40  OLR  1.1242  0.9911  1.0142  1.1133  1.236  1.3603  1.4833  1.5949  CWOCF_I  1.0405  1.0101  1.0184  1.0536  1.0931  1.1541  1.2340  1.3473  SOCF_II  1.0403  0.9863  1.0434
Fig. 2. Performance of all methods on MovieLens100K with k = 5 .
Fig. 4. Performance of all methods on MovieLens1M with k = 5 .
+4

References

Related documents

Take another problem evaluate fatigue following exercises the uc davis office shuffle two functions are the development of wrinkle to infinity each of limits must exist or..

According to the Danish Arbitration Act, Section 30, if requested by the parties, and if the arbitral tribunal does not object, the arbitral tribunal shall record

If Uninsured Motorists Coverage is not deleted and the policy of m otor vehicle liability insurance does not include collision coverage, the California Insurance Code requires

Trigonometric ratio for certain shapes and questions on polygons to determine the fenced in the appropriate formula for primary school students must contact the find the perimeter

Undergraduate students participating in the Sea Education Association's SEA Semester program will join professional oceanographers aboard a scientific research vessel this

The research suggests that Mars is not a terrestrial planet like Earth, which grew to its full size over 50 to 100 million years via collisions with other small bodies in the

Vorbeck Materials of Jessup, Md., with the help of an NSF Small Business Innovation Research grant, is at the forefront of efforts to bring graphene technology to the

NSF-supported research team, led by Nathan Putman of Oregon State University, used data from more than 56 years of catches in salmon fisheries to identify the salmon's