Sparse online collaborative filtering with dynamic regularization

(1)

Contents lists available at ScienceDirect

Information

Sciences

journal homepage: www.elsevier.com/locate/ins

Sparse

online

collaborative

ﬁltering

with

dynamic

regularization

Kangkang

Li

a

,

Xiuze

Zhou

b

,

Fan

Lin

a,∗

,

Wenhua

Zeng

a

,

Beizhan

Wang

a

,

Gil

Alterovitz

c

a_{Software School, Xiamen}_University,_{Xiamen City, Fujian,}_China b_{Department of}_{Automation, Xiamen}_University,_{Xiamen City, Fujian, China} c_{Boston Children’s Hospital,}_{Harvard Medical}_School,_Boston,_USA

a

r

t

i

c

l

e

i

n

f

o

Article history:

Received17November2017 Revised23July2019 Accepted29July2019 Availableonline29July2019

Keywords:

Collaborativeﬁltering Dynamicregularization Onlinecollaborativeﬁltering Neighborhoodfactor

a

b

s

t

r

a

c

t

Collaborativefiltering(CF)approachesarewidelyappliedinrecommendersystems. Tradi-tionalCFapproacheshavehighcoststotrainthemodelsandcannotcapturechangesin userinterestsanditempopularity.MostCFapproachesassumethatuserinterestsremain unchangedthroughout the wholeprocess. However, user preferencesarealways evolv-ingand thepopularityofitemsisalwayschanging.Additionally,inasparsematrix,the amountofknownratingdataisverysmall.Inthispaper,weproposeamethodofonline collaborativefilteringwithdynamicregularization(OCF-DR),thatconsidersdynamic infor-mationandusestheneighborhoodfactortotrackthedynamicchangeinonline collabo-rativefiltering(OCF).TheresultsfromexperimentsontheMovieLens100K,MovieLens1M, andHetRec2011 datasetsshow thattheproposed methodsaresignificantimprovements overseveralbaselineapproaches.

1. Introduction

Recommendersystemshelpconsumersfinditemsofinterestfroma widerangeofchoices[3].Theyprovideadvice to findproducts,alleviateinformationoverload[2],andenhanceoursatisfactionandloyaltytoincreasesalesfore-commerce firms[12].

Collaborativeﬁltering(CF)isoneofthemostsuccessfultechniquesappliedinrecommendersystems[3].CFapproaches predictuserpreferencesonlyontheirhistoricalratingdataanddonotrequiredomainknowledgeoradditionalinformation. ThekeyideaofCFisthatiftwousershavepreviouslylikedsimilaritems,theywillcontinuetodosointhefuture[34].

Manymodel-basedCF approacheshave beenproposed to improvethe quality ofrecommendersystems. Forexample, Zhouetal.addedratinginformationtothe latentDirichletallocation(LDA)model toobtainthe distributionofusers’ in-terests[39].Palumboetal.proposedentity2rectolearnuser-itemrelatednessbasedonknowledgegraphsfortop-n recom-mendersystems[26].Liangetal.usedmatrixfactorizationanditemembeddingtoobtainthecooccurrencepatternsofrare items[17].

Manydeeplearningmethodshavebeenapplied inrecommendersystems.Forexample,Wangetal.proposeda hierar-chicalBayesianmodelbyintegratingastackeddenoisingautoencoderintoprobabilisticmatrixfactorization[29].Heetal.

∗ _{Corresponding}_author.

E-mail address: [email protected](F.Lin).

https://doi.org/10.1016/j.ins.2019.07.093

(2)

proposedageneralframework,neuralnetwork-basedCF,whichleveragesamulti-layerperceptronmodeltolearnthe user-item interactionfunction [8].Wu etal.proposedrecurrentrecommendernetworkscombiningalong short-termmemory modelwithtraditionallow-rankfactorization[31].Sedhainetal.usedanautoencodertoembeditemsintolatentspace[27]. One ofthemostpopularCF approachesismatrix factorization(MF),includingprobabilistic matrixfactorization(PMF) [1]andnonnegativematrixfactorization(NMF)[9].Theidea istoobtainuser/item latentfeatures fromhigh-dimensional dataandusethemtopredictunknownratings.

TraditionalCFmethodsthatadoptbatchlearningalgorithms[10]havefourmaindrawbacks.First,retrainingamodelis expensive[4,16,18,19],andthemodelsrequireallthedatatobeavailablebeforeretraining.Second,theydonotwellhandle dynamicratingsdata, whichare sequential,andnewusersandproducts are constantlybeingadded [18]. Third,they fail tocapturethedriftofuserpreferences[4,13,37]becauseuserinterests arealways evolvingandthepopularityofitemsis continuouslychanging.Finally,theyareunsuitableandnon-scalableforreal-worldlarge-scaleonlineapplicationsinwhich ratingsusuallyarrivesequentiallyandperiodically[23].

Avarietyofonlinecollaborativeﬁltering(OCF)methodshavebeenproposedrecentlytoalleviatetheseissues.Abernethy etal.proposedtolearnalow-rankMFbyoptimizingtheobjectivefunctioninstochasticgradientdescent[11].Wangetal. exploitedaprinciplesimilartothatofonlinemultitasklearningforOCF[30].Luetal.appliedconﬁdence-weightedlearning [6]forOCFtasks[23].Chenghaoetal.proposed anonlineBayesianinferencealgorithmincorporatingcontentinformation intoOCF[21].

OCF methods, which apply online learning algorithms, have several attractive advantages: (i) they avoid retraining a modelfromscratchfornewtrainingdatabecauseOCFmethodsupdatemodelssequentially[10];(ii)theirscaleislinearly related tothe number ofobserved ratings andthe size ofthe latent features [19,38]; and (iii)they have low sensitivity to changes whenmodels add newratings [38] because they adoptonlinelearningalgorithms to update modelsfor new trainingdata.

The parameters ofmostcurrentCF methods,both oﬄineand online,arepredeterminedandinvariant throughoutthe entire operation.These methods assume that user preferences anditem popularityare static [23], which means that all itemsare equallyratedby everyuserandall usersare equallylikelyto rateeveryitem. However,thisassumption isnot basedinrealitybecauseuserpreferencescontinuouslyevolve[36].

CFmethodsmustbe abletotrackuserinterestsandquicklyproviderecommendations[22].Therefore,Koren proposed a CF method withtemporal dynamicsand builta factorizationmodel of thechanging characteristics ofusers and items [13].Xiongetal.proposedafactor-basedCFmethodthataccountsfortime[32].Plovicsetal.exploitedtemporalinﬂuences betweenuserstoimproverecommendationquality[25].

Inthispaper,we adddynamicfactors,includingthedynamicaveragerating,userdynamicratinghabits,item dynamic attributes, anddynamic ratingdistribution, toOCF to improveits accuracy. The neighborhood factoris alsoincorporated inourmethodstotrackchangesinuserpreferences.Userratingbehavior,itempopularity,andthedistributionofauser’s ratings areconstantly inﬂux,especially foronlinelearning. Ifauser’stastes change,thenhis/herneighborhood willalso change.

Ourcontributionsinthispaperincludethefollowing:

(i) Wecomputethedynamicratingbehavior, i.e.,theratingaverageofeachuser,thebiasofeach user,andthebiasof eachitemineachinteraction.

(ii) Weupdatetheweightoftheuserfeaturevectoranditemfeaturevectorineachround. (iii) Weaddtheneighborhoodfactortoourmethodtotrackthedriftofuserpreferences.

The rest of this paper is organized as follows. Section 2 brieﬂy reviews the background and related work. Section 3 presents thedetails ofour proposed model. Section 4describes the experimental settingand resultson three public datasets to demonstratethe performance of our methods.Section 5provides our conclusionsand suggestions for futurework.

2. Backgroundandrelatedwork

Inthissection,weintroducetheproblemsettingsofCF. Thenwereview somerelatedworkonrecommendersystems, includingmatrix-factorization-basedCFmethods,whichperformwelltoreducetheproblemofhigh-dimensionaldata,and OCFmethods,whichwestudyinthispaper.

2.1. Problemsetting

CFconsistsofthreebasicelements:user,item,andrating.ThedeﬁnitionsofCFincludethefollowing: Deﬁnition1. UserfeaturematrixP∈Rk×m_;_and_user_feature_vectors_P

u. Deﬁnition2. ItemfeaturematrixQ∈Rk×n_;_and_item_feature_vectors_Q

i. Deﬁnition3. User-itemratingmatrixR∈Rm×n_;_r

(3)

misthe numberofusers,n isthenumberofitems, andk isthenumberoflatentfeatures. kismuch smallerthann

andm.PandQareusedtopredicttheunknownrating: ˆ

ru,i=PuTQi.

TheapproximationerrorisminimizedtoobtaintheoptimalmatricesPandQ: argmin

P∈Rk×m_,_Q_∈_Rk×n

R

−PT_Q

2 F,

where

·

FistheFrobeniusnormofthematrix.Theaboveequationcanbeformulatedas argmin

P∈Rk×m_,_Q_∈_Rk×n

(u,i)∈C

l

(

Pu,Qi,ru,i

)

,

whereC_⊆{(u,i)|r_u,_iisknown},andthelfunction isusedtomeasure thedifferencebetweenthepredictedandrealvalues. Themostcommonmeasurementsarerootmeansquarederror(RMSE)andmeanabsoluteerror(MAE),deﬁnedas

RMSE=

1

|

C

|

(u,i)∈C

(

ru,i−rˆu,i

)

2 MAE= 1

|

C

|

(u,i)∈C

r

u,i−rˆu,i

,

whereC⊆{(u, i)|ru,iisknown},|C| is the number ofknown ratings in C,rˆu,i=PuTQi is the predictedrating, andru,i isthe knownrating.TooptimizetheRMSE,wedeﬁnethelossfunctionas

l1

(

Pu,Qi,ru,i

)

=

(

ru,i−PuTQi

)

2.

Similarly,tooptimizetheMAE,wedeﬁnethelossfunctionas l2

(

Pu,Qi,ru,i

)

=

r

u,i−PuTQi

.

2.2.Matrixfactorization-basedCFmethods

CFmethodsaregenerallydividedintotwobasiccategoriesofmodel-andmemory-basedCF[3].Model-basedCFmethods performbetter onproblems ofdatasparsity, andare thereforemorepromising,andonce trained,they make predictions moreeﬃciently[19].

MFisone ofthemostsuccessfulmodel-basedalgorithms[13].It performswell atreducing high-dimensionaldata.Its keyideaisthatauser’sratingbehavioronanitemisdeterminedbysomelatentfeatures[24].First,MFlearnstheuserand itemfeature vectorsbasedonuser-item ratingsto reducethe dimensionality,andthenuses thefeature vectorto predict unknownratings.Matrix-factorization-basedCFisdeﬁnedas

(u,i)∈C l

(

Pu,Qi,ru,i

)

+

λ

2

m u=1

Pu

2+ n i=1

Qi

2

,

whereC⊆{(u,i)|ru,iisknown},l(Pu,Qi,ru,i)isalossfunction,and

λ

isaregularizationparameter.

2.3.OCFmethods

Onlinelearning algorithms can update the model and avoid the cost ofretraining it when a new instance is added [10,28,35].Manystudies onOCF haveemerged recently. Abernethyet al.proposed an OCFmethod forlearninglow-rank MF byoptimizing theobjective function directly[11]. However, thismethod optimizedonly theloss function andled to overﬁtting.Aneffectivesolutiontothisproblemistoaddregularizationtermstoconstraintheobjectivefunction.Dasetal. proposedacombinationofMinHashclustering,probabilisticlatentsemanticindexing,andcovisitationcountstorecommend personalizednewsforonlineusers[5].Zhouetal.addedbiasandconﬁdenceweights toOCFtoimproveits stabilityand accuracy[38].Lingetal.designeda dual-averagemethodforprobabilisticMFbyaddingpreviousratinginformationinan approximateaveragegradientoftheloss[19].

OCFisappliedinmanyotherareas.Forexample,Chenghaoetal.combinedonlineadaptivepassive-aggressivemethods withnonnegativematrixfactorizationtoimprovetheaccuracy[20],andYangetal.appliedOCFtosocialnetworking[33]. 3. Onlinecollaborativeﬁlteringwithdynamicregularization

Wenowintroducethealgorithmforonlinecollaborativeﬁlteringwithdynamicregularization(OCF-DR).Itskeyfunction istoadddynamicregularizationandaneighborhoodfactortoOCFtoimproveitsstabilityandaccuracy.

(4)

3.1. Onlinecollaborativeﬁlteringwithdynamicregularization

The parametersofmostCF methodsare predeterminedandinvariable, anduserpreferencesanditempopularities are assumedtobe static[36].Therefore,themethods assumethat allitemsare equallyratedbyevery userandall usersare equallylikelytorateeveryitem.Thisassumptionisnotrealistic[38].

First,thepopularityofproductsconstantlychanges,especiallywhennewchoicesemerge[18].Seasonalfactorsareamong themostimportant.Forexample,sweatersareeasiertosellinwinterthaninsummer.Second,auser’sinclination contin-uouslychanges overtime,whichleadstotherepositioningoftheirtastes.Emotionsareanotherimportantfactor[39].For example,userswhoratedamovieat3starsmaychangetheirratingto4starsdependingontheirmood.So,dynamic fac-torsareessentialforOCFmethods[18].However,currentworkonOCFseldomconsidersthedynamicchangesofusersand items,andtoconsidertheratingofonlytheinteractionoftheuserfeaturevectoranditemfeaturevectorisunreasonable.

Therefore,weproposeOCF-DR,whichaddsdynamicregularizationandaneighborhoodfactortoOCF.Ithasthreemain parts:(i) computingthedynamicchange,i.e.,thedynamicratingaverageofeachuser,thedynamicbiasofeachuser,and the dynamicbias ofeach item ineach interaction; (ii) updating the weights ofthe userfeature vector anditem feature vectorineachround;and(iii)takingtheneighborhoodfactorintoaccounttotrackthechangeofuserpreferences.

Ourmethodpredictsratingsasfollows: ˆ

r_u,i=

μ

+bu+bi+PuTQi,

where

μ

istheoverallratingaverage;bu istheuserbias,theobserveddeviationofuseru,i.e., theratingbehavior ofthe user;andbiistheitembias,theobserveddeviationofitemi,i.e.,theuniquecharacteristicsoftheitem.

Userbiasrepresentstheintrinsictrendoftheuser.Forexample,someuserstendtogivehigherratingsthanothers.Item biasrepresentstheintrinsicpropertiesoftheitem.Forexample,someitemsreceivehigherratingsthanothers.Bothbiases areindependentofuserinteractions.

TooptimizetheRMSE,wedeﬁnethelossfunctionas

l1

(

Pu,Qi,bu,bi,ru,i

)

=

(

ru,i−

μ

−bu−bi−PuTQi

)

2. (1) Similarly,tooptimizetheMAE,wedeﬁnethelossfunctionas

l2

(

Pu,Qi,bu,bi,ru,i

)

=

r

u,i−

μ

−bu−bi−PuTQi

. (2) Inourmethods,wechoosel1 astheevaluationmetric.

3.1.1. OCF-DR_I

TheobjectivefunctionofOCF-DR_Iis C

(

Pu,Qi,bu,bi

)

= (u,i)∈C l

(

Pu,Qi,bu,bi,ru,i

)

+

λ

₂

m u=1 p

(

u

)

Pu

2+ n i=1 q

(

i

)

Qi

2+b2u+b2i

, (3)

wherep(u)istheprobability-of-observingrowofuseru,i.e.,thenumberofitemsratedbytheuser;q(i)isthe probability-of-observingcolumnofitemi,i.e.,thenumberofuserswhohaverateditemi;and

λ

Theprobabilitydistributionofuserratingbehaviorisnotuniformbecausesomeitemsaremorepopularandmorelikely toreceive ratings,andsomeusers aremorelikely toprovideratings.Therefore,thepenalization ofthefeature vectorsof usersoritemswithmoreratingsisincreased.

Stochasticgradientdescent(SGD)isaneffectivetoolforoptimizationandonlinelearning[10].WeadoptSGDtooptimize formula(3)andobtaintheupdaterulesPu,Qi,bu,andbi:

Pu=Pu−

η ∂

C

∂

Pu =

(

1−

λη

·p

(

u

))

Pu+2

η

eu,iQi (4) Qi=Qi−

η ∂

C

∂

Qi =

(

1−

λη

·q

(

i

))

Qi+2

η

eu,iPu (5) bu=bu−

η ∂

C

∂

bu =

(

1−

λη

)

bu+2

η

eu,i (6) bi=bi−

η ∂

C

∂

bi =

(

1−

λη

)

bi+2

η

eu,i, (7)

whereeu,i=ru,i−rˆu,i.rˆu,iisthepredictedrating,andru,iistheknownrating.

λ

η

denotesthe learningrateparameter,whichcontrolsthechangeineachstep(Algorithm1).

(5)

Algorithm1 Onlinecollaborativeﬁlteringwithdynamicregularization(OCF-DR_I). Parameters:n,m,k,

λ

,

η

Input:asequenceofratingpairs

(

u_,i_,ru,i

)

Initialization: initialize a random matrixfor userfeature matrix P∈Rk×m _and_item _feature _matrix _Q_∈_Rk×n_; _initialize _a randommatrixforbu∈R1×mandbi∈R1×n;initializezeromatricesforTu∈R1×m,thenumberofitemsratedbyuseru,and

T_i_∈R1×n_,_the_number_of_users_who_have_rated_item_i Fort=1, 2, . . . , T do

1: Receiveratingpredictionrequestofuseruonitemi 2: Compute

μ

andmakepredictionrˆu,i=

μ

+bu+bi+PuTQi

3: Thetrueratingr_u,iisrevealed

4: Thealgorithmsuffersalossl

(

Pu, Qi, bu, bi, ru,i

)

5: UpdateTu andTi:Tu=Tu+1;Ti=Ti+1

6: Update p

(

u

)

andq

(

i

)

: p

(

u

)

=Tu/t,q

(

i

)

=Ti/t

7: UpdatePu,Qi,bu,andbiaccordingto(4),(5),(6),and(7),respectively Endfor

Fig.1. Diagramofdynamicneighborhoods.

3.1.2. OCF-DR_II

NeighboringtemporalinformationisanimportantfactorinCFthatimprovestherecommendationaccuracy[14].Lathia et al. incorporated the adaptive neighborhood into temporal CF [15], but they considered only the varying size of the neighborhood over time. Liu et al.added temporal information anddeveloped incremental learningsimilarity to extend theneighborhood-basedalgorithm[22].

Foreveryinteraction,auserfeaturevectorwillchangeafterauserratesanitem.Topreventoverﬁttinginthisdynamic process,weincorporatearegularizationtermrelatedtothedifferencebetweenausersfeaturevectorsandhis/her neighbor-hoodsfeaturevectorsforaperiodoftime.Byonlineupdating,thefarthestneighborsoftheusermaychangeforaperiod oftime.AdiagramofdynamicneighborhoodsisshowninFig.1.

So,we attempttotake theneighboringtemporal informationintoOCF-DR_I. Thenthe objectivefunctionof OCF-DR_II becomes C

(

Pu,Qi,bu,bi

)

= (u,i)∈C l

(

Pu,Qi,bu,bi,ru,i

)

+

λ

₂

_m u=1 p

(

u

)

Pu

2+ n i=1 q

(

i

)

Qi

2+b2u+b2i

+

α

2

m u=if∈Fu sim

(

u,f

)

P

u−Pf

2

, (8)

whereFuisthesetofneighborsofuseru;|Fu|isthenumberofneighborsofuseru;

α

and

λ

areregularizationparameters;

p(u)denotestheprobability-of-observingrowofuseru;q(i) istheprobability-of-observingcolumnofitemi;andsim(u,f) isthesimilaritybetweenuseruandf,deﬁnedas

sim

(

u,f

)

=

I

u∩If

u∪If

(6)

whereIu andIfaretherespectivesetsthatusersuandfhaverated. WecansimilarlyobtaintheOCF-DR_IIupdaterules:

Pu=Pu−

η ∂

C

∂

Pu =

(

1−

αη

f∈Fu sim

(

u,f

)

−

λη

·p

(

u

))

Pu+

αη

f∈Fu sim

(

u,f

)

Pf+2

η

eu,iQi (10) Qi=Qi−

η ∂

C

∂

Qi =

(

1−

λη

·q

(

i

))

Qi+2

η

eu,iPu (11) bu=bu−

η ∂

C

∂

bu =

(

1−

λη

)

bu+2

η

eu,i (12) bi=bi−

η ∂

C

∂

bi =

(

1−

λη

)

bi+2

η

eu,i, (13)

where rˆu,i is the predicted rating, ru,i is the known rating, eu,i=ru,i−rˆu,i,

λ

is a regularization parameter, and

η

is the learningrateparameter,whichcontrolsthechangeineachstep.

Toimprovethealgorithm’s speed,we regularly updatethe neighborhoodofa user.We deﬁnethe updatecycleofthe userneighborhoodCandthenumberofneighborhoodsH(Algorithm2).

Algorithm2 OCF-DR_II. Parameters:n, m, k,

λ

,

α

,

η

, C, H

Input:asequenceofratingpairs

(

u,i,ru,i

)

Initialization: initialize a random matrix foruser feature matrixP_∈Rk×m _and _item _feature _matrix_Q_∈_Rk×n_; _initialize _a randommatrixforbu∈R1×mandbi∈R1×n;initializezeromatricesforTu∈R1×m,thenumberofitemsratedbyuseru,and

Ti∈R1×n,thenumberofuserswhohaverateditemi;initializearandommatrixfortheneighborhoodmatrixF∈Rm×H Fort=1,2,...,T do

1:Receiveratingpredictionrequestofuseruonitemi 2:Compute

μ

andmakepredictionrˆ_u,i₌

μ

₊bu+bi+PuTQi

3:Thetrueratingru,iisrevealed

4:Thealgorithmsuffersalossl

(

Pu,Qi,bu,bi,ru,i

)

5:UpdateTuandTi:Tu=Tu+1;Ti=Ti+1

6:Update p

(

u

)

andq

(

i

)

: p

(

u

)

=Tu/t, q

(

i

)

=Ti/t

7: UpdatePu,Qi,bu,andbiaccordingto(10),(11),(12),and(13),respectively

8:ift₌Cthen

9: UpdateF accordingto(9)

10:endif Endfor

4. Experiments

Weperformedseveralexperimentstocomparethequalityofourproposedmethodstothoseofthebaselineapproaches, andanalyzedtheresultsindetail.

4.1. Datasets

Weperformedexperimentsonthreepublic datasets:MovieLens100K[7],MovieLens1M[7],andHetRec2011[7],which are availablefromthe MovieLenswebsite.1 _The_{MovieLens100K} _dataset_contains₁₀_0,0₀₀_ratings_from₉₄₃_users _on₁₆₈₂

movies.The MovieLens1Mdatasetconsistsof1million ratingsfrom6000 userson4000movies.The HetRec2011dataset comprises855,598ratingsfrom2113userson10,109movies.

We adopted themostpopular metric,RMSE, to evaluate the predictionaccuracy ofour proposed methods. Asmaller RMSEindicatesbetterpredictionaccuracy.

4.2. Baselineapproaches

Todemonstratetheadvantagesanddisadvantagesofthealgorithm, weconducteda comparativeexperimentwithour threebaselineapproaches.

(7)

Table1 PerformanceonMovieLens100K. Method\k 5 10 15 20 25 30 35 40 OLR 1.1242 0.9911 1.0142 1.1133 1.236 1.3603 1.4833 1.5949 CWOCF_I 1.0405 1.0101 1.0184 1.0536 1.0931 1.1541 1.2340 1.3473 SOCF_II 1.0403 0.9863 1.0434 1.1485 1.2432 1.3286 1.3985 1.4861 OCF-DR_I 1.0384 1.0390 1.0401 1.0398 1.0391 1.0393 1.0398 1.0405 OCF-DR_II 1.0381 1.0388 1.0384 1.0385 1.0389 1.0391 1.0387 1.0393 Table2 PerformanceonMovieLens1M. Method\k 5 10 15 20 25 30 35 40 OLR 1.1166 0.9760 0.9695 1.0236 1.0976 1.1768 1.2500 1.3204 CWOCF_I 0.9732 0.9642 0.9693 0.9851 1.0139 1.0521 1.0806 1.1200 SOCF_II 1.0054 0.9469 0.9699 1.0211 1.0874 1.1544 1.2173 1.2765 OCF-DR_I 0.9707 0.9701 0.9691 0.9697 0.9700 0.9705 0.9719 0.9726 OCF-DR_II 0.9688 0.9689 0.9685 0.9691 0.9683 0.9694 0.9698 0.9710 Table3 PerformanceonHetRec2011. Method\k 5 10 15 20 25 30 35 40 OLR 0.9075 0.8869 0.896 0.9099 0.9654 1.0244 1.0862 1.1473 CWOCF_I 0.8754 0.8541 0.8906 0.8959 0.9076 0.9156 0.9222 0.9392 SOCF_II 0.9131 0.8630 0.8982 0.9147 0.9209 0.9367 0.9419 0.9577 OCF-DR_I 0.8921 0.8906 0.8912 0.8917 0.8920 0.8939 0.8941 0.8957 OCF-DR_II 0.8908 0.8900 0.8898 0.8893 0.8888 0.8884 0.8879 0.8870

• OLR:onlinelow-rankapproximation,whichlearnsarank-kMFbyusingonlinegradientdescenttodirectlyoptimizethe lossfunction[11];

• CWOCF_I:conﬁdence-weightedOCF,whichappliesconﬁdence-weightedonlinelearningtoaddresstheOCFtask[23];

• SOCF_II:second-ordersparseOCF,whichaddsanabsolutetermtotheobjectivefunction[18].

4.3.Resultsandanalysis 4.3.1. Overallperformance

Weevaluatedtheperformanceofallthemethodsonthethreedatasets.Thelearningrate

η

ofallthemethodswassetto 0.005forafaircomparison.Thenumberoflatentfeaturesvariedfrom5to40.Theparametersofthesebaselineapproaches weresetaccordingtoprevious workorbysearchingfromasingleexperimentoneachdataset.Eachexperimentwasrun randomlytentimes,andweobtainedtheaveragevaluesofeachmethod.

Tables 1–3 present the performance ofthe methods on each dataset.The bold elements in the tables represent the bestperformanceamongallmethods.Thetablesshow thatourproposedmethodsperformedbestinmostcases.Wetake Table1asanexample.Asthenumberoflatentfeatureskincreases,theRMSEincreasesslowlyinOCF-DR_I.Underthesame valueofk,theRMSEvaluesinOCF-DR_IIareslightlybetterthanthoseofOCF-DR_I.Becausetheratingmatrixissparse,the neighborweightmethodimprovesthepredictionaccuracyofOCF.Comparedwiththoseofthebaselines,theRMSEvalues inourmethodsarestablewithincreasingkbecauseonceourmodelhasbeenﬁtted,userswithveryfewratingswillhave featurevectorsclosetothemean,sothepredictedratingsforthoseuserswillbeclosetotheaveragemovieratings.

TheRMSEvaluesofall approachesonthethreedatasets withthenumberoflatentfeatures rangingfrom5to 40are shownin Figs. 2–6. Ourmethods outperformed the other OCF methods, andconverged faster than all the baseline ap-proaches,especiallywhenthealgorithmstartedrunning.

First,comparedwithotherbaselineapproaches,i.e.,OLR,CWOCF_I,andSOCF_II,wefoundthat ourmethodsperformed betterintermsofRMSE.Inmostcases,ourmethodresultedinsmallerRMSEvalues,indicatingthatmethodsthatconsider dynamicregularizationareeffectiveforonlinelearningCF.

Second,weobservedthatourmethodsweremorestablethanthebaselineapproachesinmostcases.OLRwasrelatively sensitivebecauseitlosessometemporalinformationinonlinelearning.TemporaldynamicfactorsareimportantforOCFto capturetheusersdynamichabitsandintereststhroughouttheonlinelearningprocess.

Third,ourmethodswere abletoobtainrelatively stableRMSEvaluesunderdifferentconditionswithvarying numbers oflatentfeatures.Therefore,ourmethodsarerobusttothenumberoflatentfeatures.

Finally,when we compared OCF-DR_I andOCF-DR_II, we found that OCF-DR_II achievedlower RMSEvalues in most cases,indicatingthattheneighborhoodfactorisessentialforOCFtaskstoimprovethepredictionaccuracy.

(8)

Fig.2. PerformanceofallmethodsonMovieLens100Kwithk =5.

(9)

Fig.4. PerformanceofallmethodsonMovieLens1Mwith k =5.

(10)

Fig.6. PerformanceofallmethodsonHetRec2011with k =30.

Table4

Results ofOCF-DR_II with different α on Movie-Lens1Mwhenk =25. α\λ 0.01 0.1 1 10 0.01 0.9870 0.9818 0.9695 1.0142 0.1 0.9866 0.9822 0.9683 1.0155 1 0.9854 0.9807 0.9730 1.0354 10 0.9849 0.9806 0.9745 1.0522

4.3.2. Impactofthenumberoflatentfeatures

Figs.7and8presenttheimpactofthenumberoflatentfeaturesonall methods.Fig.9showsanenlargeddrawing of ourmethodswithdifferentnumbersoflatentfeatureskonMovieLens1M.Weobservethatourmethodswerestableunder differentnumbers oflatentfeatures, whileOLR,CWOCF_I,andSOCF_IIwere sensitive tochanges inthenumberoflatent features.Ononehand,toosmallavalueofk(lessthan15)causestheseapproachestounderﬁt.However,theyallperform wellasthevalueofkincreases.Ontheotherhand,becausealargevalueofkcausestheseapproachestooverﬁt,whenthe valueofkislarge(largerthan15) andcontinuestoincrease, theperformance ofallthebaseline approachesworsens.Our methodsarestable,andkhaslittleeffectonouralgorithms.

When the value ofk is small(k<15), the latent features arenot sufficient to contain theuser’s interest information, andouralgorithmsperformnobetterthanothermethods.ThisisbecauseCWOCF_IandSOCF_IIaresecond-ordermethods which usethecovarianceinformation ofthelatent vector tocompensate forthelack ofuserinterest information.When thevalueofkislarge(k> 15),thelatentfeaturescontainsufficientuserinterestinformation.Covarianceinformationofthe latentvectorisredundant,whichresultsinnoobviousincrease inprecision.However, ourmethodsare stablebecausewe trackdynamicchanges:theaverageofeachuser,thebiasofeachuser,andthebiasofeachitemineachround.Moreover, Fig.9showsthatRMSEvaluesofourmethodsfluctuateinaverysmallrangeaskincreases.Thissuggeststhatalargelatent factorhaslittleeffectonourmethods.

4.3.3. Theimpactofparameters

Fig.10andTable4presenttheimpactoftwoparametersonourmethods.First,wesetk₌25forOCF-DR_Itoshowits performanceunderdifferentvaluesof

λ

,whichcontrolstheimpactoftheregularizationpenaltyforOCF-DR_I.Weobserve that the curve ofOCF-DR_I declines moresharplyandﬁnally reaches alarger RMSEvalue withan increasing numberof samples.Second,wesetk=25forOCF-DR_IItoshowitsperformanceunderdifferentvaluesof

α

and

λ

.FromTable4,we

(11)

Fig.7. PerformanceofallmethodsonMovieLens100Kwithdifferentnumbersoflatentfeaturesk .

(12)

Fig.9. EnlargeddrawingofOCF-DR_IandOCF-DR_IIonMovieLens1Mwithdifferentnumbersoflatentfeaturesk .

Fig.10. ConvergenceeffectsofOCF-DR_Iwithdifferent λonMovieLens1Mwhen k =25.

observethat fora givenparameter,theRMSEvaluedecreases ﬁrstandthenincreasesasanotherparameterincreases be-causethealgorithmisunderﬁtforextremelysmall

λ

or

α

andoverﬁtforextremelylarge

λ

or

α

.Therefore,bothparameters areessentialtoconstrainthestabilityandconvergencerateofthealgorithms.

5. Conclusions

Inthispaper,we consideredthe dynamicratingbehavior andupdatedthe weightoftheuserfeature vectoranditem featurevectorineachroundtoimprovepredictionaccuracy.Wethenaddedtheneighborhoodfactortoourmethodtotrack thedriftofuserpreferences.Experimentalresultsshowthatourproposedmethodsoutperformedthebaselineapproaches. Comparedtothebaselineapproaches, ourmethods convergerapidlyandobtainhighpredictionaccuracy,i.e.,lower RMSE

(13)

values.Therefore, to consider rational dynamicfactors is crucial to improve the predictionaccuracy of OCF methods. In thefuture,wewill focusontwo directions.First, wewill trytoextract useranditem featuresfromuser/item side infor-mationby deeplearningapproachesoﬄine asinputforOCF.Second, wewillattemptto applyourmethods toeducation recommendersystems.

DeclarationofCompetingInterest

Wewishtoconfirmthattherearenoknownconflictsofinterestassociatedwiththispublicationandtherehasbeenno significantfinancialsupportforthisworkthatcouldhaveinfluenceditsoutcome.

References

[1] Y.Cao,W.Li,D.Zheng,Animprovedneighborhood-awareuniﬁedprobabilisticmatrixfactorizationrecommendation,Wirel.Pers.Commun.102(4) (2018)3121–3140.

[2] Z.D.Champiri, S.R.Shahamiri,S.S.B.Salim,Asystematicreviewofscholarcontext-aware recommendersystems,ExpertSyst.Appl.42(3)(2015) 1743–1758.

[3] R.Chen,Q.Hua,Y.Chang,B.Wang,L.Zhang,X.Kong,Asurveyofcollaborativeﬁltering-basedrecommendersystems:fromtraditionalmethodsto hybridmethodsbasedonsocialnetworks,IEEEAccess6(2018)64301–64320.

[4] K.Christakopoulou,F.Radlinski,K.Hofmann,Towardsconversationalrecommendersystems,in:Proceedingsofthe22NdACMSIGKDDInternational ConferenceonKnowledgeDiscoveryandDataMining,in:KDD’16,ACM,NewYork,NY,USA,2016,pp.815–824.

[5] A.S.Das,M.Datar,A.Garg,S.Rajaram,Googlenewspersonalization:scalableonlinecollaborativeﬁltering,in:Proceedingsofthe16thInternational ConferenceonWorldWideWeb,in:WWW’07,ACM,NewYork,NY,USA,2007,pp.271–280.

[6] M.Dredze,K.Crammer,F.Pereira,Conﬁdence-weightedlinearclassiﬁcation,in:Proceedingsofthe25thInternationalConferenceonMachineLearning, in:ICML’08,ACM,NewYork,NY,USA,2008,pp.264–271.

[7] F.M.Harper,J.A.Konstan,Themovielensdatasets:historyandcontext,ACMTrans.Interact.Intell.Syst.5(4)(2015)19:1–19:19.

[8] X.He,L.Liao,H.Zhang,L.Nie,X.Hu,T.-S.Chua,Neuralcollaborativeﬁltering,in:Proceedingsofthe26thInternationalConferenceonWorldWide Web,in:WWW’17,InternationalWorldWideWebConferencesSteeringCommittee,RepublicandCantonofGeneva,Switzerland,2017,pp.173–182.

[9] A.Hernando,J.Bobadilla,F.Ortega,AnonnegativematrixfactorizationforcollaborativeﬁlteringrecommendersystemsbasedonaBayesian proba-bilisticmodel,Knowl.-BasedSyst.97(C)(2016)188–202.

[10] S.C.H.Hoi,D.Sahoo,J.Lu,P.Zhao,Onlinelearning:acomprehensivesurvey,CoRR,2018abs/1802.02871.

[11]J.Langford,J.Abernethy,K.Canini,A.Simma,OnlineCollaborativeFiltering,Tech.Rep.,UniversityofCaliforniaatBerkeley,2007.

[12] M.Jugovac,D.Jannach,Interactingwithrecommendersoverviewandresearchdirections,ACMTrans.Interact.Intell.Syst.7(3)(2017)10:1–10:46.

[13] Y.Koren,Collaborativeﬁlteringwithtemporaldynamics,Commun.ACM53(4)(2010)89–97.

[14] Y.Koren,Factorintheneighbors:scalableandaccuratecollaborativeﬁltering,ACMTrans.Knowl.Discov.Data4(1)(2010)1:1–1:24.

[15] N.Lathia,S.Hailes,L.Capra,Temporalcollaborativeﬁlteringwithadaptiveneighbourhoods,in:Proceedingsofthe32ndInternationalACMSIGIR ConferenceonResearchandDevelopmentinInformationRetrieval,in:SIGIR’09,ACM,NewYork,NY,USA,2009,pp.796–797.

[16] Y.Li,Z.Li,F.Wang,L.Kuang,Acceleratedonlinelearningforcollaborativeﬁlteringandrecommendersystems,in:2014IEEEInternationalConference onDataMiningWorkshop,2014,pp.879–885.

[17]D.Liang,J.Altosaar,L.Charlin,D.M.Blei,Factorizationmeetsthe itemembedding:regularizingmatrix factorizationwithitemco-occurrence,in: Proceedingsofthe10thACMConferenceonRecommenderSystems,in:RecSys’16,ACM,NewYork,NY,USA,2016,pp.59–66.

[18] F.Lin,X.Zhou,W.H.Zeng,F.Lin,X.Zhou,W.Zeng,Sparseonlinelearningforcollaborativeﬁltering,Int.J.Comput.Commun.Control11(2)(2016) 248–258.ISSN

[19] G.Ling,H.Yang,I.King,M.R.Lyu,Onlinelearningforcollaborativeﬁltering,in:The2012InternationalJointConferenceonNeuralNetworks(IJCNN), 2012,pp.1–8.

[20]C.Liu,S.C.Hoi,P.Zhao,J.Sun,E.-P.Lim,Onlineadaptivepassive-aggressivemethodsfornon-negativematrixfactorizationanditsapplications,in: Proceedingsofthe25thACMInternationalonConferenceonInformationandKnowledgeManagement,in:CIKM’16,ACM,NewYork,NY,USA,2016, pp.1161–1170.

[21]C.Liu,T.Jin,S.C.Hoi,P.Zhao,J.Sun,Collaborativetopicregressionforonlinerecommendersystems:anonlineandBayesianapproach,Mach.Learn. 106(5)(2017)651–670.

[22]N.N.Liu,M.Zhao,E.Xiang,Q.Yang, Onlineevolutionarycollaborativeﬁltering,in:Proceedingsofthe FourthACMConferenceonRecommender Systems,in:RecSys’10,ACM,NewYork,NY,USA,2010,pp.95–102.

[23]J.Lu,S.C.H.Hoi,J.Wang,P.Zhao,Secondorderonlinecollaborativeﬁltering,in:JMLR:WorkshopandConferenceProceedings:ACML2013,14,2013, pp.325–340.

[24]X.Luo,M.Zhou,Y.Xia,Q.Zhu,Aneﬃcientnon-negativematrix-factorization-basedapproachtocollaborativeﬁlteringforrecommendersystems,IEEE Trans.Ind.Inform.10(2)(2014)1273–1284.

[25]R.Pálovics,A.A.Benczúr,L.Kocsis,T.Kiss,E.Frigó,Exploitingtemporalinﬂuenceinonlinerecommendation,in:Proceedingsofthe8thACM Confer-enceonRecommenderSystems,in:RecSys’14,ACM,NewYork,NY,USA,2014,pp.273–280.

[26]E.Palumbo,G.Rizzo,R.Troncy,Entity2rec:learninguser-itemrelatednessfromknowledgegraphsfortop-nitemrecommendation,in:Proceedingsof theEleventhACMConferenceonRecommenderSystems,in:RecSys’17,ACM,NewYork,NY,USA,2017,pp.32–36.

[27]S.Sedhain,A.K.Menon,S.Sanner,L.Xie,Autorec:autoencodersmeetcollaborativeﬁltering,in:Proceedingsofthe24thInternationalConferenceon WorldWideWeb,in:WWW’15Companion,ACM,NewYork,NY,USA,2015,pp.111–112.

[28]S.Shalev-Shwartz,Onlinelearningandonlineconvexoptimization,Found.TrendsMach.Learn.4(2)(2012)107–194.

[29]H.Wang,N.Wang,D.-Y.Yeung,Collaborativedeeplearningforrecommendersystems,in:Proceedingsofthe21thACMSIGKDDInternational Confer-enceonKnowledgeDiscoveryandDataMining,in:KDD’15,ACM,NewYork,NY,USA,2015,pp.1235–1244.

[30]J.Wang,S.C.Hoi,P.Zhao,Z.-Y.Liu,Onlinemulti-taskcollaborativeﬁlteringforon-the-ﬂyrecommendersystems, in:Proceedingsofthe7thACM ConferenceonRecommenderSystems,in:RecSys’13,ACM,NewYork,NY,USA,2013,pp.237–244.

[31] C.-Y.Wu,A.Ahmed,A.Beutel,A.J.Smola,H.Jing,Recurrentrecommendernetworks,in:ProceedingsoftheTenthACMInternationalConferenceon WebSearchandDataMining,in:WSDM’17,ACM,NewYork,NY,USA,2017,pp.495–503.

[32]L.Xiong,X.Chen,T.-K.Huang,J.G.Schneider,J.G.Carbonell,TemporalcollaborativeﬁlteringwithBayesianprobabilistictensorfactorization.,in:SDM, SIAM,2010,pp.211–222.

[33]X.Yang,H.Steck,Y.Liu,Circle-basedrecommendationinonlinesocialnetworks,in:Proceedingsofthe18thACMSIGKDDInternationalConference onKnowledgeDiscoveryandDataMining,in:KDD’12,ACM,NewYork,NY,USA,2012,pp.1267–1275.

[34]Z.Yang,B.Wu,K.Zheng,X.Wang,L.Lei,Asurveyofcollaborativeﬁltering-basedrecommendersystemsformobileinternetapplications,IEEEAccess 4(2016)3273–3287.

[35]P.Zhao,S.C.H.Hoi,R.Jin,DUOL:adoubleupdatingapproachforonlinelearning,in:Proceedingsofthe22ndInternationalConferenceonNeural InformationProcessingSystems,in:NIPS’09,CurranAssociatesInc.,USA,2009,pp.2259–2267.

(14)

[36]X.Zhao,W.Zhang,J.Wang,Interactivecollaborativeﬁltering,in:Proceedingsofthe22ndACMInternationalConferenceonInformation&Knowledge Management,in:CIKM’13,ACM,NewYork,NY,USA,2013,pp.1411–1420.

[37]Z.Zhao,H.Lu,D.Cai,X.He,Y. Zhuang, Userpreferencelearningfor onlinesocialrecommendation,IEEETrans.Knowl. DataEng.28(9)(2016) 2522–2534.

[38]X.Zhou,W.Shu,F.Lin,B.Wang,Conﬁdence-weightedbiasmodelforonlinecollaborativeﬁltering,Appl.SoftComput.70(2018)1042–1053.

535–548

ScienceDirect

www.elsevier.com/locate/ins

https://grouplens.org/datasets/movielens/

3121–3140

1743–1758

64301–64320

815–824

271–280

264–271

19:1–19:19

173–182

188–202

abs/1802.02871

2007

10:1–10:46

89–97

4

796–797

879–885

59–66

248–258

2012,

1161–1170

651–670

95–102

325–340

1273–1284

273–280

32–36

111–112

107–194

1235–1244

237–244

495–503

211–222

1267–1275

3273–3287

2259–2267

1411–1420

2522–2534

1042–1053

135–143