Based on Navigation Path Features
Wolfgang Gaul, Lars S hmidt-Thieme
{Wolfgang.Gaul,Lars.S hmidt-Thieme}wiwi.uni-karlsruhe.de
Institut für Ents heidungstheorie und Unternehmensfors hung
Universityof Karlsruhe, Germany
Abstra t:
Severalkindsoffeaturesofusernavigationpaths
(e.g., subsets of the resour es as no des of the
paths overed, subsequen es of the sequen es
used for path des ription, path fragments on-
stru ted via ombination of subsequen es and
wild ards) an b e employed to build re om-
mendersystemsdesignedfortasksasdierentas
sitep ersonalization, ross-/up-selling,andnavi-
gationassistan e. Avo abularytodes rib e dif-
ferentkindsofre ommendersystemsandgeneri
qualitymeasures for systemevaluationare for-
mulated. The onstru tion of sp e i re om-
mendersystems,esp e iallysystemsbasedonfre-
quentpathfeatures,isexplained. Inadditionto
the attemptto provide aformalframeworkfor
navigationpathbasedre ommendersystemsre-
sults on the p erforman e of dierent typ es of
su hs systemsarerep orted.
1 Introdu tion
A re ommender system is software that ol-
le tsandaggregates informationab outsitevis-
itors (e.g., buying histories, pro du ts of inter-
est,hints on erningdesired/desirablesear hdi-
mensions or other FAQ) and theira tual navi-
gational and buying b ehavior and returns re -
ommendations(e.g., based on ustomer demo-
graphi sand/orpastb ehaviorofthea tualvisi-
torand/oruserpatternsoftopsellerswithelds
of interest similar to those of the a tual on-
ta t). These re ommendations have to b e re-
ated in su h a way that they are valuable for
browsers/ ustomers/visitors as well as for site
owners. Nowadays, re ommender systems are
installedinmoreand more ommer ialsites to
assist onsumers inb etter/faster a essing use-
fulinformationbutalso site owners in onvert-
ingbrowserstobuyers,instimulating ross-and
up-sales, and in establishing ustomer loyalty
as part of the a tivities to improve ele troni
ustomer are. Re ommender systems have
b een studied extensively sin e Resni k et al.
(1994)whousedthelab el ollaborativeltering.
Anoverviewab outappli ationsofre ommender
systemsine- ommer e an b efound inS hafer
etal. (1999,2000)andananalysisofsomere -
ommendationalgorithmsinBreese etal. (1998)
andSarwar etal. (2000).
Beside su h known approa hes to designing
re ommendersystems (based onbuyingb ehav-
ior) a new typ e of re ommender system has
emerged that aims at helping surfers in navi-
gatingtheweb. At ea h step inthe navigation
pro ess re ommendations based on the up-to-
nowknownnavigationhistoryaregiven on ern-
ingpages to visit next. Re ommendersystems
based on navigation paths thereby add to the
stati hyp erdo umentlinkingbyexpandingitto
dynami allylinkedhyp erdo uments.
Re ommender systems based on navigation
pathsareusefuline- ommer e ontextsasthey
tryto make buyingmorepleasantfor p otential
ustomers. Asmajorpartsofane- ommer esite
an onsistofpro du t atalogsandthepresenta-
tionofindividualpro du ts,linkingb etweensu h
information analsob ep erformedbytraditional
re ommendersystems. Therealstrengthofre -
ommender systems based on navigation paths
b e omes learerinmoreorlessunstru tured ol-
le tions of information, as found in, e.g., news
groups,messageb oards,web dire tories,sear h
mationwithinthose olle tions anb egathered
andjoined.
Firstro otsofre ommendersystemsforpaths
an b e found in an adaptive hyp ertext system
ofStottsandFuruta(1991)thatrequiresasp e-
ial do ument reader whi h an b e advised to
mo dify (attributes of) links already o ded in
the do uments with resp e t to usageb ehavior.
The idea of developing re ommender systems
basedonstandard HTTP-serversand theinfor-
mationintheirloglesdatesba k toYanet al.
(1996),whereaverysimple lusteringalgorithm
is used and the onstru tion of re ommender
systems is des rib ed as dynami hypertext link-
ing. Perkowitz and Etzioni (1998) build re -
ommendersystems based on o-o urren e fre-
quen iesb etween resour es and onne ted om-
p onentsof theusagegraphand allthemadap-
tivewebsites. Mobasher(2001)hasinvestigated
three dierent approa hes to ompute re om-
mendations by using sessions des rib ed as sets
ofpagesvisitedtogether(in ludingvisittimes).
His rst approa h is based on asso iationrules
for sets, for a se ond alternative session lus-
ters ( omputedvia thek-means algorithm)are
needed,thethirdapproa husesresour e lusters
( omputedbymeansofARHP(asso iationrule
hyp ergraphpartitioning)).Fromthep ointof
viewof webserver design,the problemof om-
putingre ommendationsresemblestheprefet h-
ing problem (see e.g., Bestavros (1996)), i.e.,
the predi tion of the pages requested next by
the a tive users to sp eed up server op eration
andtherebylowerwaitingtimes;theprefet hing
problemiseasierintheresp e t thatpages lose
tothere ommendationp ointarep erfe t hoi es
for predi ted requests (as they are exp e ted to
rest a shorter time in the a he). Bo dner
andChignell(1999)ta kletheproblemfromthe
p oint ofviewoftext retrieval: they exploitthe
referen e texts of visited links and keep tra k
ofalistof relevantkey words that isfed into a
sear hengine;theresultsofthesear harelinked
fromthekeywordsfoundinthea tivedo ument.
Joa hims et al. (1995) and Lieb erman (1995)
presentWebWat herandLetizia,twoagentsfor
web browsingthatare apable ofgivingre om-
mendations dep ending on the users' sear hing
b ehaviorsofar;WebWat herusessimilaritiesin
Letizia gathers additional data ab out user b e-
havior(asb o okmarkingandusageofdo uments
ondierentserversnotavailableonserver-side).
Fuetal. (2000)prop oseanotheragentthat ol-
le ts usage information of dierent users in a
entral rep ository and omputes re ommenda-
tions from sets of pages visited frequently to-
getherbymeansofasso iation rules.
2 Prerequisites for re om-
mender system evaluation
LetR b e an arbitrary set (the set of resour es
ofawebsitethat orresp ondtotheno desofthe
linkgraphofthissite). ThesetR
:=
S
n2N R
n
ofalltuplesofRis alledthesetofsequen esof
Randserves asbasisto mo deluserpaths.
A sequen e p = (p
1
;:::;p
jpj ) 2 R
des rib es
apathas sequen e of resour es of R,itslength
is denoted as jpj. Sometimes, we repla e the
tuple-notationbyjustputtingthe orresp onding
resour es oneaftertheother,i.e.,p
1 p
2
p
jpj .
Fromamathemati alp ointof view,are om-
mender system based on navigation paths is a
map
r:R
!P(R) (1)
(whereP(R)denotesthep owersetofR)andthe
setr (p)is alledre ommendationsetforp2R
.
Starting p oint for an evaluation of re om-
mender systems is a (multi)set of paths S.
Ea h path p 2 S an b e split at p osition i 2
f1;:::;jpj 1g in a history h
i
(p) := (p
1
;:::;p
i )
andafuturef
i
(p):=(p
i+1
;:::;p
jpj ). p
i
is alled
re ommendation point.
Now, a general denitionfor are ommenda-
tionquality measure an b egivenby
q:R
1
R
2
R
3
! R
+
0
(h;f;r ) 7! q (h;f;r )
(2)
whereR
1
des rib esthehistoryspa e,R
2 thefu-
turespa e,andR
3
thespa eof(setsof)re om-
mendationsr (h) derived fromh2R
1
. Various
hoi esof R
1
;R
2
; andR
3
arep ossible. We will
restri t to R
1
= R
2
= R
and R
3
= P(R), in
thefollowing.
q (h;f;r ) measures the quality of re ommen-
dations (e.g., by ho osing h = h
i
(p) and om-
paring r = r (h
i
(p)) with f = f
i
(p) for a path
p).
Simple examples of re ommendation quality
measuresare
q (h;f;r ):=jfy2r jy o ursinfgj (3)
whi h is just the numb er of re ommended re-
sour es thatalsoo urinf, or
q (h;f;r ):=
X
y 2r (h)
q (h
jhj
;f;y ) (4)
whereq:RR
R!R +
0
des rib esamea-
sure that dep ends onlyonthere ommendation
p oint h
jhj
and evaluates the degree of onfor-
mity b etween f and a single re ommendation
y2r (h),i.e.,thequalitymeasuredo es nottake
into onsiderationany omp oundee tsas,e.g.,
preferen e ofresour es on entratedinaparti -
ular region of the site over those s attered all
overthewholesite.
Re ommendationquality antakeinto onsid-
eration the distan e b etween history resour es
andre ommendedresour es (measuredwiththe
help of the underlying site graph stru ture or,
alternatively, dened as minimalnumb erof re-
sour esb etweenre ommendationp ointandre -
ommended resour e in the a tual future of a
path), e.g.,forx;y 2Randf 2R
(e.g.,with
x =p
i
;y 2r (h
i
(p)), and f =f
i
(p) for a path
p2R
)one an dene
q (x;f;y ):=
8
<
:
u(dist(x;y )) ;ify6=x
o ursinf
0 ;otherwise
(5)
wheredistdenotesanappropriatedistan efun -
tionand umeasures theutility assignedto the
distan eb etween pairsofresour es. Themean-
ingisthat resour es inthedire tneighb orho o d
of a re ommendation p oint are easier to nd
(and, thus, to re ommend) than adequate re-
sour esfar away. Examplesforutilityfun tions
are
u:R +
0
!R +
0
(6) d7!
>
>
<
>
>
:
1 hit ount
d linears ale
logd+1 log. s ale
(d d
0 +1)Æ
[do;d1℄
(d) windowee t
with Æ
[do;d1℄
(d):=
1; d2[d
0
;d
1
℄;
0; otherwise
Uptonow,re ommendationqualitymeasures
asdepi ted in(3), (4) arerestri ted toasingle
navigationpath but,of ourse, for agiven re -
ommendersystemr ,are ommendationquality
measure q , and an underlying (multi)set S of
navigationpaths,one andene,e.g.,
Q raw
r
(S):=
X
p2S jpj 1
X
i=1 q (h
i (p);f
i (p);r (h
i (p)))
asrawre ommendations orefor rrelativetoS.
Let
Q raw
max
(S):=max
r Q
raw
r (S)
b ethe(theoreti ally)maximal re ommendation
s ore (relative to a given quality measure q );
see se tion 3 for a simple metho d to ompute
Q raw
max
(S)for a giventest set S. Then, one an
dene
Q
r
(S):=Q raw
r
(S)=Q raw
max (S)
asnormalizedre ommendations ore,whi hisa
useful hara teristi numb erforthe omparison
ofthep erforman eofare ommendersystemon
dierent test sets or of dierent re ommender
systemsonthesame(multi)setS.
Now, theproblem to ndan optimalre om-
mender system an b e formalized as follows:
given a quality measure q onstru t a re om-
mender system r on the basis of information
froma training set S train
of paths so that the
rawre ommendations oreofronatestsetS test
ofpaths(notusedforbuildingthere ommenda-
tionsystem) ismaximal.
For thesimplere ommendationqualitymea-
sure (3)that just ounts thenumb erof onfor-
mitiesb etweenresour es ofrandf,apparently,
theoptimalre ommender systemis thesystem
that simply re ommends all resour es for any
givenhistory: forsure,thisre ommendationset
willhit allresour es inthefuture and b eof no
interestwhatso ever. Twokindsofmo di ations
arep ossibletomaketheproblemmoreinterest-
ing:
sure. Forinstan e,onemaythinkof ount-
ingthenumb erofhitsrelative tothenum-
b erofgivenre ommendations. Whileopti-
malre ommendationsforthesimplequality
measure (3) onsist of large re ommenda-
tionsets, optimalre ommendationsforthe
relativenumb erofhittingre ommendations
haveverysmallre ommendationsets: inal-
mostall asesforea h historyonlytheone
resour e withhighest follow-upprobability
issele tedandallotherresour eswithlesser
but p erhapsalsohighprobabilitiesaredis-
arded.
2. Restri t the spa e of possible re ommender
systemsbyimposing additional onstraints.
Arestri tionthat alwayseverissensiblein
pra ti e is to allow only re ommendation
setsofagivenmaximalsize(i.e.,jr (h)jn
for all h 2 R
and a given n 2 N). This
onstraintfor esarestri tiontotheb estn
re ommendations; in pra ti e, n will b e a
smallnumb er,say3upto5,ofre ommen-
dationsthatusersmayb ewillingtolo okat.
Thus,onemaysp e ializetheproblemof
ndinganoptimalre ommendersystemto
the onstru tion of an optimalone among
apredened lassofre ommender systems
(e.g.,thosewithatmostagivennumb erof
re ommendationsp erhistory).
For paths in a (sparsely linked) graph the
omputation of re ommendations with resp e t
toaqualitymeasurebasedonhits(disregarding
distan es of re ommendedresour es) will in
most ases still result in a set of resour es
dire tly linked to the re ommendation p oint.
Here, we use the idea of Mobasher (2001) to
weightresour es farther aparthigherby ho os-
ing an appropriate quality measure dep ending
on thedistan es ofthe re ommendedresour es
(another,simplerdistan esensitivequalityfun -
tion an b e foundinCo oley et al. (1999)). Of
ourse,utilityfun tions whenused formo d-
elingdierentproblemsmaydep endonother
parameters(as,e.g.,expli itorimpli itratings)
b esidesdistan eas well.
mender systems
As the pre eding dis ussion has shown, avari-
ety of optimality riteria for re ommender sys-
tems anb edesignedonthebasisofappropriate
hoi esofqandoptionalrestri tionsforr .Here,
westartwithsomeobviousp ossibilitiestotyp e-
astre ommendersystems.
As normallythe numb er of olle ted naviga-
tionpathsisverylarge omparedtothenumb er
ofresour esoftheunderlyingsite,wemaybreak
downtheglobalproblemofndingoptimalre -
ommendersystems for a whole site into a set
ofsmallersubproblems of onstru ting optimal
systems for ea h single resour e. We split R
forx2Rinto spa es R
x
:=fp2R
jp
jpj
=xg
that onsistonlyofsequen es withxatthelast
p ositionand all
r
x :R
x
!P(R) (7)
alo alre ommendersystematresour exin on-
trastto theglobalversiondes rib ed by(1). A -
ordingly, the training set S train
for the global
systemistransformedintotrainingsetsS train
x
R
x
R
forthelo alsystemsthat onsistofall
navigation paths psplit at x (as re ommenda-
tion p oint, if p ontains x); inthe ase that a
resour exapp ears ktimesinapathp2S train
,
thenS train
x
ontainsk repli ationsof psplit at
ea ho urren e ofre ommendationp ointx.
On e that optimallo alsystems for all x 2 R
have b een found, they an b e pie ed together
to aglobalsystem r : R
! P(R) by delegat-
ingthere ommendationtasktotheappropriate
lo al mo del, i.e., r (h) := r
h
jhj
(h), as there is
nodep enden yofthere ommendationsgivenat
onere ommendationp ointup onthose given at
anotherre ommendationp oint.
We further distinguish b etween stati and
dynami re ommender systems: stati re om-
mender systems do not take into a ount the
formernavigationhistoriesofusers andprovide
astati set ofre ommendations forallvisitors,
while dynami re ommender systems may de-
p endonthehistories andprovidedierent re -
ommendationsetsforuserswithdierenthisto-
ries.Dynami systemsmayb ebuildbyrstpar-
titionthehistoriesofthetrainingsetS train
and,
then, omputeastati systemforea h lass.
systems an b e des rib ed as (multi)sets of fu-
tures F
x
R
, extra ted fromS train
x
viaF
x :=
ff 2 R
j9h2 R
x
: (h;f) 2S train
x
g. A simple
re ommender system just ounts frequen ies of
resour es y2Rinthefuturepathsvia
freq (y ):=jff 2F
x
jy o ursinfgj
and re ommendsthen mostfrequent ones. Up
tonow,noutilityfun tionshaveb eentakeninto
onsideration. Todoso,one hastosumupthe
utilityvaluesforallresour esinthefuturepaths,
e.g., for the distan e sensitive utility fun tions
within (6)one omputesthe weightedfrequen-
ies
wfreq(y ):=
X
f2Fx
q (x;f;y )
with qas given in(5)and, again,sele ts then
highest valued follow-up resour es. Note that
the omputationoftheweightedfrequen iesde-
p ends onthere ommendationp ointx, but the
re ommendationsetitselfdo esnot,thus,astati
re ommendersystemisgenerated. By onstru -
tionthisis theoptimalsystemamongallstati
re ommendersystems atx.
Dynami (lo al)re ommender systems make
use of a history partition C = fC
1
;:::;C
m g of
(a sup erset of)all histories h 2 R
inthe test
set(wherem2Nisthenumb erof lasses). The
test set S test
an b e partitioned into test sets
S test
j
C
:=f(h;f)2S test
jh2Cgforea h lass
C2C andastati re ommendersystem anb e
buildforea h su h lass.
While the use of ordinary partitions is
straightforward,fuzzypartitionsneedadditional
information ab out predi ted utility values for
re ommendations given by the stati re om-
mender systems for ea h lass. Now, let C =
fw
1
;:::;w
m
gb e a fuzzypartitionof the histo-
ries,i.e.,allw2C arefun tions w:R
![0;1℄
with P
w 2C
w (h) = 1 for all h 2 R
. w (h) is
alledweight ofhin lassw . Thestati re om-
mendersystemsforea h lasswhavetoprovide
apredi ted utilityvalue for ea h re ommenda-
tion,i.e.,they aremaps
r
w :R
![0;1℄
R
withre ommendationset
re (h):=f(y ;v )2R[0;1℄jv=r
w
(h)(y )>0g
h a dynami re ommender system using fuzzy
partitions,rst, omputes the weightsof h for
all lasses w , se ond, omputes for all lasses
wwith w (h) > 0the (extended) re ommenda-
tionset r
w
(h),third,adjuststhepredi ted util-
ity values bythe weightw (h) for the lass the
re ommendationsstemfrom,and,then, ho oses
thenre ommendationswithhighest (adjusted)
predi tedutility.
Atrivialexampleforadynami re ommender
systemis theonebuild up onthesingletonpar-
titionC=ffhgj9f 2R
:(h;f)2S train
g,that
we allre ommendersystembasedonnesthis-
torypartition. Asea hdieren eb etweenhisto-
riesresultsindierent lasses,thisre ommender
system extremely suers from overtting and
thereforep erformsveryp o orlyontestsets. But
b eside the fa t that the re ommender system
based onnest history partitionis atrivialex-
ample for a dynami system, it an b e useful
forthe omputationof anupp er b ound forthe
rawre ommendations ore(that anb ea hieved
by any re ommender systemon theunderlying
testset),ifitistrainedbythetestset(!) itself,
i.e.,therawre ommendations oreofthere om-
mendersystembasedonnesthistorypartition
fora test set S test
is the theoreti ally maximal
rawre ommendations oreQ raw
max (S
test
).
The omputationofQ raw
max
requiresthe build-
ing of a huge amount of stati re ommender
systems(one for ea h history), thateither may
onsumea onsiderable large amount of mem-
oryor for esseveral iterationsoverthe test set
database,i.e., annotb edonee iently. Ifrun-
timeisanissue,amorep essimisti upp erb ound
an b e omputed by ho osing the b est re om-
mendationfor ea h very history in thetest set
inasinglelo opoverthetestsetdatabase. Note
thatthismayresultindierentre ommendation
sets for the very same history and, thus, may
never b e a hieved by a real re ommender sys-
tem. But for test setswith manydierent his-
toriesandade ent qualityfun tionthis b ound
is loseenough toQ raw
max
. Asinourexp eriments
inse tion6runtimehasnotb een onsidered,we
useexa t values for Q raw
max
. Pleasenote that
Q raw
max
isusedto omputenormalizedre ommen-
dations ores. As themainpurp ose ofnormal-
ized s ores is the omparison of re ommender
systemsondierentdatasetsorofdierentlo i
ofalo alizedsystem,itsmainappli ationsarein
resear h oriented ontexts. For tra kingre om-
mendersystemp erforman einop erational on-
textsraws oresmayb e used.
Obviously, b oth,lo alization (i.e.,the deter-
minationoflo alsystems)andtheusageofhis-
torypartitions anb eviewedasappli ationofa
lusteringte hniquetothehistoriesofpathsinS
thatbasedonanadequatesimilarity riterion
fornavigationpathsredu estheglobalprob-
lemtothehandlingofsubproblemsdes rib edby
morehomogeneoussub(multi)setsSj
C
,whereC
denotesthe lassunder onsideration ofthere-
sulting(p ossiblyfuzzyoroverlapping) lassi a-
tion. Of ourse,one an ombinethesep ossibili-
tiesandapplyhistorypartitionstoS
x
,resulting
inS
x j
C
, or split Sj
C
into subsets ofnavigation
paths with same re ommendation p oint x, re-
sulting in(Sj
C )
x
. Note, that whilelo alization
usesveryintuitive lassesthatdonothavetob e
omputed,thehardpartofhistorypartitionsis
the omputationof the partitionitself. There-
fore,onerstapplieslo alizationandafterwards
omputes history partitions for ea h lo al sys-
tem. An overview ofthe ar hite ture of su h a
omplexsystemisgiveningure1.
4 Path features
Paths that users have taken on a site b elong
to the most valuable information that an b e
gained. But paths as sequen es of resour es of
dierentlengths are omplex obje ts whi h are
not that easy to ompare and to use in data
mining algorithms. Thus, one is interested in
determiningsetsofsimplerfeaturesforpathde-
s ription(featureextra tion).
A substru ture spa e of R
is dened as pair
(A;) of a set A and a relation on AR
where a2A is alled substru ture of p2R
if
ap.
Æ
a : R
! f0;1g
p7!
1; ifap
0; otherwise
is alled indi ator fun tion of substru ture a.
Examplesofsubstru tures are:
1.sets(P(R);)ofresour es,whereasetofre-
sour esa2P(R)isdenedtob easubstru ture
ofapathp2R ifallresour es x2a o urin
pathp,
2.sequen es(R
;
t
),whereasequen ea2R
is dened to b e a substru ture of a path p if
it is a ontiguous subsequen e, i.e., it exists
i
0
2 f0;:::;jpj jajg with a
i
= p
i0+i for all
i=1;:::;jaj,
3.generalizedsequen es((R[fg)
;
g en ), i.e.,
sequen es onsisting of elements of R and an
additionalsymb ol used as wild ard, where a
generalized sequen e a 2 (R[fg)
is dened
to b e a substru ture of apath p ifit is a gen-
eralization of a ( ontiguous) subsequen e of p
andgeneralizationmeansthatarbitrarypartsof
thesequen e mayb e repla edbywild ards(see
Gauland S hmidt-Thieme (2000) for an exa t
denition),
4.simplegeneralizedsequen es(R
;
n t
),where
a(simplegeneralized)sequen ea2R
isdened
tob easubstru tureofapathpifitisanon on-
tiguoussubsequen ewiththefollowingmeaning:
It exists j : f1;:::;jajg ! f1;:::;jpjgstri tly
in reasingwith a
i
= p
j(i)
for all i = 1;:::;jaj,
i.e., inthe ontext of generalized sequen es, if
a
1
a
2
a
jaj
g en p.
Thespa eofsimplegeneralizedsequen es an
b eviewedasasubspa eofthespa eofgeneral-
izedsequen es where awild ardisintersp ersed
b etween ea h two resour es. Noti e that in
pra ti alappli ationsonlygeneralizedsequen es
withouta wild ardat therst and/or last p o-
sition (i.e., a 2 (R[fg)
with a
1
;a
jaj 2 R)
areofinterest. These sequen es are alled path
fragments.
For any substru ture spa e A the symb ol ;
des rib estheemptysubstru ture(i.e.,theempty
setortheemptysequen e, resp e tively) andjaj
thesubstru ture omplexityof a2Adened as
ardinality(forsets)orlength(forsequen es).
Now,we andeneapathfeaturetob eapair
(;'),whereisanarbitraryset alledfeature
spa eand':R
!thefeature mapmapping
paths to features. For a path p 2 R
we all
'(p)the'-featureof p.
Trivial examples for path features are its
length(':R
!N;p7!jpj)anditsentryp oint
(':R
!R;p7!p
1
). Moreinterestingfeatures
areobtainableviasubstru tures.
From anarbitrary substru ture spa e (A;)
re ommendations
for lass1
re ommendations
for lass2
.
.
.
re ommendations
for lass...
delegateon
lassweights
weightand
sele tb estn
dynami lo alre ommendersystematA
{ { { { { { { { {
==
f f f f f f f
22
C C C C C C C C C
!!
3 3 3 3 3 3 3
R R R R
))
EE
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ooo ooo oo ooo ooo ooo
ww
.
.
.
re ommendations
for lass1
re ommendations
for lass2
.
.
.
re ommendations
for lass...
delegateon
lassweights
weightand
sele tb estn
dynami lo alre ommendersystematX
{ { { { { { { { {
==
f f f f f f f
22
C C C C C C C C C
!!
3 3 3 3 3 3 3
R R R R
))
EE
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
delegateon
re ommendation
p ointhjhj
re ommendation
setr (h)
dynami globalre ommendersystem
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
//
historyh
HH
forh
jhj
=A
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
.. . . ..
..
..
..
. . ..
..
.. . . . . ..
.. .
forh
jhj
=X
DD
Figure1: Ar hite ture ofadynami globalre ommendersystem
we derive itsasso iatedpathfeature
': R
! f0;1g A
p7!
Æ: A! f0;1g
a7! Æ
a (p)
i.e.,afeature spa ethat forevery path p
ontainsa binary ve tor indi atingwhether an
elementa2Aisasubstru tureofpornot.
Feature spa es based on substru tures turn
out to have the disadvantage of high dimen-
sionality: the feature spa e build from subsets
hasdimension2 jRj
,theonebuildfromnitese-
quen es(ifsubsequen es arerestri tedtolength
n)hasdimension P
n
i=1 jRj
i
. Therefore given
an underlying (multi)setof navigation pathsS
that has to b e analyzed one is lo oking for
interesting subsets of substru tures that result
into a smaller numb er of dimensions but still
arries as mu h information as p ossible for a
des ription of the obje ts of (multi)set S (fea-
turesele tion). We alladimensionsparsewith
respe t to S if the orresp onding entry in the
binary ve tor is zero for almost all paths of
(multi)setS. Inappli ations,oneoften androp
avastnumb erofsparsedimensionsandrestri t
tothosedimensionsforwhi h thep er entage of
non-zeroentries inthebinary ve tors ex eedsa
lowerb ound.
Dep endent on this b ound alled minsup fre-
quentsubstru tures ofthepathsofS anb ede-
terminedb eforehand. Forasubstru turea2A
onedenesitsrelative frequen y
sup
S (a):=
jfp2Sjapgj
jSj
assupportofainS. Thetaskto omputeallfre-
quent substru tures, i.e., theset
(S;minsup) :=
fa 2 Ajsup
S
(a) minsupg of all substru -
tures with at least a given minimum supp ort
minsup2 R +
0
, is wellknown and a omplished
Srikant(1994)),sequen es(AgrawalandSrikant
(1995)) and generalized sequen es (Gaul and
S hmidt-Thieme(2000)),resp e tively. Building
the feature spa e from the frequent substru -
tures
(S;minsup)
only instead of using all sub-
stru tures of A an redu e the dimensionality
dramati ally(dep ending on the minimumsup-
p ort and the stru ture of S, of ourse). We
all this feature spa e path features based on
frequent substru tures in general, and, in par-
ti ular, path features based on frequent subsets,
subsequen es, (simple)generalized sequen es or
pathfragments, et .
5 Re ommender systems
based on frequent sub-
stru tures of navigation
histories
Frequent substru tures of thehistories at a re-
sour ex anb eusedtobuildfuzzypartitionsfor
dynami lo alre ommendersystems.Foralo al
trainingsetS train
x
R
x
R
atare ommenda-
tionp ointx2RletH
x
:=fh2R
x
j9f 2R
:
(h;f)2S train
x
gb ethe(multi)setof orresp ond-
inghistories. Let
x :=
(Hx;minsup)
denotethe
setoffrequentsubstru turesofH
x
. Forea hfre-
quentsubstru ture a2
x
we builda lassrep-
resentedbytheweightfun tionw
a :R
x
![0;1℄.
Note, that the empty substru ture ;o urs in
everyhistorybydenitionand,thus,isfrequent
inany ase;itservesas lassforhistorieswithout
any frequent (non-empty) substru tures. For
h2R
x
let(h)denotetheset offrequentsub-
stru tures o urring in h (in luding ;). The
lassweightfun tions w
a
arethendenedas
w
a (h):=
(
(a)
P
b2(h)
(b)
; fora2(h)
0 ; otherwise
where :
x
!R +
0
isafun tionthatmeasures
how sp e i or interesting afrequent substru -
tureis. mightb eset onstantlyto1(notmak-
ing any dieren es b etween dierent substru -
tures and, thus, weighting themall equally),it
might onsider the frequen y of a substru ture
andb esettotheinverseofthesupp ort 1
sup
Hx (a)
,
substru tureandb esettothe omplexityjajof
afrequentsubstru ture. Wewillusethese ond
variantinourexp erimentinse tion6.
Alternatively,one anassignhistoriesonlyto
lassesrepresentingmaximalsubstru tures. Let
max
(h) b e the set of maximal frequent sub-
stru tures of h (esp e ially f;g, if there are no
frequent substru tures at all)and ompute w
a
with max
(h)instead of(h).
Partitions based on frequent stru tures tend
tob e omeratherlargeforsmallminimumsup-
p ort values. A pruning step an de rease the
numb erofinterestingfrequentsubstru turesb e-
fore the partition is formed and stati re om-
mendersystemsforea h lassarebuild. Forany
frequent substru ture a 2
x
, all moresp e i
substru tures b 2
x
(e.g., sup ersets, sup erse-
quen es,et .),that have thesamesupp ort, an
b eremoved:asidenti alsupp ortvalueswilllead
tothe same lass weightsand as b oth frequent
substru tures a and b have the same training
sets, i.e., lead to the same stati re ommender
system,theywould reatetheverysamere om-
mendations, and, onsequently, the system for
b is sup eruous. The reader a quainted with
thenotionof losedsubsets inthetheoryof set
based asso iationrules willsee the analogyb e-
tween this pruningstep andthe sear h just for
losedsubsets(see,e.g.,ZakiandHsiao(1999));
butnote,thatwhile losedsubsetsarethemost
sp e i ones amongallsubsets with samesup-
p ort, here by ontrast we keep the most
generalonesamongallsubstru tures withsame
supp ort.
6 Examples andexperiments
Afterthetheoreti aloutlinewe presentaseries
of small examples to illustrate the apabilities
and short omings of the dierent kinds of re -
ommender systems based on navigation paths
b efore an exp erimental evaluation is des rib ed
withtheintentiontoprovideafeelingforthead-
vantages obtainable by appli ationof pathfea-
tures.
Figure 2shows thelinkgraph of asmallsite
of seven resour es R = fA,B,C,D,E,F,Gg and
threeexamples ofnavigationpathsthat areall
analyzedatre ommendationp ointC.Thepaths
??
AOO
~~ ~~ ~~ ~~ ~~ ~~ __
@ @
@ @
@ @
@ @
@ @
@ @
B
oo //
OO
C
oo //
OO
??
D~~ ~~ ~~ ~~ ~~ ~~ OO
E
oo //
Foo //
Ga)sitegraph
utility u
1
: 1 2 3 4
u
2
: 1 1.7 2.1 2.4
path history future
p
I
: AC D F G
p
II
: ABC F E B
p
III
: AC F D G D
p
IV
: ABC D G F E
b)pathsofexample1
utility u
1
: 1 2 3 4
u
2
: 1 1.7 2.1 2.4
path history future
p
V
: BAC D F G
p
VI
: ABC F E B
p
VII
: BAC F D G D
p
VIII
: ABC D G F E
)pathsofexample2
utility u
1
: 1 2 3 4
u
2
: 1 1.7 2.1 2.4
path history future
p
IX
: BAC D F G
p
X
: ADFEBC F E B
p
XI
: BAC F D G D
p
XII
: ABC D G F E
d)pathsofexample3
Figure 2: Sitegraphand samplepathsofexamples1,2,and3
dier with resp e t to theirhistories while their
futuresremainun hangedinthedierentexam-
ples. Forsimpli ity,atmostasinglere ommen-
dation(n=1)isprovidedfor omparisons.
We start with example 1 and assume that
the underlying system is stati and that S =
fp
I
;p
II
;p
III
;p
IV
g is the (multi)set of naviga-
tion pathsunder onsideration. Without using
utility fun tions one just ountso urren es of
resour es followingC:Fistheonlyresour eo -
urring in all four futures and, thus, a re om-
mendersystembasedonmerefrequen ieswould
re ommend F. Now, we add a utility distan e
u
1
(d):=dor u
2
(d):=lnd+1,resp e tively; the
utility values for the resour es followingC are
given inthe upp er two lines of the tables. Us-
ingutilitysums,there ommendersystembased
on weighted frequen ies now omputes the u
1 -
utilitysum7forFandthe u
1
-utilitysum8for
G or the u
2
-utility sum 5.8 for F and the u
2 -
utility sum 5.9 for G and in b oth ases
would re ommendGinstead of F.Ifaresour e
o ursmorethanon einthefuture path(asD
inpathp
III
)onlythersto urren e (shortest
empiri aldistan e)is ounted.
There ommendersystembasedonnesthis-
tory partitionre overs twohistory lasses fAg
andfABg. For lass fAg the re ommendation
Gwith u
1
-utilitysum 6(or u
2
-utility sum4.2)
is omputed, for lass fABg re ommendation
E,also withu
1
-utilitysum6(or u
2
-utilitysum
4.1),is found, resulting inatheoreti ally max-
imal p ossible re ommendation s ore of 12 for
u
1
(or 8.3 for u
2
). Thus, the normalized re -
ommendations ore ofthere ommendersystem
basedonmerefrequen iesis 7=12=0:58foru
1
(or5:8=8:3= 0:70for u
2
) while for thesystem
basedon weightedfrequen ies it is8=12=0:66
foru
1
-utilitysummation(or 5:9=8:3= 0:71for
u
2
-utility summation). This part of our small
examplewasdesignedtoshowthatthein orp o-
rationofutilitydistan es(otherutilityfun tions
are thinkable) leads to the re ommendation of
resour e Gfarther apart fromre ommendation
p ointCinstead ofresour e Fdire tly linked to
C.
Nowweapplythere ommendersystembased
onfrequentsubsetswithminsup=0.5toexam-
ple1. Thetwofrequentsubsets fAg withsup-
p ort1andfA,Bgwithsupp ort0.5arefoundand
leadtoafuzzy partitionwith3 lasses. Forthe
stati re ommender system based on weighted
frequen ies that takes into a ount allhistories
ontainingfAgallfour pathsareusedand on-
sequently G is omputedas b est re ommenda-
tion(thesameasforhistories ontaining;),for
the samesystem that, now, takes into a ount
allhistories ontainingfA,Bgonlythepathsp
II
andp
IV
areused,sothatEis omputedasb est
re ommendation. Thus, the re ommender sys-
tembasedonfrequentsubsetsa hievesare om-
mendations oreof 1.0.
Next,wetakeexample2whi hisaslightmo d-
i ationof example1: pathsp
V andp
VII now
startwithBA(insteadofAasp
I andp
III did).
Consequently, the re ommender system based
onfrequentsubsetsre oversonlyone lassofhis-
toriesfA,Bg and, thus, willa hieve there om-
mendations oreof0:70only. There ommender
systembased onfrequentsubsequen es still an
distinguish b etween subsequen es AB and BA
ofthe historiesanda hieves are ommendation
s oreof1.0again.
If we, now, lo ok at example 3in whi h only
thehistoryofpathp
X
was hanged omparedto
pathp
VI
ofexample2byinsertingasmalldevia-
tionDFEb etweenAandBCinthehistory,the
extra tion of frequent ontiguous subsequen es
would result in BA only and a distin tion b e-
tweenthetwo lassesfoundinexample2would
not b e p ossible. But the re ommender system
based on simple generalized subsequen es (or
pathfragments)isableto separatethetwofre-
quent simple generalized sequen es A ? B and
BAinthehistoriesofexample3and an,thus,
give b etterre ommendations.
Finally, our ndings have b een he ked on
larger (multi)sets. Table1 shows the result of
an exp erimental evaluationof some of the dif-
ferent re ommender systems based on naviga-
tion paths as des rib ed earlier. We mo deled
fourdierent lasses ofusers navigatingasmall
siteof20resour es(A-T)withseveral rossings.
On the basisof an abstra t des ription of user
lasses (p er entage of total users, templates of
navigation b ehavior, distributions of variations
et .) we reatedadatabaseof10.000navigation
paths. 90%ofthepathswereusedastrainingset
S train
tobuildthemo dels,theremaining10%of
thepathsastestsetS test
toevaluatethequality
resour esABCDEFGHIJKLMNOPQRST weights0.100.150.180.070.060.050.050.070.080.050.050.050.050.050.040.050.050.050.060.05 globallo al stati re ommendersystems: freq0.410.210.310.570.420.360.580.160.670.580.510.140.180.520.570.320.500.150.530.170.50 wfreq0.640.880.690.570.660.730.580.580.670.640.650.670.610.520.600.520.560.550.610.670.54 dynami re ommendersystemsbasedonfrequentsubstru tures: sets0.650.870.710.550.660.730.580.610.670.640.640.690.550.580.690.620.560.640.640.690.56 seq0.690.880.710.580.680.780.730.710.670.740.660.710.630.650.740.590.680.670.620.710.63 sgseq0.760.960.770.630.680.770.800.820.730.780.760.760.730.680.820.700.730.790.790.830.73 frag0.770.960.770.660.670.780.800.830.730.790.800.770.750.690.810.730.740.800.790.860.73 Table1:Experimentalevaluationofdierentre ommendersystemsonasmallsitewith20resour es(AT).Therowweightsdes ribes theper entageofo urren esofea hresour einthepathdata.The olumnglobalgivestheglobalre ommendationqualityforea h re ommendersystem.Furthermore,forthelo alpartsofea hre ommendersystemwithresour eAuptoresour eTasre ommendation pointthequalityisgiven.Thefollowingre ommendersystems(r.s.)areevaluated:r.s.basedonfrequen iesofresour esfollowingthe (a tual)re ommendationpoint(freq),r.s.basedonweightedfrequen ies(wfreq),r.s.basedonfrequentsubsets(sets),r.s.basedonfrequen subsequen es(seq),r.s.basedonfrequentsimplegeneralizedsubsequen es(sgseq),andr.s.basedonfrequentpathfragments(frag);all re ommendersystemsbasedonfrequentsubstru turesusedaminimumsupportof0.2.
ofthemo delsand omputethere ommendation
s ores. We used theutilityfun tionu
1
and al-
lowedonlyasinglere ommendationatea h re-
sour e (n =1). As exp e ted, the useof simple
frequen ies(mo delfreq)resultedinalowglobal
re ommendations oreof0.41,forthelo alver-
sionsthequalitydropp ed b elow0.15atsomeof
theresour es. Usingthisasbaseline,themo del
basedonweightedfrequen ies(wfreq)thattakes
into onsideration the sp e ial formof the util-
ityfun tiona hieves anoverallimprovementin
global quality of over 50% (global re ommen-
dations ore 0.64). Aswe put strongsequential
ee tsinthenavigationpatternsofthedierent
user lasses, dynami re ommendations based
onfrequentsets (sets)donotresult inab etter
globals ore(0.65),butusingsequen es(seq)or
evenb ettersimplegeneralized sequen es(sgseq)
orpathfragments(frag)furtherimprovementsof
the globalre ommendation qualitywere p ossi-
ble(byanother32%withresp e t tothebaseline
s ore (see the globalre ommendations ores of
0.69,0.76,and0.77,resp e tively)). Inall ases,
thefrequentsubstru turesforthedynami mo d-
elshaveb eenextra tedwithaminimumsupp ort
of 0.2 (the smallest exp e ted user segment size
forthedataset).
The lo alqualitys ores showthat notin all
ases (i.e., at all re ommendation p oints) the
samerankingas fortheglobalre ommendation
qualityvalues an b e observed. This is due to
thefa tthatatsomere ommendationp ointsse-
quential ee ts were not strong enough in the
data so that lo al re ommender systems based
onpathsubstru tures ouldnottake advantage
ofsomeofthepathfeatures, and/ordueto the
hoi e of thesameminimumsupp ort for alllo-
alre ommendersystems,thatisresp onsiblefor
smalloverttingee tsinsomeofthelo alsys-
tems.
Of ourse,ndingsdep endonthestru ture of
theusersegments. Ifonlyfewsequentialee ts
are in the data, the more omplex mo dels an
notshowtheirstrengthsandgivesimilarresults
than the other mo dels. In another exp eriment
with 10.000 users in 10 segments (of size 10%
ea h) onasmallsite with100resour es, where
rossingsofnavigationpathso urredby han e
0.27 0.37 0.40 0.40 0.40 0.41
Table 2: Exp erimental evaluation of dierent
re ommender systems on a smallsite with 100
resour es. andonlyfewsequentialee ts inthe
data
7 Outlook
Aframeworkforre ommendersystemsbasedon
navigationpathshasb eenpresentedandthein-
uen e ofdierentpath featuresonre ommen-
dation quality onsiderations was theoreti ally
dis ussedandempiri allydemonstrated. Wede-
velop edageneri metho dtomeasurethequality
ofre ommender systemsintermsofathe(nor-
malized)re ommendation s ore, so that dier-
entsystems aneasilyb e ompared,andgaveex-
amplesforre ommendersystemsbasedonnavi-
gationpathsthatmadeuseoffrequentsubstru -
turesinthepathhistories.
Futureworkshouldaddressquestionsasprun-
ingbased onsubstru ture partitions,automati-
allyndingoptimalsupp ort values,as well as
omparingourresults to thoseofhistory parti-
tionsobtainedbyapproa hesdierentfromfre-
quentsubstru tures. Besidetheoreti alworkon
mathemati almo delingofre ommendersystems
an empiri al evaluation of how they are used
(andliked)by site visitors is one ofthe urgent
questionsintheeld.
Referen es
Agrawal, R. & Srikant, R. (1994). Fast Al-
gorithms for Mining Asso iation Rules.
In Bo a, J.B., Jarke, M., & Zaniolo,
C. (Eds.), Pro eedings of the 20th Inter-
national Conferen e on Very Large Data
Bases (VLDB'94), Septemb er 12-15, 1994
(pp. 487499), Santiago de Chile, Morgan
Kaufmann,Chile.
Agrawal, R. &Srikant, R. (1995). Mining Se-
quential Patterns. In Yu, P.S. & Chen,
A.L.P. (Eds.), Pro eedings of the Eleventh
International Conferen e on Data Engi-
neering, Mar h 6-10, 1995Taip ei,Taiwan,
IEEEComputerSo iety,pp.314.
inationandServi etoRedu eServer Load,
Network Tra and Servi e Time.InPro-
eedings IEEE Conferen e on Data Engi-
neering (ICDE'96),pp.180189.
Bo dner, R.C. & Chignell, M.H. (1999).
Cli kIR: Text Retrieval Using a Dynami
Hyp ertext Interfa e. In Pro eedings of the
SeventhTextRetrievalConferen e(TREC-
7),Gaithersburg,Maryland.
Breese, J.S., He kerman, D. & Kadie, C.
(1998). Empiri al Analysis of Predi tive
Algorithms for Collab orative Filtering. In
Pro eedings of the Fourtheenth Conferen e
on Un ertainty in Arti ial Intelligen e,
Madison,WI,July,1998.
Co oley, R., Mobasher, B., and Srivastava, J.
(1999): DataPreparationforMiningWorld
Wide Web Browsing Patterns. In Knowl-
edge and Information Systems 1/1 (1999),
pp.532.
Fu, X., Budzik, J.,& Hammond,K.J. (2000).
MiningNavigationHistory forRe ommen-
dation.InPro eedings of the 2000 Interna-
tional Conferen e on Intelligent User In-
terfa es, New Orleans, LA, January 2000,
pp.106112.
Gaul,W. &S hmidt-Thieme, L.(2000).Min-
ing web navigation path fragments. Pro-
eedings of the WEBKDD'2000 workshop,
Boston,2000.
Joa hims,T.,Mit hell,T.,Freitag,D.,&Arm-
strong, R. (1995). WebWat her: Ma hine
Learning and Hyp ertext. In Morik, K.,&
Herrmann, J. (Eds.), GI Fa hgruppentr-
een Mas hinelles Lernen, University of
Dortmund,August1995.
Lieb erman,H.(1995).Letizia: AnAgentThat
Assists Web Browsing. In 1995 Interna-
tional Joint Conferen e on Arti ial Intel-
ligen e,Montreal,CA,1995.
Mobasher,B.(2001).Mining WebUsageData
forAutomati Site Personalization. Toap-
p earinGaul,W.,&Ritter,G.(Eds.),Clas-
si ation, Automation, and New Media,
Springer.
Perkowitz, M.& Etzioni, O.(1998).Adaptive
WebSites,Automati allySynthesizingWeb
tionalConferen eonArti ialIntelligen e,
Madison,WI.
Resni k, P., Ia ovou, N., Su hak, M.,
Bergstrom, P.,& Riedl, J. (1994). Grou-
pLens: An Op en Ar hite ture for Collab-
orativeFilteringofNetnews.InPro eedings
of the Conferen e on Computer Supported
Cooperative Work, Chap el Hill NC, 1994,
pp.175186.
Sarwar, B., Karypis, G., Konstan, J.A.,&
Riedl,J.(2000).AnalysisofRe ommenda-
tion Algorithmsfor E-Commer e.InACM
Conferen e on Ele troni Commer e (EC-
00).
S hafer,J.B.,Konstan,J.A.,&Riedl,J.(1999).
Re ommenderSystems inE-Commer e.In
ACM Conferen e on Ele troni Commer e
(EC-99),pp.158166.
S hafer,J.B.,Konstan,J.A.,&Riedl,J.(2000).
Ele troni Commer e Re ommender Ap-
pli ations. Journal of Data Mining and
KnowledgeDis overy5/1, pp.115152.
Stotts, P.D. & Furuta, R. (1991). Dynami
Adaptation of Hyp ertext Stru ture. In
Third ACM Conferen e on Hypertext Pro-
eedings, Asso iation of Computing Ma-
hinery.
Yan,T.W.,Ja obsen,M.,Gar ia-Molina,H.,&
Dayal, U. (1996). From User A ess Pat-
terns to Dynami Hyp ertext Linking. In
Fifth International World Wide Web Con-
feren e,May610,1996,Paris,Fran e.
Zaki, M. &Hsiao, C.-J. (1999).CHARM: An
E ient Algorithmfor Closed Asso iation
RuleMining.RPITe h.Rep ort.99-10.