for arbitrary alphabet sizes
B.
Skoric, S.Katzenbeisser, M.U.Celik
Abstract
Fingerprinting provides a means of tracing unauthorized redistribution of digital data by individually
markingeachauthorized copy witha personalized serial number. Inorderto prevent agroup ofusers
fromcollectivelyescapingidentication,collusion-securengerprintingcodeshavebeenproposed. Inthis
paper,weintroduceanewconstructionofacollusion-securengerprintingcodewhichissimilartoarecent
constructionbyTardosbutachievesshortercodelengthsandallows forcodesoverarbitraryalphabets.
Forbinaryalphabets,nusersandafalseaccusationprobabilityof,acodelengthofm 2
c 2
0 ln(n=)
isprovablysuÆcienttowithstand collusionattacksofat mostc
0
colluders. ThisimprovesTardos'
con-structionbyafactorof10. Furthermore,invokingtheCentral LimitTheoremweshowthatevenacode
lengthofm 1
2
2
c 2
0
ln(n=)issuÆcientinmostcases. Forq-aryalphabets,assumingtherestricteddigit
model,thecodesizecanbefurtherreduced. Numericalresultsshowthatareductionof35%isachievable
forq=3and80%forq=10.
1 Introduction
1.1 Digital ngerprinting
Fingerprinting,orforensicwatermarking, providesameans oftracingthe unauthorized
redistri-bution of digitaldata, suchas entertainment content(i.e. music ormovie clips), digitalrecords
orsoftware. Beforeauthorized distribution, the distributorimperceptibly embeds a ngerprint,
whichplaystheroleofapersonalizedserialnumber,directlyintothecontent. Thisisdoneusinga
digitalwatermarkingalgorithm. Ifthengerprintisdierentforeachrecipient(alsocalled`user'),
the distributor can extract the embedded ngerprintfrom an unauthorized copy of thecontent
andtracetherecipientwholeakedit.
Mathematicallyspeaking, angerprint isa nitestring oversomeq-ary alphabet ; theset
ofall ngerprintsis called angerprintingcode. Throughout thispaperwewill denotebyn the
number of users and by m the length of the ngerprint. In order to mark a piece of content
before distribution, thedistributor picks angerprint from the codeand imperceptibly embeds
eachsymbolof the ngerprintinto dierent segmentsof the content, such asin dierent scenes
ofamovie. Inaddition,hestoresin adatabasetheassociationof angerprintwiththeidentity
of the user whoreceivedthe personalizedcopy. In casean unauthorized copyof thecontentis
found, the distributorcan perform watermark detectionon thesegments of thecontentto read
out its ngerprint. Once the ngerprint is retrieved, he can compare it with his database of
ngerprints to identify the guilty user. Current watermarking schemes provide a considerable
level of robustness that allows correct reconstruction of the ngerprint even if the content has
sueredheavydistortions.
1.2 Collusion resistance
Fingerprintingschemesneedto berobust againstcollusionattacks, whereseveraluserspool
dif-ferent individualized versions of thesame content. By looking at the dierencesbetween these
untraceableversionofthecontent,fromwhichthedistributorcannotidentifyanyofthecolluders.
Asegmentofthecontentiscalledadetectablepositionifthecolludershaveatleasttwodierently
markedversionsofthatsegmentavailable.
Acode iscalled collusion-resistantagainstacoalitionof sizec
0
, ifanyset ofcc
0
colluders
isunable toproduceanuntraceablecopy. Theconstructionof collusion-resistantcodeshasbeen
anactiveresearch topic sincethelate 1990s(see e.g. [5, 8,3,6, 9]). Theconstructionsand the
achievedresultsdependstronglyonvariousassumptionswhichrestrictthetypeofmanipulations
theattackersareallowedtoperform. Oneoftenmadeassumptionisthemarkingcondition,stating
thatthecolludersareabletochangengerprintsymbolsonlyindetectablepositions. Throughout
thispaperwewillassumethat themarkingconditionholds. Furthermore,severalattackmodels
havebeenintroduced intheliterature:
Therestricteddigitmodelornarrow-casemodelallowsthecolludersonlyto`mixandmatch'
their copiesof thecontent,i.e. to replaceasegment in adetectable position by anyother
segmenttheyhaveavailable in that position. On thengerprinting code level, this means
that in theunauthorizedcopythesymbolateachposition canonlybeone ofthesymbols
that theyhaveavailablein thatposition.
Theunreadable digit model allowsfor slightlystrongerattacks. Besides mixingthecontent
segments,theattackerscanalsoerasetheembeddedngerprintatdetectablepositions. At
thecodelevel,wedenotethisbyaspecialerasuresymbol?62.
Thearbitrarydigitmodelallowsforevenstrongerattacks: theattackerscanputanarbitrary
q-arysymbolfrom(butnottheerasuresymbol`?')in detectablepositions.
Thegeneral digit model allowstheattackerstoput anysymbol,including`?',in detectable
positions.
Note that in the case of a binary alphabet all four attack models are equivalent in terms of
traceability. (Forq=2itisdetrimental forthecolludersto use`?', sinceitgivesthedistributor
more information than a `0' or `1', namely that the position is a detectable position for the
coalition).
Themainparametersofangerprintingcodearethecodewordlength,theFalse Positive (FP)
errorprobabilityandtheFalseNegative(FN)errorprobability. Thecodewordlengthinuencesto
agreatextentthepracticalusabilityofangerprintingscheme,asthenumberofsegmentsmthat
can be used to embed angerprint symbolis severely constrained; typical video watermarking
algorithms forinstance canonly embed 7bits of informationin arobustmanner in oneminute
of a videoclip. Furthermore,the amount of information that can be embedded per segmentis
limited; hencethe alphabet size q must besmall (typicallyq 16). Obviously, distributors are
interestedintheshortestpossiblecodesthataresecureagainstalargenumberofcolluders,while
accommodatingahugenumbernofusers(oftheorderofn10 6
orevenn10 9
).
Lowerrorprobabilitiesareanothercentralrequirement. Themostimportanttypeoferroristhe
FP,whereaninnocentusergetsaccused. Theprobabilityofsuchaneventmustbeextremelysmall;
otherwise the distributor's accusations would be questionable, making the whole ngerprinting
schemeunworkable. Wewilldenoteby"
1
theprobabilitythatonespecicusergetsfalselyaccused,
while denotestheprobabilitythatthereareinnocentusersamongtheaccused. Thesecondtype
oferroristheFN,wheretheschemefailstoaccuseanyofthecolluders. TheFNprobabilitywill
be denoted as "
2
. In practicalsituations, fairly large values of "
2
can be tolerated. Often the
objectiveofngerprintingisto deterunauthorizeddistribution ratherthantoprosecuteallthose
responsible for it. Even amere 50% probability of getting caught is a signicant deterrent for
colluders.
1.3 Related work
For the restricted digit model, `deterministic' ngerprinting codes have been proposed. Here
(IPP)codes,introducedin[5],allowthedistributortoidentifyatleastonememberofthecoalition
with certainty, withoutthedanger of accusinginnocentpeople. However,theschemes proposed
in [5] are not resistant against more than two colluders. In [8] the existence was proved of a
deterministicngerprintingcoderesistantagainstc
0
colluders,havingcodelengthm=c 2
0 log
q (n).
However,thealphabetsizeisimpracticallylarge,requiringqn 1.
MoreeÆcientngerprinting schemes are possible ifnonzero error probabilities and "
2 are
tolerated. In [3] Boneh and Shaw presented a binary scheme (q = 2) with code length m =
O(c 4
0 log
n
log
1
). Their scheme uses concatenation of apartly randomized inner code with an
outer code. Theyalso proved, forbinary alphabets, alowerbound on thecodelength required
forresistanceagainstc
0
colluders: m>O(c
0 log
1
c0
). In[6]Peikertet al. provedatighterlower
boundofm>O(c 2
0 log
1
c
0
).
In[9] Tardosfurther tightenedthelowerbound to m>O(c 2
0 log
n
). This bound is valid for
arbitrary alphabets in the arbitrary digit model and the unreadable digit model. In the same
paper, he described a fully randomized binary ngerprinting code achieving this lower bound.
Thecodehaslengthm=100c 2
0 ln
n
;aconstructionwasgivenonlyforthebinaryalphabet. In[7]
Tardos'constructionwasfurther analyzed. Itwasshownthat, withoutchangingthescheme,the
constant`100'canbereducedto4 2
. Inthesamepaperitwasshownthatanimportantquantity
inthescheme(the`accusationsum',seeSection2.1),resultingfromthesummationofmanyi.i.d.
stochasticvariables, hasaGaussian distribution,uptocorrectiontermsthat vanishforlargec
0 .
WithoutchangingtheTardosschemeinanyway,butassumingaGaussiandistribution,thecode
lengthmwas furtherreducedtom=2 2
c 2
0 ln
n
.
1.4 Contributions and outline
In thispaper, we propose anew constructionof angerprinting code, which is similar in spirit
toTardos'originalcode,butallowsforcodesoverarbitrary-sizealphabets. Forbinaryalphabets
the new scheme allows for codes that are afactor 4shorter than the construction given by [7]
(andthusafactor10shorterthanthe schemegivenin [9]). Intherestricteddigitattackmodel,
movingfrom abinary to aq-ary alphabet allowsforevenshorter ngerprintingcodes. The key
contributionsofthepaperaresummarizedasfollows:
In Section 2 we review Tardos' binary ngerprinting scheme [9] and propose a dierent
construction, which is symmetric and which can be used for arbitrary alphabets . The
constructionisdierentfromTardos'codeevenforbinaryalphabets.
InSection4westudythecollusionresistanceofthesymmetriccode. Weapplythemethods
of [9]to rigorouslyprovealowerbound onthecodelength m, such that the desirederror
rates are achieved. The bound is given by m >4~ 2
c 2
0 ln
n
, where the quantity ~ is the
expectationvalueof thecoalition'scollective`suspiciousness'.
InSection 5wecomputetheexpectationvalue~in therestricteddigitmodel. Inthecase
of abinary alphabet wehave~=2=. Thiscorrespondstoaboundon thecode lengthof
m> 2
c 2
0 ln
n
, which isafactor4shorter thanthebound obtainedfortheTardosscheme
in[7]andafactor10shorterthantheboundgivenin[9]. Forq-aryalphabetswecompute
~
numerically. Thecodelengthmisfurtherreduced(withrespecttothebinarysymmetric
scheme)by40%forq=3andby80%forq=10.
InSection6wemakeuseoftheCentralLimitTheoremtoshowthatanimportantquantityin
thescheme,theaccusationsumofaninnocentuser,hasaprobabilitydensitythatisalmost
Gaussian. Convergenceto thenormalformimproveswithincreasingc
0
. Approximationof
the distributionby aGaussian isaccurate starting from avalueof c
0
between 10and 20.
Assuming aperfect normal distribution,we showthat the desirederrorrates areachieved
for m >2~ 2
c 2
0 ln
n
2 Symmetric Tardos ngerprinting for arbitrary alphabet sizes
InthissectionwerstintroduceTardos'initialbinary ngerprintingcode[9]and thenprovidea
generalizationforarbitraryalphabets.
2.1 The Tardos ngerprinting scheme
Letnbethenumberofuserstobeaccomodatedinthesystem. TheTardosngerprintingscheme
distributes a binary codeword of length m to each user; the length m is a system parameter
chosenbythedistributor. ItaectstheFPandFNerrorrates. Thedistributedcodewordscanbe
arrangedasannmmatrixX ,wherethej-throwcorrespondstothengerprintgiventothej-th
user. LetCbeasetofcolludingusers.WedenotebycthenumberofcolludersandbyX
C
thecm
matrixofcodewordsdistributedtothecolluders. Thecolludersusea(possiblynondeterministic)
strategy to create an unauthorized copy of the content from their personalized copies. The
unauthorizedcopycarriesangerprinty 2f0;1g m
which depends onboththestrategyand the
receivedcodewords,i.e. y =(X
C ).
Fingerprintcode generation . The distributor generates the matrix X in two randomized steps.
Inthe rststep,he chooses mrandom variablesfp
i g
m
i=1
overtheintervalp
i
2[t;1 t],where t
is axed small parametersatisfyingc
0
t 1. Thevariables p
i
are independent and identically
distributed according to the probability density function f. The function f(p) is symmetric 1
aroundp=1=2and heavilybiasedtowardsvaluesofpclosetot and1 t,
f(p)=
1
2arcsin(1 2t) 1
p
p(1 p)
: (1)
In thesecond step, thedistributor lls the columns of the matrixX by independentlydrawing
randombitsX
ji
2f0;1gaccordingtoP[X
ji
=1]=p
i .
Fingerprintembedding . Beforethecontentisrealeasedtocustomerj, itis watermarkedwiththe
j-throwofthematrixX.
Accusation. Havingspottedanunauthorizedcopywithembeddedwatermarky,thecontentowner
wantsto identifyat leastonecolluder. Toachievethis,he computesforeach user1j n an
accusationsumS
j as
S
j =
m
X
i=1 y
i U(X
ji ;p
i
); with U(X
ji ;p
i )=
g
1 (p
i ) ifX
ji =1
g
0 (p
i ) ifX
ji =0;
(2)
whereg
1 andg
0
arethe`accusationfunctions'
g
1 (p)=
r
1 p
p
and g
0 (p)=
r
p
1 p
: (3)
Thedistributordecidesthatuserj isguiltyifS
j
>Z. TheparameterZ iscalled the`accusation
threshold'. Thethresholdisasystemparameterchosenbythedistributor.
Inwords,theaccusationsumS
j
iscomputedbysummingoverallsymbolpositionsiin y. All
positionswithy
i
=0areignored. Foreachpositionwherey
i
=1,theaccusationsumS
j
iseither
increasedordecreased,dependingonhowmuchsuspicionarisesfrom thatposition: ifuserj has
a `1' in that position, then the accusation is increased by apositive amount g
1 (p
i
). Note that
thesuspiciondecreaseswithhigherprobabilityp
i
,sinceg
1
isapositivemonotonicallydecreasing
function. Ifuserjhasa`0',theaccusationiscorrectedbythenegativeamountg
0 (p
i
),whichgets
morepronouncedforlargevaluesofp
i , asg
0
isnegativeandmonotonicallydecreasing.
Tardos chose thespecic form(3) forthe functions g
1 and g
0
becauseit hasniceproperties:
For xed p
i
, the accusation U(X
ji ;p
i
) in (2) has zero mean and unit variance. Especially the
fact thatthevariancedoesnotdepend onp
i
greatlysimpliestheanalysisof thescheme. Itwas
1
In[9]theparametrizationp =sin 2
shown in [7] thatfor Tardos'schemethechoice (1)for f is optimal, andthat the choice (3) for
theaccusationfunctionsisoptimalwithintheclassoffunctionsoftheformp z1
(1 p) z2
,wherez
1
andz
2
areconstants.
TardoschosethesystemparametersmandZ asfollows:
m=Ac 2
0 dln"
1
1
e ; Z =Bc
0 dln"
1
1
e; (4)
withA=100andB =20. RecallfromSection1.2that theparameter"
1
isare-scaledversionof
thefalsepositiveerror parameter. It representstheprobabilitythat aspecicinnocentuserj
getsaccused. Therelationbetween and"
1
is =1 (1 "
1 )
n c
. For"
1
1and cnthis
becomes (n c)"
1 n"
1 .
AFalseNegative(FN)isdenedastheeventwherenoneofthecolludersareaccused. Tardos
provedin[9]thathisschemeachievesFPandFNerrorratessmallerthan"
1 and"
2
,respectively,
againstcoalitionsofsizecc
0 ,for"
2 ="
c
0 =4
1
. In[7]theTardosschemewasfurtheranalyzedand
thefollowingresultswereobtainedfor"
2 "
1
(afar morereasonablechoice ofparameters,see
Section1.2): (i)thecodelengthparameterAin(4)canbereducedto4 2
. (ii)Theaccusationsum
S
j
hasanalmostGaussian probabilitydensityfunction,withcorrectionsthatvanishin thelimit
c
0
!1. (iii)Assumingaperfect Gaussiandistribution forS
j
,theparameterA canbereduced
to2 2
. Hence,forsuÆcientlylargec
0
,thecodelengthmcanbesettom=2 2
c 2
0 dln"
1
1
ewithout
anymodicationofthecodeconstruction,embeddingoraccusationmethod.
2.2 Proposed symmetric ngerprinting scheme
TheschemepresentedinSection 2.1hastwodrawbacks. First,thecomputationofTardos'
accu-sationsum(2)isasymmmetricinthesensethatonlythosecodewordpositionsicontributewhere
y
i
=1,whilealltheothersarediscarded. ThisisanineÆcientwayofexploiting theinformation
presentin theunauthorizedcopy, becausethey
i
=0positionscarryasmuchinformation about
thecolludersasthey
i
=1positions. Second,dueto thisasymmetry,theconstructioncannotbe
directlyapplied tononbinaryalphabets.
WeapplytwomodicationstoTardos'construction. Therstmodicationisastraightforward
generalization of the ngerprint generation step to produce a random q-ary code. Instead of
bits we have X
ji
2 , with = f0;1;;q 1g. Instead of scalar random variables p
i we
have,independentlyforeachcolumn, aq-componentrandomvectorp (i)
=(p (i)
0
;;p (i)
q 1 ), with
P
q 1
=0 p
(i)
= 1. The vectors p (i)
have the probability density function F(p), which replaces
f(p). Whilef is invariantunder themappingp!1 p, ourfunction F is invariantunder any
permutation ofthesymbols2. Thusourconstructionissymmetricinallsymbols2. In
thei-thcolumnofX ,randomsymbolsaredrawnwithprobabilitiesdictatedbyp (i)
. Thecolluders
createanunauthorizedcopyy=(X
C )2
m
accordingtoa(possiblynon-deterministic)strategy
. We will always assume that this strategy does not depend on the column index, i.e. the
same strategy is applied to each column of X
C
. We can make this assumption without loss
of generality; due to the column symmetry of p and X, the best colluder strategy is
column-symmetric.
Thesecondmodicationliesinthecomputationoftheaccusationsum. IncontrasttoTardos'
scheme, we letevery ngerprintsymbolin the unauthorized copy giverise to accusations. The
accusation for acertain user at acertain symbollocation is positiveif he hasthe samesymbol
astheunauthorizedcopy;otherwiseitis negative. The magnitudeoftheaccusation depends on
the likelihood of thesymbolthat appears in the unauthorizedcopy. In full detailthe proposed
constructionisasfollows:
Fingerprintcode generation . Asin the originalTardos construction,thedistributorproducesan
nm matrixX ofq-arysymbols;the rowsof thematrixcorrespondto thengerprintsfor the
individual users. We parametrize m as in (4); the value of the parameter A is the subject of
Sections4,5and6. Again,thedistributorusesatwo-stepprocedure:
1. He generates m independentrandom vectorsp (i)
=(p (i)
;;p (i)
the components satisfy p (i)
2 [t=(q 1);1 t] and P
q 1
=0 p
(i)
= 1. We call t the `cuto
parameter'orthe`cuto'. Itsatises0<t1;weparametrizeitast=Tc a
0
,withT >0
and a2(0;2). Therandomvariableshaveaprobabilitydensityfunction that issymmetric
in allthecomponentsp
. Inourconstruction,weuseaclassoffunctionsthatare aspecial
caseoftheDirichletdistribution(seee.g. [4]),
F
qt
(p)=N 1
qt q 1
Y
=0 p
1+
with>0: (5)
Here N
qt
isa normalisingconstantensuring that R
J(t;q) d
q
pF
qt
(p) =1. The expression
R
J(t;q) d
q
p stands for R
1 t
t
q 1 dp
0
R
1 t
t
q 1 dp
q 1 Æ(1
P
q 1
=0 p
), where Æ() is the Dirac delta
function. The delta function ensures that the integration is done only over p such that
P
p
= 1. The parameter determines the steepness of F
qt
. For q = 2, = 1
2 the
function F
qt
reducestoTardos'densityfunction (1).
2. The distributorgenerates thecolumns of Xindependently. In thei-th column, the vector
p (i)
determinestheprobabilitiesofgeneratingeachspecic symbolin thealphabet:
P[X
ji
=]=p (i)
: (6)
Fingerprintembedding . Beforethecontentisrealeasedtocustomerj, itis watermarkedwiththe
j-throwofthematrixX.
Accusation. The distributor extracts the ngerprint y from the unauthorized copy. For each
userj, thedistributor computesthe`accusationsum'A
j
from X,p andy. He decidesthat the
userj isguilty ifA
j
>Z, where Z isreferredto asthe`accusationthreshold'. Weparametrize
Z asin (4),withtheconstantB asyetleftundetermined. Thelistofaccusedusersisdenotedas
(p;X ;y). TheaccusationsumA
j
isgivenby
A
j
(p;X ;y)= m
X
i=1 A
(i)
j
; A
(i)
j :=Æ
yi;Xji g
1 (p
(i)
yi
)+[1 Æ
yi;Xji ]g
0 (p
(i)
yi
); (7)
whereÆ
x;y
denotes theKroneckerdelta. Wehavechosenthesamefunctions g
0 (p)=
p
(1 p)=p,
g
1 (p) =
p
p=(1 p) asTardos. There is no guarantee that this choice is optimal for q > 2.
Thechoice ismotivated bythe zero-mean,unit-variancepropertymentionedin Section2.1; this
propertyleadsto asubstantialsimplication oftheanalysis inthecomingsections.
Inwords,theaccusation(7)iscomputedasfollows. Ifuserj hasthesamesymbolinposition
iastheunauthorizedcopy,thenheisaccusedbyapositiveamountg
1 (p
(i)
yi
),wheretheaccusation
decreaseswith growinglikelihoodof thesymbol. Ifuserj hasadierentsymbolthanthe
unau-thorizedcopy,thenheisaccusedbyanegativeamountg
0 (p
(i)
yi
),whichhasthelargesteectwhen
thesymboly
i
islikelytooccur.
Notethat (7)is fullysymmetric inthe symbolsand that itdiersfrom Tardos' construction
even forq =2. Note furtherthat theKroneckerdeltas in (7)reduce thesymbolspaceinto two
classes: X
ji =y
i and X
ji 6=y
i
. Inthelatter casetheaccusation doesnotdepend onthe actual
valueofX
ji .
Attackmodel . AsmentionedinSection1.2,weusethemarkingassumptionandweassumethatthe
restricteddigitmodelholds. Inaddition,wemaketwoassumptionsontheattackstrategyofthe
colluders. First,weassumethatallmembersofthecoalitionareequivalent. Hence,theybasetheir
decisionsonlyonthenumberofsymbolstheyreceive,andnotontheidentityofthememberswho
receivethem. (Anydeviationfrom thisstrategywillmakeiteasierforthedistributortoidentify
3 Preliminaries
InordertofacilitateourworkinSections4and5,weintroducesomenotationandstateanumber
oflemmas.
3.1 Normalisation constant
ThevalueofthenormalisationconstantN
qt
in(5)iseasilycomputedfort=0,usingthefollowing
lemma(seee.g. [1]):
Lemma 1: Letv bea vector oflengthq withv
>0for0q 1. Then
Z
J(0;q) d
q
p q 1
Y
=0 p
1+v
=B(v ):= Q
q 1
=0 (v
)
( P
q 1
=0 v
)
:
ThefunctionB isthegeneralizedBetafunction,alsoreferredtoasthemultinomialBetafunction
orDirichletintegral.
Proofsketch: Fortwocomponents(q =2) thelemmais true,astheintegralyieldstheordinary
Betafunction. Forhigherq thelemmacanbeprovedbyinduction.
Fort6=0,q=2theintegralyields theso-calledincompleteBetafunction.
ApplyingLemma 1tothedenition ofF
qt
in (5),wecomputethenormalisationfactorN
qt
fort=0tobe
N
q0 =
[ ()] q
(q)
: (8)
Remark: ThedierencebetweenN
qt andN
q0
issmall. Thisisseenasfollows. Theintegrandin
N
qt
is oftheform Q
p
1+
with>0. Theprimitivefunction nearapoleatp
=0scalesas
p
. Hencethecontributionsfromthepoles, presentin N
q0
andabsentinN
qt
,are ofordert
.
Ifisnotextremelycloseto0,thent
1.
3.2 Collective accusation sum
Let C be the set of colluding users and X
C
the restriction of X to the rows received by the
colluders. From(7)wedeneausefulquantity: the`collectiveaccusationsum'A
C
,beingthesum
ofallindividualaccusationsumsofthecoalitionmembers,
A
C =
X
j2C A
j =
m
X
i=1 A
(i)
C
; A
(i)
C :=b
(i)
y
i g
1 (p
(i)
y
i
)+[c b (i)
y
i ]g
0 (p
(i)
y
i
): (9)
Hereb (i)
standsforthenumberofoccurrencesofthesymbolincolumniofX
C
. Thesenumbers
satisfytheconstraint P
q 1
=0 b
(i)
=c. ThesumA
C
playsanimportantroleintheFNerrorrate.
3.3 Denition of averages
Thereare threestochasticprocessesinvolvedin thecreationofthengerprintingcodewordsand
the unauthorized copy: The distributor's choice of vectors p (i)
, his process of generating the
columns of X, and the coalition's choice of symbols y
i
. For each process we dene a separate
expectationvalue. Averagingoverpis denotedasE
p
. Withinthei-thcolumnthisisdened as
E
p [(p
(i)
)]:= Z
J(t;q) d
q
p(p)F
qt
(p); (10)
We remind the reader that all the vectors p (i)
are independent. We denote the codeword
receivedbyuserj asX
j
. Forgivenp,averagingoverX
j
isdenotedasE
X
j
. Wedene
E
X
j [(X
ji )]:=
q 1
X
=0 ()p
(i)
: (11)
Inparticularwehave,foraninnocentuserj,
E
X
j [Æ
y
i ;X
ji ]=p
(i)
yi
: (12)
Forxedp,averagingoverX
C
isequivalenttoaveragingovertheintegersb(seeEq.9). Theb (i)
aredistributed accordingtoamultinomialdistribution. Wehave
E
b [(b
(i)
)]:= X
b (b)
c
b
q 1
Y
=0 [p
(i)
]
b
: (13)
Thenotation c
b
standsforthemultinomialc!=(b
0 !b
q 1
!). Thesum P
b
standsforsummation
overallqcomponentsofb,withthecondition P
b
=cimplicitlyassumed,
X
b
(b)= c
X
b0=0
c
X
bq 1=0 Æ
c;b0+b1++bq
1
(b): (14)
Finallywehaveto dealwiththestochastic strategyof thecoalition. We introduce thenotation
P
b
()fortheprobabilitythatthecolludersoutputthesymboly =in acertainposition,given
thattheyreceivedsymbolsaccordingtob. Averagingovery isdenoted asE
y ,
E
y [(y
i )]:=
q 1
X
=0 ()P
b (i)
(): (15)
Theexpectationvaluetakenoverallstochasticdegreesoffreedomisdenoted asE
yXp
. It canbe
computedbyrsttakingtheexpectationvalueE
y
(15)forxedb,thenforxedptakingE
b (13)
andE
X
j
(11)forallinnocentusersj,andnallyE
p
(10). Notethatseveralorderingsarepossible.
Forinstance, theexpectationE
Xj
(forinnocentj)canbetakenbeforeE
y and E
b
,sincey andb
donotdependonthecodewordsgiventoinnocentusers.
3.4 Statistical properties of the accusation sums
To facilitate the analysis in the coming sections we introduce `scaled' averages and variances,
denedsuchthat theydonotdependonm. Foraninnocentuserj wedene
~
j =
E
yXp [A
j ]
m
=E
yXp [A
(i)
j
] ; ~
2
j =
E
yXp [A
2
j ] E
2
yXp [A
j ]
m
: (16)
Forthecollectiveaccusationwedene
~ =
E
yXp [A
C ]
m
=E
yXp [A
(i)
C
] ; ~
2
= E
yXp [A
2
C ] E
2
yXp [A
C ]
m
: (17)
Thecolumnindexiin A (i)
C
in(17)andA (i)
j
in (16)canbechosenarbitrarily;theresultdoesnot
depend on i. The quantites ~
j , ~
j
and ~ are discussed below, whereas Section 5is devoted to
computing .~
Proof: WeevaluatetheexpectationE
yXp
byrstcomputingtheexpectationE
X
j
. Weapply(12)
to the denition of A (i)
j
(7). This gives E
Xj [A
(i)
j ] = p
(i) y i g 1 (p (i) y i
)+(1 p (i) y i )g 0 (p (i) y i
) = 0. The
last equality follows from the denition (3) of g
1 and g
0
. From E
Xj [A
(i)
j
] = 0 it follows that
E
yXp [A
(i)
j
]=0.
Lemma 3: For aninnocent user j wehave~
j =1.
Proof: UsingtheidempotencyoftheKroneckerdeltas inthedenitionofA (i)
j
in (7)wewrite
A 2 j = m X i=1 ( Æ y i ;X ji 1 p (i) yi p (i) yi
+(1 Æ
y i ;X ji ) p (i) yi 1 p (i) yi ) + X 1i;k m i6=k A (i) j A (k ) j : (18)
WeevaluateE
yXp
byrstcomputingtheexpectationE
X
j
. Usingproperty(12)andindependence
ofthecolumnsofX ,weget
E X j [A 2 j ]= m X i=1 E X j [1]+ X i;k ;i6=k E X j [A (i) j ]E X j [A (k ) j
]=m+0: (19)
HerewehavemadeuseofthepropertyE
Xj [A
(i)
j
]=0(seeproofofLemma2). FromE
Xj [A
2
j
]=mit
followsthatE
yXp [A
2
j
]=m. Thedenitionof~
j
in(16)canberewrittenas~
j =(1=m)E yXp [A 2 j ] m~ 2 j
. SubstitutionofE
yXp [A
2
j
]=mintothisexpressionandapplicationofLemma2gives~
j =1.
Lemma 4: Themean~andvariance~satisfy
~ 2
+~ 2
<qc: (20)
Proof: Fromthedenitionsof,~ ~(17)andA
C ,A
(i)
C
(9)itfollowsthat
~ 2 = m 1 E yXp [A 2 C ] m~
2 = m 1 0 @ m X i=1 E yXp [fA (i) j g 2 ]+ X i6=j E yXp [A (i) j ]E yXp [A (k ) j ] 1 A m~ 2 = E yXp [fA (i) C g 2 ] ~ 2 : (21)
UsingtheidempotencyoftheKroneckerdelta,wewrite
fA (i) C g 2 = q 1 X =0 Æ y [b g 1 (p
)+(c b
)g 0 (p )] 2 : (22)
WeapplythetotalaverageE
yXp
asdescribedinSection3.3,byrstperformingE
y
,thenE
X and
nallyE
p
. Weget
E yXp [fA (i) C g 2 ]= q 1 X =0 c X b=0 c b E p h p b (1 p ) c b E bnb [P b ()]fb g 1 (p
)+[c b
]g 0 (p )g 2 i : (23)
HerethenotationE
bnb
indicatesaveragingoveralldegreesoffreedominbexceptb
. Notethat
the expression in E
p
[] is always nonnegative. So, using E
bnb [P
b
r.h.s. of(23)by
E
yXp [fA
(i)
C g
2
] < q 1
X
=0 E
p "
c
X
b=0
c
b
p b
(1 p
)
c b
fb
g
1 (p
)+[c b
]g
0 (p
)g
2 #
= q 1
X
=0 E
p
[c]=qc: (24)
The rstequality is obtainedby observing, asin [9], that theb
-sum representsthe resultof a
randomwalkconsisting ofc steps,each ofwhich haszeromeanandunit variance. (Thisfollows
fromLemmas 2and3).
Remark: In Section5it willbeshownthat ~ doesnotincreaseasafunction ofc. Lemma 4
then showsthat ~=O( p
c) forc !1. This asymptoticbehaviourof~ will play animportant
rolein Section6.
4 Lower bound on the code length in the proposed symmetric scheme
Here weanalyze thesymmetric scheme described in Section 2.2. Weprovidea lowerbound on
thecodelengthm,asafunctionofthemaximumcoalitionsizec
0
andthemaximumtolerableFP
andFNerrorprobabilities"
1 ,"
2
. Wedenethefollowingtwoproperties:
Property 1: Wesay that angerprintingscheme that generatesalist of accusedusers has
Property 1for acertainxedvalue "
1
if,for allinnocentusers j,allcoalitionsC withj2=C,and
allcoalition strategies,the following holds:
P[FalsePositive]=P[j2]<"
1
: (25)
Property 2: Wesay that angerprintingscheme that generatesalist of accusedusers has
Property 2 for certain xedvalues c
0 ,"
2
,if, for all coalitions C of sizec c
0
,and all coalition
strategies, itholdsthat
P[FalseNegative]=P[C\ =;]<"
2
: (26)
Ourbound onthecodelengthis anasymptoticresultforc
0
1. Weformulateitasfollows:
Theorem 1: Letthecode lengthmandtheaccusation thresholdZ ofoursymmetric
nger-printingscheme bechosen as
m=Ac 2
0 dln"
1
1
e ; Z=Bc
0 dln"
1
1
e (27)
with"
1
2(0;1]axedparameterand
A=4~ 2
(1+Æ) 2
; B=4~ 1
(1+Æ); (28)
where ~ is dened in (17). Let "
2
2 (0;1] be a xed parameter. For all Æ > 0 there exists a
suÆciently large c
0
suchthat the symmetric ngerprintingscheme has Property 1for parameter
"
1
andProperty 2for parametersc
0 ;"
2 .
Thus,accordingto Theorem1,forlargec
0
acodelength
m>4~ 2
c 2
0 dln"
1
1
e (29)
guaranteesresistanceagainstcoalitionsofsizecc
0 .
In Sections 4.1, 4.2 and 4.3 we present a proof of Theorem 1following the approach of [7],
with minormodications. First,conditionsonA, B and c
0
are derived forachievingProperties
1and 2. Then the lowest valueof A is identied within thespace of allowed parameters. The
valueof~isdeterminedinSection5. Atthispointwealreadymentionthat0<lim
c0!1 ~ <1.
Hence(29)hastheasymptoticbehaviourm=O(c 2
4.1 Conditions for satisfying Property 1
Weconsideraxedinnocentuserj. Weintroduceanauxiliaryvariable
1
>0thatallowsusto
usetheMarkovinequality,
P[j2]=P[A
j
>Z]=P[e 1 A j >e 1 Z ] E X j [exp( 1 A j )] exp( 1 Z) : (30)
Dueto theindependence ofthecolumnsofXwecanwrite E
Xj [exp( 1 A j )]= n E Xj [exp( 1 A (i) j )] o m
. Inwhat follows, we will alwaysrestrict
1 such that 1 A (i) j 1:7. This
allowsus tousethefollowing(easilyveried)inequality
e u
<1+u+u 2
foru1:7; (31)
sothat wecanwrite
E Xj [e 1 A (i) j
]<1+
1 E Xj [A (i) j ]+ 2 1 E Xj [fA (i) j g 2 ]: (32)
We enforce the restriction
1 A
(i)
j
1:7 for all realisations of the stochastic p, X and y. For
negativeA (i)
j all
1
>0are allowed. ForpositiveA (i)
j
wemusthave
1
<1:7=g
1 (p
y ). As g
1 isa
monotonouslydecreasingfunction,thestrongestrestrictionon
1
occursforp
y =p
min
=t=(q 1).
Hencewerestrict
1
to theinterval(0;1:7=g
1 (
t
q 1 )].
FromLemmas2and3weknowthatE
X
j [A
(i)
j
]=0andE
X j [fA (i) j g 2
]=1forinnocentj; thus
(32)yields E
Xj [e
1A (i)
j
]<1+ 2
1
. Next weapplytheinequality
1+u<e u
foru6=0 (33)
towrite E
Xj [exp(
1 A
j
)]<exp(m 2
1
). Substitutioninto(30)gives
P[j2]< min
12(0;1:7=g1( t q 1 )] e 1(m1 Z) : (34)
Fillingin theexplicitformformandZ (27)into (34)weget
P[j2]< min
1 2(0;1:7=g 1 ( t q 1 )] " c 0 1 (B c 0 A 1 ) 1 : (35)
Theminimumliesat
1
=B=(2c
0
A),providedthat theupperboundon
1
islargeenough. The
condition1:7=g 1 ( t q 1 ) 1
canberewrittenas
c 0 " B
3:4A 2 q 1 T # 1 2 a 1 Tc a 0 q 1 1 2 a " B
3:4A 2 q 1 T # 1 2 a ; (36)
wherewehaveusedtheparametrisationt=Tc a
0
. Substitutonof
1
into(35)gives
P[j2]<" B
2
=4A
1
: (37)
HenceasuÆcientconditionforProperty1tobesatisedisthat(36)holds andthat
B 2
4.2 Conditions for satisfying Property 2
Westartwithalemmathat helpsustoupperboundtheFNerrorrate.
Lemma 5: LetC beacoalition ofsizecc
0
. We have
P[C\ =;]P[A
C
<cZ]P[A
C <c
0
Z] (39)
Proof: TheeventC\=;impliesA
C
<cZ.
Remark: A
C
<cZ doesnot implyC\ =;. It canhappenthat A
C
<cZ whilesomebody in
thecoalitiondoesgetaccused.
Nextweintroduceanauxiliaryvariable
2
>0that allowsustousetheMarkovinequality,
P[A
C <c
0
Z]=P[e 2AC >e 2c0Z ]< E yXp [exp( 2 A C )] exp( 2 c 0 Z) : (40)
The columns of X are independently generated, and thecolluder strategy is the same for each
column. This allowsus to write E
yXp
[exp(
2 A
C )]= fE
yXp [exp( 2 A (i) C )]g m
. We restrict
2 such that 2 A (i) C
1:7, allowingus to apply inequality (31) to bound the exponential. This
gives E yXp [e 2A (i) C
]<1+
2 ~ + 2 2 (~ 2 +~ 2 ); (41)
where wehaveusedthedenitions (17). Therestriction
2 A
(i)
C
1:7 holdsfor anyrealisation
ofpand X. Thesmallest(most negative)achievablevalueofA (i) C isc 0 g 0 (p max y )=c
0 g
0
(1 t)=
c
0 p
(1 t)=t. Hence theconditionon
2
issatisedfor
2 max 2 =1:7c 1 0 p
t=(1 t): (42)
FromLemma 4weknowthat ~ 2
+~ 2
<qc. Thuswehavefrom(41)
E yXp [e 2 A C
]<(1
2 ~ + 2 2 qc 0 ) m <e m 2 ~ (1 2 c 0 q=~) : (43)
In thelast inequality wehave made use of (33). Substitution of (43)into (40)and minimizing
over 2 gives P[A C <c 0
Z]< min
2 2(0; max 2 ] e
2[m~(1 2c0q=~ ) c0Z]
: (44)
We choose m and Z such that m~(1 max
2 c
0
q=~) > c
0
Z. Hence the minimum in (44)occurs
at
2 =
max
2
. Substitutionof(27)into(44)andevaluation at max 2 gives P[A C <c 0 Z]<"
1:7c 0 p t 1 t [A~(1 1 ) B] 1 ; (45)
where wehaveintroducedthenotation
1 =1:7
q
t
1 t
q=~. TosatisfyProperty2,(45)mustnot
belargerthan"
2
. Hence Property2issatisedif
A~(1
1
) B
2
; (46)
wherewehavedened
2 = p 1 t 1:7c 0 p t ln" 2 ln" 1 : (47)
Notethat theparameters
1 and
2
goto zeroforc
4.3 Final step in the proof of Theorem 1
WeusetheresultsofSections4.1and4.2 toproveTheorem1. Theconditions(38)and(46)can
berewrittenasanintervalforAsuchthat Properties1and2are bothsatised,
B+
2
~ (1
1 )
A
B 2
4
: (48)
A solutionexists onlyifthe r.h.s. isnotsmallerthan thel.h.s. in (48). Wewishto identify the
smallestvalue of A forwith a solutionexists. This occurs whenthe l.h.s. is equalto the r.h.s.
Solvingthequadraticequationin B gives
B =
4
~
(1+) with := 1+
p
1+
2 ~ (1
1 )
2(1
1 )
1 (49)
A =
B 2
4 =
4
~ 2
(1+) 2
: (50)
Finally, Theorem1 followsby setting theparameterÆ in (28)equalto the expression in (49),
whichgoestozeroin thelimitc
0
!1.
5 The expectation of the collective accusation sum
Aswasshownin Section4,theaveragecollectiveaccusation~playsacentralroleindetermining
thecodelengthmrequiredforcollusionresistance. Inthissectionwecomputethevalueof ~in
therestricteddigitmodel. (OtherattackmodelsarediscussedinSection7.2). Unfortunatelythe
computationsaretedious. Werstderiveageneralresultin Section5.1, forallalphabet sizesq,
allvaluesofthesteepnessparameterandallcolluderstrategies. Thisresulttakestheformofa
(q 1)-dimensionalsumoverallpossiblesymbolfrequenciesbreceivedbythecolluders. Then,in
Section5.2weinvestigatethespecialcase(q=2;= 1
2
),preciselycorrespondingtothechoiceof
parametersofTardos[9](but notthesameaccusationmethod). Itturnsoutthatoursymmetric
accusationmethodyieldsanimprovementofafactor4inthecodelength. InSection5.3westudy
thecaseq=2forarbitrary. It turns outthatfor q=2, thechoice = 1
2
isoptimal, aresult
thatwasobtainedfortheoriginalTardosconstructionin[7]. Finally,inSection5.4,wecomeback
tothenonbinarycaseq>2.
5.1 Sum representation of ~
According to the denition (17), ~ is dened as the expectation value E
yXp [A
(i)
C
]. We follow
the procedure outlined in Section 3.3: We rst compute the expectation value with respect to
the colluderstrategy, then w.r.t. the matrixX
C
and nally w.r.t. the vectorsp (i)
. Since itis
understoodthattheresultsareidentical foreachcolumnofX
C
,wewillomitthecolumnindexi
onthequantities y,pandbfornotationalsimplicity.
Weregardy asa(possiblystochastic)strategy-dependentfunctionofb=
(b
0 ;;b
q 1
)only. Thecolluders'strategycannotdepend onp,sincetheydonotknowp. We
assumethat is notinuencedby thecolluders'identities, i.e. theirdecisionsare purely based
onhowmanyinstancesofeachsymbolwerereceived,notbywhomtheywerereceived. Usingthe
notationintroducedin (15),wehave
E
y [A
(i)
C ]=
q 1
X
=0 P
b ()fb
g
1 (p
)+[c b
]g
0 (p
)g=
q 1
X
=0 P
b ()
b
cp
p
p
(1 p
)
: (51)
Nextweaverageoverbandp. Applying(13)and(10)to(51),weobtain
~ =
X
c
b
q 1
X
=0 P
b ()
Z
J(t;q) d
q
pF(p) q 1
Y
p b
b
cp
p
p
(1 p
)
Wefurther evaluatetheintegral R
d q
pfort =0. As discussedin Section 3.1, theerrorresulting
fromintegrationoverJ(0;q)insteadofJ(t;q)issmall. Furthermore,wewillseeinSection7.1that
setting t=0 isallowedfor q3in the Gaussian approximation. Firstwesplit theintegration
intotwoparts: p
andtheremainingq 1components
Z J(0;q) d q p= Z 1 0 dp Z 1 p 0 d q 1
pÆ(1 p
X 6= p ): (53)
Note thatthe upperbound onthe secondintegration intervalis reducedfrom1to 1 p
. This
prevents us from directly applying Lemma 1. For all 6= we write p
= (1 p
)s , with s
2(0;1)and P
6= s
=1. Thissubstitutionhasthefollowingeect,
Z J(0;q) d q p = Z 1 0 dp (1 p ) q 2 Z 1 0 d q 1 sÆ(1 X 6= s )
F(p) = N 1 q0 p 1+ (1 p )
( 1+)(q 1) Y 6= s 1+ q 1 Y =0 p b = p b (1 p ) c b Y 6= s b : (54)
HerewehaveusedthepropertyÆ(ax)=jaj 1
Æ(x)forconstanta6=0. Substituting(54)into(52)
andapplyingLemma1totheq 1degreesoffreedoms
weobtain ~ = N 1 q0 X b c b q 1 X =0 P b () Q 6=
(+b
)
(c b
+[q 1])
Z 1 0 dp p b 3 2 + (1 p ) c b 3 2 +[q 1] (b cp ): (55)
Finally,thep
-integralisevaluatedaswell,yieldingordinaryBetafunctions,
~ =
(q)
[ ()] q
cc!
(c+q) X b [ q 1 Y =0
(+b
)
(1+b
)
] (56)
q 1 X =0 P b () (b 1 2 +) (b +) (c b 1 2
+[q 1])
(c b
+[q 1]) 1 2 b c (1 q) :
Herewehaveused(8)forthenormalisationconstantN
q0
. Expression(56)israthercomplicated.
Onepropertyof (56)canbeseeneasily,however: Forc1,the leadingorder termsof~areof
order 1, and do notdepend on c. This is readilyseen by writingb
=cw
, with w
2[0;1],
then applying the Stirling approximation (x+1) p
2x(x=e) x
to all Gamma functions and
collecting powers of c. For the quotients of Gamma functions appearing in (56) we have the
proportionality (b +v 1 )= (b +v 2 ) /c
v1 v2
and (c b
+v
1
)= (c b
+v
2 ) /c
v1 v2 for constantsv 1 ;v 2
c. The sum P
b
givesriseto a factorc q 1
, sinceit canbeapproximatedby
an integral R c 1 d q bÆ(c P b )c
q 1 R 1 0 d q w Æ(1 P w
). The correctionsarising from the
summation terms where thecondition b
1doesnot hold are negligible,since thesupport is
negligiblecomparedto thefullsummation P
b .
Thefact that~hasanitevalueinthelimitc!1showsthattheasymptoticbehaviourof
(27)is givenbym/c 2
0
,withoutfurther dependence onc
0
arisingfrom.~
5.2 The case q =2, = 1
2
This case corresponds to the probability density function in the original Tardos construction,
F(p
0 ;p
1 )/(p
0 p 1 ) 1=2 Æ(1 p 0 p 1
). Notethatforq=2,= 1
in (56)vanishes. However,~doesnotcompletely vanish,sincefor(q =2;b
=c)the expression
(c b
1
2
+[q 1])is divergentinthelimit! 1
2
. Wehave
lim
!1=2 (
1
2 +) (
1
2
+)= lim
!1=2 (
1
2
+)=1: (57)
Hence, the only terms contributing in the b-sum in (56) are those where b
= c. Because of
themarkingcondition, P
b
()=1forthese terms,asthe coalitiononlysees thesymbol. The
complicatedexpression(56)reducestoaconstant:
~ =
(1)
[ ( 1
2 )]
2 1
X
=0 1=
2
: (58)
Substitution into(27,28)givesthefollowingasymptoticbound onthecodelength,
m> 2
c 2
0 dln"
1
1
e: (59)
This bound is4times lowerthanthe bound obtainedin [7] and 10times lowerthanthebound
in[9].
5.3 The case q =2, 6= 1
2
Nextwestudyhowthesymmetricbinaryschemeperformsfor6= 1
2
. Substitution of q=2into
(56)gives
~
=
(2)
[ ()] 2
( 1
2 )c
c 1+2 c
X
b1=0
c
b
1
B(b
1 1
2
+;c b
1 1
2 +)
1+ 2
c [b
1 P
b
(0)+(c b
1 )P
b (1)]
; (60)
where B denotes theBetafunction. From(60)wecanidentify which colluderstrategy forces
thecontentownerto usethelongestpossiblecode. Wedenotethis`extremal'strategyas
2 . We
remindthereaderthatm/~ 2
. Hence,inordertomaximizem,thestrategy
2
hastominimize
thesummandin(60)foreachb. Notethattheb
1
=0andb
1
=ccontributionstothesummation
arenotaectedbythestrategy. For1b
1
c 1theBetafunctionin (60)ispositive. Hence,
thefactor( 1
2
)infrontofthesummationdeterminestheoverallsignofthestrategy-dependent
contributions. For < 1
2
, thisfactorispositive,so thecolluderswishto minimizetheexpression
[b
1 P
b
(0)+(c b
1 )P
b
(1)]. Theyachievethisbychoosingthesymbolthatappearsmostfrequently,
i.e. byapplying`majorityvoting' tothe0sand1sthat theyreceiveinacolumn. For> 1
2 , the
factor 1
2
hastheoppositesignandtheextremalstrategy
2
isminorityvoting.
Notethat
2
isnotnecessarily thestrategythat thecoalitionactually applies. However,the
distributor has to takeinto accountthat thecolluders could be using
2
, and he has to choose
hiscodelengthmaccordingly. Weareinterestedinthis`extremal'strategybecauseouraimisto
deriveasharplowerbound onm.
Fig.1shows~asafunction offorthestrategy
2
. Thedashedlinecorrespondstothevalue
2=obtainedin theprevioussection. Itisclearthat = 1
2
istheoptimum. Attheoptimumwe
have~=2=,independentofc. Thepartofthecurvewith< 1
2
hardlydependsonc. Thepart
with> 1
2
becomessteeperwithincreasingc.
5.4 Non-binary alphabet
We now return to the general expression for ~ given in (56). We work in the restricted digit
0.35
0.4
0.45
0.5
0.55
0.6
0.3
0.35
0.4
0.45
0.5
0.55
0.6
κ
µ
~
2/
π
Fig.1: ~as afunction offorq=2,c=80, giventhe`extremal'strategy
2 .
Note that the sum P
P
b
()() in (56)represents an average over . Weobtain alower
boundforthesumfrom thefactthat anaverageisat leastasbigasthesmallestelementin the
summation. Thuswehave
~
(q)
[ ()] q
cc!
(c+q) X
b [
q 1
Y
=0
(+b
)
(1+b
)
] (61)
min
jb
6=0
(b
1
2 +)
(b
+)
(c b
1
2
+[q 1])
(c b
+[q 1])
1
2
b
c
(1 q)
:
As wehave assumed the restricteddigit model, the minimum is takenonly overthose symbols
thatthecolludershavereceived.
Eq.(61)allowsusto identify the `extremal'colluder strategy
q
, which forcesthe distributor
to use the largest code length m. For each b separately, the colluders choose such that the
expressionfollowing`min
'isminimized.
Forq 10 and axed coalition size c = 20wehave numericallycomputed ~ as afunction
of for the
q
strategy, i.e. taking the equality in (61). For large q and c the numerics are
computationallyexpensive,sincethenumberoftermsintheb-summationisoforderc q 1
. Fig.2
shows ~ as a function of the steepness parameter. For q 7the maximum of the curve lies
slightly to the right of = 1=q. For q 8 an extra hump is visible. The hump is a `nite c
eect';itdoesnotexistwhentheratioq=cissmall. Fig.3showshow~varieswhencisincreased:
Thepartof thecurveat <1=qis unaected, whilefor >1=q thecurvegoesdownwardand
convergestoanitevalue.
Weusethenumericalresultsfor~to estimatetherequiredcodelength(m/~ 2
). Wegive
estimatesfortheadvantagethataq-arycodegivesoverthesymmetricbinarycodewith=1=2.
Thecomparisonwiththebinarycasecanbedoneinseveralways,dependingonthedetailsofthe
watermarkembedding. Wegivethetwoextremecomparisonmethods:
1. Countingthenumberofsymbols. Aq-arysymboloccupiesasmuchspaceinthecontentasa
binarysymbol,regardlessofq. Fig.4showsthe
q arycase
binarycase
ratioforthenumberofsymbols.
Thisratioisgivenby4=( 2
~ 2
).
2. Countingthenumberofbits. Aq-arysymboloccupieslog
2
qtimesmorespaceinthecontent
thanabinarysymbol. Inthiscaseitisnotfairtocomparecodelengthexpressedinsymbols.
Onehastocountbits. Fig.5showsthe
q arycase
binarycase
ratioforthenumberofbits. Thisratiois
givenbylog q4=( 2
~ 2
0.1
0.2
0.3
0.4
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
c=20; q=3...10
κ
q=3
q=10
µ
~
Fig.2: ~asafunctionofforseveralalphabetsizesq. Thecoalitionsizeisc=20. Thecolluders
employthe`extremal'strategy.
0.15
0.2
0.25
0.3
0.35
0.4
0.8
0.9
1
1.1
q=6; c=20,30,40,50,60,70,80
κ
c=20
c=80
µ
~
1/q
Fig.3: ~asafunctionofforq=6atseveralcoalitionsizes. Thecolludersemploythe`extremal'
strategy. Thedashedhorizontallineliesat(2=) p
log
2
6. When~liesabovethisline,the
0.1
0.2
0.3
0.4
0.2
0.4
0.6
0.8
1.0
c=20; q=3...10
κ
relative #symbols
q=3
q=10
Fig.4: Number of symbols in the codewords, relative to the binary case, for several alphabet
sizesq. Thecoalitionsize isc=20. Thecolludersemploythe`extremal'strategy.
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.6
0.8
1.0
1.2
1.4
c=20; q=3...10
κ
relative #bits
q=3
q=10
Fig.5: Numberofbits in thecodewords,relativeto thebinary case,forseveral alphabet sizesq.
Type1isthemostoptimisticcomparisonpossible,in thesensethat itallowsforthelargest
im-provementsw.r.t. thebinaryscheme. Type2comparisonisthemostpessimisticpossible. Without
givingafullargument,westatethatinthecaseofvideowatermarkingtype1ismoreappropriate,
evenforlargealphabets. When,forinstance,symbolsareembeddedusingaspread-spectrum
wa-termark, whereeach spreadingsequence correspondsto adierentsymbolin thealphabet, then
thesegmentlengthcanbekeptalmostindependentofqwithoutdecreasingdetectionperformance.
Forcompletenesswegivetheresultsforbothcomparisons. ThehorizontaldottedlineinFig.2
indicates the threshold for comparison of type 1. When ~ risesabove this threshold, the q-ary
schemeneedsfewersymbolsthanthebinaryscheme. Thethickpieceofeachcurveindicates the
regionwhere the q-aryschemeis better thanthe binary, using comparisontype2. Fig. 4shows
thecodelengthm(thenumberofsymbols)asafunctionof,foranumberofqvalues,andFig.5
similarlyshowsmlog
2
q,thenumberofbits. Bothgraphshavetheirverticalaxisnormalisedsuch
thatlengthsaredividedbycorrespondinglengthsinthebinaryscheme. Inbothgraphsthenite-c
humpsarevisible. Not takingthehumpsintoaccount,weseethat for3q10thenumberof
symbolsisreducedby40%{80%w.r.t. thebinarycase,whilethereductioninthenumberofbits
is 11%{30%. Finite-c eects further improvethese results. Weconclude that in our symmetric
schemeitisadvantageoustousethelargestpossiblealphabetallowedbythewatermarkingmethod
employed.
6 The Gaussian approximation
6.1 Motivation
In this section we analyse the performance of the symmetric scheme using what we call the
`Gaussianapproximation'. Bythiswemeantheassumptionthat theaccusationA
j
forinnocent
j hasaGaussianprobabilitydensityfunction. Theassumptionismotivated bytheCentralLimit
Theorem(CLT):whenalargenumberofi.i.d. variablesaresummed,thedistributionofthesum
converges to the normal distribution. The CLT applies when the moments of the summands'
distribution meetcertainconditions. Themomentsalso determinetherateofconvergenceto the
normalform.
Theaccusation A
j
is computed by taking thesum overm independent accusations, each of
which is based on a single symbol y
i
in the unauthorized copy. All the separate accusations
havethesameprobabilitydistribution. Thenumberofsymbols,m,islargeenoughtoguarantee
`suÆcientlyfast'convergenceto thenormalform. This informalstatementismade moreprecise
in Appendix C, where we derivea lowerbound on c
0
asa function ofq. Whenc
0
isabovethis
bound,the deviations from thenormal form become `small enough'in the central region of the
A
j
-distribution function. It turns out that the bound approximatelylies betweenc
0
=10 and
c
0
=20. Henceconvergenceisfastenoughin manypracticalsituations.
InSection6.2weanalysethesymmetricschemeundertheassumptionthatA
j
hasaGaussian
distribution. Weobtainalowerboundonmthatisafactor2smallerthanTheorem1.
Inthediscussionof theCLTin Appendix Citturns outthat forq3thecutoparameter
t can be sent to zero without causing any divergences. The cuto parameter is discussed in
Section7.1.
6.2 Lower bound on the code length
Theorem 2: Let A
j
have a Gaussian probability density. For all Æ > 0 there exists a
suÆciently large c
0
suchthat Property 1 issatised for parameter "
1
andProperty 2is satised
for parametersc
0 ;"
2
whenthe code lengthis
m>2~ 2
(1+Æ)c 2
0 ln"
1
1
: (62)
Notethatthiscodelengthisafactor2lowerthantheoneinTheorem1. Werstgiveaninformal
µ
m
1/2
/c
~
1
Z/m
1/2
σ
~
/c
ε
1
ε
2
0
Fig.6: Sketchof theprobabilitydensityofA
j =
p
m(left) and 1
c A
C =
p
m(right). Theaccusation
thresholdZ andtheerrorrates"
1 and"
2
arealsoshown.
Informalargument: IftheprobabilitydensityofA
j
isknown,then that knowledgeallowsus
tocomputetheFPandFNerrorratesasafunctionof~
j ,~
j
,,~ ,~ mandZ. Thisissketchedin
Fig.6. TheleftcurveistheprobabilitydensityofthequantityA
j =
p
m. Ithasmean~
j
=0and
variance~
j
=1(seeLemmas 2and3). Theerrorrate"
1
is givenbytheareatotherightofthe
(rescaled)thresholdZ = p
m. Therightcurveistheprobabilitydensityofthequantity 1
c A
C =
p
m.
It has average 1
c ~ p
m and variance =c.~ The error rate "
2
is given by the area to the left of
Z = p
m. Thehorizontalaxisisscaled such thattheA
j
-curvedoesnotdepend oncand m. Ifwe
set ourselvesthe goalof having xed error ratesfor arbitrary c, twoobservations canbe made
fromFig.6:
Inordertohaveaxed"
1
forallc,thethresholdlineZ = p
mmustnotshift. HenceZ must
bechosenasZ/ p
masfarasthedependence oncisconcerned.
Whenc increases,therightmostcurvebecomesnarrowerandshiftstotheleft. Inorderto
prevent"
2
fromvanishing,mmustbechosenproportionalto c 2
.
Fromthisinformalargumentweobtaintheproportionalitym/c 2
0
in(62),butnottheconstants
andthelogarithmicdependence on"
1 .
ProofofTheorem2: Let
1 and
2
bethedensityfunctionsofA
j andA
C
,respectively,rescaled
suchthattheybothhavezeromeanandunitvariance. Wedenecumulativedistributions inthe
tails,
G
1 (x)=
Z
1
x dx
0
1 (x
0
) ; G
2 (x)=
Z
x
1 dx
0
2 (x
0
): (63)
Lemma 6: Inordertoachieve aFalsePositive errorrate"
1
andaFalse Negativeerrorrate
"
2
againstany coalition of sizecc
0
,itissuÆcienttoset thecode lengthm accordingto
mc 2
0
~
j
~
G inv
1 ("
1 )
2
1 ~
c
0 ~
j G
inv
2 ("
2 )
G inv
1 ("
1 )
2
Proof: Theproofis completelyanalogoustothederivationinSection 3.4of[7].
NotethatG inv
2 ("
2
)<0. Notefurther thatthedependence of(64)on"
2
vanishesin thelimit
c
0
!1. (Rememberthat~
j
=1accordingtoLemma 3and that~=O( p
c
0
)asaconsequence
ofLemma4). IfA
j andA
C
haveGaussiandistributions,thenG
1 andG
2
areerrorfunctions 2
and
we have G inv 1 (" 1 ) = p 2Erfc inv (2" 1 ), G inv 2 (" 2 ) = p 2Erfc inv (2" 2
). Substitution into (64), using
~
j
=1,gives
m 2 ~ 2 c 2 0 h Erfc inv (2" 1 ) i 2 ( 1+ ~ c 0 Erfc inv (2" 2 ) Erfc inv (2" 1 ) ) 2 : (65)
Intheregime"
1 "
2
,whichistherelevantregimefore.g. moviedistribution,thedependenceof
(65)on"
2
isratherweakevenfornitec
0
,sinceErfc inv
("
2
)<Erfc inv
("
1
). Weusetheasymptotic
form of theinverse errorfunction forsmall arguments,Erfc inv (")= p ln" 1 [1 O( lnln"
1 ln" 1 )], to write m 2 ~ 2 c 2 0 ln 1 " 1 1 O( lnln"
1 1 ln" 1 1 ) (
1+O 1 p c 0 s ln" 1 2 ln" 1 1 !) : (66)
Forlargec
0
theresult(62)follows 3
.
Henceforlargeenoughc
0
acodelengthm=2~ 2 c 2 0 ln" 1 1
suÆces. Thisisshorterbyafactor
2thantheresultobtainedin Section4.
7 Discussion
7.1 The cuto parameter t
In this sectionwediscuss the the eects of thecuto t =Tc a
0
introduced in Section 2.2. The
probabilitiesp
liein therestrictedinterval[t=(q 1);1 t]. It isclearfrom Section 4that the
presented proofof Theorem 1doesnotworkfor t=0. InthelimitT #0, theallowed intervals
fortheauxiliaryvariables
1 and
2
(34,42)vanish,whilebothintervalsneedtobeniteforthe
proofthatProperties1and2aresatised.
ThespeedoftheconvergencetotheasymptoticresultA=4=~ 2
,B=4=~dependsontheway
in which the parametersa 2(0;2) and T are chosen. The small parameters
1 and
2
(45,47)
asymptoticallybehaveas
1 1:7 q ~ p T c a=2 0 ; 2 ln" 2 ln" 1 1 1:7 p Tc 1 a=2 0 : (67)
Furthermore,condition(36),necessaryforProperty1tohold,canbewrittenas
c 0 ' q 1 T ( ~ 3:4 ) 2 1=(2 a) : (68)
Forpracticalreasons, wewish both
1 and
2
to becomesmall at areasonablylow valueofc
0 ,
whilethebound(68)alsoshouldnotbetoohigh. However,inthelimitT #0,boththec
0 -bound
(68) and the expression for
2
in (67) diverge. Hence, when t tends to zero, the approach of
Sections4.1{4.3,basedontheMarkovinequality,canproveProperties1and2onlyforextremely
largec
0 .
TheroleofthecutotiscompletelydierentintheanalysisusingtheGaussianapproximation.
2
To avoid ambiguitiesdueto conicting denitions inthe literature, wementionthat we usethe denition
Erfc(x)=1 (2= p ) R x 0 e u 2 du. 3
ForprovingTheorem2wedonothavetoassumethatA
C
hasaGaussianform.The"2-termin(64)vanishes
forallfunctionsG inv
2
becauseofthefactor~=c0=O(1= p
c0). However,thecomputationofA
C
involvesevenmore
summedcontributionsthanA
j
,soitissafetoassumethatwhenA
j
isGaussian,thenA
C
Thecaseq=2. Itwasshownin[7]fortheoriginalTardosschemethattheCLTcanonlybe
applied ift>0. Theprobabilitydistribution oftheaccusation U
i
(for innocent users)due
to thesymboly
i
isproportionalto1=(1+U 2
i )
2
. The3rdmomentiszero. Fordistributions
with vanishing 3rd moment, the CLTonly holds when the 4thmoment does not diverge.
However,for t=0the4th momentdoes diverge. Hence we needt>0. Exactly thesame
reasoningappliestothesymmetricschemewithq=2.
Thecaseq3. For q 3, the 3rd moment of the probability distribution of A (i)
j (for
innocentj)isalwaysnonzero,nomatterwhatthevalueoftis. ThisisshowninAppendixC,
Eq.(74). HencetheCLTappliesevenifwesett=0. IntheGaussianapproximation,there
isnoreasontohaveacutotforq3.
7.2 Dierent attack models
Uptothispointwehaveonlyconsideredtherestricteddigitmodel. However,itiseasytoobtain
resultsfortheother attackmodelslistedin Section1.2. As canbeseenfrom (64),thebound on
thecodelengthisproportionalto~ 2
j =~
2
. 4
Thedierencesbetweenthevariousattackmodelsgive
rise to dierent values ~
j
, ,~ but the form (64) is independent of the attack model. Hence, in
ordertosee thedierencesbetweentheattackmodels, itissuÆcientto comparetheratio~
j =~.
The unreadable digit model is discussed in Appendix A. It is assumed that the colluders
output anerasuresymbol`?' whenever theycan,and that thedistributor giveszeroaccusation
to locationswith anerasure. It turns outthat forlarge alphabets (q '7) the colluderstrategy
ofoutputtingerasuresisgood,and thedistributorhastouselongercodesthanin therestricted
digit model. However,for small alphabets it isbetter forthecolluders notto usean erasureat
eachdetectableposition,asa`?' informsthedistributorthat thepositionisdetectable.
Results for the arbitrary digit model are derived in Appendix B. Unsurprisingly, with this
attack model a nonbinary scheme always performs worse than the symmetric binary scheme;
the colluders have ample opportunity to incriminate innocent users while avoiding accusation
themselves.
8 Summary
Inthis paperwehaveproposed anewconstructionforarandomizeddigitalngerprintingcode,
whichissimilartoarecentconstructionbyTardosbutcanbeusedwitharbitrarysizealphabets.
Wehaveanalyzedtheperformanceofourscheme,in therestricteddigitmodel,in twoways.
First,wehaveprovedalowerboundonthecodelengthmsuchthatthedesiredFalsePositive
andFalseNegativeerrorprobabilitiesareachievedagainstanycoalitionofsize cc
0
. Due toa
dierentwayofcomputingaccusations,theproposedcodeallowsfor10timesshortercodes(with
respectto [9])inthecaseofabinaryalphabet. Movingto acodeoveraq-aryalphabetallowsa
furtherreductionofthecodelengthof35%atq=3and80%at q=10.
Second,wehaveanalyzedourschemeundertheassumptionthattheaccusationsumA
j follows
aGaussian distribution. This `Gaussianapproximation'is valid at coalitionsizes c
0
of
approxi-mately 10{20and larger. Wehaveshown that,in this approximation,thecollusionresistanceof
theschemeisretainedforacodelengthmthat istwiceasshortasthebound obtainedusingno
assumptions.
Acknowledgements
WekindlythankJoopTalstraandHenkHollmannforvaluablecomments,andGuidoJanssenfor
providingus withliteraturereferences.
4
Theorem1isformulatedafterthesubstitution~!1hasbeendone. IfapplicationofLemma3ispostponed
intheanalysisinSection4,wegettheproportionalityto~ 2
References
[1] G.E. Andrews, R. Askey, and R. Roy. Special Functions. EncyclopediaofMathematics and
itsApplications.CambridgeUniversityPress,1999.
[2] M.Z. Bazant. Random walks and diusion. Technical report, MIT Lecture notes, http:
//www-math.mit.edu/18.366/lec/lec04.pdf,2005.
[3] D.BonehandJ.Shaw.Collusion-securengerprintingfordigitaldata. IEEETransactionson
Information Theory,44(5):1897{1905,1998.
[4] L. Devroye. Non-Uniform Random Variate Generation. Springer-Verlag, 1986. Available
onlineat http://cg.scs.carleton.ca/luc/rnbookindex.html.
[5] H.D.L. Hollmann,J.H.vanLint,J-P.Linnartz,and L.M.G.M.Tolhuizen. Oncodeswith the
identiableparentproperty. JournalofCombinatorial Theory,82:472{479,1998.
[6] C. Peikert, A. Shelat, and A. Smith. Lower bounds for collusion-secure ngerprinting. In
Proceedingsofthe14thAnnualACM-SIAMSymposiumonDiscreteAlgorithms(SODA),pages
472{478,2003.
[7] B.
Skoric,T.U.Vladimirova,M.Celik,andJ.C.Talstra.Tardosngerprintingisbetterthanwe
thought. Technicalreport,Submitted toIEEETransactionsonInformationTheory.Preprint
at arXivrepository,http://www.arxiv.org/abs/cs.CR/0607131,2006.
[8] J.N. Staddon, D.R.Stinson, andR. Wei. Combinatorial properties of frameproofand
trace-abilitycodes. IEEETransactions on InformationTheory, 47(3):1042{1049,2001.
[9] G. Tardos. Optimalprobabilistic ngerprintcodes. InProceedings of the 35th Annual ACM
SymposiumonTheoryof Computing(STOC),pages116{125,2003.
A Unreadable digit model
In this appendix weconsider the case of the unreadabledigit model. Inthis attack model, the
colludersareallowedtooutput theerasuresymbol`?' in detectable positions. Forsimplicitywe
maketwo assumptions: (i) The colluders generate an erasure whenever they can, and (ii) The
distributorgiveszeroaccusationin caseofanerasuresymbol.
Thequantities~
j
and ~ arebothaected by these assumptions. Theyare easilycomputed,
sinceall detectable positions (leading to `?') are discardedby the distributor. This leavesonly
theundetectablepositions,characterizedbyvectorsbthat consistofq 1zerocomponentsand
onecomponentequaltoc. Wehave
~ 2
j =
q 1
X
=0 Z
J(0;q) d
q
pF(p)p c
=q
(q) (c+)
() (c+q)
(69)
and
~ =
q 1
X
=0 Z
J(0;q) d
q
pF(p)p c
cg
1 (p
)=cq
(q) (c 1
2 +) (
1
2
+[q 1])
() ([q 1]) (c+q)
: (70)
Recallfrom (64)thatthe requiredcodelengthis proportionalto~ 2
j =~
2
. Using(69)and (70)we
obtain
~ 2
j
~ 2
= 1
qc 2
(c+) (c+q)
[ (c 1
2 +)]
2 ()
(q)
([q 1])
( 1
2
+[q 1])
2
c
1+[q 1]
q
()
(q)
([q 1])
( 1
+[q 1])
2