• No results found

Symmetric Tardos fingerprinting codes for arbitrary alphabet sizes

N/A
N/A
Protected

Academic year: 2020

Share "Symmetric Tardos fingerprinting codes for arbitrary alphabet sizes"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

for arbitrary alphabet sizes

B.

Skoric, S.Katzenbeisser, M.U.Celik

Abstract

Fingerprinting provides a means of tracing unauthorized redistribution of digital data by individually

markingeachauthorized copy witha personalized serial number. Inorderto prevent agroup ofusers

fromcollectivelyescapingidentication,collusion-securengerprintingcodeshavebeenproposed. Inthis

paper,weintroduceanewconstructionofacollusion-securengerprintingcodewhichissimilartoarecent

constructionbyTardosbutachievesshortercodelengthsandallows forcodesoverarbitraryalphabets.

Forbinaryalphabets,nusersandafalseaccusationprobabilityof,acodelengthofm 2

c 2

0 ln(n=)

isprovablysuÆcienttowithstand collusionattacksofat mostc

0

colluders. ThisimprovesTardos'

con-structionbyafactorof10. Furthermore,invokingtheCentral LimitTheoremweshowthatevenacode

lengthofm 1

2

2

c 2

0

ln(n=)issuÆcientinmostcases. Forq-aryalphabets,assumingtherestricteddigit

model,thecodesizecanbefurtherreduced. Numericalresultsshowthatareductionof35%isachievable

forq=3and80%forq=10.

1 Introduction

1.1 Digital ngerprinting

Fingerprinting,orforensicwatermarking, providesameans oftracingthe unauthorized

redistri-bution of digitaldata, suchas entertainment content(i.e. music ormovie clips), digitalrecords

orsoftware. Beforeauthorized distribution, the distributorimperceptibly embeds a ngerprint,

whichplaystheroleofapersonalizedserialnumber,directlyintothecontent. Thisisdoneusinga

digitalwatermarkingalgorithm. Ifthengerprintisdierentforeachrecipient(alsocalled`user'),

the distributor can extract the embedded ngerprintfrom an unauthorized copy of thecontent

andtracetherecipientwholeakedit.

Mathematicallyspeaking, angerprint isa nitestring oversomeq-ary alphabet ; theset

ofall ngerprintsis called angerprintingcode. Throughout thispaperwewill denotebyn the

number of users and by m the length of the ngerprint. In order to mark a piece of content

before distribution, thedistributor picks angerprint from the codeand imperceptibly embeds

eachsymbolof the ngerprintinto dierent segmentsof the content, such asin dierent scenes

ofamovie. Inaddition,hestoresin adatabasetheassociationof angerprintwiththeidentity

of the user whoreceivedthe personalizedcopy. In casean unauthorized copyof thecontentis

found, the distributorcan perform watermark detectionon thesegments of thecontentto read

out its ngerprint. Once the ngerprint is retrieved, he can compare it with his database of

ngerprints to identify the guilty user. Current watermarking schemes provide a considerable

level of robustness that allows correct reconstruction of the ngerprint even if the content has

sueredheavydistortions.

1.2 Collusion resistance

Fingerprintingschemesneedto berobust againstcollusionattacks, whereseveraluserspool

dif-ferent individualized versions of thesame content. By looking at the dierencesbetween these

(2)

untraceableversionofthecontent,fromwhichthedistributorcannotidentifyanyofthecolluders.

Asegmentofthecontentiscalledadetectablepositionifthecolludershaveatleasttwodierently

markedversionsofthatsegmentavailable.

Acode iscalled collusion-resistantagainstacoalitionof sizec

0

, ifanyset ofcc

0

colluders

isunable toproduceanuntraceablecopy. Theconstructionof collusion-resistantcodeshasbeen

anactiveresearch topic sincethelate 1990s(see e.g. [5, 8,3,6, 9]). Theconstructionsand the

achievedresultsdependstronglyonvariousassumptionswhichrestrictthetypeofmanipulations

theattackersareallowedtoperform. Oneoftenmadeassumptionisthemarkingcondition,stating

thatthecolludersareabletochangengerprintsymbolsonlyindetectablepositions. Throughout

thispaperwewillassumethat themarkingconditionholds. Furthermore,severalattackmodels

havebeenintroduced intheliterature:

Therestricteddigitmodelornarrow-casemodelallowsthecolludersonlyto`mixandmatch'

their copiesof thecontent,i.e. to replaceasegment in adetectable position by anyother

segmenttheyhaveavailable in that position. On thengerprinting code level, this means

that in theunauthorizedcopythesymbolateachposition canonlybeone ofthesymbols

that theyhaveavailablein thatposition.

Theunreadable digit model allowsfor slightlystrongerattacks. Besides mixingthecontent

segments,theattackerscanalsoerasetheembeddedngerprintatdetectablepositions. At

thecodelevel,wedenotethisbyaspecialerasuresymbol?62.

Thearbitrarydigitmodelallowsforevenstrongerattacks: theattackerscanputanarbitrary

q-arysymbolfrom(butnottheerasuresymbol`?')in detectablepositions.

Thegeneral digit model allowstheattackerstoput anysymbol,including`?',in detectable

positions.

Note that in the case of a binary alphabet all four attack models are equivalent in terms of

traceability. (Forq=2itisdetrimental forthecolludersto use`?', sinceitgivesthedistributor

more information than a `0' or `1', namely that the position is a detectable position for the

coalition).

Themainparametersofangerprintingcodearethecodewordlength,theFalse Positive (FP)

errorprobabilityandtheFalseNegative(FN)errorprobability. Thecodewordlengthinuencesto

agreatextentthepracticalusabilityofangerprintingscheme,asthenumberofsegmentsmthat

can be used to embed angerprint symbolis severely constrained; typical video watermarking

algorithms forinstance canonly embed 7bits of informationin arobustmanner in oneminute

of a videoclip. Furthermore,the amount of information that can be embedded per segmentis

limited; hencethe alphabet size q must besmall (typicallyq 16). Obviously, distributors are

interestedintheshortestpossiblecodesthataresecureagainstalargenumberofcolluders,while

accommodatingahugenumbernofusers(oftheorderofn10 6

orevenn10 9

).

Lowerrorprobabilitiesareanothercentralrequirement. Themostimportanttypeoferroristhe

FP,whereaninnocentusergetsaccused. Theprobabilityofsuchaneventmustbeextremelysmall;

otherwise the distributor's accusations would be questionable, making the whole ngerprinting

schemeunworkable. Wewilldenoteby"

1

theprobabilitythatonespecicusergetsfalselyaccused,

while denotestheprobabilitythatthereareinnocentusersamongtheaccused. Thesecondtype

oferroristheFN,wheretheschemefailstoaccuseanyofthecolluders. TheFNprobabilitywill

be denoted as "

2

. In practicalsituations, fairly large values of "

2

can be tolerated. Often the

objectiveofngerprintingisto deterunauthorizeddistribution ratherthantoprosecuteallthose

responsible for it. Even amere 50% probability of getting caught is a signicant deterrent for

colluders.

1.3 Related work

For the restricted digit model, `deterministic' ngerprinting codes have been proposed. Here

(3)

(IPP)codes,introducedin[5],allowthedistributortoidentifyatleastonememberofthecoalition

with certainty, withoutthedanger of accusinginnocentpeople. However,theschemes proposed

in [5] are not resistant against more than two colluders. In [8] the existence was proved of a

deterministicngerprintingcoderesistantagainstc

0

colluders,havingcodelengthm=c 2

0 log

q (n).

However,thealphabetsizeisimpracticallylarge,requiringqn 1.

MoreeÆcientngerprinting schemes are possible ifnonzero error probabilities and "

2 are

tolerated. In [3] Boneh and Shaw presented a binary scheme (q = 2) with code length m =

O(c 4

0 log

n

log

1

). Their scheme uses concatenation of apartly randomized inner code with an

outer code. Theyalso proved, forbinary alphabets, alowerbound on thecodelength required

forresistanceagainstc

0

colluders: m>O(c

0 log

1

c0

). In[6]Peikertet al. provedatighterlower

boundofm>O(c 2

0 log

1

c

0

).

In[9] Tardosfurther tightenedthelowerbound to m>O(c 2

0 log

n

). This bound is valid for

arbitrary alphabets in the arbitrary digit model and the unreadable digit model. In the same

paper, he described a fully randomized binary ngerprinting code achieving this lower bound.

Thecodehaslengthm=100c 2

0 ln

n

;aconstructionwasgivenonlyforthebinaryalphabet. In[7]

Tardos'constructionwasfurther analyzed. Itwasshownthat, withoutchangingthescheme,the

constant`100'canbereducedto4 2

. Inthesamepaperitwasshownthatanimportantquantity

inthescheme(the`accusationsum',seeSection2.1),resultingfromthesummationofmanyi.i.d.

stochasticvariables, hasaGaussian distribution,uptocorrectiontermsthat vanishforlargec

0 .

WithoutchangingtheTardosschemeinanyway,butassumingaGaussiandistribution,thecode

lengthmwas furtherreducedtom=2 2

c 2

0 ln

n

.

1.4 Contributions and outline

In thispaper, we propose anew constructionof angerprinting code, which is similar in spirit

toTardos'originalcode,butallowsforcodesoverarbitrary-sizealphabets. Forbinaryalphabets

the new scheme allows for codes that are afactor 4shorter than the construction given by [7]

(andthusafactor10shorterthanthe schemegivenin [9]). Intherestricteddigitattackmodel,

movingfrom abinary to aq-ary alphabet allowsforevenshorter ngerprintingcodes. The key

contributionsofthepaperaresummarizedasfollows:

In Section 2 we review Tardos' binary ngerprinting scheme [9] and propose a dierent

construction, which is symmetric and which can be used for arbitrary alphabets . The

constructionisdierentfromTardos'codeevenforbinaryalphabets.

InSection4westudythecollusionresistanceofthesymmetriccode. Weapplythemethods

of [9]to rigorouslyprovealowerbound onthecodelength m, such that the desirederror

rates are achieved. The bound is given by m >4~ 2

c 2

0 ln

n

, where the quantity ~ is the

expectationvalueof thecoalition'scollective`suspiciousness'.

InSection 5wecomputetheexpectationvalue~in therestricteddigitmodel. Inthecase

of abinary alphabet wehave~=2=. Thiscorrespondstoaboundon thecode lengthof

m> 2

c 2

0 ln

n

, which isafactor4shorter thanthebound obtainedfortheTardosscheme

in[7]andafactor10shorterthantheboundgivenin[9]. Forq-aryalphabetswecompute

~

numerically. Thecodelengthmisfurtherreduced(withrespecttothebinarysymmetric

scheme)by40%forq=3andby80%forq=10.

InSection6wemakeuseoftheCentralLimitTheoremtoshowthatanimportantquantityin

thescheme,theaccusationsumofaninnocentuser,hasaprobabilitydensitythatisalmost

Gaussian. Convergenceto thenormalformimproveswithincreasingc

0

. Approximationof

the distributionby aGaussian isaccurate starting from avalueof c

0

between 10and 20.

Assuming aperfect normal distribution,we showthat the desirederrorrates areachieved

for m >2~ 2

c 2

0 ln

n

(4)

2 Symmetric Tardos ngerprinting for arbitrary alphabet sizes

InthissectionwerstintroduceTardos'initialbinary ngerprintingcode[9]and thenprovidea

generalizationforarbitraryalphabets.

2.1 The Tardos ngerprinting scheme

Letnbethenumberofuserstobeaccomodatedinthesystem. TheTardosngerprintingscheme

distributes a binary codeword of length m to each user; the length m is a system parameter

chosenbythedistributor. ItaectstheFPandFNerrorrates. Thedistributedcodewordscanbe

arrangedasannmmatrixX ,wherethej-throwcorrespondstothengerprintgiventothej-th

user. LetCbeasetofcolludingusers.WedenotebycthenumberofcolludersandbyX

C

thecm

matrixofcodewordsdistributedtothecolluders. Thecolludersusea(possiblynondeterministic)

strategy to create an unauthorized copy of the content from their personalized copies. The

unauthorizedcopycarriesangerprinty 2f0;1g m

which depends onboththestrategyand the

receivedcodewords,i.e. y =(X

C ).

Fingerprintcode generation . The distributor generates the matrix X in two randomized steps.

Inthe rststep,he chooses mrandom variablesfp

i g

m

i=1

overtheintervalp

i

2[t;1 t],where t

is axed small parametersatisfyingc

0

t 1. Thevariables p

i

are independent and identically

distributed according to the probability density function f. The function f(p) is symmetric 1

aroundp=1=2and heavilybiasedtowardsvaluesofpclosetot and1 t,

f(p)=

1

2arcsin(1 2t) 1

p

p(1 p)

: (1)

In thesecond step, thedistributor lls the columns of the matrixX by independentlydrawing

randombitsX

ji

2f0;1gaccordingtoP[X

ji

=1]=p

i .

Fingerprintembedding . Beforethecontentisrealeasedtocustomerj, itis watermarkedwiththe

j-throwofthematrixX.

Accusation. Havingspottedanunauthorizedcopywithembeddedwatermarky,thecontentowner

wantsto identifyat leastonecolluder. Toachievethis,he computesforeach user1j n an

accusationsumS

j as

S

j =

m

X

i=1 y

i U(X

ji ;p

i

); with U(X

ji ;p

i )=

g

1 (p

i ) ifX

ji =1

g

0 (p

i ) ifX

ji =0;

(2)

whereg

1 andg

0

arethe`accusationfunctions'

g

1 (p)=

r

1 p

p

and g

0 (p)=

r

p

1 p

: (3)

Thedistributordecidesthatuserj isguiltyifS

j

>Z. TheparameterZ iscalled the`accusation

threshold'. Thethresholdisasystemparameterchosenbythedistributor.

Inwords,theaccusationsumS

j

iscomputedbysummingoverallsymbolpositionsiin y. All

positionswithy

i

=0areignored. Foreachpositionwherey

i

=1,theaccusationsumS

j

iseither

increasedordecreased,dependingonhowmuchsuspicionarisesfrom thatposition: ifuserj has

a `1' in that position, then the accusation is increased by apositive amount g

1 (p

i

). Note that

thesuspiciondecreaseswithhigherprobabilityp

i

,sinceg

1

isapositivemonotonicallydecreasing

function. Ifuserjhasa`0',theaccusationiscorrectedbythenegativeamountg

0 (p

i

),whichgets

morepronouncedforlargevaluesofp

i , asg

0

isnegativeandmonotonicallydecreasing.

Tardos chose thespecic form(3) forthe functions g

1 and g

0

becauseit hasniceproperties:

For xed p

i

, the accusation U(X

ji ;p

i

) in (2) has zero mean and unit variance. Especially the

fact thatthevariancedoesnotdepend onp

i

greatlysimpliestheanalysisof thescheme. Itwas

1

In[9]theparametrizationp =sin 2

(5)

shown in [7] thatfor Tardos'schemethechoice (1)for f is optimal, andthat the choice (3) for

theaccusationfunctionsisoptimalwithintheclassoffunctionsoftheformp z1

(1 p) z2

,wherez

1

andz

2

areconstants.

TardoschosethesystemparametersmandZ asfollows:

m=Ac 2

0 dln"

1

1

e ; Z =Bc

0 dln"

1

1

e; (4)

withA=100andB =20. RecallfromSection1.2that theparameter"

1

isare-scaledversionof

thefalsepositiveerror parameter. It representstheprobabilitythat aspecicinnocentuserj

getsaccused. Therelationbetween and"

1

is =1 (1 "

1 )

n c

. For"

1

1and cnthis

becomes (n c)"

1 n"

1 .

AFalseNegative(FN)isdenedastheeventwherenoneofthecolludersareaccused. Tardos

provedin[9]thathisschemeachievesFPandFNerrorratessmallerthan"

1 and"

2

,respectively,

againstcoalitionsofsizecc

0 ,for"

2 ="

c

0 =4

1

. In[7]theTardosschemewasfurtheranalyzedand

thefollowingresultswereobtainedfor"

2 "

1

(afar morereasonablechoice ofparameters,see

Section1.2): (i)thecodelengthparameterAin(4)canbereducedto4 2

. (ii)Theaccusationsum

S

j

hasanalmostGaussian probabilitydensityfunction,withcorrectionsthatvanishin thelimit

c

0

!1. (iii)Assumingaperfect Gaussiandistribution forS

j

,theparameterA canbereduced

to2 2

. Hence,forsuÆcientlylargec

0

,thecodelengthmcanbesettom=2 2

c 2

0 dln"

1

1

ewithout

anymodicationofthecodeconstruction,embeddingoraccusationmethod.

2.2 Proposed symmetric ngerprinting scheme

TheschemepresentedinSection 2.1hastwodrawbacks. First,thecomputationofTardos'

accu-sationsum(2)isasymmmetricinthesensethatonlythosecodewordpositionsicontributewhere

y

i

=1,whilealltheothersarediscarded. ThisisanineÆcientwayofexploiting theinformation

presentin theunauthorizedcopy, becausethey

i

=0positionscarryasmuchinformation about

thecolludersasthey

i

=1positions. Second,dueto thisasymmetry,theconstructioncannotbe

directlyapplied tononbinaryalphabets.

WeapplytwomodicationstoTardos'construction. Therstmodicationisastraightforward

generalization of the ngerprint generation step to produce a random q-ary code. Instead of

bits we have X

ji

2 , with = f0;1;;q 1g. Instead of scalar random variables p

i we

have,independentlyforeachcolumn, aq-componentrandomvectorp (i)

=(p (i)

0

;;p (i)

q 1 ), with

P

q 1

=0 p

(i)

= 1. The vectors p (i)

have the probability density function F(p), which replaces

f(p). Whilef is invariantunder themappingp!1 p, ourfunction F is invariantunder any

permutation ofthesymbols2. Thusourconstructionissymmetricinallsymbols2. In

thei-thcolumnofX ,randomsymbolsaredrawnwithprobabilitiesdictatedbyp (i)

. Thecolluders

createanunauthorizedcopyy=(X

C )2

m

accordingtoa(possiblynon-deterministic)strategy

. We will always assume that this strategy does not depend on the column index, i.e. the

same strategy is applied to each column of X

C

. We can make this assumption without loss

of generality; due to the column symmetry of p and X, the best colluder strategy is

column-symmetric.

Thesecondmodicationliesinthecomputationoftheaccusationsum. IncontrasttoTardos'

scheme, we letevery ngerprintsymbolin the unauthorized copy giverise to accusations. The

accusation for acertain user at acertain symbollocation is positiveif he hasthe samesymbol

astheunauthorizedcopy;otherwiseitis negative. The magnitudeoftheaccusation depends on

the likelihood of thesymbolthat appears in the unauthorizedcopy. In full detailthe proposed

constructionisasfollows:

Fingerprintcode generation . Asin the originalTardos construction,thedistributorproducesan

nm matrixX ofq-arysymbols;the rowsof thematrixcorrespondto thengerprintsfor the

individual users. We parametrize m as in (4); the value of the parameter A is the subject of

Sections4,5and6. Again,thedistributorusesatwo-stepprocedure:

1. He generates m independentrandom vectorsp (i)

=(p (i)

;;p (i)

(6)

the components satisfy p (i)

2 [t=(q 1);1 t] and P

q 1

=0 p

(i)

= 1. We call t the `cuto

parameter'orthe`cuto'. Itsatises0<t1;weparametrizeitast=Tc a

0

,withT >0

and a2(0;2). Therandomvariableshaveaprobabilitydensityfunction that issymmetric

in allthecomponentsp

. Inourconstruction,weuseaclassoffunctionsthatare aspecial

caseoftheDirichletdistribution(seee.g. [4]),

F

qt

(p)=N 1

qt q 1

Y

=0 p

1+

with>0: (5)

Here N

qt

isa normalisingconstantensuring that R

J(t;q) d

q

pF

qt

(p) =1. The expression

R

J(t;q) d

q

p stands for R

1 t

t

q 1 dp

0

R

1 t

t

q 1 dp

q 1 Æ(1

P

q 1

=0 p

), where Æ() is the Dirac delta

function. The delta function ensures that the integration is done only over p such that

P

p

= 1. The parameter determines the steepness of F

qt

. For q = 2, = 1

2 the

function F

qt

reducestoTardos'densityfunction (1).

2. The distributorgenerates thecolumns of Xindependently. In thei-th column, the vector

p (i)

determinestheprobabilitiesofgeneratingeachspecic symbolin thealphabet:

P[X

ji

=]=p (i)

: (6)

Fingerprintembedding . Beforethecontentisrealeasedtocustomerj, itis watermarkedwiththe

j-throwofthematrixX.

Accusation. The distributor extracts the ngerprint y from the unauthorized copy. For each

userj, thedistributor computesthe`accusationsum'A

j

from X,p andy. He decidesthat the

userj isguilty ifA

j

>Z, where Z isreferredto asthe`accusationthreshold'. Weparametrize

Z asin (4),withtheconstantB asyetleftundetermined. Thelistofaccusedusersisdenotedas

(p;X ;y). TheaccusationsumA

j

isgivenby

A

j

(p;X ;y)= m

X

i=1 A

(i)

j

; A

(i)

j :=Æ

yi;Xji g

1 (p

(i)

yi

)+[1 Æ

yi;Xji ]g

0 (p

(i)

yi

); (7)

whereÆ

x;y

denotes theKroneckerdelta. Wehavechosenthesamefunctions g

0 (p)=

p

(1 p)=p,

g

1 (p) =

p

p=(1 p) asTardos. There is no guarantee that this choice is optimal for q > 2.

Thechoice ismotivated bythe zero-mean,unit-variancepropertymentionedin Section2.1; this

propertyleadsto asubstantialsimplication oftheanalysis inthecomingsections.

Inwords,theaccusation(7)iscomputedasfollows. Ifuserj hasthesamesymbolinposition

iastheunauthorizedcopy,thenheisaccusedbyapositiveamountg

1 (p

(i)

yi

),wheretheaccusation

decreaseswith growinglikelihoodof thesymbol. Ifuserj hasadierentsymbolthanthe

unau-thorizedcopy,thenheisaccusedbyanegativeamountg

0 (p

(i)

yi

),whichhasthelargesteectwhen

thesymboly

i

islikelytooccur.

Notethat (7)is fullysymmetric inthe symbolsand that itdiersfrom Tardos' construction

even forq =2. Note furtherthat theKroneckerdeltas in (7)reduce thesymbolspaceinto two

classes: X

ji =y

i and X

ji 6=y

i

. Inthelatter casetheaccusation doesnotdepend onthe actual

valueofX

ji .

Attackmodel . AsmentionedinSection1.2,weusethemarkingassumptionandweassumethatthe

restricteddigitmodelholds. Inaddition,wemaketwoassumptionsontheattackstrategyofthe

colluders. First,weassumethatallmembersofthecoalitionareequivalent. Hence,theybasetheir

decisionsonlyonthenumberofsymbolstheyreceive,andnotontheidentityofthememberswho

receivethem. (Anydeviationfrom thisstrategywillmakeiteasierforthedistributortoidentify

(7)

3 Preliminaries

InordertofacilitateourworkinSections4and5,weintroducesomenotationandstateanumber

oflemmas.

3.1 Normalisation constant

ThevalueofthenormalisationconstantN

qt

in(5)iseasilycomputedfort=0,usingthefollowing

lemma(seee.g. [1]):

Lemma 1: Letv bea vector oflengthq withv

>0for0q 1. Then

Z

J(0;q) d

q

p q 1

Y

=0 p

1+v

=B(v ):= Q

q 1

=0 (v

)

( P

q 1

=0 v

)

:

ThefunctionB isthegeneralizedBetafunction,alsoreferredtoasthemultinomialBetafunction

orDirichletintegral.

Proofsketch: Fortwocomponents(q =2) thelemmais true,astheintegralyieldstheordinary

Betafunction. Forhigherq thelemmacanbeprovedbyinduction.

Fort6=0,q=2theintegralyields theso-calledincompleteBetafunction.

ApplyingLemma 1tothedenition ofF

qt

in (5),wecomputethenormalisationfactorN

qt

fort=0tobe

N

q0 =

[ ()] q

(q)

: (8)

Remark: ThedierencebetweenN

qt andN

q0

issmall. Thisisseenasfollows. Theintegrandin

N

qt

is oftheform Q

p

1+

with>0. Theprimitivefunction nearapoleatp

=0scalesas

p

. Hencethecontributionsfromthepoles, presentin N

q0

andabsentinN

qt

,are ofordert

.

Ifisnotextremelycloseto0,thent

1.

3.2 Collective accusation sum

Let C be the set of colluding users and X

C

the restriction of X to the rows received by the

colluders. From(7)wedeneausefulquantity: the`collectiveaccusationsum'A

C

,beingthesum

ofallindividualaccusationsumsofthecoalitionmembers,

A

C =

X

j2C A

j =

m

X

i=1 A

(i)

C

; A

(i)

C :=b

(i)

y

i g

1 (p

(i)

y

i

)+[c b (i)

y

i ]g

0 (p

(i)

y

i

): (9)

Hereb (i)

standsforthenumberofoccurrencesofthesymbolincolumniofX

C

. Thesenumbers

satisfytheconstraint P

q 1

=0 b

(i)

=c. ThesumA

C

playsanimportantroleintheFNerrorrate.

3.3 Denition of averages

Thereare threestochasticprocessesinvolvedin thecreationofthengerprintingcodewordsand

the unauthorized copy: The distributor's choice of vectors p (i)

, his process of generating the

columns of X, and the coalition's choice of symbols y

i

. For each process we dene a separate

expectationvalue. Averagingoverpis denotedasE

p

. Withinthei-thcolumnthisisdened as

E

p [(p

(i)

)]:= Z

J(t;q) d

q

p(p)F

qt

(p); (10)

(8)

We remind the reader that all the vectors p (i)

are independent. We denote the codeword

receivedbyuserj asX

j

. Forgivenp,averagingoverX

j

isdenotedasE

X

j

. Wedene

E

X

j [(X

ji )]:=

q 1

X

=0 ()p

(i)

: (11)

Inparticularwehave,foraninnocentuserj,

E

X

j [Æ

y

i ;X

ji ]=p

(i)

yi

: (12)

Forxedp,averagingoverX

C

isequivalenttoaveragingovertheintegersb(seeEq.9). Theb (i)

aredistributed accordingtoamultinomialdistribution. Wehave

E

b [(b

(i)

)]:= X

b (b)

c

b

q 1

Y

=0 [p

(i)

]

b

: (13)

Thenotation c

b

standsforthemultinomialc!=(b

0 !b

q 1

!). Thesum P

b

standsforsummation

overallqcomponentsofb,withthecondition P

b

=cimplicitlyassumed,

X

b

(b)= c

X

b0=0

c

X

bq 1=0 Æ

c;b0+b1++bq

1

(b): (14)

Finallywehaveto dealwiththestochastic strategyof thecoalition. We introduce thenotation

P

b

()fortheprobabilitythatthecolludersoutputthesymboly =in acertainposition,given

thattheyreceivedsymbolsaccordingtob. Averagingovery isdenoted asE

y ,

E

y [(y

i )]:=

q 1

X

=0 ()P

b (i)

(): (15)

Theexpectationvaluetakenoverallstochasticdegreesoffreedomisdenoted asE

yXp

. It canbe

computedbyrsttakingtheexpectationvalueE

y

(15)forxedb,thenforxedptakingE

b (13)

andE

X

j

(11)forallinnocentusersj,andnallyE

p

(10). Notethatseveralorderingsarepossible.

Forinstance, theexpectationE

Xj

(forinnocentj)canbetakenbeforeE

y and E

b

,sincey andb

donotdependonthecodewordsgiventoinnocentusers.

3.4 Statistical properties of the accusation sums

To facilitate the analysis in the coming sections we introduce `scaled' averages and variances,

denedsuchthat theydonotdependonm. Foraninnocentuserj wedene

~

j =

E

yXp [A

j ]

m

=E

yXp [A

(i)

j

] ; ~

2

j =

E

yXp [A

2

j ] E

2

yXp [A

j ]

m

: (16)

Forthecollectiveaccusationwedene

~ =

E

yXp [A

C ]

m

=E

yXp [A

(i)

C

] ; ~

2

= E

yXp [A

2

C ] E

2

yXp [A

C ]

m

: (17)

Thecolumnindexiin A (i)

C

in(17)andA (i)

j

in (16)canbechosenarbitrarily;theresultdoesnot

depend on i. The quantites ~

j , ~

j

and ~ are discussed below, whereas Section 5is devoted to

computing .~

(9)

Proof: WeevaluatetheexpectationE

yXp

byrstcomputingtheexpectationE

X

j

. Weapply(12)

to the denition of A (i)

j

(7). This gives E

Xj [A

(i)

j ] = p

(i) y i g 1 (p (i) y i

)+(1 p (i) y i )g 0 (p (i) y i

) = 0. The

last equality follows from the denition (3) of g

1 and g

0

. From E

Xj [A

(i)

j

] = 0 it follows that

E

yXp [A

(i)

j

]=0.

Lemma 3: For aninnocent user j wehave~

j =1.

Proof: UsingtheidempotencyoftheKroneckerdeltas inthedenitionofA (i)

j

in (7)wewrite

A 2 j = m X i=1 ( Æ y i ;X ji 1 p (i) yi p (i) yi

+(1 Æ

y i ;X ji ) p (i) yi 1 p (i) yi ) + X 1i;k m i6=k A (i) j A (k ) j : (18)

WeevaluateE

yXp

byrstcomputingtheexpectationE

X

j

. Usingproperty(12)andindependence

ofthecolumnsofX ,weget

E X j [A 2 j ]= m X i=1 E X j [1]+ X i;k ;i6=k E X j [A (i) j ]E X j [A (k ) j

]=m+0: (19)

HerewehavemadeuseofthepropertyE

Xj [A

(i)

j

]=0(seeproofofLemma2). FromE

Xj [A

2

j

]=mit

followsthatE

yXp [A

2

j

]=m. Thedenitionof~

j

in(16)canberewrittenas~

j =(1=m)E yXp [A 2 j ] m~ 2 j

. SubstitutionofE

yXp [A

2

j

]=mintothisexpressionandapplicationofLemma2gives~

j =1.

Lemma 4: Themean~andvariance~satisfy

~ 2

+~ 2

<qc: (20)

Proof: Fromthedenitionsof,~ ~(17)andA

C ,A

(i)

C

(9)itfollowsthat

~ 2 = m 1 E yXp [A 2 C ] m~

2 = m 1 0 @ m X i=1 E yXp [fA (i) j g 2 ]+ X i6=j E yXp [A (i) j ]E yXp [A (k ) j ] 1 A m~ 2 = E yXp [fA (i) C g 2 ] ~ 2 : (21)

UsingtheidempotencyoftheKroneckerdelta,wewrite

fA (i) C g 2 = q 1 X =0 Æ y [b g 1 (p

)+(c b

)g 0 (p )] 2 : (22)

WeapplythetotalaverageE

yXp

asdescribedinSection3.3,byrstperformingE

y

,thenE

X and

nallyE

p

. Weget

E yXp [fA (i) C g 2 ]= q 1 X =0 c X b=0 c b E p h p b (1 p ) c b E bnb [P b ()]fb g 1 (p

)+[c b

]g 0 (p )g 2 i : (23)

HerethenotationE

bnb

indicatesaveragingoveralldegreesoffreedominbexceptb

. Notethat

the expression in E

p

[] is always nonnegative. So, using E

bnb [P

b

(10)

r.h.s. of(23)by

E

yXp [fA

(i)

C g

2

] < q 1

X

=0 E

p "

c

X

b=0

c

b

p b

(1 p

)

c b

fb

g

1 (p

)+[c b

]g

0 (p

)g

2 #

= q 1

X

=0 E

p

[c]=qc: (24)

The rstequality is obtainedby observing, asin [9], that theb

-sum representsthe resultof a

randomwalkconsisting ofc steps,each ofwhich haszeromeanandunit variance. (Thisfollows

fromLemmas 2and3).

Remark: In Section5it willbeshownthat ~ doesnotincreaseasafunction ofc. Lemma 4

then showsthat ~=O( p

c) forc !1. This asymptoticbehaviourof~ will play animportant

rolein Section6.

4 Lower bound on the code length in the proposed symmetric scheme

Here weanalyze thesymmetric scheme described in Section 2.2. Weprovidea lowerbound on

thecodelengthm,asafunctionofthemaximumcoalitionsizec

0

andthemaximumtolerableFP

andFNerrorprobabilities"

1 ,"

2

. Wedenethefollowingtwoproperties:

Property 1: Wesay that angerprintingscheme that generatesalist of accusedusers has

Property 1for acertainxedvalue "

1

if,for allinnocentusers j,allcoalitionsC withj2=C,and

allcoalition strategies,the following holds:

P[FalsePositive]=P[j2]<"

1

: (25)

Property 2: Wesay that angerprintingscheme that generatesalist of accusedusers has

Property 2 for certain xedvalues c

0 ,"

2

,if, for all coalitions C of sizec c

0

,and all coalition

strategies, itholdsthat

P[FalseNegative]=P[C\ =;]<"

2

: (26)

Ourbound onthecodelengthis anasymptoticresultforc

0

1. Weformulateitasfollows:

Theorem 1: Letthecode lengthmandtheaccusation thresholdZ ofoursymmetric

nger-printingscheme bechosen as

m=Ac 2

0 dln"

1

1

e ; Z=Bc

0 dln"

1

1

e (27)

with"

1

2(0;1]axedparameterand

A=4~ 2

(1+Æ) 2

; B=4~ 1

(1+Æ); (28)

where ~ is dened in (17). Let "

2

2 (0;1] be a xed parameter. For all Æ > 0 there exists a

suÆciently large c

0

suchthat the symmetric ngerprintingscheme has Property 1for parameter

"

1

andProperty 2for parametersc

0 ;"

2 .

Thus,accordingto Theorem1,forlargec

0

acodelength

m>4~ 2

c 2

0 dln"

1

1

e (29)

guaranteesresistanceagainstcoalitionsofsizecc

0 .

In Sections 4.1, 4.2 and 4.3 we present a proof of Theorem 1following the approach of [7],

with minormodications. First,conditionsonA, B and c

0

are derived forachievingProperties

1and 2. Then the lowest valueof A is identied within thespace of allowed parameters. The

valueof~isdeterminedinSection5. Atthispointwealreadymentionthat0<lim

c0!1 ~ <1.

Hence(29)hastheasymptoticbehaviourm=O(c 2

(11)

4.1 Conditions for satisfying Property 1

Weconsideraxedinnocentuserj. Weintroduceanauxiliaryvariable

1

>0thatallowsusto

usetheMarkovinequality,

P[j2]=P[A

j

>Z]=P[e 1 A j >e 1 Z ] E X j [exp( 1 A j )] exp( 1 Z) : (30)

Dueto theindependence ofthecolumnsofXwecanwrite E

Xj [exp( 1 A j )]= n E Xj [exp( 1 A (i) j )] o m

. Inwhat follows, we will alwaysrestrict

1 such that 1 A (i) j 1:7. This

allowsus tousethefollowing(easilyveried)inequality

e u

<1+u+u 2

foru1:7; (31)

sothat wecanwrite

E Xj [e 1 A (i) j

]<1+

1 E Xj [A (i) j ]+ 2 1 E Xj [fA (i) j g 2 ]: (32)

We enforce the restriction

1 A

(i)

j

1:7 for all realisations of the stochastic p, X and y. For

negativeA (i)

j all

1

>0are allowed. ForpositiveA (i)

j

wemusthave

1

<1:7=g

1 (p

y ). As g

1 isa

monotonouslydecreasingfunction,thestrongestrestrictionon

1

occursforp

y =p

min

=t=(q 1).

Hencewerestrict

1

to theinterval(0;1:7=g

1 (

t

q 1 )].

FromLemmas2and3weknowthatE

X

j [A

(i)

j

]=0andE

X j [fA (i) j g 2

]=1forinnocentj; thus

(32)yields E

Xj [e

1A (i)

j

]<1+ 2

1

. Next weapplytheinequality

1+u<e u

foru6=0 (33)

towrite E

Xj [exp(

1 A

j

)]<exp(m 2

1

). Substitutioninto(30)gives

P[j2]< min

12(0;1:7=g1( t q 1 )] e 1(m1 Z) : (34)

Fillingin theexplicitformformandZ (27)into (34)weget

P[j2]< min

1 2(0;1:7=g 1 ( t q 1 )] " c 0 1 (B c 0 A 1 ) 1 : (35)

Theminimumliesat

1

=B=(2c

0

A),providedthat theupperboundon

1

islargeenough. The

condition1:7=g 1 ( t q 1 ) 1

canberewrittenas

c 0 " B

3:4A 2 q 1 T # 1 2 a 1 Tc a 0 q 1 1 2 a " B

3:4A 2 q 1 T # 1 2 a ; (36)

wherewehaveusedtheparametrisationt=Tc a

0

. Substitutonof

1

into(35)gives

P[j2]<" B

2

=4A

1

: (37)

HenceasuÆcientconditionforProperty1tobesatisedisthat(36)holds andthat

B 2

(12)

4.2 Conditions for satisfying Property 2

Westartwithalemmathat helpsustoupperboundtheFNerrorrate.

Lemma 5: LetC beacoalition ofsizecc

0

. We have

P[C\ =;]P[A

C

<cZ]P[A

C <c

0

Z] (39)

Proof: TheeventC\=;impliesA

C

<cZ.

Remark: A

C

<cZ doesnot implyC\ =;. It canhappenthat A

C

<cZ whilesomebody in

thecoalitiondoesgetaccused.

Nextweintroduceanauxiliaryvariable

2

>0that allowsustousetheMarkovinequality,

P[A

C <c

0

Z]=P[e 2AC >e 2c0Z ]< E yXp [exp( 2 A C )] exp( 2 c 0 Z) : (40)

The columns of X are independently generated, and thecolluder strategy is the same for each

column. This allowsus to write E

yXp

[exp(

2 A

C )]= fE

yXp [exp( 2 A (i) C )]g m

. We restrict

2 such that 2 A (i) C

1:7, allowingus to apply inequality (31) to bound the exponential. This

gives E yXp [e 2A (i) C

]<1+

2 ~ + 2 2 (~ 2 +~ 2 ); (41)

where wehaveusedthedenitions (17). Therestriction

2 A

(i)

C

1:7 holdsfor anyrealisation

ofpand X. Thesmallest(most negative)achievablevalueofA (i) C isc 0 g 0 (p max y )=c

0 g

0

(1 t)=

c

0 p

(1 t)=t. Hence theconditionon

2

issatisedfor

2 max 2 =1:7c 1 0 p

t=(1 t): (42)

FromLemma 4weknowthat ~ 2

+~ 2

<qc. Thuswehavefrom(41)

E yXp [e 2 A C

]<(1

2 ~ + 2 2 qc 0 ) m <e m 2 ~ (1 2 c 0 q=~) : (43)

In thelast inequality wehave made use of (33). Substitution of (43)into (40)and minimizing

over 2 gives P[A C <c 0

Z]< min

2 2(0; max 2 ] e

2[m~(1 2c0q=~ ) c0Z]

: (44)

We choose m and Z such that m~(1 max

2 c

0

q=~) > c

0

Z. Hence the minimum in (44)occurs

at

2 =

max

2

. Substitutionof(27)into(44)andevaluation at max 2 gives P[A C <c 0 Z]<"

1:7c 0 p t 1 t [A~(1 1 ) B] 1 ; (45)

where wehaveintroducedthenotation

1 =1:7

q

t

1 t

q=~. TosatisfyProperty2,(45)mustnot

belargerthan"

2

. Hence Property2issatisedif

A~(1

1

) B

2

; (46)

wherewehavedened

2 = p 1 t 1:7c 0 p t ln" 2 ln" 1 : (47)

Notethat theparameters

1 and

2

goto zeroforc

(13)

4.3 Final step in the proof of Theorem 1

WeusetheresultsofSections4.1and4.2 toproveTheorem1. Theconditions(38)and(46)can

berewrittenasanintervalforAsuchthat Properties1and2are bothsatised,

B+

2

~ (1

1 )

A

B 2

4

: (48)

A solutionexists onlyifthe r.h.s. isnotsmallerthan thel.h.s. in (48). Wewishto identify the

smallestvalue of A forwith a solutionexists. This occurs whenthe l.h.s. is equalto the r.h.s.

Solvingthequadraticequationin B gives

B =

4

~

(1+) with := 1+

p

1+

2 ~ (1

1 )

2(1

1 )

1 (49)

A =

B 2

4 =

4

~ 2

(1+) 2

: (50)

Finally, Theorem1 followsby setting theparameterÆ in (28)equalto the expression in (49),

whichgoestozeroin thelimitc

0

!1.

5 The expectation of the collective accusation sum

Aswasshownin Section4,theaveragecollectiveaccusation~playsacentralroleindetermining

thecodelengthmrequiredforcollusionresistance. Inthissectionwecomputethevalueof ~in

therestricteddigitmodel. (OtherattackmodelsarediscussedinSection7.2). Unfortunatelythe

computationsaretedious. Werstderiveageneralresultin Section5.1, forallalphabet sizesq,

allvaluesofthesteepnessparameterandallcolluderstrategies. Thisresulttakestheformofa

(q 1)-dimensionalsumoverallpossiblesymbolfrequenciesbreceivedbythecolluders. Then,in

Section5.2weinvestigatethespecialcase(q=2;= 1

2

),preciselycorrespondingtothechoiceof

parametersofTardos[9](but notthesameaccusationmethod). Itturnsoutthatoursymmetric

accusationmethodyieldsanimprovementofafactor4inthecodelength. InSection5.3westudy

thecaseq=2forarbitrary. It turns outthatfor q=2, thechoice = 1

2

isoptimal, aresult

thatwasobtainedfortheoriginalTardosconstructionin[7]. Finally,inSection5.4,wecomeback

tothenonbinarycaseq>2.

5.1 Sum representation of ~

According to the denition (17), ~ is dened as the expectation value E

yXp [A

(i)

C

]. We follow

the procedure outlined in Section 3.3: We rst compute the expectation value with respect to

the colluderstrategy, then w.r.t. the matrixX

C

and nally w.r.t. the vectorsp (i)

. Since itis

understoodthattheresultsareidentical foreachcolumnofX

C

,wewillomitthecolumnindexi

onthequantities y,pandbfornotationalsimplicity.

Weregardy asa(possiblystochastic)strategy-dependentfunctionofb=

(b

0 ;;b

q 1

)only. Thecolluders'strategycannotdepend onp,sincetheydonotknowp. We

assumethat is notinuencedby thecolluders'identities, i.e. theirdecisionsare purely based

onhowmanyinstancesofeachsymbolwerereceived,notbywhomtheywerereceived. Usingthe

notationintroducedin (15),wehave

E

y [A

(i)

C ]=

q 1

X

=0 P

b ()fb

g

1 (p

)+[c b

]g

0 (p

)g=

q 1

X

=0 P

b ()

b

cp

p

p

(1 p

)

: (51)

Nextweaverageoverbandp. Applying(13)and(10)to(51),weobtain

~ =

X

c

b

q 1

X

=0 P

b ()

Z

J(t;q) d

q

pF(p) q 1

Y

p b

b

cp

p

p

(1 p

)

(14)

Wefurther evaluatetheintegral R

d q

pfort =0. As discussedin Section 3.1, theerrorresulting

fromintegrationoverJ(0;q)insteadofJ(t;q)issmall. Furthermore,wewillseeinSection7.1that

setting t=0 isallowedfor q3in the Gaussian approximation. Firstwesplit theintegration

intotwoparts: p

andtheremainingq 1components

Z J(0;q) d q p= Z 1 0 dp Z 1 p 0 d q 1

pÆ(1 p

X 6= p ): (53)

Note thatthe upperbound onthe secondintegration intervalis reducedfrom1to 1 p

. This

prevents us from directly applying Lemma 1. For all 6= we write p

= (1 p

)s , with s

2(0;1)and P

6= s

=1. Thissubstitutionhasthefollowingeect,

Z J(0;q) d q p = Z 1 0 dp (1 p ) q 2 Z 1 0 d q 1 sÆ(1 X 6= s )

F(p) = N 1 q0 p 1+ (1 p )

( 1+)(q 1) Y 6= s 1+ q 1 Y =0 p b = p b (1 p ) c b Y 6= s b : (54)

HerewehaveusedthepropertyÆ(ax)=jaj 1

Æ(x)forconstanta6=0. Substituting(54)into(52)

andapplyingLemma1totheq 1degreesoffreedoms

weobtain ~ = N 1 q0 X b c b q 1 X =0 P b () Q 6=

(+b

)

(c b

+[q 1])

Z 1 0 dp p b 3 2 + (1 p ) c b 3 2 +[q 1] (b cp ): (55)

Finally,thep

-integralisevaluatedaswell,yieldingordinaryBetafunctions,

~ =

(q)

[ ()] q

cc!

(c+q) X b [ q 1 Y =0

(+b

)

(1+b

)

] (56)

q 1 X =0 P b () (b 1 2 +) (b +) (c b 1 2

+[q 1])

(c b

+[q 1]) 1 2 b c (1 q) :

Herewehaveused(8)forthenormalisationconstantN

q0

. Expression(56)israthercomplicated.

Onepropertyof (56)canbeseeneasily,however: Forc1,the leadingorder termsof~areof

order 1, and do notdepend on c. This is readilyseen by writingb

=cw

, with w

2[0;1],

then applying the Stirling approximation (x+1) p

2x(x=e) x

to all Gamma functions and

collecting powers of c. For the quotients of Gamma functions appearing in (56) we have the

proportionality (b +v 1 )= (b +v 2 ) /c

v1 v2

and (c b

+v

1

)= (c b

+v

2 ) /c

v1 v2 for constantsv 1 ;v 2

c. The sum P

b

givesriseto a factorc q 1

, sinceit canbeapproximatedby

an integral R c 1 d q bÆ(c P b )c

q 1 R 1 0 d q w Æ(1 P w

). The correctionsarising from the

summation terms where thecondition b

1doesnot hold are negligible,since thesupport is

negligiblecomparedto thefullsummation P

b .

Thefact that~hasanitevalueinthelimitc!1showsthattheasymptoticbehaviourof

(27)is givenbym/c 2

0

,withoutfurther dependence onc

0

arisingfrom.~

5.2 The case q =2, = 1

2

This case corresponds to the probability density function in the original Tardos construction,

F(p

0 ;p

1 )/(p

0 p 1 ) 1=2 Æ(1 p 0 p 1

). Notethatforq=2,= 1

(15)

in (56)vanishes. However,~doesnotcompletely vanish,sincefor(q =2;b

=c)the expression

(c b

1

2

+[q 1])is divergentinthelimit! 1

2

. Wehave

lim

!1=2 (

1

2 +) (

1

2

+)= lim

!1=2 (

1

2

+)=1: (57)

Hence, the only terms contributing in the b-sum in (56) are those where b

= c. Because of

themarkingcondition, P

b

()=1forthese terms,asthe coalitiononlysees thesymbol. The

complicatedexpression(56)reducestoaconstant:

~ =

(1)

[ ( 1

2 )]

2 1

X

=0 1=

2

: (58)

Substitution into(27,28)givesthefollowingasymptoticbound onthecodelength,

m> 2

c 2

0 dln"

1

1

e: (59)

This bound is4times lowerthanthe bound obtainedin [7] and 10times lowerthanthebound

in[9].

5.3 The case q =2, 6= 1

2

Nextwestudyhowthesymmetricbinaryschemeperformsfor6= 1

2

. Substitution of q=2into

(56)gives

~

=

(2)

[ ()] 2

( 1

2 )c

c 1+2 c

X

b1=0

c

b

1

B(b

1 1

2

+;c b

1 1

2 +)

1+ 2

c [b

1 P

b

(0)+(c b

1 )P

b (1)]

; (60)

where B denotes theBetafunction. From(60)wecanidentify which colluderstrategy forces

thecontentownerto usethelongestpossiblecode. Wedenotethis`extremal'strategyas

2 . We

remindthereaderthatm/~ 2

. Hence,inordertomaximizem,thestrategy

2

hastominimize

thesummandin(60)foreachb. Notethattheb

1

=0andb

1

=ccontributionstothesummation

arenotaectedbythestrategy. For1b

1

c 1theBetafunctionin (60)ispositive. Hence,

thefactor( 1

2

)infrontofthesummationdeterminestheoverallsignofthestrategy-dependent

contributions. For < 1

2

, thisfactorispositive,so thecolluderswishto minimizetheexpression

[b

1 P

b

(0)+(c b

1 )P

b

(1)]. Theyachievethisbychoosingthesymbolthatappearsmostfrequently,

i.e. byapplying`majorityvoting' tothe0sand1sthat theyreceiveinacolumn. For> 1

2 , the

factor 1

2

hastheoppositesignandtheextremalstrategy

2

isminorityvoting.

Notethat

2

isnotnecessarily thestrategythat thecoalitionactually applies. However,the

distributor has to takeinto accountthat thecolluders could be using

2

, and he has to choose

hiscodelengthmaccordingly. Weareinterestedinthis`extremal'strategybecauseouraimisto

deriveasharplowerbound onm.

Fig.1shows~asafunction offorthestrategy

2

. Thedashedlinecorrespondstothevalue

2=obtainedin theprevioussection. Itisclearthat = 1

2

istheoptimum. Attheoptimumwe

have~=2=,independentofc. Thepartofthecurvewith< 1

2

hardlydependsonc. Thepart

with> 1

2

becomessteeperwithincreasingc.

5.4 Non-binary alphabet

We now return to the general expression for ~ given in (56). We work in the restricted digit

(16)

0.35

0.4

0.45

0.5

0.55

0.6

0.3

0.35

0.4

0.45

0.5

0.55

0.6

κ

µ

~

2/

π

Fig.1: ~as afunction offorq=2,c=80, giventhe`extremal'strategy

2 .

Note that the sum P

P

b

()() in (56)represents an average over . Weobtain alower

boundforthesumfrom thefactthat anaverageisat leastasbigasthesmallestelementin the

summation. Thuswehave

~

(q)

[ ()] q

cc!

(c+q) X

b [

q 1

Y

=0

(+b

)

(1+b

)

] (61)

min

jb

6=0

(b

1

2 +)

(b

+)

(c b

1

2

+[q 1])

(c b

+[q 1])

1

2

b

c

(1 q)

:

As wehave assumed the restricteddigit model, the minimum is takenonly overthose symbols

thatthecolludershavereceived.

Eq.(61)allowsusto identify the `extremal'colluder strategy

q

, which forcesthe distributor

to use the largest code length m. For each b separately, the colluders choose such that the

expressionfollowing`min

'isminimized.

Forq 10 and axed coalition size c = 20wehave numericallycomputed ~ as afunction

of for the

q

strategy, i.e. taking the equality in (61). For large q and c the numerics are

computationallyexpensive,sincethenumberoftermsintheb-summationisoforderc q 1

. Fig.2

shows ~ as a function of the steepness parameter. For q 7the maximum of the curve lies

slightly to the right of = 1=q. For q 8 an extra hump is visible. The hump is a `nite c

eect';itdoesnotexistwhentheratioq=cissmall. Fig.3showshow~varieswhencisincreased:

Thepartof thecurveat <1=qis unaected, whilefor >1=q thecurvegoesdownwardand

convergestoanitevalue.

Weusethenumericalresultsfor~to estimatetherequiredcodelength(m/~ 2

). Wegive

estimatesfortheadvantagethataq-arycodegivesoverthesymmetricbinarycodewith=1=2.

Thecomparisonwiththebinarycasecanbedoneinseveralways,dependingonthedetailsofthe

watermarkembedding. Wegivethetwoextremecomparisonmethods:

1. Countingthenumberofsymbols. Aq-arysymboloccupiesasmuchspaceinthecontentasa

binarysymbol,regardlessofq. Fig.4showsthe

q arycase

binarycase

ratioforthenumberofsymbols.

Thisratioisgivenby4=( 2

~ 2

).

2. Countingthenumberofbits. Aq-arysymboloccupieslog

2

qtimesmorespaceinthecontent

thanabinarysymbol. Inthiscaseitisnotfairtocomparecodelengthexpressedinsymbols.

Onehastocountbits. Fig.5showsthe

q arycase

binarycase

ratioforthenumberofbits. Thisratiois

givenbylog q4=( 2

~ 2

(17)

0.1

0.2

0.3

0.4

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

c=20; q=3...10

κ

q=3

q=10

µ

~

Fig.2: ~asafunctionofforseveralalphabetsizesq. Thecoalitionsizeisc=20. Thecolluders

employthe`extremal'strategy.

0.15

0.2

0.25

0.3

0.35

0.4

0.8

0.9

1

1.1

q=6; c=20,30,40,50,60,70,80

κ

c=20

c=80

µ

~

1/q

Fig.3: ~asafunctionofforq=6atseveralcoalitionsizes. Thecolludersemploythe`extremal'

strategy. Thedashedhorizontallineliesat(2=) p

log

2

6. When~liesabovethisline,the

(18)

0.1

0.2

0.3

0.4

0.2

0.4

0.6

0.8

1.0

c=20; q=3...10

κ

relative #symbols

q=3

q=10

Fig.4: Number of symbols in the codewords, relative to the binary case, for several alphabet

sizesq. Thecoalitionsize isc=20. Thecolludersemploythe`extremal'strategy.

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.6

0.8

1.0

1.2

1.4

c=20; q=3...10

κ

relative #bits

q=3

q=10

Fig.5: Numberofbits in thecodewords,relativeto thebinary case,forseveral alphabet sizesq.

(19)

Type1isthemostoptimisticcomparisonpossible,in thesensethat itallowsforthelargest

im-provementsw.r.t. thebinaryscheme. Type2comparisonisthemostpessimisticpossible. Without

givingafullargument,westatethatinthecaseofvideowatermarkingtype1ismoreappropriate,

evenforlargealphabets. When,forinstance,symbolsareembeddedusingaspread-spectrum

wa-termark, whereeach spreadingsequence correspondsto adierentsymbolin thealphabet, then

thesegmentlengthcanbekeptalmostindependentofqwithoutdecreasingdetectionperformance.

Forcompletenesswegivetheresultsforbothcomparisons. ThehorizontaldottedlineinFig.2

indicates the threshold for comparison of type 1. When ~ risesabove this threshold, the q-ary

schemeneedsfewersymbolsthanthebinaryscheme. Thethickpieceofeachcurveindicates the

regionwhere the q-aryschemeis better thanthe binary, using comparisontype2. Fig. 4shows

thecodelengthm(thenumberofsymbols)asafunctionof,foranumberofqvalues,andFig.5

similarlyshowsmlog

2

q,thenumberofbits. Bothgraphshavetheirverticalaxisnormalisedsuch

thatlengthsaredividedbycorrespondinglengthsinthebinaryscheme. Inbothgraphsthenite-c

humpsarevisible. Not takingthehumpsintoaccount,weseethat for3q10thenumberof

symbolsisreducedby40%{80%w.r.t. thebinarycase,whilethereductioninthenumberofbits

is 11%{30%. Finite-c eects further improvethese results. Weconclude that in our symmetric

schemeitisadvantageoustousethelargestpossiblealphabetallowedbythewatermarkingmethod

employed.

6 The Gaussian approximation

6.1 Motivation

In this section we analyse the performance of the symmetric scheme using what we call the

`Gaussianapproximation'. Bythiswemeantheassumptionthat theaccusationA

j

forinnocent

j hasaGaussianprobabilitydensityfunction. Theassumptionismotivated bytheCentralLimit

Theorem(CLT):whenalargenumberofi.i.d. variablesaresummed,thedistributionofthesum

converges to the normal distribution. The CLT applies when the moments of the summands'

distribution meetcertainconditions. Themomentsalso determinetherateofconvergenceto the

normalform.

Theaccusation A

j

is computed by taking thesum overm independent accusations, each of

which is based on a single symbol y

i

in the unauthorized copy. All the separate accusations

havethesameprobabilitydistribution. Thenumberofsymbols,m,islargeenoughtoguarantee

`suÆcientlyfast'convergenceto thenormalform. This informalstatementismade moreprecise

in Appendix C, where we derivea lowerbound on c

0

asa function ofq. Whenc

0

isabovethis

bound,the deviations from thenormal form become `small enough'in the central region of the

A

j

-distribution function. It turns out that the bound approximatelylies betweenc

0

=10 and

c

0

=20. Henceconvergenceisfastenoughin manypracticalsituations.

InSection6.2weanalysethesymmetricschemeundertheassumptionthatA

j

hasaGaussian

distribution. Weobtainalowerboundonmthatisafactor2smallerthanTheorem1.

Inthediscussionof theCLTin Appendix Citturns outthat forq3thecutoparameter

t can be sent to zero without causing any divergences. The cuto parameter is discussed in

Section7.1.

6.2 Lower bound on the code length

Theorem 2: Let A

j

have a Gaussian probability density. For all Æ > 0 there exists a

suÆciently large c

0

suchthat Property 1 issatised for parameter "

1

andProperty 2is satised

for parametersc

0 ;"

2

whenthe code lengthis

m>2~ 2

(1+Æ)c 2

0 ln"

1

1

: (62)

Notethatthiscodelengthisafactor2lowerthantheoneinTheorem1. Werstgiveaninformal

(20)

µ

m

1/2

/c

~

1

Z/m

1/2

σ

~

/c

ε

1

ε

2

0

Fig.6: Sketchof theprobabilitydensityofA

j =

p

m(left) and 1

c A

C =

p

m(right). Theaccusation

thresholdZ andtheerrorrates"

1 and"

2

arealsoshown.

Informalargument: IftheprobabilitydensityofA

j

isknown,then that knowledgeallowsus

tocomputetheFPandFNerrorratesasafunctionof~

j ,~

j

,,~ ,~ mandZ. Thisissketchedin

Fig.6. TheleftcurveistheprobabilitydensityofthequantityA

j =

p

m. Ithasmean~

j

=0and

variance~

j

=1(seeLemmas 2and3). Theerrorrate"

1

is givenbytheareatotherightofthe

(rescaled)thresholdZ = p

m. Therightcurveistheprobabilitydensityofthequantity 1

c A

C =

p

m.

It has average 1

c ~ p

m and variance =c.~ The error rate "

2

is given by the area to the left of

Z = p

m. Thehorizontalaxisisscaled such thattheA

j

-curvedoesnotdepend oncand m. Ifwe

set ourselvesthe goalof having xed error ratesfor arbitrary c, twoobservations canbe made

fromFig.6:

Inordertohaveaxed"

1

forallc,thethresholdlineZ = p

mmustnotshift. HenceZ must

bechosenasZ/ p

masfarasthedependence oncisconcerned.

Whenc increases,therightmostcurvebecomesnarrowerandshiftstotheleft. Inorderto

prevent"

2

fromvanishing,mmustbechosenproportionalto c 2

.

Fromthisinformalargumentweobtaintheproportionalitym/c 2

0

in(62),butnottheconstants

andthelogarithmicdependence on"

1 .

ProofofTheorem2: Let

1 and

2

bethedensityfunctionsofA

j andA

C

,respectively,rescaled

suchthattheybothhavezeromeanandunitvariance. Wedenecumulativedistributions inthe

tails,

G

1 (x)=

Z

1

x dx

0

1 (x

0

) ; G

2 (x)=

Z

x

1 dx

0

2 (x

0

): (63)

Lemma 6: Inordertoachieve aFalsePositive errorrate"

1

andaFalse Negativeerrorrate

"

2

againstany coalition of sizecc

0

,itissuÆcienttoset thecode lengthm accordingto

mc 2

0

~

j

~

G inv

1 ("

1 )

2

1 ~

c

0 ~

j G

inv

2 ("

2 )

G inv

1 ("

1 )

2

(21)

Proof: Theproofis completelyanalogoustothederivationinSection 3.4of[7].

NotethatG inv

2 ("

2

)<0. Notefurther thatthedependence of(64)on"

2

vanishesin thelimit

c

0

!1. (Rememberthat~

j

=1accordingtoLemma 3and that~=O( p

c

0

)asaconsequence

ofLemma4). IfA

j andA

C

haveGaussiandistributions,thenG

1 andG

2

areerrorfunctions 2

and

we have G inv 1 (" 1 ) = p 2Erfc inv (2" 1 ), G inv 2 (" 2 ) = p 2Erfc inv (2" 2

). Substitution into (64), using

~

j

=1,gives

m 2 ~ 2 c 2 0 h Erfc inv (2" 1 ) i 2 ( 1+ ~ c 0 Erfc inv (2" 2 ) Erfc inv (2" 1 ) ) 2 : (65)

Intheregime"

1 "

2

,whichistherelevantregimefore.g. moviedistribution,thedependenceof

(65)on"

2

isratherweakevenfornitec

0

,sinceErfc inv

("

2

)<Erfc inv

("

1

). Weusetheasymptotic

form of theinverse errorfunction forsmall arguments,Erfc inv (")= p ln" 1 [1 O( lnln"

1 ln" 1 )], to write m 2 ~ 2 c 2 0 ln 1 " 1 1 O( lnln"

1 1 ln" 1 1 ) (

1+O 1 p c 0 s ln" 1 2 ln" 1 1 !) : (66)

Forlargec

0

theresult(62)follows 3

.

Henceforlargeenoughc

0

acodelengthm=2~ 2 c 2 0 ln" 1 1

suÆces. Thisisshorterbyafactor

2thantheresultobtainedin Section4.

7 Discussion

7.1 The cuto parameter t

In this sectionwediscuss the the eects of thecuto t =Tc a

0

introduced in Section 2.2. The

probabilitiesp

liein therestrictedinterval[t=(q 1);1 t]. It isclearfrom Section 4that the

presented proofof Theorem 1doesnotworkfor t=0. InthelimitT #0, theallowed intervals

fortheauxiliaryvariables

1 and

2

(34,42)vanish,whilebothintervalsneedtobeniteforthe

proofthatProperties1and2aresatised.

ThespeedoftheconvergencetotheasymptoticresultA=4=~ 2

,B=4=~dependsontheway

in which the parametersa 2(0;2) and T are chosen. The small parameters

1 and

2

(45,47)

asymptoticallybehaveas

1 1:7 q ~ p T c a=2 0 ; 2 ln" 2 ln" 1 1 1:7 p Tc 1 a=2 0 : (67)

Furthermore,condition(36),necessaryforProperty1tohold,canbewrittenas

c 0 ' q 1 T ( ~ 3:4 ) 2 1=(2 a) : (68)

Forpracticalreasons, wewish both

1 and

2

to becomesmall at areasonablylow valueofc

0 ,

whilethebound(68)alsoshouldnotbetoohigh. However,inthelimitT #0,boththec

0 -bound

(68) and the expression for

2

in (67) diverge. Hence, when t tends to zero, the approach of

Sections4.1{4.3,basedontheMarkovinequality,canproveProperties1and2onlyforextremely

largec

0 .

TheroleofthecutotiscompletelydierentintheanalysisusingtheGaussianapproximation.

2

To avoid ambiguitiesdueto conicting denitions inthe literature, wementionthat we usethe denition

Erfc(x)=1 (2= p ) R x 0 e u 2 du. 3

ForprovingTheorem2wedonothavetoassumethatA

C

hasaGaussianform.The"2-termin(64)vanishes

forallfunctionsG inv

2

becauseofthefactor~=c0=O(1= p

c0). However,thecomputationofA

C

involvesevenmore

summedcontributionsthanA

j

,soitissafetoassumethatwhenA

j

isGaussian,thenA

C

(22)

Thecaseq=2. Itwasshownin[7]fortheoriginalTardosschemethattheCLTcanonlybe

applied ift>0. Theprobabilitydistribution oftheaccusation U

i

(for innocent users)due

to thesymboly

i

isproportionalto1=(1+U 2

i )

2

. The3rdmomentiszero. Fordistributions

with vanishing 3rd moment, the CLTonly holds when the 4thmoment does not diverge.

However,for t=0the4th momentdoes diverge. Hence we needt>0. Exactly thesame

reasoningappliestothesymmetricschemewithq=2.

Thecaseq3. For q 3, the 3rd moment of the probability distribution of A (i)

j (for

innocentj)isalwaysnonzero,nomatterwhatthevalueoftis. ThisisshowninAppendixC,

Eq.(74). HencetheCLTappliesevenifwesett=0. IntheGaussianapproximation,there

isnoreasontohaveacutotforq3.

7.2 Dierent attack models

Uptothispointwehaveonlyconsideredtherestricteddigitmodel. However,itiseasytoobtain

resultsfortheother attackmodelslistedin Section1.2. As canbeseenfrom (64),thebound on

thecodelengthisproportionalto~ 2

j =~

2

. 4

Thedierencesbetweenthevariousattackmodelsgive

rise to dierent values ~

j

, ,~ but the form (64) is independent of the attack model. Hence, in

ordertosee thedierencesbetweentheattackmodels, itissuÆcientto comparetheratio~

j =~.

The unreadable digit model is discussed in Appendix A. It is assumed that the colluders

output anerasuresymbol`?' whenever theycan,and that thedistributor giveszeroaccusation

to locationswith anerasure. It turns outthat forlarge alphabets (q '7) the colluderstrategy

ofoutputtingerasuresisgood,and thedistributorhastouselongercodesthanin therestricted

digit model. However,for small alphabets it isbetter forthecolluders notto usean erasureat

eachdetectableposition,asa`?' informsthedistributorthat thepositionisdetectable.

Results for the arbitrary digit model are derived in Appendix B. Unsurprisingly, with this

attack model a nonbinary scheme always performs worse than the symmetric binary scheme;

the colluders have ample opportunity to incriminate innocent users while avoiding accusation

themselves.

8 Summary

Inthis paperwehaveproposed anewconstructionforarandomizeddigitalngerprintingcode,

whichissimilartoarecentconstructionbyTardosbutcanbeusedwitharbitrarysizealphabets.

Wehaveanalyzedtheperformanceofourscheme,in therestricteddigitmodel,in twoways.

First,wehaveprovedalowerboundonthecodelengthmsuchthatthedesiredFalsePositive

andFalseNegativeerrorprobabilitiesareachievedagainstanycoalitionofsize cc

0

. Due toa

dierentwayofcomputingaccusations,theproposedcodeallowsfor10timesshortercodes(with

respectto [9])inthecaseofabinaryalphabet. Movingto acodeoveraq-aryalphabetallowsa

furtherreductionofthecodelengthof35%atq=3and80%at q=10.

Second,wehaveanalyzedourschemeundertheassumptionthattheaccusationsumA

j follows

aGaussian distribution. This `Gaussianapproximation'is valid at coalitionsizes c

0

of

approxi-mately 10{20and larger. Wehaveshown that,in this approximation,thecollusionresistanceof

theschemeisretainedforacodelengthmthat istwiceasshortasthebound obtainedusingno

assumptions.

Acknowledgements

WekindlythankJoopTalstraandHenkHollmannforvaluablecomments,andGuidoJanssenfor

providingus withliteraturereferences.

4

Theorem1isformulatedafterthesubstitution~!1hasbeendone. IfapplicationofLemma3ispostponed

intheanalysisinSection4,wegettheproportionalityto~ 2

(23)

References

[1] G.E. Andrews, R. Askey, and R. Roy. Special Functions. EncyclopediaofMathematics and

itsApplications.CambridgeUniversityPress,1999.

[2] M.Z. Bazant. Random walks and diusion. Technical report, MIT Lecture notes, http:

//www-math.mit.edu/18.366/lec/lec04.pdf,2005.

[3] D.BonehandJ.Shaw.Collusion-securengerprintingfordigitaldata. IEEETransactionson

Information Theory,44(5):1897{1905,1998.

[4] L. Devroye. Non-Uniform Random Variate Generation. Springer-Verlag, 1986. Available

onlineat http://cg.scs.carleton.ca/luc/rnbookindex.html.

[5] H.D.L. Hollmann,J.H.vanLint,J-P.Linnartz,and L.M.G.M.Tolhuizen. Oncodeswith the

identiableparentproperty. JournalofCombinatorial Theory,82:472{479,1998.

[6] C. Peikert, A. Shelat, and A. Smith. Lower bounds for collusion-secure ngerprinting. In

Proceedingsofthe14thAnnualACM-SIAMSymposiumonDiscreteAlgorithms(SODA),pages

472{478,2003.

[7] B.

Skoric,T.U.Vladimirova,M.Celik,andJ.C.Talstra.Tardosngerprintingisbetterthanwe

thought. Technicalreport,Submitted toIEEETransactionsonInformationTheory.Preprint

at arXivrepository,http://www.arxiv.org/abs/cs.CR/0607131,2006.

[8] J.N. Staddon, D.R.Stinson, andR. Wei. Combinatorial properties of frameproofand

trace-abilitycodes. IEEETransactions on InformationTheory, 47(3):1042{1049,2001.

[9] G. Tardos. Optimalprobabilistic ngerprintcodes. InProceedings of the 35th Annual ACM

SymposiumonTheoryof Computing(STOC),pages116{125,2003.

A Unreadable digit model

In this appendix weconsider the case of the unreadabledigit model. Inthis attack model, the

colludersareallowedtooutput theerasuresymbol`?' in detectable positions. Forsimplicitywe

maketwo assumptions: (i) The colluders generate an erasure whenever they can, and (ii) The

distributorgiveszeroaccusationin caseofanerasuresymbol.

Thequantities~

j

and ~ arebothaected by these assumptions. Theyare easilycomputed,

sinceall detectable positions (leading to `?') are discardedby the distributor. This leavesonly

theundetectablepositions,characterizedbyvectorsbthat consistofq 1zerocomponentsand

onecomponentequaltoc. Wehave

~ 2

j =

q 1

X

=0 Z

J(0;q) d

q

pF(p)p c

=q

(q) (c+)

() (c+q)

(69)

and

~ =

q 1

X

=0 Z

J(0;q) d

q

pF(p)p c

cg

1 (p

)=cq

(q) (c 1

2 +) (

1

2

+[q 1])

() ([q 1]) (c+q)

: (70)

Recallfrom (64)thatthe requiredcodelengthis proportionalto~ 2

j =~

2

. Using(69)and (70)we

obtain

~ 2

j

~ 2

= 1

qc 2

(c+) (c+q)

[ (c 1

2 +)]

2 ()

(q)

([q 1])

( 1

2

+[q 1])

2

c

1+[q 1]

q

()

(q)

([q 1])

( 1

+[q 1])

2

References

Related documents

In conclusion the use of the 1-norm and the Total Variation are valid tools in the formulation of the minimization problem for the image reconstruc- tion resulting from

10G Ethernet Links Host (UDP) PHY PHY SRAM Controller Datagram Session IP Processing Host Application Algo-Logic C++ API Networking Stack OS Kernel Broker Exchange FPGA

In this work, we have studied the possibility to use, in realistic VANET urban scenarios, the available bandwidth estimator (ABE) in the AODV routing protocol (called AODV- ABE),

The Annual Property Review (APR) goes beyond overseeing normal maintenance. A Silver State Realty &amp; Investments representative performs the APR, and the purpose is to provide

81 The petition asked to cancel the decision from the date 31/1/2012, where the “citizens” of South Sudan were demanded to leave Israel by the April 1, 2012; to extend the issuance

The matrix Box-Cox model of realized covariances (MBC-RCov) is based on transformations of the covariance matrix eigenvalues, while for the Box-Cox dynamic correlation

The only difference among the Internet patent subtypes was that Internet business methods are much more likely to cite popular press as non-patent prior art than Internet

II – SWD, females and males, and Diptera collected in traps during the monitoring period, May-June 2013, on sweet cherry orchards in Lari (Pisa) areas.... tion/fruit between 0.2 and