Explanation of the algorithm - Fingerprinting codes for databases/hashtables

5.5 Fingerprinting codes for databases/hashtables

7.1.2 Explanation of the algorithm

To reason the steps of the tracing algorithm above, this section contains some explanations and an example.

Preparation: Figure 9 illustrates the creation of the ranking vector V with the example of an 8 bit fingerprint. The probability vector p provides a probability of p4 = 0.01 for a ’1’ in

position i= 4. In the same position the attacked fingerprint y = y1, ... y8 contains a y4= 1. By

equation (9), k(4) is 0.01 as well. Because this value of k is closest to zero, the 4th column of the fingerprint matrix X is ranked highest and set as first position of ranking vector V .

Following the example of figure 9, we focus on the 4th column of the7× 8 fingerprint matrix X in figure 10. The symbols in the third and fifth row, in fingerprints 3 and 5 respectively, equal the corresponding symbol in the attacked fingerprint y of figure 9. Hence its corresponding row indexes 3 and 5 are the entries in the first row of the ranking matrix R. Which index is sorted in first is depending on its accusation scores. The fingerprint with the larger accusation score is more suspicious and for this reason favored.

Note that R is not an ordinary matrix at all, it is a list of vectors or rows containing indexes of the corresponding rows of X. Each row may contain a different number of entries.

The number of rows of R, i.e. the number of positions of V that are considered, depends on the demands of the actual distributor and hence on the code length, the number of fingerprints and on the error probabilities. It also depends on the attack strategy the colluders have chosen. Core tracing algorithm: For a small number of fingerprints contained in the fingerprint matrix X the first rows of R frequently consist of solely one entry each. This is because of the bias generation of the fingerprints, see chapter 4.5. The closer the probability value p_i is to zero or one, the higher the probability that the entries of the corresponding column i of X all have the same symbol but one. Obviously, the less entries a column provides, the higher the probability of this event. In such an event, i.e. if there is a column in X consisting of e.g. one ’0’ and all other entries contain ’1’s, and the corresponding position in y carries a ’0’ as well, the corresponding fingerprint must be a colluder-fingerprint4. These fingerprints can directly be output by the tracing algorithm with a zero false positive error rate.

This is because the probability density function is required to have a high bias towards 0 and 1, such as in equation (5). The probability for a symbol different than the others in one specific position is very low, but on the other hand, the probability that this happens at least one times throughout the whole length of the fingerprint is comparably high.

The larger the collusion, the more fingerprints can lead to such an event, i.e. the higher is this probability. This means, if the colluders chose the minority vote attack strategy for the manipulated copy with attacked fingerprint y, there might occur a symbol that directly and undoubtedly leads to one of the colluder-fingerprints. This frequently happens also in case of a random attack strategy.

Remember that in a real watermarking application, the colluders are not able to simply choose a symbol, they only compare between different values of frequency bands, pixel values, etc., and choose corresponding values of frequency bands, pixel values, etc., to create the forgery. The resulting manipulated fingerprint y is not categorical altered in all detectable positions. This

0.9 0.8 0.2 0.01 0.3 0.6 0.99 0.9 p= 1 0 0 1 0 1 1 0 0.9 0.2 0.8 0.01 0.7 0.6 0.99 0.1 4 8 2 6 5 3 1 7 y= k= V=

V(1)=4: Column 4 has highest ranking -> is considered first

Figure 9: Example for creation of the ranking vectorV ¨ ¨ ¨ 0 ¨ ¨ ¨ ¨ ¨ ¨ ¨ 0 ¨ ¨ ¨ ¨ ¨ ¨ ¨ 1 ¨ ¨ ¨ ¨ ¨ ¨ ¨ 0 ¨ ¨ ¨ ¨ ¨ ¨ ¨ 1 ¨ ¨ ¨ ¨ ¨ ¨ ¨ 0 ¨ ¨ ¨ ¨ ¨ ¨ ¨ 0 ¨ ¨ ¨ ¨ X(j,i) j=3 j=5 5 3 ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ R= -> R(1)=(5,3) -> if fingerprint 5 has a greater accusation score than fingerprint 3 -> it is more suspicious

Figure 10: Example for creation of the ranking matrixR

fact enhances the probability of a direct identification of colluder-fingerprints after the first step

1: Single entries ofR, consequently this example actually mirrors a realistic scenario.

On the other hand, in case the colluders followed an attack strategy alike the majority vote attack, a direct zero false positive output after the first step is not possible. This is because the majority vote decision by construction prevents from such events described before. However, for the same reason, i.e. because the majority of colluders contribute to y in every position, the first rows of R= R(1), R(2), ... are likely to include all colluder-fingerprints. According to step 2: Indexes in each row of R, if the fingerprint that appears most in R also gets the highest accusation score, this fingerprint can be output for participating in the collusion with a very high probability. Note again that because of the small distance to the attacked fingerprint y, as well as the large value for the accusation score that an attack strategy alike the majority vote attack brings about, we assume that most attackers presumably prevent from this strategy. Due to what was carried out in the afore paragraphs, the first and second steps are are capable to output at least one colluder-fingerprint with a high probability. The next steps concern the cases in which tracing of one colluder (or a few) is not sufficient, or for the cases in which up to this point no clear identification of a colluder-fingerprint has been possible. A closer look at

R discloses the following. It is granted that at least one index of a colluder fingerprint appears

in each row of R. Hence, for a majority vote attack strategy the first rows of R are likely capable of including all indexes if the colluder fingerprints. Consequently, following step 3: Reconstruct

the collusion, a rebuilding of y with the corresponding fingerprints is possible even for larger

collusion sizes.

Optionally, a deeper successive search through R can be applied by step 4: Successive search

through R. A closer look on R reveals options to further reduce the number of fingerprints

to be considered for following tracing algorithms. For the case where one or more colluder- fingerprints have been found during the first step(s), we compare y to the exposed colluder- fingerprints and create the new vector y0 by discarding all positions where the exposed finger- prints and y equal. This new vector now declares which rows of R may be discarded (these are the ones generated according to a position that is no longer appearing in y0) to generate

R0. In addition y0 also is the new vector to which we compute new accusation scores. Thereby we change the prior accusation scores by subtracting by accident high amounts that were as- sociated to fingerprints that did not partaken in the collusion. All in all we now only consider

Table 8: False positives (FP) and false negative (FN) results forc₀= 3 after step Single entries of R for minority vote attack and random vote attack

code length attack strategy FP FN (T= {;}) T = {c₁} T = {c1, c2} T = {c1, c2, c3}

135 minority vote 0 20.5 % 43.4 % 29.5 % 6.4 %

random vote 0 45.6 % 41.2 % 12 % 1.1 %

100 minority vote 0 26 % 44.8 % 24.7 % 4.3 %

random vote 0 51.5 % 38.4 % 9.2 % 0.7 %

the positions of the attacked fingerprint y that the already found are not responsible for. The size of R is further reduced, the number of fingerprints to be considered is reduced respec- tively, and the ones within R0 provide accusation scores that likely lead to the remaining the colluder-fingerprints.

These steps whether constrict the number of fingerprints to be considered or in many times already disclosed one or more of the colluder-fingerprints. If rebuilding of y is still too costly, it is possible to add another – usually cost intensive – tracing algorithm here. The effort for this algorithm should be reduced enormously by now, because due to the already filtered suspects, it has to consider decisive fewer fingerprints.

In document Collusion Secure Fingerprint Watermarking (Page 83-85)