Watermarking using Imperceptible Trigger Patterns

6: frog 9: truck

+

=

4: deer 7: horse Watermarked DNN Regular DNN

+

=

message mark 6: frog 9: truck 4: deer 7: horse 6: frog 9: truck 4: deer 7: horse

Figure 7.2: Overview of the proposed example watermarking technique on DNN based image classifiers.

In this section, we show an exemplary approach to watermarking DNN image classifiers. Our approach is by no means the best under the proposed framework. In designing this approach, we put straightforwardness as our top priority. That said, a high-level description of the approach is as follows. Alice creates a trigger pattern of the same size of the input images using her signature and embedded it into images. The trigger pattern is so imperceptible that regular DNNs will classify the image to its true class regardless of whether the trigger pattern has been added. A DNN

watermarked by Alice, however, is able to recognize images embedded with the trigger pattern and classify them to a different class than the original true class. Figure 7.2 depicts the idea.

Now we explain the method in more details. We start by introducing the procedure for both embedding a watermark and detecting a watermark. Then we break down the embedding procedure and present them respectively.

7.3.1 Workflow

To embed a watermark, the model owner Alice would need to do the following: 1. Create an n-bit signature.

2. Create trigger pattern m of suitable magnitude α based on the n-bit signature 3. Calculate the mapping for class labels based on the n-bit signature

4. Fine-tune an existing model f . While fine-tuning, we use, with an equal probability, both the original dataset and the dataset containing trigger images and remapped labels.

After the training converges, the resulting model will classify an input image added with trigger patterns to a different label according to the mapping.

To detect a watermark, the model owner Alice would need to do the following: 1. Take a set of images.

2. Create trigger pattern m of magnitude α based on the n-bit signature. Add the trigger pattern to all the images.

3. Calculate the mapping for class labels based on the n-bit signature 4. Take the watermarked model fWMK

, and run classification on images both with and without the trigger pattern m.

If the fWMK_{classifies original images to the correct label, and trigger images to the correctly mapped}

label within a certain margin of error, then we show that fWMK

7.3.2 Trigger Pattern Creation

In this approach, Alice has an n-bit signature which will be embedded into n pixels in the images. Alice generates the signature by hashing a message that proves her as the author. The next step is to use the signature as the key to a pseudorandom random permutation (PRP) that remap label k to any of the remaining K − 1 labels. This is referred to as the mapping of the classes φy(y).

After that, the signature is used as the key to select the location of the n pixels. The signature will be directly added to the pixels. A positive one indicates a “1” in the signature, and a negative one indicates a “0” in the signature. The resulting pattern is essentially the trigger pattern m. The procedure of embedding can be described as φX(X) = X + αm where α is referred to as the

magnitude of the trigger pattern.

7.3.3 Optimal Trigger Pattern Magnitude

Recall that Alice wants to add m to test images so that regular models would still correct classify them but the watermarked model would not. Given a signature, if the magnitude is too large, then regular models would classify them incorrectly. If a watermark is too small, then the training won’t be able to converge on both the embedded dataset and the regular dataset. In this subsection, we discuss the effect of message length (number of bits) and watermark magnitude on the performance of regular models. Further, we show an algorithm to search for the optimal magnitude given the length of m.

Figure 7.3 shows a study based on a VGG16 model [162] trained on the CIFAR10 dataset [116]. We add trigger patterns of different length and magnitude to the training set. Then we test the classification accuracy of the model which originally achieves 99.97% accuracy on the training set. The x-axis shows the length of the message in the number of bits, and the y-axis shows the magnitude normalized by the standard deviation of all the pixels in the dataset. Each accuracy drop value is the mean of 5 independent experiments. As expected, shorter messages can tolerate larger magnitude. Given a certain length m, it is natural that larger magnitudes are easier for a watermarked model to recognize. Empirically we do find that if the trigger pattern is too small, the watermarked model won’t be able to converge in training. Therefore, given a certain message

Figure 7.3: Classification accuracy drop of a regular VGG16 model when trigger patterns of different lengths and magnitudes are added to the CIFAR10 training set.

length, it would be desired to find the largest possible magnitude α that is not detectable by a regular model.

We now lay out our binary search based algorithm to search for the optimal magnitude αopt. We

define αoptas the largest possible magnitude to be added to the dataset such that the classification

accuracy of the regular model f drops by no more than ∆A. Algorithm 8 describes the algorithm in more details.

Algorithm 8 Get Optimal Trigger Pattern Magnitude Using Binary Search

Input: regular DNN f , trigger pattern m, training set Dtrain_{, resolution ∆α, accuracy drop ∆A}

Output: optimal trigger pattern αopt

1: Initialize αminand αmax

2: A0 ← classification accuracy of f on Dtrain

3: α0 ← αmin, α ← αmax

4: while α0− α > ∆α do

5: A ← classification accuracy of f on Dtrain_αm 6: if A0− A < ∆A then 7: α0 ← α, αmin ← α 8: αopt ← α 9: α ← α+αmax 2 10: else 11: α0 ← α, αmax ← α 12: α ← α+αmin 2 13: end if 14: end while 15: return αopt

(a) (b) (c)

Figure 7.4: Examples of trigger inputs. (a) a random out-of-distrbution image [2], (b) a regular image with a logo [3], (c) a regular image with a color-coded key not perceptible by the eye [4]. 7.3.4 Training Model

Once we obtain the desired magnitude α and trigger pattern m, we are ready for the final watermarked model fWMK

. To obtain the watermarked model fWMK

, we start by fine-tuning the existing model f . With a 50% chance, we embed the trigger pattern to any of the training samples during fine-tuning.

In document Learning-Based Techniques for Energy-Efficient and Secure Computation on the Edge (Page 157-162)