Why sparse feature attack for the adversary?

3.3 Experiments

3.3.2 Why sparse feature attack for the adversary?

In this section we illustrate three main insights with experimental validations of why a rational adversary will apply sparse feature attack.

3.3.2.1 A more realistic behavior

With the help of the USPS digit data set we demonstrate why a rational adversary is likely to mount a sparse attack. We select digit 7 and 9 to be the binary classification data set. Now we assume an adversary is able to control and manipulate 7 so that it will be misclassified. We first train a regular classifier with `2regularizer, then the two types

of attacks (`1and `2) are performed on the data set. We show the misclassified 7 from

each of the two types of attacks in Figure 3.1.

An example of the original digit 7 and 9 is shown in Figures 3.1(a,d). Under a sparse feature attack only one pixel is modified (Figure 3.1c) and that is sufficient to misclassify the image. It turns out that the pixel manipulated by the adversary is the most important feature to distinguish between 7 an 9. On the other hand a dense feature attack is shown in (Figure 3.1b) results in several pixels being modified but with a lower intensity. Thus with a sparse feature an adversary is able to make minimal observablechanges to the “spam” and circumvent the spam filter.

3.3.2.2 Leads to better classifier

Here we study the relationship between the performance of a game-theoretic classifier with varying attack strength, i.e. Cost as defined in the model. When we train a classifier on an tempered data set, in our case manipulated positive samples, it is nec- essary to ask how much will the new classifier differ from a regular classifier trained on original data set? To answer this question, we evaluate the classifiers using the false negative to false positive rate. In the Non-Zero sum game model, we have assumed that an adversary will manipulate the positive data so that it will go across the classification boundary. Therefore, the classifier learned in this case will move the classification boundary backward to the direction of negative samples. Thus, we would expect a classifier with lower false negative rate, at the same time, a higher false positive rate. The attack strength decides how far the boundary will move towards the negative samples. Thus we can vary the attack strength and examine the false negative rate and false positive rate. The model with a lower false negative rate is preferred. In this experiment we

30 CHAPTER 3. ADVERSARIAL LEARNING

test the Non-Zero sum game with classifier using `2norm, while the adversary uses `2

or `1norm. We select the first 400 samples from the Malinglist data set as training data,

which represent the older emails. For test data, we select the last 4000 samples from the data set. As shown in Figure 3.4, with the same false positive rate, classifier with `1

0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14

False Positive Rate

False Negative Rate

Game_(ℓd 2ℓa1) Game_(ℓd 2ℓ a 2)

Figure 3.4: When adversary is assumed to apply sparse feature attack, the learned classifier has better performance.

regularizer has a lower false negative rate. The experiment illustrates that by modeling an adversary with `1regularizer we get overall better performance.

3.3.2.3 The game converges faster with less cost and feature modifications

100 200 300 400 500 50 100 150 200 250 300 Number of repetitions

Cost of adversary Game_(ℓd

2ℓ a 2) Game_(ℓd 2ℓ a 1)

Figure 3.5: The game with sparse feature attack reaches a stable state much faster compared to a dense attack and is associated with lower cost.

3.3. EXPERIMENTS 31 0 50 100 150 200 0 5 10 15 20 Iteration

Number of features modified

Game_(ℓd

2ℓ a 1)

Figure 3.6: The game with sparse feature attack identify and modifies a limited number of features, in this case, 13 out of 50 features.

As one can anticipate that if we let the game play repeat indefinitely, it will reach a state where the positive and negative data almost overlap. However we would expect the classifier learned from such a game will have the least performance. In other words, this is an extreme case of overestimating the adversary. Still we conduct experiment to estimate the adversarial cost for reaching such a state. Since we have already concluded that a sparse attack is a better model for the adversary, we expect the adversary with sparse feature attack will achieve the overlap state with less cost compared to the one with dense feature attack.

We compare the the number of iterations required for Game(`d₂`a₂) and Game(`d₂`a₁) converge on the mailinglist data set . The Cost of the adversary is evaluated by accumu- lating the `1norm of α in each step. We also evaluate the number of features modified

under the sparse feature attack model Game(`d₂`a₁) as a function of number of iterations. For Game(`d₂ `a₂), we know it is likely to modify all the features in each step.

Figure 3.5 shows the cumulative cost of the adversary as a function of the number of iterations in the game. It is clear that Game(`d₂`a₁), i.e., the game where the adversary carries out a sparse feature attack converges faster to reach a stable state compared to the dense game Game(`d₂`a₂). From Figure 3.6, we found the number of features being modified will also converge to a number far less than the total number of features. This again suggests that modeling an adversary using an `1 regularizer is a better reflection

32 CHAPTER 3. ADVERSARIAL LEARNING

In document Robust and Adversarial Data Mining (Page 46-49)