Generative Adversarial networks - Data augmentation for supervised learning with generative adv

CHAPTER 4. EXPERIMENTS

4.2 Generative Adversarial networks

We experimented with generative adversarial networks using the CIFAR-10 and MNIST dataset. GANs were trained for 200k,140k iterations on CIFAR-10 and MNIST respectively. The generator and discriminator loss is fixed alternately at each iteration. Adam Optimizer is used for training which takes into account moving averages of parameter which enables taking larger effective steps hence has an advantage of converging faster. It is recommend to reduce learning rate as training progresses for increased performance and to reduce training time. So we used an exponential decay in learning rate. The exponential decay is calculated as

decayed learning rate = learning rate * decay rate ˆ (global step / decay steps)

where learning rate, decay rate, global step, decay steps is initial learning rate, rate of decay, iteration number, and number of decay steps respectively. In our case initial learning is set

Figure 4.3: CIFAR-10 Samples produced by generator at different level of iteration on five distinct latent variables.

to 0.01, number of decay steps to 20000 and decay rate to 0.1. A batch size of 128 is used in both generator and discriminator. As mentioned in discriminator architecture dropout is used to avoid overfitting. For our experiments, we employed a dropout probability of 0.05. We used a latent space variable of 100 as input to the generator.

We will analyze how GANs train and statistics involved using CIFAR-10 dataset. The idea is that similar statistics will be involved in training of GANs on MNIST. The CIFAR-10 dataset is more versatile dataset covering a broader view of training GANs.

The generator loss is sum of cross entropy loss of generated sample belonging to real class (first term) and cross entropy loss of generated sample belonging to k different classes. Initially probability that a generated belongs to k classes is given as 1/k which means each

class has equal probability(second term). At start of the training generator produces non- sensible output as shown in figure 4.3. As one would expect, initially the generator loss is very high because of the (first term) in the loss function. As training progresses generator is trying to produce reasonable samples which means samples belonging to any one class should be prominent compared to other classes as a result (second term) in loss starts to dominate. This variations in the generator loss is illustrated in figure 4.5.

The discriminator loss is sum of sum of cross entropy loss of generated sample belonging to fake class plus(first terms), sum of cross entropy loss of real sample belonging to real class plus(second term) and cross entropy loss of real sample belonging to k different classes. The discriminator loss can be seen as a model that is trying to distinguish between real and fake samples and at the same time learning a classifier for the dataset. Not surprisingly, discriminator loss decreases with iteration as would any classifier which gets better at classification at each iteration. The evolution of discriminator loss over iterations can be seen in figure 4.6.

An illustration of how inception score changes with number of iterations is given in figure

4.7. The GAN model has an inception score of 8.5 with a standard deviation of ±0.4.

The figure 4.8 shows some of the samples that are produced by the generator. These generated samples are further passes to discriminator to classify these samples. There are a few things to notice here:

1. Notice that 3rd _{figure in third row has a dog with 3 heads. GANs fail to differentiate} number of objects should be present in a particular location, in this case gives more number of heads than they are supposed to be.

2. GANs do not understand holistic structure. For example 4th image in second row or 6th _{image in first row have some weird looking shape of horse.}

0k 25k 50k 75k 100k 125k 150k 175k 200k 0.605 0.610 0.615 0.620 0.625 0.630 generator loss

Figure 4.5 Generator loss as a variable of iteration number on CIFAR-10 dataset.

0k 25k 50k 75k 100k 125k 150k 175k 200k 2.10 2.12 2.14 2.16 2.18 discriminator loss

Figure 4.6 Discriminator loss as a variable of iteration number on CIFAR-10 dataset.

0k 25k 50k 75k 100k 125k 150k 175k 200k 7.8 7.9 8.0 8.1 8.2 8.3 8.4 inception score

Figure 4.7 Inception score of the model as a variable of iteration number on CIFAR-10 dataset.

3. Sometimes produces outputs which are hard to tell which class they belong to. For instance, 4th _{image in first row, 2}nd _{image in third row.}

4. The discriminator ends up classifying the samples wrong. The 1st _{image in second row} is labelled dog when clearly it is a Truck. The 6th _{image in third row is labelled Horse} when clearly it an Airplane.

While the first three problems related to generator the fourth problem is partly because of discriminator. As the training progresses, discriminator learns to classify the real samples. If

Figure 4.8: Samples produced by generator after training. The samples in 3x3 grid to the left are classified as Dog and samples in right 3x3 grid are classified as Horse by discriminator.

the generator produces real samples then discriminator can be used to label these generated samples. As point out in[26] inception score for real samples is 11.24 with± 0.12 standard deviation on the other hand inception score for generated samples is somewhere around 8.5 which means the samples are far from real and hence discriminator fails to classify non-real looking images.

This is the reason we added an additional simple gradient-based model which tries to find closest looking generated sample to a given real image. As we already know the label of the real image, generated sample will be labelled correctly. This also deals with all the other problems mentioned above. Further, in order to produce multiple generated samples of given real sample, latent variable of generated sample is perturbed by a little bit of noise and passed through generator to produce additional samples.

In document Data augmentation for supervised learning with generative adversarial networks (Page 45-50)