Adaptive Integration of Multiple Finetuning Models in Transfer Learning for Image Classification

(1)

Yu Wang u5762607

COMP8755

Supervisor: Professor Tom Gedeon

Adaptive Integration of Multiple

Fine-tuning Models in Transfer Learning for

Image Classification

(2)

Outline

• Background

• Problem Statement

• Dataset

• Methodology

• Results

• Conclusion and Future Work

• Q&A

(3)

Background

• Transfer Learning (TL)

Transfer learning (TL) focuses on applying obtained

knowledge from one problem to a different but related

problem.

1. Freezing/Feature Extractor:

Freeze the weights learnt from the source

tasks except for the last classification layer.

2. Fine-tuning:

Most of the weights learnt from the source

task are retrained and updated to fit the

target task.

(4)

Problem Statement

Research on recent state-of-the-art techniques related to

TL, and propose some novel techniques to improve the

performance of TL in the field of image classification. The

proposed technique is compared with a state-of-the-art TL

technique as baseline to evaluate the performance.

(5)

Baseline

• SpotTune

A state-of-the-art way of adaptive fine-tuning. By using

SpotTune, NN can find them optimal fine-tuning strategy

per instance for the target data. (Guo et al., 2018)

(6)

Datasets

• FGVC-Aircraft

A specific smaller-sized (shorter side with 72 pixels) image dataset taken from Visual Decathlon challenge. (Rebuffi, Bilen, and Vedaldi, 2017) 10,000 images of aircraft, 100 images for each of 100 different aircraft models. (e.g. Boeing 737-400, Airbus A310) Training, validation and testing sets are equally divided with around 3,333 images for each.

• CIFAR100

A generic smaller-sized (72x72 pixels) image dataset taken from Visual

Decathlon challenge. (Rebuffi, Bilen, and Vedaldi, 2017) 60,000 colour images for 100 object categories. 40,000 for training, 10,000 for validation, 10,000 for testing.

Class 1 Class 2 Class 3

(7)

Methodology – CNN

• ResNet-26

Deep Residual Network (ResNet) with 26 layers, which contains 3 macro blocks of convolutional layers. The first block has 64 output feature channels, the second block has 128 output feature channels, and the last block has 256 output feature channels. Each macro block contains 4 residual blocks and every residual block consists of 2 convolutional layers with 3 x 3 filters. (Rebuffi, Bilen, and Vedaldi, 2018)

(8)

Methodology – Regularization

• L2-SP

A novel type of regularization to reduce losses of the initially transferred

knowledge. The pre-trained model is not only used as the starting point of the fine-tuning process but also used as the reference in the penalty to encode an explicit inductive bias. (Li, Grandvalet, and Davoine, 2018) Cross Entropy Loss is used in this project, the formula of L2-SP can be shown as:

L y, t = − ෍ i C t_ilog y_i + α 2෍ i W w_i − w_i0 2 2 + β 2 wS 2ത 2

• w_i represents the weights except for the last classification layer • w_Sത represents the weights of the last classification layer

• t is the target

(9)

Methodology – MultiTune

• MultiTune

Proposed in this project. It enables the adaptive integration of multiple fine-tuning models with different fine-fine-tuning settings. The current version contains two ResNet-26 models with different fine-tuning settings.

Z = W ∗ concat αX₁; 1 − α X₂ • Z represents the output of the MultiTune Layer

• W represents the weights of the MultiTune Layer • X₁ is the output of Fine-tuning Model A

• X₂ is the output of Fine-tuning Model B

ResNet-26 Fine-tuning Setting A

ResNet-26 Fine-tuning Setting B

Final Classification MultiTune Layer

(10)

Results – Accuracy

Validation Accuracy versus

Number of Epoch

• Aircraft • CIFAR100

SpotTune MultiTune

Aircraft 55.15% 59.59% CIFAR100 78.45% 79.31%

(11)

Results – Accuracy

• Smaller-sized (number of images per class) Aircraft

SpotTune MultiTune

Aircraft-20 45.60% 47.85% Aircraft-15 39.20% 40.73% Aircraft-10 30.70% 29.90% Aircraft-5 17.40% 18.80%

(12)

Results – Running Time

SpotTune (mins) MultiTune (mins)

Aircraft (whole) 47.49 38.19 Aircraft-20 29.15 22.50 Aircraft-15 21.84 16.88 Aircraft-10 14.67 11.51 Aircraft-5 7.57 5.91 CIFAR100 454.80 321.37

(13)

Conclusion and Future Work

• Conclusion

Results achieved in this thesis indicate that the proposed MultiTune technique can improve the performance of TL on the image classification problem with higher accuracy and shorter time. It can be a good approach to be further

adopted and applied in the fields of TL and tasks related to image classification.

• Future Work

➢ Combine more fine-tuning models with different settings rather than two ➢ Further tuning of hyper-parameters

(14)

Q&A

(15)

References

• Li, Xuhong, Yves Grandvalet, and Franck Davoine (2018). “Explicit Inductive Bias for Transfer Learning with Convolutional Networks”. In: CoRR abs/1802.01483. arXiv: 1802.01483. URL: http://arxiv.org/abs/1802.01483.

• Guo, Yunhui et al. (2018). “SpotTune: Transfer Learning through AdaptiveFine-tuning”. In:CoRRabs/1811.08737.

arXiv:1811.08737.URL:http://arxiv.org/abs/1811.08737.

• Rebuffi, Sylvestre-Alvise, Hakan Bilen, and Andrea Vedaldi (2017). “Learning multiple visual domains with residual adapters”. In: CoRR abs/1705.08045. arXiv:

1705.08045. URL: http://arxiv.org/abs/1705.08045.

• Rebuffi, Sylvestre-Alvise, Hakan Bilen, and Andrea Vedaldi (2018). “Efficient

parametrization of multi-domain deep neural networks”. In: CoRR abs/1803.10082. arXiv: 1803.10082. URL: http://arxiv.org/abs/1803.10082.