Yu Wang u5762607
COMP8755
Supervisor: Professor Tom Gedeon
Adaptive Integration of Multiple
Fine-tuning Models in Transfer Learning for
Image Classification
Outline
• Background
• Problem Statement
• Dataset
• Methodology
• Results
• Conclusion and Future Work
• Q&A
Background
• Transfer Learning (TL)
Transfer learning (TL) focuses on applying obtained
knowledge from one problem to a different but related
problem.
1. Freezing/Feature Extractor:
Freeze the weights learnt from the source
tasks except for the last classification layer.
2. Fine-tuning:
Most of the weights learnt from the source
task are retrained and updated to fit the
target task.
Problem Statement
Research on recent state-of-the-art techniques related to
TL, and propose some novel techniques to improve the
performance of TL in the field of image classification. The
proposed technique is compared with a state-of-the-art TL
technique as baseline to evaluate the performance.
Baseline
• SpotTune
A state-of-the-art way of adaptive fine-tuning. By using
SpotTune, NN can find them optimal fine-tuning strategy
per instance for the target data. (Guo et al., 2018)
Datasets
• FGVC-Aircraft
A specific smaller-sized (shorter side with 72 pixels) image dataset taken from Visual Decathlon challenge. (Rebuffi, Bilen, and Vedaldi, 2017) 10,000 images of aircraft, 100 images for each of 100 different aircraft models. (e.g. Boeing 737-400, Airbus A310) Training, validation and testing sets are equally divided with around 3,333 images for each.
• CIFAR100
A generic smaller-sized (72x72 pixels) image dataset taken from Visual
Decathlon challenge. (Rebuffi, Bilen, and Vedaldi, 2017) 60,000 colour images for 100 object categories. 40,000 for training, 10,000 for validation, 10,000 for testing.
Class 1 Class 2 Class 3
Methodology – CNN
• ResNet-26
Deep Residual Network (ResNet) with 26 layers, which contains 3 macro blocks of convolutional layers. The first block has 64 output feature channels, the second block has 128 output feature channels, and the last block has 256 output feature channels. Each macro block contains 4 residual blocks and every residual block consists of 2 convolutional layers with 3 x 3 filters. (Rebuffi, Bilen, and Vedaldi, 2018)
Methodology – Regularization
• L2-SP
A novel type of regularization to reduce losses of the initially transferred
knowledge. The pre-trained model is not only used as the starting point of the fine-tuning process but also used as the reference in the penalty to encode an explicit inductive bias. (Li, Grandvalet, and Davoine, 2018) Cross Entropy Loss is used in this project, the formula of L2-SP can be shown as:
L y, t = − i C tilog yi + α 2 i W wi − wi0 2 2 + β 2 wS 2ത 2
• wi represents the weights except for the last classification layer • wSത represents the weights of the last classification layer
• t is the target
Methodology – MultiTune
• MultiTune
Proposed in this project. It enables the adaptive integration of multiple fine-tuning models with different fine-fine-tuning settings. The current version contains two ResNet-26 models with different fine-tuning settings.
Z = W ∗ concat αX1; 1 − α X2 • Z represents the output of the MultiTune Layer
• W represents the weights of the MultiTune Layer • X1 is the output of Fine-tuning Model A
• X2 is the output of Fine-tuning Model B
ResNet-26 Fine-tuning Setting A
ResNet-26 Fine-tuning Setting B
Final Classification MultiTune Layer
Results – Accuracy
Validation Accuracy versus
Number of Epoch
• Aircraft • CIFAR100
SpotTune MultiTune
Aircraft 55.15% 59.59% CIFAR100 78.45% 79.31%
Results – Accuracy
• Smaller-sized (number of images per class) Aircraft
SpotTune MultiTune
Aircraft-20 45.60% 47.85% Aircraft-15 39.20% 40.73% Aircraft-10 30.70% 29.90% Aircraft-5 17.40% 18.80%
Results – Running Time
SpotTune (mins) MultiTune (mins)
Aircraft (whole) 47.49 38.19 Aircraft-20 29.15 22.50 Aircraft-15 21.84 16.88 Aircraft-10 14.67 11.51 Aircraft-5 7.57 5.91 CIFAR100 454.80 321.37
Conclusion and Future Work
• Conclusion
Results achieved in this thesis indicate that the proposed MultiTune technique can improve the performance of TL on the image classification problem with higher accuracy and shorter time. It can be a good approach to be further
adopted and applied in the fields of TL and tasks related to image classification.
• Future Work
➢ Combine more fine-tuning models with different settings rather than two ➢ Further tuning of hyper-parameters
Q&A
References
• Li, Xuhong, Yves Grandvalet, and Franck Davoine (2018). “Explicit Inductive Bias for Transfer Learning with Convolutional Networks”. In: CoRR abs/1802.01483. arXiv: 1802.01483. URL: http://arxiv.org/abs/1802.01483.
• Guo, Yunhui et al. (2018). “SpotTune: Transfer Learning through AdaptiveFine-tuning”. In:CoRRabs/1811.08737.
arXiv:1811.08737.URL:http://arxiv.org/abs/1811.08737.
• Rebuffi, Sylvestre-Alvise, Hakan Bilen, and Andrea Vedaldi (2017). “Learning multiple visual domains with residual adapters”. In: CoRR abs/1705.08045. arXiv:
1705.08045. URL: http://arxiv.org/abs/1705.08045.
• Rebuffi, Sylvestre-Alvise, Hakan Bilen, and Andrea Vedaldi (2018). “Efficient
parametrization of multi-domain deep neural networks”. In: CoRR abs/1803.10082. arXiv: 1803.10082. URL: http://arxiv.org/abs/1803.10082.