CHAPTER 5: IMP: INSTANCE MASK PROJECTION FOR HIGH ACCURACY
5.4 Acknowledgements
Thanks to Sarene Fu for runway fashion photos and Jonathan Shih and Adam Aji for many thoughtful discussions, lunches at Imm Thai, and fun times at Shopagon!
REFERENCES
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. InCVPR, 2016.
[2] Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-FCN: Object Detection via Region-based Fully Convolutional Networks. InNIPS, 2016.
[3] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal Loss for Dense Object Detection. InICCV, 2017.
[4] Kaiming He, Georgia Gkioxari, Piotr Doll´ar, and Ross Girshick. Mask R-CNN. InICCV, 2017.
[5] Alexander Kirillov, Ross Girshick, Kaiming He, and Piotr Doll´ar. Panoptic Feature Pyramid Networks. arXiv preprint arXiv:1901.02446, 2019.
[6] Ross Girshick. Fast R-CNN. InICCV, 2015.
[7] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You Only Look Once: Unified, Real-Time Object Detection. InCVPR, 2016.
[8] Derek Hoiem, Yodsawalai Chodpathumwan, and Qieyun Dai. Diagnosing error in object detectors. InECCV, 2012.
[9] Detectron model zoo and baselines. https://github.com/facebookresearch/
Detectron/blob/master/MODEL_ZOO.md, 2018. [Online; accessed 2018-11-16].
[10] Francisco Massa and Ross Girshick. maskrnn-benchmark: Fast, modular reference im- plementation of Instance Segmentation and Object Detection algorithms in PyTorch.
https://github.com/facebookresearch/maskrcnn-benchmark/blob/
master/MODEL_ZOO.md, 2018. [Online; accessed 2018-11-16].
[11] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollr, and C. Lawrence Zitnick. Microsoft COCO: Common Objects in Context. In
ECCV, 2014.
[12] Shuai Zheng, Fan Yang, M. Hadi Kiapour, and Robinson Piramuthu. ModaNet: A Large- Scale Street Fashion Dataset with Polygon Annotations. InACM Multimedia, 2018.
[13] L. G. Roberts. Machine Perception of Three Dimensional Solids. PhD thesis, MIT De partment of Electrical Engineering, 1963.
[14] Paul Viola and Michael Jones. Rapid object detection using a boosted cascade of simple features. InCVPR, 2001.
[15] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In
[16] Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. Object Detection with Discriminatively Trained Part Based Models. PAMI, 2010.
[17] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet Classification with Deep Convolutional Neural Networks. InNIPS, 2012.
[18] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhi- heng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
[19] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. InCVPR, 2014.
[20] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. InNIPS, 2015.
[21] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 2010. [22] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias
Weyand, Marco Andreetto, and Hartwig Adam. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861, 2017.
[23] Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, and Kevin Murphy. Speed/accuracy trade-offs for modern convolutional object detectors. InCVPR, 2017. [24] Andrew Zhai, Dmitry Kislyuk, Yushi Jing, Michael Feng, Eric Tzeng, Jeff Donahue, Yue Li
Du, and Trevor Darrell. Visual Discovery at Pinterest. InWWW, 2017.
[25] Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll´ar. Panoptic Segmentation. arXiv preprint arXiv:1801.00868, 2017.
[26] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In
CVPR, 2005.
[27] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object Detection with Discriminatively Trained Part Based Models. Pattern Analysis and Machine Intelligence, 2010.
[28] J.R.R. Uijlings, K.E.A. van de Sande, T. Gevers, and A.W.M. Smeulders. Selective Search for Object Recognition. International Journal of Computer Vision, 2013.
[29] Shaoqing Ren Kaiming He, Xiangyu Zhang and Jian Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. InIEEE Transactions on Pattern Analysis
and Machine Intelligence, 2015.
[30] Pierre Sermanet, David Eigen, Xiang Zhang, Micha¨el Mathieu, Robert Fergus, and Yann Lecun. OverFeat: Integrated recognition, localization and detection using convolutional networks. InICLR, 2014.
[31] Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov. Scalable Object Detection Using Deep Neural Networks. InCVPR, 2014.
[32] Joseph Redmon and Ali Farhadi. YOLO9000: Better, Faster, Stronger. InCVPR, 2017. [33] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang
Fu, and Alexander C. Berg. SSD: Single Shot MultiBox Detector. InECCV, 2016.
[34] Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. Training Region-based Object Detectors with Online Hard Example Mining. InCVPR, 2016.
[35] Zhaowei Cai, Quanfu Fan, Rogerio Feris, and Nuno Vasconcelos. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection. InECCV, 2016.
[36] Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. DSSD : Deconvolutional Single Shot Detector. arXiv:1701.06659, 2017.
[37] Tsung-Yi Lin, Piotr Doll´ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature Pyramid Networks for Object Detection. InCVPR, 2017.
[38] Abhinav Shrivastava, Rahul Sukthankar, Jitendra Malik, and Abhinav Gupta. Beyond Skip Connections: Top-Down Modulation for Object Detection. arXiv:1612.06851, 2016.
[39] Nikita Dvornik, Konstantin Shmelkov, Julien Mairal, and Cordelia Schmid. BlitzNet: A Real-Time Deep Network for Scene Understanding. InICCV, 2017.
[40] Jifeng Dai, Kaiming He, and Jian Sun. Instance-aware Semantic Segmentation via Multi-task Network Cascades. InCVPR, 2016.
[41] Jifeng Dai Xiangyang Ji Yi Li, Haozhi Qi and Yichen Wei. Fully Convolutional Instance- aware Semantic Segmentation. InCVPR, 2017.
[42] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path Aggregation Network for Instance Segmentation. InCVPR, 2018.
[43] Evan Shelhamer, Jonathan Long, and Trevor Darrell. Fully Convolutional Networks for Semantic Segmentation. PAMI, 2016.
[44] Fisher Yu and Vladlen Koltun. Multi-Scale Context Aggregation by Dilated Convolutions. In
ICLR, 2016.
[45] Liang-Chieh* Chen, George* Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. PAMI, 2018.
[46] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In
[47] Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, and Jiaya Jia. PSANet: Point-wise Spatial Attention Network for Scene Parsing. InECCV, 2018. [48] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid Scene
Parsing Network. InCVPR, 2017.
[49] Samuel Rota Bul`o, Lorenzo Porzi, and Peter Kontschieder. In-Place Activated BatchNorm for Memory-Optimized Training of DNNs. InCVPR, 2018.
[50] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. Deformable Convolutional Networks. ICCV, 2017.
[51] Alex Kendall Vijay Badrinarayanan and Roberto Cipolla. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. PAMI, 2017.
[52] O. Ronneberger, P.Fischer, and T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. InMICCAI, 2015.
[53] Sina Honari, Jason Yosinski, Pascal Vincent, and Christopher Pal. Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation. InCVPR, 2016.
[54] Alejandro Newell, Kaiyu Yang, and Jia Deng. Stacked Hourglass Networks for Human Pose Estimation. InECCV, 2016.
[55] Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, and Piotr Dollr. Learning to Refine Object Segments. InECCV, 2016.
[56] Liang-Chieh* Chen, Kokkinos Iasonas Papandreou, George *, Kevin Murphy, and Alan (*equal contribution) L. Yuille. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. InICLR, 2015.
[57] Liang-Chieh Chen, Jonathan T. Barron, George Papandreou, Kevin Murphy, and Alan L. Yuille. Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform. InCVPR, 2016.
[58] Ya˘gız Aksoy, Tae-Hyun Oh, Sylvain Paris, Marc Pollefeys, and Wojciech Matusik. Semantic Soft Segmentation. ACM Trans. Graph. (Proc. SIGGRAPH), 2018.
[59] Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. COCO-Stuff: Thing and Stuff Classes in Context. InCVPR, 2018.
[60] Yuwen Xiong*, Renjie Liao*, Hengshuang Zhao*, Rui Hu, Min Bai, Ersin Yumer, and Raquel Urtasun. UPSNet: A Unified Panoptic Segmentation Network. InCVPR, 2019. [61] Christian Szegedy, Scott Reed, Dumitru Erhan, and Dragomir Anguelov. Scalable, high-
quality object detection. arXiv preprint arXiv:1412.1441 v3, 2015.
[62] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully Convolutional Networks for Semantic Segmentation. InCVPR, 2015.
[63] Bharath Hariharan, Pablo Arbel´aez, Ross Girshick, and Jitendra Malik. Hypercolumns for Object Segmentation and Fine-grained Localization. InCVPR, 2015.
[64] Wei Liu, Andrew Rabinovich, and Alexander C Berg. ParseNet: Looking Wider to See Better. InILCR, 2016.
[65] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Object Detectors Emerge in Deep Scene CNNs. InICLR, 2015.
[66] Andrew G Howard. Some Improvements on Deep Convolutional Neural Network Based Image Classification. arXiv preprint arXiv:1312.5402, 2013.
[67] K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. InICLR, 2015.
[68] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. InICLR, 2015.
[69] Matthias Holschneider, Richard Kronland-Martinet, Jean Morlet, and Ph Tchamitchian. A real-time algorithm for signal analysis with the help of the wavelet transform. InWavelets, pages 286–297. Springer, 1990.
[70] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Gir- shick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. InMM, 2014.
[71] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. InAISTATS, 2010.
[72] Liliang Zhang, Liang Lin, Xiaodan Liang, and Kaiming He. Is Faster R-CNN Doing Well for Pedestrian Detection. InECCV, 2016.
[73] Sean Bell, C. Lawrence Zitnick, Kavita Bala, and Ross Girshick. Inside-Outside Net: De- tecting Objects in Context with Skip Pooling and Recurrent Neural Networks. InCVPR, 2016.
[74] COCO. Common Objects in Context. http://mscoco.org/dataset/
#detections-leaderboard, 2016. [Online; accessed 25-July-2016].
[75] Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, and Piotr Dollr. Learning to Refine Object Segments. InECCV, 2016.
[76] Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. InEMNLP, 2016.
[77] Yang Gao, Oscar Beijbom, Ning Zhang, and Trevor Darrell. Compact Bilinear Pooling. In
[78] Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. Bilinear CNNs for Fine-grained Visual Recognition. InICCV, 2015.
[79] Joseph Redmon and Ali Farhadi. YOLOv3: An Incremental Improvement. arXiv:1804.02767, 2018.
[80] Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O Pinheiro, Sam Gross, Soumith Chintala, and Piotr Dollr. A MultiPath Network for Object Detection. InBMCV, 2016. [81] Saining Xie, Ross Girshick, Piotr Dollr, Zhuowen Tu, and Kaiming He. Aggregated Residual
Transformations for Deep Neural Networks. InCVPR, 2017.
[82] Priya Goyal, Piotr Doll´ar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv:1706.02677, 2017.
[83] Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S. Davis. Soft-NMS – Improv- ing Object Detection With One Line of Code. InICCV, 2017.
[84] Yuxin Wu and Kaiming He. Group Normalization. InECCV, 2018.
[85] Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. InICML, 2015.
[86] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. Learning Transferable Architectures for Scalable Image Recognition. InCVPR, 2018.
[87] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861, 2017.
[88] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. MobileNetV2: Inverted Residuals and Linear Bottlenecks. InCVPR, 2018. [89] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. ShuffleNet: An Extremely
Efficient Convolutional Neural Network for Mobile Devices. InCVPR, 2018.
[90] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differen- tiation in PyTorch. InNIPS-W, 2017.
[91] Francisco Massa and Ross Girshick. maskrcnn-benchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
https://github.com/facebookresearch/maskrcnn-benchmark, 2018. Ac-
cessed: [03/22/2019].
[92] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. InCVPR, 2016.
[93] Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip H. S. Torr. Conditional Random Fields as Recurrent Neural Networks. InICCV, 2015.