Future Work - A saliency based framework for multi-modal registration.

state-of-the-art features, due to their higher repeatability. Furthermore, the BnB approaches are significantly more robust to high rates of outliers than existing approaches, while furthermore scaling relatively well with the number of features and not requiring a large number of parameters to be set, unlike existing approaches to 2D-3D registration.

6.2 Future Work

In this section we firstly comment on future work specific to the two themes explored in this thesis (feature detection and globally optimal registration), and finish with future work for the 2D-3D registration pipeline as a whole.

The feature detectors presented in this thesis are highly generalisable, and as such it would be interesting to investigate their use for different multi-modal registration problems. Examples include cross-spectral or medical imagery registration, where many existing approaches rely on Mutual Information alignment, which is often slow and requires a good initialisation. Alter- natively, the detection of other salient features could be investigated, such as edges (in 2D and 3D) and 3D planes. While there has been much research to detect such features in the literature they often act locally, and disregard any of their salient aspects. However, determining the registration parameters using such features is a non-trivial problem in comparison to registration using points or lines.

There is scope for future work in global optimisation for the 2D-3D registration problem. There has been much research recently in global optimisation for a range of registration problems, but they often take drastically different approaches: for example, the BnB using geometrically meaningful bounds presented here, the construction of convex and concave envelopes to com- pute bounds e.g. [117], or approaches that search over the correspondences e.g. [33]. At this stage, it appears almost trivial to adopt a globally optimal approach to a new geometry estimation problem; however it is unclear which type of global optimisation approach leads to the most efficient implementation. While we made our BnB approach faster via deterministic and probabilistic nested BnB procedures, there are many other paths to be investigated.

Now we comment on future work for the proposed 2D-3D registration pipeline, which could be enhanced in a number of ways. A natural extension would be the use of feature descriptors to

guide the registration process. For textureless 3D data there are some potential feature attributes to guide the matching process, for example the work of Kuang and ˚Astr¨om [86] who consider the direction of a point feature. However, this is insufficient to act solely as a feature descriptor, but is a slightly different, richer type of feature.

With increasingly many LiDAR scans recording the texture of the scene, it seems natural to use a 3D feature descriptor based on the texture, and match this with a descriptor on the image. This seemingly straightforward task may turn out to be non-trivial due to large changes in both lighting conditions and perspective distortion, tasks that many image descriptors are not robust to. However, the use of descriptors would fit in perfectly with the proposed pipeline, being applicable to salient features detected and being of use in the global optimisation approach via a slight change in objective function.

Another potential avenue of research is to determine both the pose of the camera and the focal length. This is an increasingly important task as it can often be assumed, especially in digital cameras, that the principal point is in the centre of the image and there is zero skew, leaving only the focal length as the unknown intrinsic parameter. This could be combined in the BnB framework with derivation of suitable bounds, however the running time may become infeasi- ble with an extra parameter in the search space. Alternatively, vanishing points could be used to determine camera intrinsics, as e.g. Guillemaut et al. [57] do. Such an approach assumes that 3D lines have a tendency to lie parallel to one another, or whose directions form an orthogonal basis. This is not always the case but is often a reasonable assumption in man-made scenes [15].

We finally comment on ML methods, particularly CNNs, since they have seen a rapid increase in use across a range of fields that have traditionally used model-based approaches. Recently, Su et al. [143] learnt 2D-3D registration parameters by rendering a number of 3D models from millions of viewpoints, and Kendall et al. [78] used a CNN for 2D-3D registration of an outdoor scene using training data captured from registering videos to SfM 3D data. However, the availability of training data currently remains an issue and it remains unclear how the approach would fare with large-scale rendered 3D data.

Bibliography

[1] H. Aanæ s, A. L. Dahl, and K. S. Pedersen. Interesting interest points - a comparative study of interest point performance on a unique data set. International Journal of Computer Vision, 97(1):18–35, 2012.

[2] C. Aguilera, F. Barrera, F. Lumbreras, A. D. Sappa, and R. Toledo. Multispectral image feature points. Sensors, 12(9):12661–12672, 2012.

[3] C. Akinlar and C. Topal. Edlines: Real-time line segment detection by edge drawing (ed). In Proceedings of the 2011 International Conference on Image Processing, pages 2837–2840, 2011.

[4] P. F. Alcantarilla, A. Bartoli, and A. J. Davison. Kaze features. In Proceedings of the 12th European Conference on Computer Vision - Volume Part VI, pages 214–227, 2012.

[5] A. Ansar and K. Daniilidis. Linear pose estimation from points or lines. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):578–589, 2003.

[6] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5):898–916, 2011.

[7] J. Arvo. Fast random rotation matrices. Graphics Gems III, pages 117–120, 1992.

[8] M. Aubry, B. C. Russell, and J. Sivic. Painting-to-3D model alignment via discriminative visual elements. ACM Transactions on Graphics, 33(2):14:1–14:14, 2014.

[9] D. H. Ballard. Generalizing the Hough Transform to detect arbitrary shapes. Pattern Recognition, 13(2):111– 122, 1987.

[10] F. Barrera, F. Lumbreras, and A. D. Sappa. Multispectral piecewise planar stereo using manhattan-world assumption. Pattern Recognition Letters, 34(1):52–61, 2013.

[11] A. Bartoli and P. Sturm. Structure-From-Motion Using Lines: Representation, Triangulation and Bundle Adjustment. Computer Vision and Image Understanding, 100:416–441, 2005.

[12] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3):346–359, 2008.

[13] H. Bay, V. Ferrari, and L. Van Gool. Wide-baseline stereo matching with line segments. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, pages 329 – 336, 2005.

[14] J. C. Bazin, H. Li, I. Kweon, C. Demonceaux, P. Vasseur, and K. Ikeuchi. A branch-and-bound approach to correspondence and grouping problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7):1565–1576, 2013.

[15] J. C. Bazin, Y. Seo, C. Demonceaux, P. Vasseur, K. Ikeuchi, I. Kweon, and M. Pollefeys. Globally optimal line clustering and vanishing point estimation in manhattan world. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 638–645, 2012.

[16] P. R. Beaudet. Rotationally invariant image operators. In Proceedings of the 4th International Joint Confer- ence on Pattern Recognition, pages 579–583, 1978.

[17] J. Beveridge and E. M. Riseman. Optimal geometric model matching under full 3D perspective. Computer Vision and Image Understanding, 61(3):351 – 364, 1995.

[18] S. Bhat and J. Heikkil¨a. Line matching and pose estimation for unconstrained model-to-image alignment. In Proceedings of International Conference on 3D Vision, pages 155–162, 2014.

[19] J. Blat, A. Evans, H. Kim, E. Imre, L. Polok, V. Ila, N. Nikolaidis, P. Zemcik, A. Tefas, P. Smrz, A. Hilton, and I. Pitas. Big data analysis for media production. Submitted to Proceedings of the IEEE, 2015.

[20] T. M. Breuel. Implementation techniques for geometric branch-and-bound matching methods. Computer Vision and Image Understanding, 90:294, 2003.

[21] M. Brown, J.-Y. Guillemaut, and D. Windridge. A saliency-based approach to 2D-3D registration. In Proc. International Conference on Computer Vision Theory and Applications (VISAPP), 2014.

[22] A. G. Buch, J. B. Jessen, D. Kraft, T. R. Savarimuthu, and N. Kr¨uger. Extended 3D line segments from rgb-d data for pose estimation. In Image Analysis, volume 7944 of Lecture Notes in Computer Science, pages 54–65, 2013.

[23] M. Bujnak, Z. Kukelova, and T. Pajdla. A general solution to the P4P problem for camera with unknown focal length. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008.

[24] M. Bujnak, Z. Kukelova, and T. Pajdla. Making minimal solvers fast. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1506–1513, 2012.

[25] J. B. Burns, A. R. Hanson, and E. M. Riseman. Extracting straight lines. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(4):425–455, 1986.

[26] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. BRIEF: Binary robust independent elementary features. In Proceedings of the 11th European Conference on Computer Vision: Part IV, pages 778–792, 2010.

[27] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6):679–698, 1986.

[28] U. Castellani, M. Cristani, S. Fantoni, and V. Murino. Sparse points matching by combining 3D mesh saliency with statistical descriptors. Computer Graphics Forum, 27(2):643–652, 2008.

[29] D. Ceylan, N. J. Mitra, Y. Zheng, and M. Pauly. Coupled structure-from-motion and 3D symmetry detection for urban facades. ACM Transactions on Graphics, 33(1):2:1–2:15, 2014.

Bibliography 153

[30] C.-H. Chang and Y.-Y. Chuang. A line-structure-preserving approach to image resizing. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1075–1082, 2012.

[31] H. Chen and B. Bhanu. 3D free-form object recognition in range images using local surface patches. Pattern Recognition Letters, 28(10):1252–1262, 2007.

[32] T. Chen and Q. Wang. 3D line segment detection for unorganized point clouds from multi-view stereo. In Proceedings of the 10th Asian Conference on Computer Vision, volume 2, pages 400–411, 2011.

[33] T.-J. Chin, P. Purkait, A. Eriksson, and D. Suter. Efficient globally optimal consensus maximisation with tree search. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2413–2421, 2015.

[34] W. J. Christmas, J. Kittler, and M. Petrou. Structural matching in computer vision using probabilistic relax- ation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):749–764, 1995.

[35] O. Chum and J. Matas. Matching with PROSAC - progressive sample consensus. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 220–226, 2005.

[36] O. Chum, J. Matas, and J. Kittler. Locally optimized RANSAC. In Pattern Recognition, 25th DAGM Symposium, Magdeburg, Germany, September 10-12, 2003, Proceedings, pages 236–243, 2003.

[37] M. Corsini, M. Dellepiane, F. Ponchio, and R. Scopigno. Image-to-geometry registration: a mutual information method exploiting illumination-related geometric properties. Computer Graphics Forum, 28(7):1755– 1764, 2009.

[38] P. David and D. DeMenthon. Object recognition in high clutter images using line features. In International Conference on Computer Vision, pages 1581–1588. IEEE Computer Society, 2005.

[39] P. David, D. DeMenthon, R. Duraiswami, and H. Samet. Softposit: Simultaneous pose and correspondence determination. In Proceedings of the 7th European Conference on Computer Vision-Part III, pages 698–714, 2002.

[40] P. David, D. DeMenthon, R. Duraiswami, and H. Samet. Simultaneous pose and correspondence determination using line features. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 424–431, 2003.

[41] J. W. Davis and V. Sharma. Background-subtraction using contour-based fusion of thermal and visible imagery. Computer Vision and Image Understanding, 106(2-3):162–182, 2007.

[42] D. F. Dementhon and L. S. Davis. Model-based object pose in 25 lines of code. International Journal of Computer Vision, 15(1-2):123–141, 1995.

[43] A. Desolneux, L. Moisan, and J.-M. Morel. Meaningful alignments. International Journal of Computer Vision, 40(1):7–23, 2000.

[44] M. Dhome, M. Richetin, J. Laprest´e, and G. Rives. Determination of the attitude of 3D objects from a single perspective view. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(12):1265–1278, 1989.

[45] P. Doll´ar and C. L. Zitnick. Structured forests for fast edge detection. In The IEEE International Conference on Computer Vision, pages 1841–1848, 2013.

[46] F. Dornaika and C. Garcia. Pose estimation using point and line correspondences. Real-Time Imaging, 5(3):215 – 230, 1999.

[47] G. Egnal. Mutual Information as a Stereo Correspondence Measure. Technical report, University of Penn- sylvania, 2000.

[48] D. M. Endres and J. E. Schindelin. A new metric for probability distributions. IEEE Transactions on Information Theory, 49(7):1858–1860, 2003.

[49] O. Enqvist, K. Josephson, and F. Kahl. Optimal correspondences from pairwise constraints. In IEEE Inter- national Conference on Computer Vision, pages 1295–1302, 2009.

[50] O. Enqvist and F. Kahl. Two view geometry estimation with outliers. In British Machine Vision Conference, 2009.

[51] T. Fiolka, J. St¨uckler, D. A. Klein, D. Schulz, and S. Behnke. SURE: Surface entropy for distinctive 3D features. In Spatial Cognition VIII, volume 7463 of Lecture Notes in Computer Science, pages 74–93, 2012.

[52] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, 1981.

[53] G. Flitton, T. Breckon, and N. Megherbi Bouallagu. Object recognition using 3D SIFT in complex CT volumes. In Proceedings of the British Machine Vision Conference, pages 11.1–11.12, 2010.

[54] J. Fredriksson, V. Larsson, and C. Olsson. Practical robust two-view translation estimation. In IEEE Con- ference on Computer Vision and Pattern Recognition, 2015.

[55] B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315:972–976, 2007.

[56] I. Grosse, P. B. Galván, P. Carpena, R. Román-Roldán, J. Oliver, and H. E. Stanley. Analysis of symbolic sequences using the Jensen-Shannon Divergence. Physical Review E, 65(4), 2002.

[57] J.-Y. Guillemaut, A. S. Aguado, and J. Illingworth. Using points at infinity for parameter decoupling in camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2):265–270, 2005.

[58] J.-Y. Guillemaut and A. Hilton. Joint multi-layer segmentation and reconstruction for free-viewpoint video applications. International Journal of Computer Vision, 93(1):73–100, 2011.

[59] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and N. M. Kwok. A comprehensive performance evaluation of 3D local feature descriptors. International Journal of Computer Vision, pages 1–26, 2015.

[60] Y. Guo, M. Bennamoun, F. A. Sohel, M. Lu, and J. Wan. 3D object recognition in cluttered scenes with local surface features: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11):2270– 2287, 2014.

[61] C. Harris and M. Stephens. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference, pages 147–151, 1988.

Bibliography 155

[62] R. Hartley and F. Kahl. Optimal algorithms in multiview geometry. In Asian Conference on Computer Vision, volume 4843 of Lecture Notes in Computer Science, pages 13–34, 2007.

[63] R. Hartley and F. Kahl. Global Optimization through Rotation Space Search. International Journal of Computer Vision, 82(1):64–79, 2009.

[64] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. 2nd edition, 2003.

[65] D. C. Hauagge and N. Snavely. Image matching using local symmetry features. In IEEE Conference on Computer Vision and Pattern Recognition, pages 206–213, 2012.

[66] M. Hofer, A. Wendel, and H. Bischof. Incremental line-based 3D reconstruction using geometric constraints. In British Machine Vision Conference. BMVA Press, 2013.

[67] M. Holtzman-Gazit, L. Zelnik-Manor, and I. Yavneh. Salient edges: A multi scale approach. In European Conference on Computer Vision, Workshop on Vision for Cognitive Tasks, 2010.

[68] M. Hutter. Distribution of mutual information. In Advances in Neural Information Processing Systems 14, pages 399–406, 2002.

[69] J. Illingworth and J. Kittler. A survey of the Hough Transform. Computer Vision, Graphics, and Image Processing, 44(1):87–116, 1988.

[70] E. ˙Imre and A. Hilton. Order statistics of RANSAC and their practical application. International Journal of Computer Vision, 111(3):276–297, 2015.

[71] L. Itti and C. Koch. A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40(10-12):1489–1506, 2000.

[72] M. Jeltsch, C. Dalitz, and R. Pohle-Fr¨ohlich. Hough parameter space regularisation for line detection in 3D. In Proc. International Conference on Computer Vision Theory and Applications (VISAPP), 2016.

[73] F. Jurie. Solution of the simultaneous pose and correspondence problem using gaussian error model. Com- puter Vision and Image Understanding, 73(3):357–373, 1999.

[74] T. Kadir and M. Brady. Saliency, scale and image description. International Journal of Computer Vision, 45(2):83–105, 2001.

[75] T. Kadir, A. Zisserman, and M. Brady. An affine invariant salient region detector. In Proceedings of the 8th European Conference on Computer Vision, volume 3021 of Lecture Notes in Computer Science, pages 228–241. Springer, 2004.

[76] B. Kamgar-Parsi and B. Kamgar-Parsi. Algorithms for matching 3D line sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(5):582–593, 2004.

[77] B. Kamgar-Parsi and B. Kamgar-Parsi. Matching 2D image lines to 3D models: Two improvements and a new algorithm. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2425–2432, 2011.

[78] A. Kendall, M. Grimes, and R. Cipolla. Posenet: A convolutional network for real-time 6-DOF camera relocalization. pages 2938–2946.

[79] N. Y. Khan, B. McCane, and S. Mills. Better than SIFT? Machine Vision and Applications, 26(6):819–836, 2015.

[80] H. Kim. Impart multi-modal dataset. http://dx.doi.org/10.15126/surreydata.00807707, 2014.

[81] M. Klaudiny, M. Tejera, C. Malleson, J.-Y. Guillemaut, and A. Hilton. Scene digital cinema datasets. http: //dx.doi.org/10.15126/surreydata.00807665, 2014.

[82] L. Kneip, H. Li, and Y. Seo. Upnp: An optimal o(n) solution to the absolute pose problem with universal applicability. In European Conference on Computer Vision, volume 8689 of Lecture Notes in Computer Science, pages 127–142, 2014.

[83] L. Kneip, D. Scaramuzza, and R. Siegwart. A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2969–2976, 2011.

[84] M. Kolomenkin, I. Shimshoni, and A. Tal. On edge detection on surfaces. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2767 – 2774, 2009.

[85] D. J. Kriegman and J. Ponce. On recognizing and positioning curved 3-D objects from image contours. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(12):1127–1137, 1990.

[86] Y. Kuang and K. ˚Astr¨om. Pose estimation with unknown focal length using points, directions and lines. In The IEEE International Conference on Computer Vision, pages 529–536, 2013.

[87] R. Kumar and A. Hanson. Robust methods for estimating pose and a sensitivity analysis. CVGIP: Image Understanding, 60(3):313 – 342, 1994.

[88] C. H. Lee, A. Varshney, and D. W. Jacobs. Mesh saliency. ACM Transactions on Graphics, 24(3):659–666, 2005.

[89] W.-T. Lee and H.-T. Chen. Histogram-based interest point detectors. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1590–1596, 2009.

[90] V. Lepetit, F. Moreno-Noguer, and P. Fua. Epnp: An accurate o(n) solution to the pnp problem. International Journal Computer Vision, 81(2), 2009.

[91] J. Lezama, J. Morel, G. Randall, and R. G. von Gioi. A contrario 2D point alignment detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):499–512, 2015.

[92] Y. Li, N. Snavely, and D. P. Huttenlocher. Location recognition using prioritized feature matching. In Proceedings of the 11th European Conference on Computer Vision: Part II, pages 791–804, 2010.

[93] Y. Li, S. Wang, Q. Tian, and X. Ding. A survey of recent advances in visual feature detection. Neurocom- puting, 149(Part B):736–751, 2015.

[94] Y. Lin, C. Wang, J. Cheng, B. Chen, F. Jia, Z. Chen, and J. Li. Line segment extraction for large scale unorganized point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 102(1):172–183, 2015.

[95] J.-L. Liu and D.-Z. Feng. Two-dimensional multi-pixel anisotropic gaussian filter for edge-line segment (ELS) detection. Image and Vision Computing, 32(1):37–53, 2014.

Bibliography 157

[96] J. G. Lopera, N. Ilhami, P. L. Escamilla, J. M. Aroza, and R. R. Rold´an. Improved entropic edge-detection. In International Conference on Image Analysis and Processing, pages 180–184, 1999.

[97] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.

[98] Z. Lu, S. Baek, and S. Lee. Robust 3D line extraction from stereo point clouds. In Robotics, Automation and Mechatronics, IEEE Conference on, pages 1–5. IEEE, 2008.

[99] E. Mair, G. D. Hager, D. Burschka, M. Suppa, and G. Hirzinger. Adaptive and generic corner detection based on the accelerated segment test. In European Conference on Computer Vision, volume 6312, pages 183–196, 2010.

[100] M. Marques, M. Stosic, and J. Costeira. Subspace matching: Unique solution to point matching with geometric constraints. In International Conference on Computer Vision, pages 1288–1294. IEEE Computer

In document A saliency based framework for multi-modal registration. (Page 159-171)