A generalised framework for saliency-based point feature detection

(1)

ContentslistsavailableatScienceDirect

Computer

Vision

and

Image

Understanding

journalhomepage:www.elsevier.com/locate/cviu

A

generalised

framework

for

saliency-based

point

feature

detection

Mark

Brown

a,∗

_,

_David

_Windridge

a,b

_,

_Jean-Yves

_Guillemaut

a

a_Centre_for_Vision,_Speech_and_Signal_Processing,_University_of_Surrey,_Guildford,_Surrey_GU2_7XH,_UK b_School_of_Science_and_Technology,_Middlesex_University,_London,_NW4_4BT,_UK

a

r

t

i

c

l

e

i

n

f

o

Articlehistory: Received30November2015 Revised26August2016 Accepted13September2016 Availableonlinexxx Keywords: Pointdetection Featuredetection Featurematching 2D-3Dregistration Saliency

a

b

s

t

r

a

c

t

Herewepresentanovel,histogram-basedsalientpointfeaturedetectorthatmaynaturallybeappliedto bothimagesand 3Ddata.Existingpointfeaturedetectorsareoftenmodalityspeciﬁc,with2Dand3D featuredetectorstypicallyconstructedinseparateways.Assuch,theirapplicabilityina2D-3Dcontextis verylimited,particularlywherethe3DdataisobtainedbyaLiDARscanner.Bycontrast,our histogram-basedapproachishighlygeneralisableand assuch, maybemeaningfullyappliedbetween2Dand3D data.Usingthegeneralisedapproach,weproposesalientpointdetectorsforimages,andbothuntextured andtextured3Ddata.Theapproachnaturallyallowsforthedetectionofsalient3Dpointsbasedjointly onboththegeometry andtextureofthescene,allowingforbroaderapplicability.Therepeatabilityof thefeaturedetectorsisevaluatedusingarangeofdatasetsincludingimageandLiDARinputfromindoor andoutdoorscenes.Experimentalresultsdemonstrateasigniﬁcantimprovementintermsof2D-2Dand 2D-3Drepeatabilitycomparedtoexistingmulti-modalfeaturedetectors.

1. Introduction

LightDetection AndRanging(LiDAR)scanners havebeenused to obtain 3D datafordecades, butit isonly inrecentyears that they have seen more widespread applicability due to the high computational capacityrequiredto copewithsuch largedatasets. However, the integration of LiDAR scans with data from other modalities(e.g.images)remainsadifficultproblem,withmany ap-proachesrelyingonlinefeaturesfortheirregistration(Liuand Sta-mos,2012;WangandNeumann,2009),whichmaynot alwaysbe available. This causes significant bottlenecks in practical applica-tions such asdigitalfilm production,whereLiDARscans and im-agesare capturedon-settoobtaindataaboutthe scene,but sub-sequentlyneedto bemanually registeredduringpost-production. The problem is further exacerbated by the high resolution and largescale ofthedata,requiringscalablemethodsforregistration thatarerobusttothediverse,multi-modalaspectofthedata.

Toaddress this,herewe proposea point featuredetectorthat maybe naturallyandmeaningfullyapplied betweenboth 2D and 3D data. Feature detectionis a typical ﬁrst stage in many regis-tration pipelines(Lietal., 2010;LiuandStamos,2012;Wu etal., 2008b), whereby considering only a small subset of

discrimina-∗ _{Corresponding}_author.

E-mailaddresses: [email protected] (M.Brown),[email protected] (D.Windridge),[email protected](J.-Y.Guillemaut).

tive features in each dataset the registration parameters maybe obtainedinarelatively straightforwardmanner. However, obtain-ing suitablyrepeatable features betweenboth 2D and3D data is aparticularly challengingproblemduetothe largeheterogeneity betweenthetwomodalities.

Instead,existing point feature detectionmethods are typically centred around images. Recent advances in 3D data acquisition (e.g.MicrosoftKinect) has resultedin asigniﬁcant interest in3D feature detection (Guo et al., 2014; Tombari et al., 2013b). How-ever,itisclearthatthemajorityof2Dand3Dfeaturedetectorsare constructed invery separate ways.The morepopular 2D feature detectors are based on the derivative of the image, and provide a principled approach to scale selection using scale-spacetheory (Lowe, 2004; Mikolajczyk and Schmid, 2004). Yet, very few may beextended tooperateon3D data,withmany3D feature detec-torsbasedon surfacecurvature(Tombari etal., 2013b), andsince thetraditionalscale-spaceapproachtypicallycannotbeappliedto 3D data without altering the geometry. The differences between 2Dand3Dfeature detectorsare furtherexacerbatedbytherange ofexisting3Ddatatypes(pointcloud,volumetric,mesh,textured/ untextured),leadingtodifferent3Dfeaturedetectorsforeachcase (Guoetal.,2014;Tombarietal.,2013b;Yuetal.,2013).

As such, it is very diﬃcult to use existing point feature de-tectorsjointlyacross 2D and 3D dueto the incomparable nature oftheir constructions, andthe limitedscope to which 3D detec-torsmaybeapplied.Applicationssuchasregistration,thatwould

http://dx.doi.org/10.1016/j.cviu.2016.09.008

(2)

entropy of its histogram) at a particular scale. This histogram-basedapproachallowsittobeformulatedacrossdifferent modal-ities in a more meaningful manner than other feature detec-tors due to the vast array of ways in which histograms may be constructed.

Based upon theKBsaliencydetector,andinspiredby the suc-cessofthe 2D Harris corner detector(Aanæs etal., 2012; Harris andStephens,1988) we propose a novel extension to the2D KB saliencydetector.Whereas theoriginal KB saliencydetector con-structsahistogramofpixelintensitiesinacircularregion,we pro-posea derivative-based approach whereby the histogram is con-structed based on the distribution of eigenvalues of the second momentmatrix.Thisallows ourapproachtodetectsalientpoints withrespecttothederivativeoftheimage,whereitmayoperate inamoregeneralmannerthanatypicalcornerdetectorandavoid repetitivepartsofthescene.

Byusingthegeneralisablehistogram-basedapproachoftheKB saliencydetector,the above approach maybe naturallyextended to 3D data by constructing a histogram based on the 3D sec-ondmoment matrix(SipiranandBustos, 2010). Furthermore,the histogram-basedapproachallowsforthedetectionofsalientpoints basedonboththegeometryandtextureofthesceneby construct-ing a 2D histogram based on the texture ofthe 3D surface, and combiningthe2D and3D histograms.Thisallows ittooperatein ameaningfulmannerregardlessofwhetherornotthe3D datais textured,andisable tocombinethebestofboth setsoffeatures fortextureddata.

Thecontributionsofthispaperarethree-fold.Firstly,a general-isationtotheKBsaliencydetectorisformulated,demonstratingits broadapplicabilitytooperatewhereverhistogramsmaybe mean-ingfullyconstructedwithinametricspace.Secondly,inlightofthis generalisation,we propose a2D derivative-based KB saliency de-tectorbasedonthesecondmomentmatrix.Thirdly,the derivative-basedKB saliencydetector isnaturally extended to 3D, whereit may operate on both textured and untextured 3D data. It is, to thebestofourknowledge,theﬁrst3Dfeaturedetectortooperate basedon boththe geometryandtextureof thescene simultane-ously.Theproposeddetectorsareevaluatedina2D-2Dand2D-3D mannerwhereitisshowntobemorerepeatablethanexisting de-tectors(Harris2Dand3D (HarrisandStephens,1988;Sipiranand Bustos,2010), andSIFT2D and3D (Lowe,2004; Zaharescuetal., 2012)).

This paper is structured asfollows. In Section 2 we describe related work in point feature detection between 2D and 3D. In

Section3adescriptionoftheKBsaliencydetectorisgiven,along with proposed extensions and modiﬁcations (Kadir et al., 2004; Shaoand Brady, 2006; Shao et al., 2007). In Section 4 we pro-posea generalisationof KB saliency.The generalisation is subse-quentlyimplementedfora2D derivative-basedKBsaliency detec-tor(Section5),anda3DKBsaliencydetector(Section6)thatmay operate on textured or untextured 3D data. In Section 7 results willbegiven,involvingqualitativeandquantitativeresultsinboth 2Dand3D; ﬁnally,conclusionsandfuturework arepresented in

Section8.

categorised as derivative-based. The early Harris corner detector (HarrisandStephens,1988)isaprimeexample,basedonthe sec-ondmoment matrixM(made upofthepartial derivativesofthe imageinaneighbourhoodofthepoint).Whenbotheigenvaluesof

M are large, it impliesa corner is present;a ‘cornermeasure’ is constructed accordingly.Alternatively,the Hessianmatrixmay be used(Beaudet,1978) asthebasis fora featuredetector.Itdetects ‘blob’structures, whereapoint isofrelativelyhighorlow inten-sitycomparedtoitsimmediatesurroundings.Theeigenvectorsand eigenvaluesdescribe thesizeandshape ofthe blob,withthe de-terminantoftheHessiantypicallyusedasaresponsevalue.

In the case of both the Harris and Hessian detectors, they may be made aﬃne-invariant by constructing the matricesfrom image derivatives over an elliptical regions (Mikolajczyk and Schmid,2004).Furthermore,theymaybemadescale-invariant by constructingthe matricesoverellipses ofvaryingsize while con-volving witha Gaussian kernel (Mikolajczyk andSchmid, 2004). It is observed that detecting keypoints based on the magnitude ofthescale-normalisedLaplacianofGaussians(LoG)producesthe highest percentageof correct scales. This has led to the popular SIFT detector(Lowe,2004) that detects keypointsby the magni-tude of the Differenceof Gaussians(DoG). DoGis approximately equaltothescale-normalisedLoGbytheheatequation,hencethis approach allows forLoG estimation withoutthe need for deriva-tives to be computed. However, the DoG response is large for edge-likestructures,soSIFTsubsequentlycullsedgeresponses us-ing the ratioofeigenvalues of theHessian. Thetraditional Gaus-sian scale-space approach has its limitations since it blurs both noise and ﬁne detail (e.g. edges); this has been addressed by

Alcantarillaetal.(2012) whouseanon-linearscale-spacethat re-spectsthenaturalboundariesoftheimage.

Asecondary categoryofpoint featuredetectors arethose that areintensity-based.Thesedetectorstypicallyoperateovera neigh-bouring set of pixels, but disregard the derivative of the im-age. As such, they are often more robust to noise (particularly salt-and-peppernoise)than derivative-basedfeaturedetectors.An early intensity-based approach isthe SUSANdetector (Smithand Brady, 1997); itdeﬁnes aUnivalue Segment Assimilating Nucleus (USAN)asasetofneighbouringpixelsthathaveasimilarintensity valuetoacentrepixel.Cornersaresubsequentlydeﬁnedwherethe numberofpixels intheUSAN issmall.Regiondetectorstypically fallintotheintensity-basedapproachescategory;forexample,the MSERdetector(Matasetal.,2002)detectsregionswherepixel in-tensitiesinsidetheregionareeitherhigherorlowerthanthoseon itsboundary.

Asubsetofintensity-basedapproachesarethehistogram-based

featuredetectorsthatdetectfeaturepointsviahistogram construc-tion.TheKadir-Bradysaliencydetector(KadirandBrady,2001)is an example of this; it constructsa histogramof pixel intensities ina neighbourhood of apoint, salientpointsare detected where thedistributionofpixelintensitieshasahighentropyata partic-ular scale. It will be discussed in greater detail in the next sec-tion,whereitformsthebasisoftheproposed2D-3Dpointfeature detector.

(3)

Using the histogram-based approach, a keypoint may be de-tected based on the idea of self-similarity, (or lack of it) to its neighbours. Maver (Maver, 2010) looks for similar histograms of pixel intensities in radial and tangential regions so as to detect keypoints that exhibit different types of symmetry. Conversely,

Lee and Chen (2009) look for a point whose histogram is sig-nificantly dissimilar fromits immediate neighbours.Tombari and diStefano(2014)useasimilaridea,butwherehistogram compar-isonisonlyperformedonthek-nearest neighboursanda compu-tationallyefficientimplementationisproposed.Thenotionof self-similarityisvery usefulformulti-modalregistration,sincescenes mayoftenexhibitasimilar structurebetweenmodalitiesbutlack similarfinerfeatures.TombarianddiStefano(2014)showtheir ap-proachtobeofpotentialuseforcross-spectralimage registration, andShechtmanandIrani(2007)constructaself-similarity descrip-torforcross-spectralimageryandsketch-basedretrieval.

The majority of 2D point feature detectors are focused purely within the 2D domain. There is evidence to suggest that histogram-based approaches are a promising avenue for multi-modalfeaturedetectionduetotheirgeneralformulation.However, this has never been applied in a 2D-3D context, where the his-togramconstruction processmaymoregenerallyresultinfeature detectionbasedonboththegeometryandtextureofthe3Ddata.

2.2. 3Dpointfeaturedetection

Approaches to point feature detection in 3D vary depending upon the type ofdata being used.For volumetric3D data many 2D feature detectors may be naturally extended, e.g. 3D SIFT (Flittonetal.,2010).Indeed,aperformanceevaluationof volumet-ric 3D feature detectors (Yu et al., 2013) show extensions of fa-miliar2D feature detectors(Harris,Hessian,MSER, etc).However, otherrepresentationsof3Ddata(pointcloudormesh)create diﬃ-cultiessincepointsarenon-uniformlysampled,pointsmayormay not be textured,and a scale-spacemay not be so naturally con-structed. Point cloud representations are however the subject of thispaperandassuchfeaturedetectionforthisrepresentationwill bereviewedhere.

Similarlyto2Dfeaturedetection,theHarriscornerdetectorhas been naturallyextendedto operateon3D data (Sipiran and Bus-tos, 2010). For each point, a best ﬁt tangent plane is ﬁrst deter-mined. Eachneighbouring point is projected onto the plane and assigned an ‘intensity’ value foreach point asits distance tothe plane.The2DHarriscornerdetectormaybeappliedtothissetof intensityvalues,resultinginthe3DHarriscornerdetector.

Second derivative-based approaches in 3D typically mani-festthemselvesthroughcurvature-basedapproaches,while avoid-ing any mention of a Hessian matrix. For example, Chen and Bhanu (2007) propose an approach that locally estimates a quadratic surfacearound each vertexanduses this toobtain the principal curvatures.They thenassign a Shape Index(SI)to each vertexbasedonthemaximumandminimumprincipalcurvatures. Pointsaredetected baseduponwhetheritsSIissigniﬁcantly big-gerorsmallerthanthemeanofaneighbourhoodofSIs.

Alternativeapproachesmaynot bederivative-based atall, tak-ing advantageofthe unorderedpointcloud representationofthe data. For example, Zhong (2009) proposes Intrinsic Shape Sig-natures (ISS), based on the eigenvalue decomposition of the 3

× 3 covariance matrix around a point. They subsequently cull points whose ratio between successive eigenvalues are similar, then rankfeaturepointsinproportiontothe smallesteigenvalue. Learning-basedapproaches havealsobeenproposed, forexample byTeranandMordohai(2014),wholearnacrossasetofgeometric attributesusingarandom forest.Theapproach allowsforspeciﬁc pointdetectiontomatchthecriteriaobservedduringthetraining phase,resultinginamoreﬂexibleapproach.

Scale-spaceapproachesto3Dfeaturedetectionhavebeen pro-posedinanumberofways.Castellanietal.(2008)proposeto de-tectpointfeatures byusing theDifferenceofGaussians(DoG)on the set of 3D points, determining a point’s saliency by how far itmovesalong its normalundertheDoG operator.However, this typeofapproachhasbeencriticisedsince itobtainsascale-space representationbyalteringthegeometryofthescene.Alternatively, ascale-spacemaybeconstructedbyconvolvingotherattributesof the3Ddata.SuchanapproachistakenbyZaharescuetal.(2012): theydetectkeypointsina genericwaythat isapplicabletoscalar functionsof2Dmanifolds,e.g.meancurvature,ortheintensity(if the data is textured). However, it cannot detect keypoints based jointlyongeometryandtexture.TheirapproachissimilartoSIFT, computinga scalar function ateach point,using a DoG operator onthescalarfunctionandrejectingkeypointsforwhichtheratio oftheeigenvaluesoftheHessianarelarge.

AnapproachthatisverysimilartoSIFTistheViewpoint Invari-antPatchesapproach ofWu etal.(2008a),thatisonlyapplicable totextured 3D models. Theypropose tocompute a local tangent planeto each 3Dpoint, ontowhicha neighbouringtexturepatch maybe orthographically projected.The 2D SIFT detectorand de-scriptormaybe subsequentlyapplied on thetexturepatchto al-lowaframeworkfor3D-3Dregistration.Wuetal.furthermore ap-plytheir approachina2D-3Dscenario(Wuetal., 2008b),where SIFTfeaturesaredetectedinboth2Dand3Ddata.Theydetermine putativefeature matchesthat arereﬁned bywarpingthe2D SIFT featuressuchthattheyapproximatelymatchthesameformofthe orthographicVIPSIFTfeatures.

Ahistogram-basedapproachto3Dpointfeaturedetectionwas prosed by Fiolka et al. (2012), who extend the KB saliency de-tector (Kadir and Brady, 2001) and construct a histogram based onthe distributionof normals.However, their approach only de-tects salient features based on the geometry of the scene and does not detect those based on any available texture; as a re-sult it does not provide a uniﬁed approach to salient point de-tection in 3D. An earlier version of this work was published in

Brown et al. (2014) based on the mean curvature, however this wasa purely geometry-based KB saliency detector.In this paper wea)propose a derivative-based2D KBsaliencydetector,andb) incontrasttobothFiolkaetal.(2012)andBrownetal.(2014),we considerboththegeometryandtextureofthescene,allowingfor salientpointdetectionbasedonbothattributesofthedata simul-taneously.Ourframeworkforgeneralisablesalientpointdetection isevaluatedbetween2D and3D ona rangeofsyntheticandreal data.

3. TheKadir-Bradysaliencydetector

Here an outline of the Kadir-Brady (KB) saliency detector (KadirandBrady,2001)anditsextensionsandvarious implemen-tations(Kadiretal.,2004;ShaoandBrady,2006;Shaoetal.,2007) aregiven.

The KBsaliency detector(Kadir andBrady, 2001) isoriginally basedontheprinciplethattheparts onanimage thatarehighly complex are salient. Scale-invariance is achieved by measuring thecomplexityacross a rangeofscales andonly selectingpoints whosecomplexityispeakedwithrespecttotheirscale.Tofurther localiseitsscale,itisrequiredthatthepointisstatistically dissim-ilar across its neighbouring scales, known asinter-scale saliency. Thesaliencyofa pointistherefore theproductoftwo terms:its complexityand its inter-scale saliency. Finally, salient pointsare clustered into salient regions so as to be more robust to noise. These three stages of the KB saliency detector (complexity esti-mation, inter-scalesaliency,andclustering) are now describedin moredetail:

(4)

Fig. 1. Anexampleoffourdistributionsofpixelintensitiesfromtheimage.Thedistributionsonthelefthavearelativelylargeentropyandchangesigniﬁcantlyoverscale. Thedistributionsontherightlieinanapproximatelyuniformpartoftheimage,havinglowentropyandnotchangingoverscale,hencewillnotbedeemedsalientbythe approach.Imagetakenfrom(Mikolajczyketal.,2005).

Stage I: Complexity estimation. The complexity of a given

point(p)ataparticularscale(

σ

s)isdeterminedbyitsentropy.

En-tropyis,however,deﬁnedforaprobability massfunction(pmf)P

takingoneofKvalues(i.e.P=

{

p1,...,pK

}

,pi≥0

∀

i,Ki=1pi=1),

andisdeﬁnedas:

H

(

P

)

=− K

i=1

pilnpi (1)

Informally,theentropy ofa pmf givesa measure ofhow ‘spread out’ it is: it is maximised forthe uniform distribution and min-imisedwhenthepmf is1forone binandzeroforall other bins (Shannon,1948).Wetake0ln0=0(sincelimx→0xlnx=0).

To meaningfully apply the concept ofentropy to a point p at scale

σ

s,a histogramofpixelintensities isﬁrstconstructed from

all pixels within a distance

σ

s from p; denoted

{

v

1,σs,...,

v

K,σs

}

.

Thehistogramisnormalisedtoobtaina(frequentist)pmf,denoted

{

v

ˆ1,σs,...,

v

ˆK,σs

}

,i.e. K

i=1

v

ˆi,σs=1.Then theentropyofpoint pat

scale

σ

sisdeﬁnedastheentropyofthefrequentistpmf: H

(

p,

σ

s

)

=− K i=1 ˆ

v

i,σslog

ˆ

v

i,σs

(2)

StageII: Inter-scalesaliency. Similarlyto otherfeature

detec-tors,onlyfeatures whoseresponse valueis peakedinscale-space are sought-after; i.e. only features whose entropy is peaked in scale-spaceare kept. Furthermore, it is necessary forthe feature tobe statisticallydissimilaracrossscale.Basedonthis,thepmfis comparedtothepmfsoftheneighbouringscales,andthesaliency isweighted by how dissimilar the pmfs are. Thus the weighting functionisconstructedas:

W

(

p,

σ

s

)

=

σ

2 s

σ

2 s −

σ

s2−1 K i=1

ˆ

v

i,σs−ˆ

v

i,σs−1

(3) Thecoeﬃcient σs2 σ2 s−σs2−1 isusedsoastobescale-invariant. From these two stages, a set of keypoints - those whose en-tropyispeakedinscale-space-areobtained.Theyhaveasaliency valueofH(p,

σ

s) ×W(p,

σ

s).Anexampleofhistogramsobtained

fortheﬁrsttwostagesisgiveninFig.1,wheretheadvantagesof

determining salientpointsasthose withahighentropy and dis-similarityacrossscalearedemonstrated.

StageIII:Salientregions.Fromthepreviousstageagreatdeal

ofsalientpointsare returnedby the detector(typicallyhundreds of thousands);far too many to be of usein any practical appli-cation. Hence, a simple clustering algorithm is proposed. In the original paper(KadirandBrady,2001) arathercomplicated clus-teringalgorithm,dependentupontwouser-deﬁnedparameters,is proposed.However,codeprovidedontheauthor’swebpageusesa greedyclusteringalgorithm:ititerativelytakesthepoint withthe highestsaliencyvalueandremovesallotherpointswithinitsscale, continuinginthisfashion untilnopointsareleft. Wehavefound thegreedyclusteringalgorithmtobebetterinpractice,aswellas moregeneralsinceitisparameterfree.

A deficiency in the above approach is that it is not affine-invariant: histograms arecomputed ina circularregion around a point,ratherthanthefullrangeofpotentialellipticalregions.This wasaddressedinKadiretal.(2004)whereafull,time-consuming searchoverallellipsesintheimageisimplemented.Alternatively, in Shao and Brady (2006), the authors propose to first detect affine-covariant salientregions usingthe original KBsaliency de-tector,thenadaptthesetomakethemaffine-invariant.

In(Shaoetal.,2007) Shaoetal.provideanumberof improve-ments to the algorithm that signiﬁcantly increase its robustness. Theydo not changeanyfundamentalaspect ofthe approach, in-steadcomputingdesiredquantitiesinamoreaccurate and princi-pledmanner.Speciﬁcally,

i) The weighting W(p,

σ

s) is more accurately computed,

re-ﬂectingtheratiosofthenumberofpixelsateachscale.Let therebe Nspixels within

σ

s fromp.Thenthe weightingis

determinedas: W

(

p,

σ

s

)

= Ns Ns−Ns−1 K i=1

ˆ

v

i,σs−

v

ˆi,σs−1

+ Ns+1 Ns+1−Ns K i=1

ˆ

v

i,σs+1−

v

ˆi,σs

(4)

ii) Thehistogramissampleddifferentlysoastoweight pixels towardsthecentreofthecirclemorethanthosetowardsthe

(5)

edge.A Gaussianweighting is initiallysuggested; insteada computationally inexpensive alternative is proposed where a pixel isweighted twice asmuch ifit is within

σ

s−1 and

threetimesasmuchifwithin

σ

s−2.

iii) Partial volume estimation: some pixels are only partly within the circle. In this case, they contribute to the his-tograminproportiontohowmuchofthepixelisinsidethe circle.

iv) Parzenwindowing:thehistogramisconvolvedwitha Gaus-sianto obtainasmootherpdf.Bilinearinterpolationis sug-gestedasacomputationallyinexpensivealternative. TheproposedmodiﬁcationsofShaoetal.(2007)resultinsome improvementtotheperformance oftheKBdetector,asevaluated on the dataset of Mikolajczyk et al. (2005). Hence, Shao et al. demonstratethepotentialoftheapproachasarepeatablefeature detector,butdonotdemonstrateitsbroadapplicability.Inthenext section, we generalisethe KB detector andshow how it may be broadlyappliedacrossdifferentmodalities.

4. ThegeneralisedKadir-Bradysaliencydetector

The original KB saliencydetector waslimited in its construc-tion and as such was only applicable to images. In this section wepropose amuchmoregeneralformulationthatallowsittobe applicable in a multi-modalmanner. Subsequently, we propose a derivative-basedreformulationinthe2Ddomain,anda3D formu-lationthatnaturallyaccountsforboththegeometryandtextureof thescene.

TogeneralisetheKBsaliencydetector,weobservethatmuchof its constructionisbasedonavery generalconcept: pointswhose entropyispeakedacrossscaleareregardedassalient.Toillustrate how widely thisconcept maybe applied, we shallformulate the KBsaliencydetectorinamoregeneralmannerforpointslyingin ametricspace.

Tothisend,let _Mbe asetandd:_M_×_M_→_R+ bea metric, i.e.let

(

M,d

)

beametricspace.Deﬁneaballofradius

σ

centred atp∈Mas

B_σ

(

p

)

:=

{

x∈M:d

(

p,x

)

≤

σ

}

(5)

representingthe setofelementsofM within

σ

ofp. Finally, as-sume amapping Fmaybe constructed fromeach element ofM

to an K-dimensional positive vector, i.e.F:_M_→_R+K_.

Construct-ingFasaspeciﬁcallyvector-valuedfunctionwillallowforbroader applicability where multipleattributesof the dataare takeninto account(e.g.geometryandtexture).

From the above constructions the key components of the KB detectormaybedeﬁned, allowingforgeneralisedKBsaliency de-tection in

(

M,d

)

. The probability mass function for an element

p_∈_M at scale

σ

s is determined by computing a weighted sum

overmappings (F)fromall pointsinball B_σ_s

(

p

)

andnormalising: explicitly,thepmfis

{

v

ˆ1,σs,...,

v

ˆK,σs

}

,where

ˆ

v

i,σs= q∈Bσs(p)w

(

q,p

)

Fi

(

q

)

K j=1 q∈Bσs(p)w

(

q,p

)

Fj

(

q

)

(6)

where the weighting w(q, p) is constructed to favour points closer to p. A Gaussian weighting is originally suggested by

Shaoetal.(2007)butdiscardedduetoconsiderationsof computa-tional eﬃciency.However, thisconsiderationdoesnot necessarily holdsincetheweightingsmaybeprecomputed,andrelativegains in eﬃciency are always application dependent. In thispaper, we usea Gaussian weighting sinceit leadstoa moreprincipled and robustapproach:

w

(

q,p

)

=e

−||q−p|| σ2

s (7)

With the construction of the pmf (Eq. (6)), the entropy of a pointp∈Matscale

σ

siswelldeﬁned,andisthesameasEq.(2):

H

(

p,

σ

s

)

=− K i=1 ˆ

v

i,σslog

ˆ

v

i,σs

(8)

Subsequently the inter-scale saliency, W(p,

σ

s), is deﬁned as in Eq.(4). Finally, the saliency of a point p∈M at scale

σ

s is

de-ﬁnedas theproduct ofH(p,

σ

s) andW(p,

σ

s). Salientpointsare

subsequently clustered by iteratively taking the point with the highestsaliencyvalue (pH) andremoving all other pointswithin B_σ_s

(

pH

)

.

As an example, for the 2D KB saliency detector, the metric space is

(

R2_,_L2

₎

_, _representing _the _image _plane _under _the

Eu-clidean norm. A ball B_σ_s

(

p

)

is simply a circle of radius

σ

s

cen-tredatp.The mappingFtakestheintensityofapixelandmaps it tothe index ofthe histogrambin (i.e. ifthe intensityof pixel

p is I(p) then F

(

p

)

₌

(

0_,_._._.0_,1_,0_,_._._._,0

)

_, with a 1 in the I(p)th

element of the vector). However, the more general construction whereFisa multi-valuedfunction allowsforpixelstocontribute tomultiplebins.ThisnotonlyextendstheKBsaliencydetectorto othermodalitiesbutprovidesadditionaladvantages,e.g.for bilin-ear interpolation, or wherepoints havemultiple attributes (such aswhere 3D points contain information regardinggeometry and texture).

Basedontheaboveformulation,thegeneralisedKBsaliency de-tectormaybe appliedtoarangeofmulti-modaldata.Inthenext two subsections, we construct a derivative-based 2D KB saliency detector,aswellasa3D KBsaliencydetectorthatnaturally oper-atesonboththegeometryandtextureofthescene.Inbothcases, theapproaches are elegantly incorporated within the generalised KB saliency framework by simply deﬁning the metric space and constructingthemappingF.

5. Derivative-based2DKadir-Bradysaliencydetector

Theoriginal2DKBsaliencydetectorwasconstructedbasedon thedistributionofpixelintensitiesinaneighbourhoodofapoint. Whilstthis givessome indicationof some ofthe more complex, salientpartsoftheimage,itfailstodetectthegeometricallysalient

aspects.Inparticular,itrarelydetectscorners,forwhichthe neigh-bouringcomplexityofpixelintensitiesvarieslittlewithscale.Asa result,the original 2D KBsaliencydetectorfailstodetect repeat-ablefeatures between2D and3D (seethe resultsinSection 7.5); focusingmoreonthetextureofthesceneratherthanthe geome-try.

Inlight ofthis limitationforthe originalKB saliencydetector andbasedontheprecedinggeneralisation,inthissectionwe pro-poseaderivative-based KBsaliencydetector. Speciﬁcally,the his-togrammapping Fis modiﬁedto be a function of thederivative of the image at any given pixel. This allows for high-derivative pointswithina low-derivativeneighbourhood (e.g.corners)tobe deemed salient; an important outcome in low-textured scenes. However,it ismore generalthana typicalcorner detector, deter-miningsalientpointswhereverachangeinimage derivativewith respecttoscaleoccurs,andavoidingnoisyorrepetitivepartsofthe scene.

Thederivative-basedKB saliencydetectorisformulated as fol-lows: the metric space is

(

_R2_,_L2

₎

_, _the _same _as _the _original _KB

saliencydetector.ThemappingFisafunctionofthederivativeof theimage(speciﬁcally,thesecondmomentmatrix).Denotethe in-tensityofapixelpasI(p)anditsderivativesinthexandy direc-tionsasI(p)x andI(p)y respectively.Foraﬁxedscale

σ

,construct

(6)

Fig. 2. Anexampleoffourdistributionsofsecondmomentmatrixeigenvaluesfromtheimage.Thedistributionsonthelefthavearelativelylargeentropyandchange signiﬁcantlywith scale,andarelikelytohaveahighsaliencyvalue.Conversely,the distributionsonthe right,whilehavingarelativelylargeentropy,donotchange signiﬁcantlyoverscale,andarelikelytohavealowersaliencyvalue.

Fig. 3. Exampleoutputoftheproposedderivative-basedKBsaliencydetector. Left :Inputimage. Middle :Aheatmapindicatingthemagnitudeoftheeigenvaluesof M (p ). Theintensityofmagentarepresentstherelativemagnitudeoftheﬁrsteigenvalue,withbluerepresentingthesecondeigenvalue. Right :Salientpointsdetectedbasedona histogramoftheeigenvalues.Thesizeofthecirclerepresentsitsscale.

pas: M

(

p

)

=

q∈Bσ(p) w

(

q,p

)

−1 × q∈Bσ(p) w

(

q,p

)

I

(

q

)

2 x I

(

q

)

xI

(

q

)

y I

(

q

)

xI

(

q

)

y I

(

q

)

y2

(9)

wherew(q,p)isaGaussianweightingfunctiondesignedtofavour pointsclosertop,e.g.w

(

q,p

)

=e−

q−p2

2σ2 _._In_constructing_the

ma-trix,we capthederivativesat50pixelsto givea more perceptu-allymeaningful approachthat favours all largechanges in image derivativetothesameextent.

For constructingthe derivative-based KB saliencydetector,we areinterestedintheeigenvalues

λ

1 and

λ

2 ofM

(

p

)

thatdescribe

thederivativeoftheimage. Inqualitativeterms,when

λ

1 and

λ

2

are both large, p is a corner; when

λ

1

λ

2,p isan edge; and

otherwisephaslittlechangeinderivativeinanydirection.To con-structthehistogrammappingF,theeigenvaluesofM

(

p

)

ofall pix-elsontheimagearenormalisedanddiscretisedtolieinarD×rD

histogram. Subsequently, F maps the eigenvalues of M

(

p

)

to the binsoftherD ×rDhistogram(hence,thecodomainofFisR+r

2 D).

Bilinearinterpolationisperformed,meaningatmostfourelements ofFwillbenon-zero.

An example of histograms constructed using the proposed derivative-based2D KB saliencydetectorisgivenin Fig.2,anda heatmap of the relative magnitudes of the eigenvalues of M

(

p

)

alongside theoutput ofthe proposed detectoris given inFig. 3. Itcan beseenthat theapproachdetects salientpointswherethe histogramofeigenvalueschangeswithrespecttoscale.Thisallows ittodetectarangeofderivative-basedstructureswithinthescene whilenaturallyavoidingtherepetitiveareas.

6. The3DKadir-Bradysaliencydetector

For3DKBsaliencydetection,we shalldeﬁnethemetricspace and histogram construction from Section 4. Such a general for-mulation allows for a large range of potential implementations; of note is its applicability to both textureless and textured 3D data within the same framework. More concretely, we may use a histogram mapping F that describes both the geometry and the texture of 3D data, rendering it equally applicable regard-less of whether the 3D data is textured. In this section, we describe the histogram construction based purely on geometry (Section6.1),ontexture(Section6.2),andonboth(Section6.3).An exampleofhistogramsconstructedusingeach approachisshown inFig.5.

Regardless of histogram construction, the metric space used hereis simply

(

R3_,_L2

₎

_,_i.e. _consider_all _points_to _lie _in_3D _space

undertheEuclideannorm.Ifthe3Ddatawereameshthegeodesic distancemaybeused instead,howeverthisisslowerto compute andnotaswidelyapplicable.

(7)

6.1. Geometry-based3DKBsaliencydetector

Initially, we describe the approach takenbased purely on the geometryofthe3Ddata.Todoso,weprojectthelocalsurfaceof the 3D data to an image andapply the same techniquesas per-formedpreviously (constructionofthesecond momentmatrix); a similar approach has been taken for the construction of the 3D Harris corner detector (Sipiran and Bustos, 2010). The image is taken to be a tangent plane to the 3D data, and the ‘intensity’ value of theimage representsthe distanceof the3D data tothe plane.We takeapurely derivative-basedapproachinthis subsec-tion; an intensity-based geometric KB detector maynot be con-structedsincethe‘intensity’valueofeverypointontoitsown tan-gentplaneisalwayszero.

Our derivative-based geometric KB detector is more formally constructedasfollows:forapointp∈R3_,_ﬁrst_determine_a

least-squaretangentplaneatp.Constructanorthonormalframeforthe tangent plane as{t1, t2, n},where nis thenormal to theplane.

Then, foraﬁxed scale

σ

,considertheneighbouringset ofpoints {q_∈B_σ(p)}.Projecteachpointontotheplane,yieldinglocal(u,v) coordinates

((

q−p

)

·t1,

(

q−p

)

·t2

)

anddeﬁneits‘intensity’value I(q)asthe directionaldistance fromqto theplane, computedas

(

q₋p

)

_·n.Thesecondmomentmatrixmaythusbeconstructedin thesamewayasSection5as:

N

(

p

)

=

q∈Bσ(p) w

(

q,p

)

−1 × q∈Bσ(p) w

(

q,p

)

I

(

q

)

2 u I

(

q

)

uI

(

q

)

v I

(

q

)

uI

(

q

)

v I

(

q

)

v 2

(10)

where,similarly to Section 5,w

(

q,p

)

=e−||q−p|| 2

2σ2 _._The _eigenvalues

ofN

(

p

)

aresubsequentlyusedinthehistogrammappingF,inthe samemannerasperformedpreviously.Notethattheorthonormal frame{t1,t2,n}isnotunique-thereisambiguityinthedirections

oft1andt2.However,theeigenvaluesofN

(

p

)

arerotationally

in-variantandthereforethisambiguitywillnotaffectthedesired out-come.Hence,wehaveavoidedtheneedtoconstructauniqueand unambiguous orthonormal referenceframe thatoften plagues 3D featuredetectors(Guoetal.,2013;PetrellianddiStefano,2012).

However,thederivativesI(q)uandI(q)vrequiredinEq.(10)may

not be estimatedas easily asfor the 2D detector, where the in-tensity values of a pixel’s immediate neighbours may be used to determine the derivative. Instead, we compute a Gaussian weighted average froma set of neighbouring points, similarly to

Zaharescuetal.(2012).TocomputethederivativesI(q)u andI(q)v

froma non-uniformlysampled setof 2D points{r _∈B_σ(q)} each with intensityI(r);ﬁrstly, denote thederivative forthe 2D point

q as g := (I(q)u, I(q)v). Then note that, fora point r lying

suﬃ-cientlyclosetoq,thefollowingrelationshipholdsbydeﬁnitionof thederivative:

gT

₍

_q₋_r

₎

_≈_I

₍

_q

₎

₋_I

₍

_r

₎

₍₁₁₎

WemayuseEq.(11)todeterminegbysolvingtheweighted least-squaresequation: argmin g r∈B_σ(q) w

(

r,q

)

I

(

q

)

−I

(

r

)

−gT

₍

_r₋_q

₎

2 ₍₁₂₎

where w(r, q) is a Gaussian of small variance, e.g. w

(

r,q

)

= e

−||r−q||2

2

(

σ₂

)

2 _so_that_the_local_derivative_estimates_of_I₍_q₎_are_computed

over a tighter region than that fromwhich N

(

p

)

is constructed.

Eq. (12) is solved by ‘stacking’ each weighted equalityin (11) to forman over-determinedsystemofthe formAg=b, fromwhich theleast-squaressolutionto(12)isgivenbyg=

(

AT A

)

−1 AT b.

Subsequently, computing the gradient (I(q)u, I(q)v) for every

neighbouring point projected onto the tangent plane allows for thematrixN

(

p

)

tobeconstructedanditseigenvaluestobe com-puted.ToconstructthemappingF,the eigenvaluesofN

(

p

)

ofall pointsin thedataarenormalised anddiscretisedto lieina rG× rGhistogram,wherebilinearinterpolationisperformed.An

exam-pleoftheproposedgeometry-basedKBsaliencydetectorisshown inFig.4alongsideaheatmapoftheeigenvaluesofN

(

p

)

.The ap-proachdetectsa rangeofgeometricallysigniﬁcant structuresina scale-invariantmanner,whileavoidingthemorerepetitiveareasof themodel.

6.2.Texture-based3DKBsaliencydetector

We propose two texture-based3D KB detectors: an intensity-based approach and a derivative-based approach, both of which willbeevaluatedinSection 7.5.Fortheintensity-basedapproach, the mappingF is exactlythe same asin the original 2D KB im-plementation: taking the intensity of a point to its histogram bin while applying bilinear interpolation. Where the 3D data is coloured, the greyscale value is computed via the equation I=

0.299R+0.587G+0.114B. Thehistogram isassumed to be ofthe samesize (K) asthe original intensity-based 2D KB implementa-tion.

ToobtainthemappingFforthederivativetexture-based3DKB saliencydetector,we adopt essentiallythe same approachas the geometry-based 3D KB saliencydetector in the previous section. Thelocalsurfaceofthe3Ddataisprojectedontoatangentplane, and the second-moment matrix (Eq. (10)) may be constructed again. However, rather than using the intensity value of a pro-jectedpoint I(q) asthedirected distancebetweenq andthe tan-gentplane,thegreyscalevalue ofthepointqisusedinstead.The intensitydifferences (I

(

r

)

−I

(

q

)

) in Eq.(12) are cappedbetween

−50and50pixels,similarlytothe2DapproachinSection5,soas togive a moreperceptuallymeaningful distance.Theeigenvalues ofN

(

p

)

(whereI(q)representstheintensityofpointq)are subse-quentlynormalisedtolieinarD2 histogram.

6.3.Geometryandtexturebased3DKBsaliencydetector

Our framework naturally allows for the extension to detect salientpointsbasedonboththegeometryandtexture.Giventhat thetwohistogramsmaybeconstructedbasedonthegeometryor thetexture,theirjointhistogrammaybeconstructed.Theintensity texture-basedKB detectormay be combined withthe geometry-basedKBdetectorto producea KrG2 histogram.Alternatively, the

derivative texture-based KB detector may be combined with the geometry-basedKBdetector,toproducearD2rG2 histogram.

Bilin-earinterpolationisagainperformedinthesehistograms.

Anexampleofhistogramsconstructedbasedon thegeometry, derivative-based texture, and both, is shown in Fig. 5. The his-togramsbasedonbotharethejointhistogramofthegeometryand the derivative-based texture histograms.They are relatively large and,ingeneral, sparse;exhibitingavery highentropy onlywhen causedby boththegeometryandtexture.However,thisapproach isableto detectsalientpointsbasedon eitherthegeometryand texture, sincein eithercasea relatively highentropyis observed ataparticularscale.

7. Experimentalevaluation

In this section we evaluate the performance of our proposed generalised salient point detector against other approaches, with both2Dand3Ddata.Qualitativeandquantitativeresultsaregiven, wheretheﬁnal aimistodetecthighlyrepeatable,sparsefeatures

(8)

Fig. 4. Exampleoutputoftheproposedderivative-basedKBsaliencydetector. Left :Input3Ddata.Middle :Aheatmapindicatingthemagnitudeoftheeigenvaluesof N (p ). Theintensityofmagentarepresentstherelativemagnitudeoftheﬁrsteigenvalue,withbluerepresentingthesecondeigenvalue. Right :Salientpointsdetectedbasedona histogramoftheeigenvalues.Thesizeofthesphererepresentsitsscale.

Fig. 5. Anexampleofthederivative-basedhistogramdistributionsfrom3Ddatawhenconsideringgeometry,texture,andboth.Thepointontherighthasalargedistribution ofeigenvaluesbasedontexturebutnotbasedongeometry,whereasthepointonthelefthasarelativelylargerdistributionofeigenvaluesbasedongeometry(aswellas texture).Inbothcases,theresultingjointhistogram(basedongeometryandtexture)isrelativelysparse.

between2D and3D,that maybe ofuseinthesubsequent regis-trationstage.Forcomparisonagainstourapproaches,thereexista largenumberoffeature detectorsinboth 2D and3D (Guo etal., 2014;TuytelaarsandMikolajczyk,2008),howeverwefocus specifi-callyoncomparingagainstfeaturedetectorsthatmaybe meaning-fullyconstructedinboth 2D and3D. Weshall firstintroduce the detectorsineach modalitybefore describinghow they are evalu-ated:firstlybetween2Dand2D,andsecondlybetween2Dand3D. In 2D,we considerfivedetectors.Firstly, thetraditional Harris

cornerdetector(MikolajczykandSchmid,2004).However,itis ob-servedthat, forsmallnumbers offeatures,Harrisdoesnotdetect a suitable spread offeatures, with manycorners detected inthe samearea (see Fig.9). Therefore,we secondly evaluate theGood FeaturestoTrackalgorithm(GFT) ShiandTomasi(1994)toobtain a better, morerepresentative set of corners. Thirdly,we evaluate against the state-of-the-artSIFT detector (Lowe, 2004). The ﬁnal twodetectorsevaluatedaretheproposed derivative-basedKB de-tector (Section 5), referred to asKBD, andthe original intensity-basedKBdetector(Shaoetal., 2007)(referredto asKBI) soasto experimentallyjustifytheconstructionoftheproposedKBD detec-torformulatedinSection5.

In3D,thereareoptionaldetectorsavailabletocompareagainst depending upon if the texture of the data is used. For untex-tured 3D data, we consider four detectors: Harris (Sipiran and Bustos, 2010), SIFT, SURE1₍_Fiolka _et _al., ₂₀₁₂₎ _and _the _proposed

derivative-basedgeometricKBdetector(Section6.1),referredtoas

KB-G.In3D,Harrisisnotscale-invariantandperformsnon-maxima suppression,thereforetypically detectsa better spreadofcorners in 3D than its 2D counterpart; hence there is no need fora 3D GoodFeatures toTrackdetector.Foruntextured3D data,SIFT de-tects keypoints based upon the mean curvature, andwill be re-ferred to as SIFT-G. Both Harris and SIFT-G are implemented in PointCloudLibrary.2 _Harris_is_extended_to _3D₍_Filipe_and Alexan-dre, 2014) by replacing image gradients by surface normalsfrom which a3D covariance matrixisconstructed. The responsevalue isthen afunction ofthedeterminantandtrace ofthecovariance matrix(similarto2D).SIFTisextendedto3D(Hänschetal.,2014) using either the curvature of a point orthe intensity(if the 3D

1_Code_available_from_{https://github.com/torstenﬁolka/sure3d} 2_{http://pointclouds.org/}

(9)

point cloud is textured).A Difference-of-Gaussians(DoG) maybe applied solely on this attribute of the point cloud (curvature or intensity) that does not change the position of the points. Local maximaandminimamaythenbefoundbycomparingtoapoint’s

k-nearest neighbours,subsequentlypointswithlow curvatureare rejectedastheyaredeemedunstable.

Fortextured3Ddata,thereareadditionaldetectorsthatmaybe evaluatedagainst.SIFTmaydetectfeaturesontextureddatabased on the intensity (referred to as SIFT-T). Alternatively, the KB ap-proachesmaybeusedtodetectfeaturesbasedpurely onthe tex-ture, with the intensity-based KB detector referred as KBI-T and the derivative-based KBdetectorfortextured 3D datareferred to asKBD-T.Onlythe KBapproachesallow forboththe textureand geometry to be combined (Section 6.3), referred to asKBI-B and

KBD-B.

From the above 2D feature detectors (Harris, GFT, SIFT, KBI, and KBD) we ﬁrstly evaluate their repeatability in a 2D-2D sce-nario (Section 7.4).Subsequently, alongsidethe 3D feature detec-tors(Harris,SIFT-G,KB-G,SURE,SIFT-T,KBI-T,KBD-T,KBI-B,and KBD-B) weevaluatetheir repeatabilitybetween2Dand3D.For untex-tured3Ddata,weusesix2D-3Dpointcombinations:Harris-Harris, GFT-Harris,SIFT-SIFT-G,KBI-KB-G,KBD-SUREandKBD-KB-G.For tex-tureddatathereareafurtherﬁve2D-3Dcombinations:SIFT-SIFT-T, KBI-KBI-T, KBD-KBD-T; andwhere both geometryand texture are considered by KB:KBI-KBI-B andKBD-KBD-B.Thus, wherethe 3D dataistextured,atotalof112D-3Dfeaturedetectorcombinations willbeevaluated,tocomparetheeffectsofconsideringthe geom-etry,texture,orboth,ofthetextured3Ddata.

7.1. Implementationdetails

FortheproposedKBdetectorstwoparametersareuser-deﬁned: the number of bins for the mapping F (K, rD and rG), and the

number andrange of scales (

σ

s). Forthe number of bins ofKBI

we take K=16in both 2D and 3D. Forthe proposed derivative-basedapproaches(KBD)weuserD=rG=4;hence,bothKBI-Band KBD-B have the same total number of bins of 256. The number ofscales is12 inall cases.Fortherangeofscales in2D we take

σ

1=3 with

σ

s=3+

σ

s−1. This is similar to the parameters of

Shaoetal.(2007)whoseexperimentsshowthatagapof3pixels betweenscales performedthebest.In 3D,the scaleisdeﬁnedin proportiontothesizeofthemodel.First,denotethelengthofthe diagonaloftheboundingboxofthemodelasL.Then,forthe syn-thetic data,

σ

1=0.004L whereas

σ

1=0.003Lfor realdata (since

features are relatively smaller for the more complex real data). Subsequentscales aredeﬁnedby

σ

s=s

σ

1,thesameasthemesh

saliencyapproachbyLeeetal.(2005).Indeterminingthe param-eter

σ

1inboththe2Dand3Dcase,werunexperimentstojustify

our choice ofparameters (shown in the appendix). Forthe con-structionofmatricesM

(

p

)

andN

(

p

)

inEqs.(9)and(10),thesize oftheballB_σ(p)istakentobe

σ

=5.

For a fair comparison, the other approaches (SIFT, GFT, Harris, and SURE) are altered, where possible, to align with these user-deﬁned parameters. For SIFT in 2D the parameters provided by Vedaldi and Fulkerson (2008) are used and by

Mikolajczyk et al. (2005) for Harris; and the parameter for GFT

is deﬁnedsuch that no twocorners are within 16 pixels ofeach other.In3D, theﬁxed scaleofHarrisissetto

σ

1,andforSIFT-G, SIFT-T,andSURE,12scalesareused,withthesmallestsetto

σ

1. 7.2. Datasets

Three datasets are used: a 2D-2D dataset from Mikolajczyk etal.(2005)(showninFig.6);asynthetic2D-3Ddataset(shown inFig.7);andareal2D-3Ddataset(showninFig.8).

The2D dataset istakenfromMikolajczyketal.(2005). Itis a setofsixgroupsof siximages,withthe knownhomography be-tweeneach imageinagroup provided.Eachgroupofimageshas undergoneacertaintransformation(blurring,scale,JPEG compres-sion, lighting, and viewpoint (twice)), from small to large trans-formations.The ﬁrstandlast imagesin eachgroup are shownin

Fig.6.

Forsyntheticdata,we usesixuntextured 3Dmodels.The ﬁrst four models in Fig.7 are fromthe Stanford 3D Scanning Repos-itory.3 _For _each _of _these _four _models, ₅₀ _images _were _rendered

using POV-Rayusing a random rotation matrix (Arvo, 1992) and translationsuch that the modelis centred in the image, using a point light sourceat thesame location asthecamera. The latter twomodelsarethe3DreconstructionprovidedbyGuillemautand Hilton(2011)ofthedinosaurandtemplefromMiddlebury’s multi-viewreconstructiondataset(Seitzetal.,2006).Inthiscase,50 im-ageswiththeir knownprojectionmatrixfromthemodelare pro-videdaspartofthedataset,sothereisnoneedforrenderingusing POV-Ray.

For real data (Fig. 8), we use ﬁve textured 3D models, obtained by a colour LiDAR scanner. All have been obtained from Kim (2014) with the exception of room, which is from

Klaudinyetal.(2014).The numberofpointsand thedimensions ofthe3Dmodelsistabulatedbelow(Table1):

Foreach model,a setofbetween 7and11images havebeen takenofthescene andmanuallyaligned. This hasbeenachieved bypickingpairsofimageandscenepoints,andusingtheapproach by Penate-Sanchez etal. (2013) to determine the pose and focal lengthofthecamera.Anexampleimageofeachmodelisshownat thebottomofFig.8.Notethatforcertainmodelsthisdoesnot en-capsulatemuchofthescene(e.g.courtyard),making2D-3Dpoint detectionmorediﬃcult.

7.3.Evaluationmeasure

Theperformanceofapointdetector(eitherin2D-2D,orin 2D-3D)ismeasuredbyitsrelativerepeatability.Todefinethis,weshall firstdefinetherepeatabilitybetweentwosetsofpoints(2D-2Dor 2D-3D)asfollows:firstapplytheknowntransformation (homogra-phy,orprojectionmatrix)toonesetofpoints,discardinganythat donot lie within theimage boundary ofthe other set ofpoints. For2D-3D evaluation, occlusions may be handled in the case of the synthetic 2D-3D dataset, the 3D mesh is known and hence occludedpointsmay be discarded;howeveroften real datais in theform ofa point cloud andthisis not possible. From one set of2D points

{

pi∈R2

}

Ni=1 andthe other setoftransformedpoints

{

qi∈R2

}

Mi=1(transformedunderahomography,oraprojection

ma-trix),andgivenaninlierthresholdt,deﬁneaninlierasapointpair (p,q)forwhichi)thenearestneighbourtopfromtheset

{

qi

}

Mi=1

isqandvice-versa;andii)

||

p−q

||

<t.Therepeatabilityis subse-quentlydeﬁnedasthenumberofinliersdividedbymin(N,M).

Ithasbeenobservedintheliterature(e.g.HauaggeandSnavely, 2012;Tombari etal., 2013b)that the repeatabilitymeasure is bi-asedtowardsdetectorsthatproducea lotoffeatures,anda mea-sure that is invariant to the number of points detected is pro-posed.Therefore,wecomputetherelativerepeatability:foreachset ofpoints,orderthem indecreasingvalueoftheir responsevalue. Then,therepeatabilitymaybe determinedfromthetop-kpoints, andagraphmaybeplottedofrepeatabilityagainstthekmost re-sponsivefeatures ineach set.Furthermore, thisis a more useful measureforthepurposesofsparse2D-3Dregistration,wherelarge numbersof featureswill not be ofusedueto thecomputational complexityofsucharegistrationproblem.

(10)

Fig. 6. Examplesinthe2D-2Ddatasetfromsixgroupsofimagetransformations.Foreachgroup,therearesiximagesinthedatasetrangingfromsmalltolarge transfor-mations,withtheﬁrstandlastimagesineachgroupshownhere.

Fig. 7. Top :The3Dmodelsusedinthesynthetic2D-3Ddataset. Bottom :Anexampleimagefromeachsyntheticmodelusedinthedataset.Fromlefttoright:armadillo, buddha,bunny,dragon,dino,temple.

Fig. 8. Top :The3Dmodelsusedinthereal2D-3Ddataset. Bottom :Anexampleimagefromeachmodelusedintherealdataset.Fromlefttoright:cathedral,courtyard, reception,room,studio.

Table 1

3Dmodelsinformation.

cathedral courtyard reception room studio

Numberofvertices 522,018 672,342 772,536 524,873 348,592 Boundingboxdiameter(m) 67.2 27.9 17.6 5.34 7.80

(11)

Fig. 9. Qualitative2Dresults.Thetop-150featuresareshownineachcase.Thetoptwoimagesarefromthesynthetic2D-3Ddataset,thirdtoﬁfthfromthe2D-2Ddataset (Mikolajczyketal.,2005),withthebottomfourfromthereal2D-3Ddataset.Manyimageswerecroppedfromtheiroriginaldatasetforeaseofpresentationinthisﬁgure.

(12)

increasing blur 2 3 4 5 6 0 scale changes 1 1.5 2 2.5 3 0 JPEG compression % 60 70 80 90 100 0 decreasing light 2 3 4 5 6 repeatability % 0 5 10 15 20 25 30 light (leuven) viewpoint angle 20 30 40 50 60 repeatability % 0 5 10 15 20 25 30 viewpoint (graffiti) viewpoint angle 20 30 40 50 60 repeatability % 0 5 10 15 20 25 viewpoint (wall) Harris SIFT GFT KBI KBD

Fig. 10. Quantitative2D-2Dresultsacrossarangeofimagetransformations.Therelativerepeatabilityismeasuredforthetop-100pointfeaturesineachcase.Aninlier thresholdof3pixelsisused.ExampleimagesfromthisdatasetareshowninFig.9.

7.4.2Dpointdetection

Qualitative results for the set of ﬁve 2D point detectors are showninFig.9,foraselectionofimagesacrossthethreedatasets used. It is immediately noticeable, by the size andshape of the features, that Harris is aﬃne- and scale-invariant; SIFT, KBI and

KBDarescale-invariant,andGFTisneither,beingavery parameter-dependentapproach.SIFT,andinparticularHarris,evidently have atendencytodetectthesamefeatureatmultiplescalesandvery similarlocations:thismotivatedtheuseofGFTtoobtainabetter spreadoffeatures(Section7).KBIandKBDnaturallydetectabetter spreadofpointsthanHarrisandSIFT,whileretaininga parameter-freeapproachtoscaleselection.

As a qualitative comparisonbetween theKB approaches; KBD

detects more corners than KBI (e.g. on the cathedral) while still detectingblob-likestructures (e.g.windows inthethirdfromtop image)duetothenecessary changeinderivative presentinsuch features.Incontrast,KBIdoesnotdetectaswide arangeofpoint featuretypesasKBDandoftendetectsmanyedges(e.g.the cathe-dral).Whileedgesmayberegardedassalient,apointonanedge ispoorlylocalisedalongtheedgeandisnotusefulforregistration purposes.

Quantitative resultsfor2D pointfeaturedetectors aregivenin

Fig.10forthe2D-2Ddataset(Fig.6).Thetop-100featuresare de-tectedin each image,andan inlier threshold of3pixels is used. It is observed that no feature detector performs the best across alltransformations.Harrisperformsparticularlywellforscaleand JPEG compression changes, but very poorly across a change in

viewpoint. GFT generally performs very well across the range of transformations.Importantly, KBDoutperformsKBIacross a num-ber of transformations, justifying our proposed reformulation of the2DKBdetector.

7.5. 3Dpointdetection 7.5.1. Qualitativeresults

Qualitative results for the 3D feature detectors are shown in

Fig.11forsyntheticdataandFig.12forrealdata.

Fortheuntexturedsyntheticdata,Harris,SIFT-G,KB-G,andSURE

maybeused.InFig.11,thescale-covariantHarrisdetector success-fullydetectsanumberofsmall-scalecornersbutofteninrepetitive places(e.g.thelegofthearmadillo).KB-Gismorerobustthan SIFT-G,detectingawiderrangeofpoints,e.g.onthearmadilloanddino. By contrast, SIFT-G has a tendency to detect smaller, less mean-ingfulfeatures,e.gonthebunny.SUREtypicallydetectscorner-like structureswherethereisawidedistributionofnormals,however it oftendetects large features and missessmaller corners e.g. on thedragon.Asacomparisonbetweenfeaturesdetectedin3Dand the qualitative2D results (Fig. 9);3D Harriscorrelates quite well with2DGFT,howeverit isclearthescale-covarianceofGFTisan issueon thedragon.SIFT and SIFT-Goftendo notdetect geomet-rically meaningful entities, with some 2D SIFT features detected off the model.KBIandKBDhavesome qualitative correlation with

KB-G,butKBIoftendetectsedgesandavoidscorner-likestructures (particularlysoonthedino).

(13)

Fig. 11. Qualitative3Dresultsforallmodelsfromthesyntheticdataset.Thetop-200pointsareshownineachcase.Thesizeofthesphereindicatesthescaleofthepoint.

Qualitative results for real data are given in Fig. 12, where pointsaredetected basedongeometry(Harris,SIFT-G,KB-G), tex-ture(SIFT-T, KBI-Tand KBD-T), orboth (KBI-BandKBD-B). Similar conclusions may be drawn from the geometry-based approaches asforthesyntheticresults(Fig.11):Harrisislimitedbyits scale-covariance, KB-G is generally more robust than SIFT-G, and SURE

typically detects larger features and misses the ﬁner detail. For texture-based detectors,few qualitative distinctions can be made

between SIFT-T and KBD-T, however KBD-T detects more textural corner-like structures than SIFT-T (the same as in 2D in Fig. 9). Similarly to the 2D results, KBI-T detects more edge-like struc-tures - particularly on the pavement on the cathedral. Interest-ingly, texture-basedfeature detectors often detect geometrically-signiﬁcantfeatures(e.g.cornersonthecathedral,andthetable-leg intheroom) dueto anaturalchangeincolouronthemodel sur-face,orthe lightingconditions.Finally,itis clearthatboth KBI-B

(14)

Fig. 12. Qualitative3Dresultsforcathedralandroomfromthesyntheticdataset.Thetop-400pointsareshownineachcase.Thesizeofthesphereindicatesthescaleof thepoint.

(15)

Fig. 13. Resultsontheuntexturedsyntheticdataset.Eachgraphshowstherelativerepeatabilityofthedetectorsforeachdataset,fork=20,40,60,80,100.Thegraphsare orderedsuchthatagraphofinlierthreshold3pixelsisshownabovethatofinlierthreshold6pixels.

(16)

Fig. 14. Qualitative3Dresultsforvaryingquantitiesoffeaturesonthearmadillomodel.TheleftshowsresultsforGFTandHarris,withKBDandKB-Gontheright.

andKBD-Bdetect pointsbased onboth thegeometry (cornersof thecathedral)andtexture(carpetandpictureinroom).

7.5.2. Quantitativeresults

Quantitativeresultsforthesyntheticdatasetarepresentedﬁrst. Foreachmodel-image pair,the relativerepeatabilityis computed using the top-k 2D points and the top-2k 3D points(since it is expectedhalf the 3D points will be occluded by the model), for

k varying between 20 and100. It is computedfor inlier thresh-olds(t) of3 and 6 pixels and averaged across all images of the model.Resultsare giveninFig. (13), where,giventhe3D data is untextured,acomparisonismadebetweenHarris-Harris, SIFT-SIFT-G,GFT-Harris,KBD-SURE,KBI-KB-G,andKBD-KB-G.

Itisobservedthat,ingeneral,GFT-HarrisandKBD-KB-Gperform thebest;betweenthemhavingthehighestrepeatabilityacrossall sixmodels.Bothhaverepeatabilitiesofatleast30%for(relatively) large numbers of points; suﬃciently high for subsequent 2D-3D registration. KBI-KB-G performs quite well, but never as well as

KBD-KB-G.This isperhaps surprsing incomparisonto the results ofKBIonthe2D-2Devaluation(Fig.10)-thederivative-based KB formulationis evidently moreindicative ofgeometry ratherthan texturebasedontheseresults.Harris-Harris,SIFT-SIFT-G,KBD-SURE, andKBI-KB-Gperformsimilarlypoorly,rarelyobtaininga repeata-bilityofabove 20%. Comparingbetween 3pixels and6 pixels as theinlierthreshold;GFT-Harrisperformsslightlybetterthan KBD-KB-G for the smaller threshold, the reverse is true of the larger threshold.However, the increase in inlier threshold from 3 to 6 typicallyresultsinarepeatabilityincreasebyafactorofaround2, regardlessofdetectorordataset.

Fig. 13showsthat, ingeneral, therepeatability increaseswith respecttothenumberofpointsdetected.However,thisisnotthe casewithGFT-Harris which, in some circumstances, showsa de-creaseinrepeatabilityforincreasing numbersofpoints- particu-larly so onthe armadillo,and toa lesser extent onthe dino and

dragon. Fig.14showsqualitativeresults onthearmadillofor GFT-HarrisandKBD-KBGforsmallerquantitiesofpoints.Forverysmall quantitiesofpoints(20in2Dand40in3D)GFT-Harrishasahigh correlationduetotherelativelysmallnumberofwell-defined cor-nersonthe model(toes,fingers, andears)andhencetherelative easeatwhichtheyaredetected byacornerdetector.Forahigher quantityoffeatures(60in2Dand120in3D)thereareinsufficient cornersin the sceneandso it becomesunclear whycertain fea-turesshould bedetectedby thecornerdetectors.Bycontrast,our

saliency-basedapproachismorebroadlydefinedthanacorner de-tector allowing KBDandKBG to admit awider rangeof features. Asaresult,itisrelativelyunlikelyourapproachwillhaveahigher repeatabilityforsmallnumbersoffeatures(sincesalientpointsare notasnarrowlydefinedascornerpoints)butconverselythe defi-nitionofsaliencyextendstolargernumbersoffeatures.

Next,quantitativeresultsfortherealdatasetarepresented.For eachmodel-imagepair,therelativerepeatabilityiscomputedusing thetop-k2D pointsandthetop-2k3D points,withtheexception ofthelargercourtyardandreceptiondatasetswherethetop-4k3D pointsare used, sincehere itis expectedthe majorityof the 3D pointswillnotbeprojectedontotheimage.kisvariedupfrom20 to200.Similarlytothesyntheticdataset,therelativerepeatability iscomputedforinlierthresholdsof3and6pixels.

Results are presentedin Fig.15, wherea comparison ismade between all 11 approaches (as described at the beginning of

Section 7).Betweenthedifferentmodels,the bestresultsare ob-tainedonreceptionandroom,withrepeatabilityratesofover30% in some cases. However, the other three models only obtain re-peatability rates of between15% and25%. Between the different point detectors,KBD-KBD-TandKBD-KBD-Bgenerallyperform the best acrossall models. GFT-Harris performs nearly aswell except onthemoretexturedmodelsroomandstudio. KBI-KBI-Tmore of-tenoutperformsKBI-KBI-B,furtherdemonstratingthatKBIdoesnot detectgeometricallysigniﬁcantfeaturesin2D.Similarlytothe syn-theticdataset,SIFT-SIFT-GHarris-Harris,andKBD-SUREdonot per-formwellingeneral.

Asacomparisonbetweenthemethodsproposedhere( KBD-KB-G,KBD-KB-T,andKBD-KB-B),KBD-KB-Ggenerallydoesnotperform aswellexceptonthecathedralmodel.Itisperhapssurprisingthat

KBD-KB-Tconsistentlyperformswell,particularlyoncourtyardand

reception wherethere is littlediscriminating texture; howeveras observedinthequalitativeresults,geometricfeaturesareoften ac-companiedbyachangeintexture.Furthermore,thescaleselection processwithin theKBdetectorallowsittonaturallyavoid repeti-tive partsofa scene.KBD-B consistently performswell regardless ofthescene,outperformingtheotherapproachesonthecathedral

andstudio.

8. Conclusionsandfuturework

In this paper we have presented a general approach to 2D-3D salient point feature detection, based on the

(17)

information-Fig. 15. Resultsontherealdataset.Ontheleftshowstherelativerepeatabilityofthedetectorsforaninlierthresholdof3pixels;ontherightaninlierthresholdof6pixels isused.kvariesbetween20and200.Thegraphsareorderedsuchthatagraphofinlierthreshold3pixelsisshownabovethatofinlierthreshold6pixels.

(18)

tectfeaturesbasedonbothattributessimultaneously,increasingits robustnessandwideningitsapplicability.

There is scope forimprovement in our method; inparticular, the qualitative results show our approach to occasionally detect edges assalient. While theremay be some salientproperties re-gardingtheedges, apoint onan edge isnot well localisedalong theedgeandmaynot be asusefulforgeometryestimation.This could be addressed in a similar manner to Tombari and di Ste-fano(2014)wherehistogramsarecomparedbetweenneighbouring points,ratherthanbetweenneighbouringscales.Alternatively,one mayconsiderotherattributestoconstructahistogramfrom,other thantheﬁrstderivativesoftheimage.However,whilethesecond derivativesoftheimage havehadconsiderablesuccessinfeature detectionviaSIFT(Lowe,2004),theblob-likefeaturestheydetect aregenerallymoreindicativeoftextureratherthangeometry.

Futureworkwillincludetheregistrationofpointsbetween im-agesand3DLiDARdata.Inmanycases,correspondencesbetween features cannot be automaticallydetermined, andneed to be es-tablishedalongsideregistrationparameters.Itisacomputationally expensiveproblem(Moreno-Noguer et al., 2008), so any method that hasa highrepeatability fora smallernumber ofpoints will be moresuited to thiskindof problem. We furthermore planto

Klaudinyetal.(2014); andfortheCathedral, Courtyard, Reception, andStudiodatasetsat(Kim,2014).

Acknowledgements

Thisworkwassupported bythe EngineeringandPhysical Sci-encesResearchCouncil(grantnumberEP/K503186/1)andthe Eu-ropeanCommissionFP7IM-PARTpro-ject(grantnumber316564).

AppendixA. Effectofscaleparametersettingonrepeatability

Herewe presentrepeatabilityresultswhen varyingthe choice of

σ

1 inboth 2D and3D.The resultsin2D are showninFig.16

comparingresultsforKBOandKBDonthe2D-3Dsyntheticdataset. TheresultsforKBOshowsomevariabilitydependingonthechoice of

σ

1,withbetterresultsobservedonthebuddhaandthedragon

with

σ

1=4butthischoiceofparametergaveworseresultsonthe dino.Thechoiceof

σ

1makesverylittledifferenceonKBDhowever.

Theresultsforvarying

σ

1in3DaregiveninFig.17.Thechoice

of

σ

1 affectsthedifferentapproachesina verysimilar way,with

σ

1=0.3%thediameteroftheboundingboxgivingthepoorest

re-sultsand

σ

1=0.5%givingslightlystrongerresultsthan

σ

1=0.4%.

Fig. 16. 2D-3Drepeatabilityresultswhereσ1isvariedbetween2,3,and4pixelsin2D.OnlyKBOandKBDareshownherebecausetheother2Dfeaturedetectorsusea