ContentslistsavailableatScienceDirect
Computer
Vision
and
Image
Understanding
journalhomepage:www.elsevier.com/locate/cviu
A
generalised
framework
for
saliency-based
point
feature
detection
Mark
Brown
a,∗,
David
Windridge
a,b,
Jean-Yves
Guillemaut
aaCentreforVision,SpeechandSignalProcessing,UniversityofSurrey,Guildford,SurreyGU27XH,UK bSchoolofScienceandTechnology,MiddlesexUniversity,London,NW44BT,UK
a
r
t
i
c
l
e
i
n
f
o
Articlehistory: Received30November2015 Revised26August2016 Accepted13September2016 Availableonlinexxx Keywords: Pointdetection Featuredetection Featurematching 2D-3Dregistration Saliencya
b
s
t
r
a
c
t
Herewepresentanovel,histogram-basedsalientpointfeaturedetectorthatmaynaturallybeappliedto bothimagesand 3Ddata.Existingpointfeaturedetectorsareoftenmodalityspecific,with2Dand3D featuredetectorstypicallyconstructedinseparateways.Assuch,theirapplicabilityina2D-3Dcontextis verylimited,particularlywherethe3DdataisobtainedbyaLiDARscanner.Bycontrast,our histogram-basedapproachishighlygeneralisableand assuch, maybemeaningfullyappliedbetween2Dand3D data.Usingthegeneralisedapproach,weproposesalientpointdetectorsforimages,andbothuntextured andtextured3Ddata.Theapproachnaturallyallowsforthedetectionofsalient3Dpointsbasedjointly onboththegeometry andtextureofthescene,allowingforbroaderapplicability.Therepeatabilityof thefeaturedetectorsisevaluatedusingarangeofdatasetsincludingimageandLiDARinputfromindoor andoutdoorscenes.Experimentalresultsdemonstrateasignificantimprovementintermsof2D-2Dand 2D-3Drepeatabilitycomparedtoexistingmulti-modalfeaturedetectors.
© 2016TheAuthors.PublishedbyElsevierInc. ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/).
1. Introduction
LightDetection AndRanging(LiDAR)scanners havebeenused to obtain 3D datafordecades, butit isonly inrecentyears that they have seen more widespread applicability due to the high computational capacityrequiredto copewithsuch largedatasets. However, the integration of LiDAR scans with data from other modalities(e.g.images)remainsadifficultproblem,withmany ap-proachesrelyingonlinefeaturesfortheirregistration(Liuand Sta-mos,2012;WangandNeumann,2009),whichmaynot alwaysbe available. This causes significant bottlenecks in practical applica-tions such asdigitalfilm production,whereLiDARscans and im-agesare capturedon-settoobtaindataaboutthe scene,but sub-sequentlyneedto bemanually registeredduringpost-production. The problem is further exacerbated by the high resolution and largescale ofthedata,requiringscalablemethodsforregistration thatarerobusttothediverse,multi-modalaspectofthedata.
Toaddress this,herewe proposea point featuredetectorthat maybe naturallyandmeaningfullyapplied betweenboth 2D and 3D data. Feature detectionis a typical first stage in many regis-tration pipelines(Lietal., 2010;LiuandStamos,2012;Wu etal., 2008b), whereby considering only a small subset of
discrimina-∗ Correspondingauthor.
E-mailaddresses: [email protected] (M.Brown),[email protected] (D.Windridge),[email protected](J.-Y.Guillemaut).
tive features in each dataset the registration parameters maybe obtainedinarelatively straightforwardmanner. However, obtain-ing suitablyrepeatable features betweenboth 2D and3D data is aparticularly challengingproblemduetothe largeheterogeneity betweenthetwomodalities.
Instead,existing point feature detectionmethods are typically centred around images. Recent advances in 3D data acquisition (e.g.MicrosoftKinect) has resultedin asignificant interest in3D feature detection (Guo et al., 2014; Tombari et al., 2013b). How-ever,itisclearthatthemajorityof2Dand3Dfeaturedetectorsare constructed invery separate ways.The morepopular 2D feature detectors are based on the derivative of the image, and provide a principled approach to scale selection using scale-spacetheory (Lowe, 2004; Mikolajczyk and Schmid, 2004). Yet, very few may beextended tooperateon3D data,withmany3D feature detec-torsbasedon surfacecurvature(Tombari etal., 2013b), andsince thetraditionalscale-spaceapproachtypicallycannotbeappliedto 3D data without altering the geometry. The differences between 2Dand3Dfeature detectorsare furtherexacerbatedbytherange ofexisting3Ddatatypes(pointcloud,volumetric,mesh,textured/ untextured),leadingtodifferent3Dfeaturedetectorsforeachcase (Guoetal.,2014;Tombarietal.,2013b;Yuetal.,2013).
As such, it is very difficult to use existing point feature de-tectorsjointlyacross 2D and 3D dueto the incomparable nature oftheir constructions, andthe limitedscope to which 3D detec-torsmaybeapplied.Applicationssuchasregistration,thatwould
http://dx.doi.org/10.1016/j.cviu.2016.09.008
1077-3142/© 2016 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
entropy of its histogram) at a particular scale. This histogram-basedapproachallowsittobeformulatedacrossdifferent modal-ities in a more meaningful manner than other feature detec-tors due to the vast array of ways in which histograms may be constructed.
Based upon theKBsaliencydetector,andinspiredby the suc-cessofthe 2D Harris corner detector(Aanæs etal., 2012; Harris andStephens,1988) we propose a novel extension to the2D KB saliencydetector.Whereas theoriginal KB saliencydetector con-structsahistogramofpixelintensitiesinacircularregion,we pro-posea derivative-based approach whereby the histogram is con-structed based on the distribution of eigenvalues of the second momentmatrix.Thisallows ourapproachtodetectsalientpoints withrespecttothederivativeoftheimage,whereitmayoperate inamoregeneralmannerthanatypicalcornerdetectorandavoid repetitivepartsofthescene.
Byusingthegeneralisablehistogram-basedapproachoftheKB saliencydetector,the above approach maybe naturallyextended to 3D data by constructing a histogram based on the 3D sec-ondmoment matrix(SipiranandBustos, 2010). Furthermore,the histogram-basedapproachallowsforthedetectionofsalientpoints basedonboththegeometryandtextureofthesceneby construct-ing a 2D histogram based on the texture ofthe 3D surface, and combiningthe2D and3D histograms.Thisallows ittooperatein ameaningfulmannerregardlessofwhetherornotthe3D datais textured,andisable tocombinethebestofboth setsoffeatures fortextureddata.
Thecontributionsofthispaperarethree-fold.Firstly,a general-isationtotheKBsaliencydetectorisformulated,demonstratingits broadapplicabilitytooperatewhereverhistogramsmaybe mean-ingfullyconstructedwithinametricspace.Secondly,inlightofthis generalisation,we propose a2D derivative-based KB saliency de-tectorbasedonthesecondmomentmatrix.Thirdly,the derivative-basedKB saliencydetector isnaturally extended to 3D, whereit may operate on both textured and untextured 3D data. It is, to thebestofourknowledge,thefirst3Dfeaturedetectortooperate basedon boththe geometryandtextureof thescene simultane-ously.Theproposeddetectorsareevaluatedina2D-2Dand2D-3D mannerwhereitisshowntobemorerepeatablethanexisting de-tectors(Harris2Dand3D (HarrisandStephens,1988;Sipiranand Bustos,2010), andSIFT2D and3D (Lowe,2004; Zaharescuetal., 2012)).
This paper is structured asfollows. In Section 2 we describe related work in point feature detection between 2D and 3D. In
Section3adescriptionoftheKBsaliencydetectorisgiven,along with proposed extensions and modifications (Kadir et al., 2004; Shaoand Brady, 2006; Shao et al., 2007). In Section 4 we pro-posea generalisationof KB saliency.The generalisation is subse-quentlyimplementedfora2D derivative-basedKBsaliency detec-tor(Section5),anda3DKBsaliencydetector(Section6)thatmay operate on textured or untextured 3D data. In Section 7 results willbegiven,involvingqualitativeandquantitativeresultsinboth 2Dand3D; finally,conclusionsandfuturework arepresented in
Section8.
categorised as derivative-based. The early Harris corner detector (HarrisandStephens,1988)isaprimeexample,basedonthe sec-ondmoment matrixM(made upofthepartial derivativesofthe imageinaneighbourhoodofthepoint).Whenbotheigenvaluesof
M are large, it impliesa corner is present;a ‘cornermeasure’ is constructed accordingly.Alternatively,the Hessianmatrixmay be used(Beaudet,1978) asthebasis fora featuredetector.Itdetects ‘blob’structures, whereapoint isofrelativelyhighorlow inten-sitycomparedtoitsimmediatesurroundings.Theeigenvectorsand eigenvaluesdescribe thesizeandshape ofthe blob,withthe de-terminantoftheHessiantypicallyusedasaresponsevalue.
In the case of both the Harris and Hessian detectors, they may be made affine-invariant by constructing the matricesfrom image derivatives over an elliptical regions (Mikolajczyk and Schmid,2004).Furthermore,theymaybemadescale-invariant by constructingthe matricesoverellipses ofvaryingsize while con-volving witha Gaussian kernel (Mikolajczyk andSchmid, 2004). It is observed that detecting keypoints based on the magnitude ofthescale-normalisedLaplacianofGaussians(LoG)producesthe highest percentageof correct scales. This has led to the popular SIFT detector(Lowe,2004) that detects keypointsby the magni-tude of the Differenceof Gaussians(DoG). DoGis approximately equaltothescale-normalisedLoGbytheheatequation,hencethis approach allows forLoG estimation withoutthe need for deriva-tives to be computed. However, the DoG response is large for edge-likestructures,soSIFTsubsequentlycullsedgeresponses us-ing the ratioofeigenvalues of theHessian. Thetraditional Gaus-sian scale-space approach has its limitations since it blurs both noise and fine detail (e.g. edges); this has been addressed by
Alcantarillaetal.(2012) whouseanon-linearscale-spacethat re-spectsthenaturalboundariesoftheimage.
Asecondary categoryofpoint featuredetectors arethose that areintensity-based.Thesedetectorstypicallyoperateovera neigh-bouring set of pixels, but disregard the derivative of the im-age. As such, they are often more robust to noise (particularly salt-and-peppernoise)than derivative-basedfeaturedetectors.An early intensity-based approach isthe SUSANdetector (Smithand Brady, 1997); itdefines aUnivalue Segment Assimilating Nucleus (USAN)asasetofneighbouringpixelsthathaveasimilarintensity valuetoacentrepixel.Cornersaresubsequentlydefinedwherethe numberofpixels intheUSAN issmall.Regiondetectorstypically fallintotheintensity-basedapproachescategory;forexample,the MSERdetector(Matasetal.,2002)detectsregionswherepixel in-tensitiesinsidetheregionareeitherhigherorlowerthanthoseon itsboundary.
Asubsetofintensity-basedapproachesarethehistogram-based
featuredetectorsthatdetectfeaturepointsviahistogram construc-tion.TheKadir-Bradysaliencydetector(KadirandBrady,2001)is an example of this; it constructsa histogramof pixel intensities ina neighbourhood of apoint, salientpointsare detected where thedistributionofpixelintensitieshasahighentropyata partic-ular scale. It will be discussed in greater detail in the next sec-tion,whereitformsthebasisoftheproposed2D-3Dpointfeature detector.
Using the histogram-based approach, a keypoint may be de-tected based on the idea of self-similarity, (or lack of it) to its neighbours. Maver (Maver, 2010) looks for similar histograms of pixel intensities in radial and tangential regions so as to detect keypoints that exhibit different types of symmetry. Conversely,
Lee and Chen (2009) look for a point whose histogram is sig-nificantly dissimilar fromits immediate neighbours.Tombari and diStefano(2014)useasimilaridea,butwherehistogram compar-isonisonlyperformedonthek-nearest neighboursanda compu-tationallyefficientimplementationisproposed.Thenotionof self-similarityisvery usefulformulti-modalregistration,sincescenes mayoftenexhibitasimilar structurebetweenmodalitiesbutlack similarfinerfeatures.TombarianddiStefano(2014)showtheir ap-proachtobeofpotentialuseforcross-spectralimage registration, andShechtmanandIrani(2007)constructaself-similarity descrip-torforcross-spectralimageryandsketch-basedretrieval.
The majority of 2D point feature detectors are focused purely within the 2D domain. There is evidence to suggest that histogram-based approaches are a promising avenue for multi-modalfeaturedetectionduetotheirgeneralformulation.However, this has never been applied in a 2D-3D context, where the his-togramconstruction processmaymoregenerallyresultinfeature detectionbasedonboththegeometryandtextureofthe3Ddata.
2.2. 3Dpointfeaturedetection
Approaches to point feature detection in 3D vary depending upon the type ofdata being used.For volumetric3D data many 2D feature detectors may be naturally extended, e.g. 3D SIFT (Flittonetal.,2010).Indeed,aperformanceevaluationof volumet-ric 3D feature detectors (Yu et al., 2013) show extensions of fa-miliar2D feature detectors(Harris,Hessian,MSER, etc).However, otherrepresentationsof3Ddata(pointcloudormesh)create diffi-cultiessincepointsarenon-uniformlysampled,pointsmayormay not be textured,and a scale-spacemay not be so naturally con-structed. Point cloud representations are however the subject of thispaperandassuchfeaturedetectionforthisrepresentationwill bereviewedhere.
Similarlyto2Dfeaturedetection,theHarriscornerdetectorhas been naturallyextendedto operateon3D data (Sipiran and Bus-tos, 2010). For each point, a best fit tangent plane is first deter-mined. Eachneighbouring point is projected onto the plane and assigned an ‘intensity’ value foreach point asits distance tothe plane.The2DHarriscornerdetectormaybeappliedtothissetof intensityvalues,resultinginthe3DHarriscornerdetector.
Second derivative-based approaches in 3D typically mani-festthemselvesthroughcurvature-basedapproaches,while avoid-ing any mention of a Hessian matrix. For example, Chen and Bhanu (2007) propose an approach that locally estimates a quadratic surfacearound each vertexanduses this toobtain the principal curvatures.They thenassign a Shape Index(SI)to each vertexbasedonthemaximumandminimumprincipalcurvatures. Pointsaredetected baseduponwhetheritsSIissignificantly big-gerorsmallerthanthemeanofaneighbourhoodofSIs.
Alternativeapproachesmaynot bederivative-based atall, tak-ing advantageofthe unorderedpointcloud representationofthe data. For example, Zhong (2009) proposes Intrinsic Shape Sig-natures (ISS), based on the eigenvalue decomposition of the 3
× 3 covariance matrix around a point. They subsequently cull points whose ratio between successive eigenvalues are similar, then rankfeaturepointsinproportiontothe smallesteigenvalue. Learning-basedapproaches havealsobeenproposed, forexample byTeranandMordohai(2014),wholearnacrossasetofgeometric attributesusingarandom forest.Theapproach allowsforspecific pointdetectiontomatchthecriteriaobservedduringthetraining phase,resultinginamoreflexibleapproach.
Scale-spaceapproachesto3Dfeaturedetectionhavebeen pro-posedinanumberofways.Castellanietal.(2008)proposeto de-tectpointfeatures byusing theDifferenceofGaussians(DoG)on the set of 3D points, determining a point’s saliency by how far itmovesalong its normalundertheDoG operator.However, this typeofapproachhasbeencriticisedsince itobtainsascale-space representationbyalteringthegeometryofthescene.Alternatively, ascale-spacemaybeconstructedbyconvolvingotherattributesof the3Ddata.SuchanapproachistakenbyZaharescuetal.(2012): theydetectkeypointsina genericwaythat isapplicabletoscalar functionsof2Dmanifolds,e.g.meancurvature,ortheintensity(if the data is textured). However, it cannot detect keypoints based jointlyongeometryandtexture.TheirapproachissimilartoSIFT, computinga scalar function ateach point,using a DoG operator onthescalarfunctionandrejectingkeypointsforwhichtheratio oftheeigenvaluesoftheHessianarelarge.
AnapproachthatisverysimilartoSIFTistheViewpoint Invari-antPatchesapproach ofWu etal.(2008a),thatisonlyapplicable totextured 3D models. Theypropose tocompute a local tangent planeto each 3Dpoint, ontowhicha neighbouringtexturepatch maybe orthographically projected.The 2D SIFT detectorand de-scriptormaybe subsequentlyapplied on thetexturepatchto al-lowaframeworkfor3D-3Dregistration.Wuetal.furthermore ap-plytheir approachina2D-3Dscenario(Wuetal., 2008b),where SIFTfeaturesaredetectedinboth2Dand3Ddata.Theydetermine putativefeature matchesthat arerefined bywarpingthe2D SIFT featuressuchthattheyapproximatelymatchthesameformofthe orthographicVIPSIFTfeatures.
Ahistogram-basedapproachto3Dpointfeaturedetectionwas prosed by Fiolka et al. (2012), who extend the KB saliency de-tector (Kadir and Brady, 2001) and construct a histogram based onthe distributionof normals.However, their approach only de-tects salient features based on the geometry of the scene and does not detect those based on any available texture; as a re-sult it does not provide a unified approach to salient point de-tection in 3D. An earlier version of this work was published in
Brown et al. (2014) based on the mean curvature, however this wasa purely geometry-based KB saliency detector.In this paper wea)propose a derivative-based2D KBsaliencydetector,andb) incontrasttobothFiolkaetal.(2012)andBrownetal.(2014),we considerboththegeometryandtextureofthescene,allowingfor salientpointdetectionbasedonbothattributesofthedata simul-taneously.Ourframeworkforgeneralisablesalientpointdetection isevaluatedbetween2D and3D ona rangeofsyntheticandreal data.
3. TheKadir-Bradysaliencydetector
Here an outline of the Kadir-Brady (KB) saliency detector (KadirandBrady,2001)anditsextensionsandvarious implemen-tations(Kadiretal.,2004;ShaoandBrady,2006;Shaoetal.,2007) aregiven.
The KBsaliency detector(Kadir andBrady, 2001) isoriginally basedontheprinciplethattheparts onanimage thatarehighly complex are salient. Scale-invariance is achieved by measuring thecomplexityacross a rangeofscales andonly selectingpoints whosecomplexityispeakedwithrespecttotheirscale.Tofurther localiseitsscale,itisrequiredthatthepointisstatistically dissim-ilar across its neighbouring scales, known asinter-scale saliency. Thesaliencyofa pointistherefore theproductoftwo terms:its complexityand its inter-scale saliency. Finally, salient pointsare clustered into salient regions so as to be more robust to noise. These three stages of the KB saliency detector (complexity esti-mation, inter-scalesaliency,andclustering) are now describedin moredetail:
Fig. 1. Anexampleoffourdistributionsofpixelintensitiesfromtheimage.Thedistributionsonthelefthavearelativelylargeentropyandchangesignificantlyoverscale. Thedistributionsontherightlieinanapproximatelyuniformpartoftheimage,havinglowentropyandnotchangingoverscale,hencewillnotbedeemedsalientbythe approach.Imagetakenfrom(Mikolajczyketal.,2005).
Stage I: Complexity estimation. The complexity of a given
point(p)ataparticularscale(
σ
s)isdeterminedbyitsentropy.En-tropyis,however,definedforaprobability massfunction(pmf)P
takingoneofKvalues(i.e.P=
{
p1,...,pK}
,pi≥0∀
i,Ki=1pi=1),andisdefinedas:
H
(
P)
=− Ki=1
pilnpi (1)
Informally,theentropy ofa pmf givesa measure ofhow ‘spread out’ it is: it is maximised forthe uniform distribution and min-imisedwhenthepmf is1forone binandzeroforall other bins (Shannon,1948).Wetake0ln0=0(sincelimx→0xlnx=0).
To meaningfully apply the concept ofentropy to a point p at scale
σ
s,a histogramofpixelintensities isfirstconstructed fromall pixels within a distance
σ
s from p; denoted{
v
1,σs,...,v
K,σs}
.Thehistogramisnormalisedtoobtaina(frequentist)pmf,denoted
{
v
ˆ1,σs,...,v
ˆK,σs}
,i.e. Ki=1
v
ˆi,σs=1.Then theentropyofpoint patscale
σ
sisdefinedastheentropyofthefrequentistpmf: H(
p,σ
s)
=− K i=1 ˆv
i,σslog ˆv
i,σs (2)StageII: Inter-scalesaliency. Similarlyto otherfeature
detec-tors,onlyfeatures whoseresponse valueis peakedinscale-space are sought-after; i.e. only features whose entropy is peaked in scale-spaceare kept. Furthermore, it is necessary forthe feature tobe statisticallydissimilaracrossscale.Basedonthis,thepmfis comparedtothepmfsoftheneighbouringscales,andthesaliency isweighted by how dissimilar the pmfs are. Thus the weighting functionisconstructedas:
W
(
p,σ
s)
=σ
2 sσ
2 s −σ
s2−1 K i=1ˆ
v
i,σs−ˆv
i,σs−1 (3) Thecoefficient σs2 σ2 s−σs2−1 isusedsoastobescale-invariant. From these two stages, a set of keypoints - those whose en-tropyispeakedinscale-space-areobtained.Theyhaveasaliency valueofH(p,σ
s) ×W(p,σ
s).AnexampleofhistogramsobtainedforthefirsttwostagesisgiveninFig.1,wheretheadvantagesof
determining salientpointsasthose withahighentropy and dis-similarityacrossscalearedemonstrated.
StageIII:Salientregions.Fromthepreviousstageagreatdeal
ofsalientpointsare returnedby the detector(typicallyhundreds of thousands);far too many to be of usein any practical appli-cation. Hence, a simple clustering algorithm is proposed. In the original paper(KadirandBrady,2001) arathercomplicated clus-teringalgorithm,dependentupontwouser-definedparameters,is proposed.However,codeprovidedontheauthor’swebpageusesa greedyclusteringalgorithm:ititerativelytakesthepoint withthe highestsaliencyvalueandremovesallotherpointswithinitsscale, continuinginthisfashion untilnopointsareleft. Wehavefound thegreedyclusteringalgorithmtobebetterinpractice,aswellas moregeneralsinceitisparameterfree.
A deficiency in the above approach is that it is not affine-invariant: histograms arecomputed ina circularregion around a point,ratherthanthefullrangeofpotentialellipticalregions.This wasaddressedinKadiretal.(2004)whereafull,time-consuming searchoverallellipsesintheimageisimplemented.Alternatively, in Shao and Brady (2006), the authors propose to first detect affine-covariant salientregions usingthe original KBsaliency de-tector,thenadaptthesetomakethemaffine-invariant.
In(Shaoetal.,2007) Shaoetal.provideanumberof improve-ments to the algorithm that significantly increase its robustness. Theydo not changeanyfundamentalaspect ofthe approach, in-steadcomputingdesiredquantitiesinamoreaccurate and princi-pledmanner.Specifically,
i) The weighting W(p,
σ
s) is more accurately computed,re-flectingtheratiosofthenumberofpixelsateachscale.Let therebe Nspixels within
σ
s fromp.Thenthe weightingisdeterminedas: W
(
p,σ
s)
= Ns Ns−Ns−1 K i=1ˆ
v
i,σs−v
ˆi,σs−1 + Ns+1 Ns+1−Ns K i=1ˆ
v
i,σs+1−v
ˆi,σs (4)ii) Thehistogramissampleddifferentlysoastoweight pixels towardsthecentreofthecirclemorethanthosetowardsthe
edge.A Gaussianweighting is initiallysuggested; insteada computationally inexpensive alternative is proposed where a pixel isweighted twice asmuch ifit is within
σ
s−1 andthreetimesasmuchifwithin
σ
s−2.iii) Partial volume estimation: some pixels are only partly within the circle. In this case, they contribute to the his-tograminproportiontohowmuchofthepixelisinsidethe circle.
iv) Parzenwindowing:thehistogramisconvolvedwitha Gaus-sianto obtainasmootherpdf.Bilinearinterpolationis sug-gestedasacomputationallyinexpensivealternative. TheproposedmodificationsofShaoetal.(2007)resultinsome improvementtotheperformance oftheKBdetector,asevaluated on the dataset of Mikolajczyk et al. (2005). Hence, Shao et al. demonstratethepotentialoftheapproachasarepeatablefeature detector,butdonotdemonstrateitsbroadapplicability.Inthenext section, we generalisethe KB detector andshow how it may be broadlyappliedacrossdifferentmodalities.
4. ThegeneralisedKadir-Bradysaliencydetector
The original KB saliencydetector waslimited in its construc-tion and as such was only applicable to images. In this section wepropose amuchmoregeneralformulationthatallowsittobe applicable in a multi-modalmanner. Subsequently, we propose a derivative-basedreformulationinthe2Ddomain,anda3D formu-lationthatnaturallyaccountsforboththegeometryandtextureof thescene.
TogeneralisetheKBsaliencydetector,weobservethatmuchof its constructionisbasedonavery generalconcept: pointswhose entropyispeakedacrossscaleareregardedassalient.Toillustrate how widely thisconcept maybe applied, we shallformulate the KBsaliencydetectorinamoregeneralmannerforpointslyingin ametricspace.
Tothisend,let Mbe asetandd:M×M→R+ bea metric, i.e.let
(
M,d)
beametricspace.Defineaballofradiusσ
centred atp∈MasBσ
(
p)
:={
x∈M:d(
p,x)
≤σ
}
(5)representingthe setofelementsofM within
σ
ofp. Finally, as-sume amapping Fmaybe constructed fromeach element ofMto an K-dimensional positive vector, i.e.F:M→R+K.
Construct-ingFasaspecificallyvector-valuedfunctionwillallowforbroader applicability where multipleattributesof the dataare takeninto account(e.g.geometryandtexture).
From the above constructions the key components of the KB detectormaybedefined, allowingforgeneralisedKBsaliency de-tection in
(
M,d)
. The probability mass function for an elementp∈M at scale
σ
s is determined by computing a weighted sumovermappings (F)fromall pointsinball Bσs
(
p)
andnormalising: explicitly,thepmfis{
v
ˆ1,σs,...,v
ˆK,σs}
,whereˆ
v
i,σs= q∈Bσs(p)w(
q,p)
Fi(
q)
K j=1 q∈Bσs(p)w(
q,p)
Fj(
q)
(6)where the weighting w(q, p) is constructed to favour points closer to p. A Gaussian weighting is originally suggested by
Shaoetal.(2007)butdiscardedduetoconsiderationsof computa-tional efficiency.However, thisconsiderationdoesnot necessarily holdsincetheweightingsmaybeprecomputed,andrelativegains in efficiency are always application dependent. In thispaper, we usea Gaussian weighting sinceit leadstoa moreprincipled and robustapproach:
w
(
q,p)
=e−||q−p|| σ2
s (7)
With the construction of the pmf (Eq. (6)), the entropy of a pointp∈Matscale
σ
siswelldefined,andisthesameasEq.(2):H
(
p,σ
s)
=− K i=1 ˆv
i,σslog ˆv
i,σs (8)Subsequently the inter-scale saliency, W(p,
σ
s), is defined as in Eq.(4). Finally, the saliency of a point p∈M at scaleσ
s isde-finedas theproduct ofH(p,
σ
s) andW(p,σ
s). Salientpointsaresubsequently clustered by iteratively taking the point with the highestsaliencyvalue (pH) andremoving all other pointswithin Bσs
(
pH)
.As an example, for the 2D KB saliency detector, the metric space is
(
R2,L2)
, representing the image plane under theEu-clidean norm. A ball Bσs
(
p)
is simply a circle of radiusσ
scen-tredatp.The mappingFtakestheintensityofapixelandmaps it tothe index ofthe histogrambin (i.e. ifthe intensityof pixel
p is I(p) then F
(
p)
=(
0,...0,1,0,...,0)
, with a 1 in the I(p)thelement of the vector). However, the more general construction whereFisa multi-valuedfunction allowsforpixelstocontribute tomultiplebins.ThisnotonlyextendstheKBsaliencydetectorto othermodalitiesbutprovidesadditionaladvantages,e.g.for bilin-ear interpolation, or wherepoints havemultiple attributes (such aswhere 3D points contain information regardinggeometry and texture).
Basedontheaboveformulation,thegeneralisedKBsaliency de-tectormaybe appliedtoarangeofmulti-modaldata.Inthenext two subsections, we construct a derivative-based 2D KB saliency detector,aswellasa3D KBsaliencydetectorthatnaturally oper-atesonboththegeometryandtextureofthescene.Inbothcases, theapproaches are elegantly incorporated within the generalised KB saliency framework by simply defining the metric space and constructingthemappingF.
5. Derivative-based2DKadir-Bradysaliencydetector
Theoriginal2DKBsaliencydetectorwasconstructedbasedon thedistributionofpixelintensitiesinaneighbourhoodofapoint. Whilstthis givessome indicationof some ofthe more complex, salientpartsoftheimage,itfailstodetectthegeometricallysalient
aspects.Inparticular,itrarelydetectscorners,forwhichthe neigh-bouringcomplexityofpixelintensitiesvarieslittlewithscale.Asa result,the original 2D KBsaliencydetectorfailstodetect repeat-ablefeatures between2D and3D (seethe resultsinSection 7.5); focusingmoreonthetextureofthesceneratherthanthe geome-try.
Inlight ofthis limitationforthe originalKB saliencydetector andbasedontheprecedinggeneralisation,inthissectionwe pro-poseaderivative-based KBsaliencydetector. Specifically,the his-togrammapping Fis modifiedto be a function of thederivative of the image at any given pixel. This allows for high-derivative pointswithina low-derivativeneighbourhood (e.g.corners)tobe deemed salient; an important outcome in low-textured scenes. However,it ismore generalthana typicalcorner detector, deter-miningsalientpointswhereverachangeinimage derivativewith respecttoscaleoccurs,andavoidingnoisyorrepetitivepartsofthe scene.
Thederivative-basedKB saliencydetectorisformulated as fol-lows: the metric space is
(
R2,L2)
, the same as the original KBsaliencydetector.ThemappingFisafunctionofthederivativeof theimage(specifically,thesecondmomentmatrix).Denotethe in-tensityofapixelpasI(p)anditsderivativesinthexandy direc-tionsasI(p)x andI(p)y respectively.Forafixedscale
σ
,constructFig. 2. Anexampleoffourdistributionsofsecondmomentmatrixeigenvaluesfromtheimage.Thedistributionsonthelefthavearelativelylargeentropyandchange significantlywith scale,andarelikelytohaveahighsaliencyvalue.Conversely,the distributionsonthe right,whilehavingarelativelylargeentropy,donotchange significantlyoverscale,andarelikelytohavealowersaliencyvalue.
Fig. 3. Exampleoutputoftheproposedderivative-basedKBsaliencydetector. Left :Inputimage. Middle :Aheatmapindicatingthemagnitudeoftheeigenvaluesof M (p ). Theintensityofmagentarepresentstherelativemagnitudeofthefirsteigenvalue,withbluerepresentingthesecondeigenvalue. Right :Salientpointsdetectedbasedona histogramoftheeigenvalues.Thesizeofthecirclerepresentsitsscale.
pas: M
(
p)
= q∈Bσ(p) w(
q,p)
−1 × q∈Bσ(p) w(
q,p)
I(
q)
2 x I(
q)
xI(
q)
y I(
q)
xI(
q)
y I(
q)
y2 (9)wherew(q,p)isaGaussianweightingfunctiondesignedtofavour pointsclosertop,e.g.w
(
q,p)
=e−q−p2
2σ2 .Inconstructingthe
ma-trix,we capthederivativesat50pixelsto givea more perceptu-allymeaningful approachthat favours all largechanges in image derivativetothesameextent.
For constructingthe derivative-based KB saliencydetector,we areinterestedintheeigenvalues
λ
1 andλ
2 ofM(
p)
thatdescribethederivativeoftheimage. Inqualitativeterms,when
λ
1 andλ
2are both large, p is a corner; when
λ
1λ
2,p isan edge; andotherwisephaslittlechangeinderivativeinanydirection.To con-structthehistogrammappingF,theeigenvaluesofM
(
p)
ofall pix-elsontheimagearenormalisedanddiscretisedtolieinarD×rDhistogram. Subsequently, F maps the eigenvalues of M
(
p)
to the binsoftherD ×rDhistogram(hence,thecodomainofFisR+r2 D).
Bilinearinterpolationisperformed,meaningatmostfourelements ofFwillbenon-zero.
An example of histograms constructed using the proposed derivative-based2D KB saliencydetectorisgivenin Fig.2,anda heatmap of the relative magnitudes of the eigenvalues of M
(
p)
alongside theoutput ofthe proposed detectoris given inFig. 3. Itcan beseenthat theapproachdetects salientpointswherethe histogramofeigenvalueschangeswithrespecttoscale.Thisallows ittodetectarangeofderivative-basedstructureswithinthescene whilenaturallyavoidingtherepetitiveareas.
6. The3DKadir-Bradysaliencydetector
For3DKBsaliencydetection,we shalldefinethemetricspace and histogram construction from Section 4. Such a general for-mulation allows for a large range of potential implementations; of note is its applicability to both textureless and textured 3D data within the same framework. More concretely, we may use a histogram mapping F that describes both the geometry and the texture of 3D data, rendering it equally applicable regard-less of whether the 3D data is textured. In this section, we describe the histogram construction based purely on geometry (Section6.1),ontexture(Section6.2),andonboth(Section6.3).An exampleofhistogramsconstructedusingeach approachisshown inFig.5.
Regardless of histogram construction, the metric space used hereis simply
(
R3,L2)
,i.e. considerall pointsto lie in3D spaceundertheEuclideannorm.Ifthe3Ddatawereameshthegeodesic distancemaybeused instead,howeverthisisslowerto compute andnotaswidelyapplicable.
6.1. Geometry-based3DKBsaliencydetector
Initially, we describe the approach takenbased purely on the geometryofthe3Ddata.Todoso,weprojectthelocalsurfaceof the 3D data to an image andapply the same techniquesas per-formedpreviously (constructionofthesecond momentmatrix); a similar approach has been taken for the construction of the 3D Harris corner detector (Sipiran and Bustos, 2010). The image is taken to be a tangent plane to the 3D data, and the ‘intensity’ value of theimage representsthe distanceof the3D data tothe plane.We takeapurely derivative-basedapproachinthis subsec-tion; an intensity-based geometric KB detector maynot be con-structedsincethe‘intensity’valueofeverypointontoitsown tan-gentplaneisalwayszero.
Our derivative-based geometric KB detector is more formally constructedasfollows:forapointp∈R3,firstdeterminea
least-squaretangentplaneatp.Constructanorthonormalframeforthe tangent plane as{t1, t2, n},where nis thenormal to theplane.
Then, forafixed scale
σ
,considertheneighbouringset ofpoints {q∈Bσ(p)}.Projecteachpointontotheplane,yieldinglocal(u,v) coordinates((
q−p)
·t1,(
q−p)
·t2)
anddefineits‘intensity’value I(q)asthe directionaldistance fromqto theplane, computedas(
q−p)
·n.Thesecondmomentmatrixmaythusbeconstructedin thesamewayasSection5as:N
(
p)
= q∈Bσ(p) w(
q,p)
−1 × q∈Bσ(p) w(
q,p)
I(
q)
2 u I(
q)
uI(
q)
v I(
q)
uI(
q)
v I(
q)
v 2 (10)where,similarly to Section 5,w
(
q,p)
=e−||q−p|| 22σ2 .The eigenvalues
ofN
(
p)
aresubsequentlyusedinthehistogrammappingF,inthe samemannerasperformedpreviously.Notethattheorthonormal frame{t1,t2,n}isnotunique-thereisambiguityinthedirectionsoft1andt2.However,theeigenvaluesofN
(
p)
arerotationallyin-variantandthereforethisambiguitywillnotaffectthedesired out-come.Hence,wehaveavoidedtheneedtoconstructauniqueand unambiguous orthonormal referenceframe thatoften plagues 3D featuredetectors(Guoetal.,2013;PetrellianddiStefano,2012).
However,thederivativesI(q)uandI(q)vrequiredinEq.(10)may
not be estimatedas easily asfor the 2D detector, where the in-tensity values of a pixel’s immediate neighbours may be used to determine the derivative. Instead, we compute a Gaussian weighted average froma set of neighbouring points, similarly to
Zaharescuetal.(2012).TocomputethederivativesI(q)u andI(q)v
froma non-uniformlysampled setof 2D points{r ∈Bσ(q)} each with intensityI(r);firstly, denote thederivative forthe 2D point
q as g := (I(q)u, I(q)v). Then note that, fora point r lying
suffi-cientlyclosetoq,thefollowingrelationshipholdsbydefinitionof thederivative:
gT
(
q−r)
≈I(
q)
−I(
r)
(11)WemayuseEq.(11)todeterminegbysolvingtheweighted least-squaresequation: argmin g r∈Bσ(q) w
(
r,q)
I(
q)
−I(
r)
−gT(
r−q)
2 (12)where w(r, q) is a Gaussian of small variance, e.g. w
(
r,q)
= e−||r−q||2
2
(
σ2)
2 sothatthelocalderivativeestimatesofI(q)arecomputedover a tighter region than that fromwhich N
(
p)
is constructed.Eq. (12) is solved by ‘stacking’ each weighted equalityin (11) to forman over-determinedsystemofthe formAg=b, fromwhich theleast-squaressolutionto(12)isgivenbyg=
(
AT A)
−1 AT b.Subsequently, computing the gradient (I(q)u, I(q)v) for every
neighbouring point projected onto the tangent plane allows for thematrixN
(
p)
tobeconstructedanditseigenvaluestobe com-puted.ToconstructthemappingF,the eigenvaluesofN(
p)
ofall pointsin thedataarenormalised anddiscretisedto lieina rG× rGhistogram,wherebilinearinterpolationisperformed.Anexam-pleoftheproposedgeometry-basedKBsaliencydetectorisshown inFig.4alongsideaheatmapoftheeigenvaluesofN
(
p)
.The ap-proachdetectsa rangeofgeometricallysignificant structuresina scale-invariantmanner,whileavoidingthemorerepetitiveareasof themodel.6.2.Texture-based3DKBsaliencydetector
We propose two texture-based3D KB detectors: an intensity-based approach and a derivative-based approach, both of which willbeevaluatedinSection 7.5.Fortheintensity-basedapproach, the mappingF is exactlythe same asin the original 2D KB im-plementation: taking the intensity of a point to its histogram bin while applying bilinear interpolation. Where the 3D data is coloured, the greyscale value is computed via the equation I=
0.299R+0.587G+0.114B. Thehistogram isassumed to be ofthe samesize (K) asthe original intensity-based 2D KB implementa-tion.
ToobtainthemappingFforthederivativetexture-based3DKB saliencydetector,we adopt essentiallythe same approachas the geometry-based 3D KB saliencydetector in the previous section. Thelocalsurfaceofthe3Ddataisprojectedontoatangentplane, and the second-moment matrix (Eq. (10)) may be constructed again. However, rather than using the intensity value of a pro-jectedpoint I(q) asthedirected distancebetweenq andthe tan-gentplane,thegreyscalevalue ofthepointqisusedinstead.The intensitydifferences (I
(
r)
−I(
q)
) in Eq.(12) are cappedbetween−50and50pixels,similarlytothe2DapproachinSection5,soas togive a moreperceptuallymeaningful distance.Theeigenvalues ofN
(
p)
(whereI(q)representstheintensityofpointq)are subse-quentlynormalisedtolieinarD2 histogram.6.3.Geometryandtexturebased3DKBsaliencydetector
Our framework naturally allows for the extension to detect salientpointsbasedonboththegeometryandtexture.Giventhat thetwohistogramsmaybeconstructedbasedonthegeometryor thetexture,theirjointhistogrammaybeconstructed.Theintensity texture-basedKB detectormay be combined withthe geometry-basedKBdetectorto producea KrG2 histogram.Alternatively, the
derivative texture-based KB detector may be combined with the geometry-basedKBdetector,toproducearD2rG2 histogram.
Bilin-earinterpolationisagainperformedinthesehistograms.
Anexampleofhistogramsconstructedbasedon thegeometry, derivative-based texture, and both, is shown in Fig. 5. The his-togramsbasedonbotharethejointhistogramofthegeometryand the derivative-based texture histograms.They are relatively large and,ingeneral, sparse;exhibitingavery highentropy onlywhen causedby boththegeometryandtexture.However,thisapproach isableto detectsalientpointsbasedon eitherthegeometryand texture, sincein eithercasea relatively highentropyis observed ataparticularscale.
7. Experimentalevaluation
In this section we evaluate the performance of our proposed generalised salient point detector against other approaches, with both2Dand3Ddata.Qualitativeandquantitativeresultsaregiven, wherethefinal aimistodetecthighlyrepeatable,sparsefeatures
Fig. 4. Exampleoutputoftheproposedderivative-basedKBsaliencydetector. Left :Input3Ddata.Middle :Aheatmapindicatingthemagnitudeoftheeigenvaluesof N (p ). Theintensityofmagentarepresentstherelativemagnitudeofthefirsteigenvalue,withbluerepresentingthesecondeigenvalue. Right :Salientpointsdetectedbasedona histogramoftheeigenvalues.Thesizeofthesphererepresentsitsscale.
Fig. 5. Anexampleofthederivative-basedhistogramdistributionsfrom3Ddatawhenconsideringgeometry,texture,andboth.Thepointontherighthasalargedistribution ofeigenvaluesbasedontexturebutnotbasedongeometry,whereasthepointonthelefthasarelativelylargerdistributionofeigenvaluesbasedongeometry(aswellas texture).Inbothcases,theresultingjointhistogram(basedongeometryandtexture)isrelativelysparse.
between2D and3D,that maybe ofuseinthesubsequent regis-trationstage.Forcomparisonagainstourapproaches,thereexista largenumberoffeature detectorsinboth 2D and3D (Guo etal., 2014;TuytelaarsandMikolajczyk,2008),howeverwefocus specifi-callyoncomparingagainstfeaturedetectorsthatmaybe meaning-fullyconstructedinboth 2D and3D. Weshall firstintroduce the detectorsineach modalitybefore describinghow they are evalu-ated:firstlybetween2Dand2D,andsecondlybetween2Dand3D. In 2D,we considerfivedetectors.Firstly, thetraditional Harris
cornerdetector(MikolajczykandSchmid,2004).However,itis ob-servedthat, forsmallnumbers offeatures,Harrisdoesnotdetect a suitable spread offeatures, with manycorners detected inthe samearea (see Fig.9). Therefore,we secondly evaluate theGood FeaturestoTrackalgorithm(GFT) ShiandTomasi(1994)toobtain a better, morerepresentative set of corners. Thirdly,we evaluate against the state-of-the-artSIFT detector (Lowe, 2004). The final twodetectorsevaluatedaretheproposed derivative-basedKB de-tector (Section 5), referred to asKBD, andthe original intensity-basedKBdetector(Shaoetal., 2007)(referredto asKBI) soasto experimentallyjustifytheconstructionoftheproposedKBD detec-torformulatedinSection5.
In3D,thereareoptionaldetectorsavailabletocompareagainst depending upon if the texture of the data is used. For untex-tured 3D data, we consider four detectors: Harris (Sipiran and Bustos, 2010), SIFT, SURE1(Fiolka et al., 2012) and the proposed
derivative-basedgeometricKBdetector(Section6.1),referredtoas
KB-G.In3D,Harrisisnotscale-invariantandperformsnon-maxima suppression,thereforetypically detectsa better spreadofcorners in 3D than its 2D counterpart; hence there is no need fora 3D GoodFeatures toTrackdetector.Foruntextured3D data,SIFT de-tects keypoints based upon the mean curvature, andwill be re-ferred to as SIFT-G. Both Harris and SIFT-G are implemented in PointCloudLibrary.2 Harrisisextendedto 3D(Filipeand Alexan-dre, 2014) by replacing image gradients by surface normalsfrom which a3D covariance matrixisconstructed. The responsevalue isthen afunction ofthedeterminantandtrace ofthecovariance matrix(similarto2D).SIFTisextendedto3D(Hänschetal.,2014) using either the curvature of a point orthe intensity(if the 3D
1Codeavailablefromhttps://github.com/torstenfiolka/sure3d 2http://pointclouds.org/
point cloud is textured).A Difference-of-Gaussians(DoG) maybe applied solely on this attribute of the point cloud (curvature or intensity) that does not change the position of the points. Local maximaandminimamaythenbefoundbycomparingtoapoint’s
k-nearest neighbours,subsequentlypointswithlow curvatureare rejectedastheyaredeemedunstable.
Fortextured3Ddata,thereareadditionaldetectorsthatmaybe evaluatedagainst.SIFTmaydetectfeaturesontextureddatabased on the intensity (referred to as SIFT-T). Alternatively, the KB ap-proachesmaybeusedtodetectfeaturesbasedpurely onthe tex-ture, with the intensity-based KB detector referred as KBI-T and the derivative-based KBdetectorfortextured 3D datareferred to asKBD-T.Onlythe KBapproachesallow forboththe textureand geometry to be combined (Section 6.3), referred to asKBI-B and
KBD-B.
From the above 2D feature detectors (Harris, GFT, SIFT, KBI, and KBD) we firstly evaluate their repeatability in a 2D-2D sce-nario (Section 7.4).Subsequently, alongsidethe 3D feature detec-tors(Harris,SIFT-G,KB-G,SURE,SIFT-T,KBI-T,KBD-T,KBI-B,and KBD-B) weevaluatetheir repeatabilitybetween2Dand3D.For untex-tured3Ddata,weusesix2D-3Dpointcombinations:Harris-Harris, GFT-Harris,SIFT-SIFT-G,KBI-KB-G,KBD-SUREandKBD-KB-G.For tex-tureddatathereareafurtherfive2D-3Dcombinations:SIFT-SIFT-T, KBI-KBI-T, KBD-KBD-T; andwhere both geometryand texture are considered by KB:KBI-KBI-B andKBD-KBD-B.Thus, wherethe 3D dataistextured,atotalof112D-3Dfeaturedetectorcombinations willbeevaluated,tocomparetheeffectsofconsideringthe geom-etry,texture,orboth,ofthetextured3Ddata.
7.1. Implementationdetails
FortheproposedKBdetectorstwoparametersareuser-defined: the number of bins for the mapping F (K, rD and rG), and the
number andrange of scales (
σ
s). Forthe number of bins ofKBIwe take K=16in both 2D and 3D. Forthe proposed derivative-basedapproaches(KBD)weuserD=rG=4;hence,bothKBI-Band KBD-B have the same total number of bins of 256. The number ofscales is12 inall cases.Fortherangeofscales in2D we take
σ
1=3 withσ
s=3+σ
s−1. This is similar to the parameters ofShaoetal.(2007)whoseexperimentsshowthatagapof3pixels betweenscales performedthebest.In 3D,the scaleisdefinedin proportiontothesizeofthemodel.First,denotethelengthofthe diagonaloftheboundingboxofthemodelasL.Then,forthe syn-thetic data,
σ
1=0.004L whereasσ
1=0.003Lfor realdata (sincefeatures are relatively smaller for the more complex real data). Subsequentscales aredefinedby
σ
s=sσ
1,thesameasthemeshsaliencyapproachbyLeeetal.(2005).Indeterminingthe param-eter
σ
1inboththe2Dand3Dcase,werunexperimentstojustifyour choice ofparameters (shown in the appendix). Forthe con-structionofmatricesM
(
p)
andN(
p)
inEqs.(9)and(10),thesize oftheballBσ(p)istakentobeσ
=5.For a fair comparison, the other approaches (SIFT, GFT, Harris, and SURE) are altered, where possible, to align with these user-defined parameters. For SIFT in 2D the parameters provided by Vedaldi and Fulkerson (2008) are used and by
Mikolajczyk et al. (2005) for Harris; and the parameter for GFT
is definedsuch that no twocorners are within 16 pixels ofeach other.In3D, thefixed scaleofHarrisissetto
σ
1,andforSIFT-G, SIFT-T,andSURE,12scalesareused,withthesmallestsettoσ
1. 7.2. DatasetsThree datasets are used: a 2D-2D dataset from Mikolajczyk etal.(2005)(showninFig.6);asynthetic2D-3Ddataset(shown inFig.7);andareal2D-3Ddataset(showninFig.8).
The2D dataset istakenfromMikolajczyketal.(2005). Itis a setofsixgroupsof siximages,withthe knownhomography be-tweeneach imageinagroup provided.Eachgroupofimageshas undergoneacertaintransformation(blurring,scale,JPEG compres-sion, lighting, and viewpoint (twice)), from small to large trans-formations.The firstandlast imagesin eachgroup are shownin
Fig.6.
Forsyntheticdata,we usesixuntextured 3Dmodels.The first four models in Fig.7 are fromthe Stanford 3D Scanning Repos-itory.3 For each of these four models, 50 images were rendered
using POV-Rayusing a random rotation matrix (Arvo, 1992) and translationsuch that the modelis centred in the image, using a point light sourceat thesame location asthecamera. The latter twomodelsarethe3DreconstructionprovidedbyGuillemautand Hilton(2011)ofthedinosaurandtemplefromMiddlebury’s multi-viewreconstructiondataset(Seitzetal.,2006).Inthiscase,50 im-ageswiththeir knownprojectionmatrixfromthemodelare pro-videdaspartofthedataset,sothereisnoneedforrenderingusing POV-Ray.
For real data (Fig. 8), we use five textured 3D models, obtained by a colour LiDAR scanner. All have been obtained from Kim (2014) with the exception of room, which is from
Klaudinyetal.(2014).The numberofpointsand thedimensions ofthe3Dmodelsistabulatedbelow(Table1):
Foreach model,a setofbetween 7and11images havebeen takenofthescene andmanuallyaligned. This hasbeenachieved bypickingpairsofimageandscenepoints,andusingtheapproach by Penate-Sanchez etal. (2013) to determine the pose and focal lengthofthecamera.Anexampleimageofeachmodelisshownat thebottomofFig.8.Notethatforcertainmodelsthisdoesnot en-capsulatemuchofthescene(e.g.courtyard),making2D-3Dpoint detectionmoredifficult.
7.3.Evaluationmeasure
Theperformanceofapointdetector(eitherin2D-2D,orin 2D-3D)ismeasuredbyitsrelativerepeatability.Todefinethis,weshall firstdefinetherepeatabilitybetweentwosetsofpoints(2D-2Dor 2D-3D)asfollows:firstapplytheknowntransformation (homogra-phy,orprojectionmatrix)toonesetofpoints,discardinganythat donot lie within theimage boundary ofthe other set ofpoints. For2D-3D evaluation, occlusions may be handled in the case of the synthetic 2D-3D dataset, the 3D mesh is known and hence occludedpointsmay be discarded;howeveroften real datais in theform ofa point cloud andthisis not possible. From one set of2D points
{
pi∈R2}
Ni=1 andthe other setoftransformedpoints{
qi∈R2}
Mi=1(transformedunderahomography,oraprojectionma-trix),andgivenaninlierthresholdt,defineaninlierasapointpair (p,q)forwhichi)thenearestneighbourtopfromtheset
{
qi}
Mi=1isqandvice-versa;andii)
||
p−q||
<t.Therepeatabilityis subse-quentlydefinedasthenumberofinliersdividedbymin(N,M).Ithasbeenobservedintheliterature(e.g.HauaggeandSnavely, 2012;Tombari etal., 2013b)that the repeatabilitymeasure is bi-asedtowardsdetectorsthatproducea lotoffeatures,anda mea-sure that is invariant to the number of points detected is pro-posed.Therefore,wecomputetherelativerepeatability:foreachset ofpoints,orderthem indecreasingvalueoftheir responsevalue. Then,therepeatabilitymaybe determinedfromthetop-kpoints, andagraphmaybeplottedofrepeatabilityagainstthekmost re-sponsivefeatures ineach set.Furthermore, thisis a more useful measureforthepurposesofsparse2D-3Dregistration,wherelarge numbersof featureswill not be ofusedueto thecomputational complexityofsucharegistrationproblem.
Fig. 6. Examplesinthe2D-2Ddatasetfromsixgroupsofimagetransformations.Foreachgroup,therearesiximagesinthedatasetrangingfromsmalltolarge transfor-mations,withthefirstandlastimagesineachgroupshownhere.
Fig. 7. Top :The3Dmodelsusedinthesynthetic2D-3Ddataset. Bottom :Anexampleimagefromeachsyntheticmodelusedinthedataset.Fromlefttoright:armadillo, buddha,bunny,dragon,dino,temple.
Fig. 8. Top :The3Dmodelsusedinthereal2D-3Ddataset. Bottom :Anexampleimagefromeachmodelusedintherealdataset.Fromlefttoright:cathedral,courtyard, reception,room,studio.
Table 1
3Dmodelsinformation.
cathedral courtyard reception room studio
Numberofvertices 522,018 672,342 772,536 524,873 348,592 Boundingboxdiameter(m) 67.2 27.9 17.6 5.34 7.80
Fig. 9. Qualitative2Dresults.Thetop-150featuresareshownineachcase.Thetoptwoimagesarefromthesynthetic2D-3Ddataset,thirdtofifthfromthe2D-2Ddataset (Mikolajczyketal.,2005),withthebottomfourfromthereal2D-3Ddataset.Manyimageswerecroppedfromtheiroriginaldatasetforeaseofpresentationinthisfigure.
increasing blur 2 3 4 5 6 0 scale changes 1 1.5 2 2.5 3 0 JPEG compression % 60 70 80 90 100 0 decreasing light 2 3 4 5 6 repeatability % 0 5 10 15 20 25 30 light (leuven) viewpoint angle 20 30 40 50 60 repeatability % 0 5 10 15 20 25 30 viewpoint (graffiti) viewpoint angle 20 30 40 50 60 repeatability % 0 5 10 15 20 25 viewpoint (wall) Harris SIFT GFT KBI KBD
Fig. 10. Quantitative2D-2Dresultsacrossarangeofimagetransformations.Therelativerepeatabilityismeasuredforthetop-100pointfeaturesineachcase.Aninlier thresholdof3pixelsisused.ExampleimagesfromthisdatasetareshowninFig.9.
7.4.2Dpointdetection
Qualitative results for the set of five 2D point detectors are showninFig.9,foraselectionofimagesacrossthethreedatasets used. It is immediately noticeable, by the size andshape of the features, that Harris is affine- and scale-invariant; SIFT, KBI and
KBDarescale-invariant,andGFTisneither,beingavery parameter-dependentapproach.SIFT,andinparticularHarris,evidently have atendencytodetectthesamefeatureatmultiplescalesandvery similarlocations:thismotivatedtheuseofGFTtoobtainabetter spreadoffeatures(Section7).KBIandKBDnaturallydetectabetter spreadofpointsthanHarrisandSIFT,whileretaininga parameter-freeapproachtoscaleselection.
As a qualitative comparisonbetween theKB approaches; KBD
detects more corners than KBI (e.g. on the cathedral) while still detectingblob-likestructures (e.g.windows inthethirdfromtop image)duetothenecessary changeinderivative presentinsuch features.Incontrast,KBIdoesnotdetectaswide arangeofpoint featuretypesasKBDandoftendetectsmanyedges(e.g.the cathe-dral).Whileedgesmayberegardedassalient,apointonanedge ispoorlylocalisedalongtheedgeandisnotusefulforregistration purposes.
Quantitative resultsfor2D pointfeaturedetectors aregivenin
Fig.10forthe2D-2Ddataset(Fig.6).Thetop-100featuresare de-tectedin each image,andan inlier threshold of3pixels is used. It is observed that no feature detector performs the best across alltransformations.Harrisperformsparticularlywellforscaleand JPEG compression changes, but very poorly across a change in
viewpoint. GFT generally performs very well across the range of transformations.Importantly, KBDoutperformsKBIacross a num-ber of transformations, justifying our proposed reformulation of the2DKBdetector.
7.5. 3Dpointdetection 7.5.1. Qualitativeresults
Qualitative results for the 3D feature detectors are shown in
Fig.11forsyntheticdataandFig.12forrealdata.
Fortheuntexturedsyntheticdata,Harris,SIFT-G,KB-G,andSURE
maybeused.InFig.11,thescale-covariantHarrisdetector success-fullydetectsanumberofsmall-scalecornersbutofteninrepetitive places(e.g.thelegofthearmadillo).KB-Gismorerobustthan SIFT-G,detectingawiderrangeofpoints,e.g.onthearmadilloanddino. By contrast, SIFT-G has a tendency to detect smaller, less mean-ingfulfeatures,e.gonthebunny.SUREtypicallydetectscorner-like structureswherethereisawidedistributionofnormals,however it oftendetects large features and missessmaller corners e.g. on thedragon.Asacomparisonbetweenfeaturesdetectedin3Dand the qualitative2D results (Fig. 9);3D Harriscorrelates quite well with2DGFT,howeverit isclearthescale-covarianceofGFTisan issueon thedragon.SIFT and SIFT-Goftendo notdetect geomet-rically meaningful entities, with some 2D SIFT features detected off the model.KBIandKBDhavesome qualitative correlation with
KB-G,butKBIoftendetectsedgesandavoidscorner-likestructures (particularlysoonthedino).
Fig. 11. Qualitative3Dresultsforallmodelsfromthesyntheticdataset.Thetop-200pointsareshownineachcase.Thesizeofthesphereindicatesthescaleofthepoint.
Qualitative results for real data are given in Fig. 12, where pointsaredetected basedongeometry(Harris,SIFT-G,KB-G), tex-ture(SIFT-T, KBI-Tand KBD-T), orboth (KBI-BandKBD-B). Similar conclusions may be drawn from the geometry-based approaches asforthesyntheticresults(Fig.11):Harrisislimitedbyits scale-covariance, KB-G is generally more robust than SIFT-G, and SURE
typically detects larger features and misses the finer detail. For texture-based detectors,few qualitative distinctions can be made
between SIFT-T and KBD-T, however KBD-T detects more textural corner-like structures than SIFT-T (the same as in 2D in Fig. 9). Similarly to the 2D results, KBI-T detects more edge-like struc-tures - particularly on the pavement on the cathedral. Interest-ingly, texture-basedfeature detectors often detect geometrically-significantfeatures(e.g.cornersonthecathedral,andthetable-leg intheroom) dueto anaturalchangeincolouronthemodel sur-face,orthe lightingconditions.Finally,itis clearthatboth KBI-B
Fig. 12. Qualitative3Dresultsforcathedralandroomfromthesyntheticdataset.Thetop-400pointsareshownineachcase.Thesizeofthesphereindicatesthescaleof thepoint.
Fig. 13. Resultsontheuntexturedsyntheticdataset.Eachgraphshowstherelativerepeatabilityofthedetectorsforeachdataset,fork=20,40,60,80,100.Thegraphsare orderedsuchthatagraphofinlierthreshold3pixelsisshownabovethatofinlierthreshold6pixels.
Fig. 14. Qualitative3Dresultsforvaryingquantitiesoffeaturesonthearmadillomodel.TheleftshowsresultsforGFTandHarris,withKBDandKB-Gontheright.
andKBD-Bdetect pointsbased onboth thegeometry (cornersof thecathedral)andtexture(carpetandpictureinroom).
7.5.2. Quantitativeresults
Quantitativeresultsforthesyntheticdatasetarepresentedfirst. Foreachmodel-image pair,the relativerepeatabilityis computed using the top-k 2D points and the top-2k 3D points(since it is expectedhalf the 3D points will be occluded by the model), for
k varying between 20 and100. It is computedfor inlier thresh-olds(t) of3 and 6 pixels and averaged across all images of the model.Resultsare giveninFig. (13), where,giventhe3D data is untextured,acomparisonismadebetweenHarris-Harris, SIFT-SIFT-G,GFT-Harris,KBD-SURE,KBI-KB-G,andKBD-KB-G.
Itisobservedthat,ingeneral,GFT-HarrisandKBD-KB-Gperform thebest;betweenthemhavingthehighestrepeatabilityacrossall sixmodels.Bothhaverepeatabilitiesofatleast30%for(relatively) large numbers of points; sufficiently high for subsequent 2D-3D registration. KBI-KB-G performs quite well, but never as well as
KBD-KB-G.This isperhaps surprsing incomparisonto the results ofKBIonthe2D-2Devaluation(Fig.10)-thederivative-based KB formulationis evidently moreindicative ofgeometry ratherthan texturebasedontheseresults.Harris-Harris,SIFT-SIFT-G,KBD-SURE, andKBI-KB-Gperformsimilarlypoorly,rarelyobtaininga repeata-bilityofabove 20%. Comparingbetween 3pixels and6 pixels as theinlierthreshold;GFT-Harrisperformsslightlybetterthan KBD-KB-G for the smaller threshold, the reverse is true of the larger threshold.However, the increase in inlier threshold from 3 to 6 typicallyresultsinarepeatabilityincreasebyafactorofaround2, regardlessofdetectorordataset.
Fig. 13showsthat, ingeneral, therepeatability increaseswith respecttothenumberofpointsdetected.However,thisisnotthe casewithGFT-Harris which, in some circumstances, showsa de-creaseinrepeatabilityforincreasing numbersofpoints- particu-larly so onthe armadillo,and toa lesser extent onthe dino and
dragon. Fig.14showsqualitativeresults onthearmadillofor GFT-HarrisandKBD-KBGforsmallerquantitiesofpoints.Forverysmall quantitiesofpoints(20in2Dand40in3D)GFT-Harrishasahigh correlationduetotherelativelysmallnumberofwell-defined cor-nersonthe model(toes,fingers, andears)andhencetherelative easeatwhichtheyaredetected byacornerdetector.Forahigher quantityoffeatures(60in2Dand120in3D)thereareinsufficient cornersin the sceneandso it becomesunclear whycertain fea-turesshould bedetectedby thecornerdetectors.Bycontrast,our
saliency-basedapproachismorebroadlydefinedthanacorner de-tector allowing KBDandKBG to admit awider rangeof features. Asaresult,itisrelativelyunlikelyourapproachwillhaveahigher repeatabilityforsmallnumbersoffeatures(sincesalientpointsare notasnarrowlydefinedascornerpoints)butconverselythe defi-nitionofsaliencyextendstolargernumbersoffeatures.
Next,quantitativeresultsfortherealdatasetarepresented.For eachmodel-imagepair,therelativerepeatabilityiscomputedusing thetop-k2D pointsandthetop-2k3D points,withtheexception ofthelargercourtyardandreceptiondatasetswherethetop-4k3D pointsare used, sincehere itis expectedthe majorityof the 3D pointswillnotbeprojectedontotheimage.kisvariedupfrom20 to200.Similarlytothesyntheticdataset,therelativerepeatability iscomputedforinlierthresholdsof3and6pixels.
Results are presentedin Fig.15, wherea comparison ismade between all 11 approaches (as described at the beginning of
Section 7).Betweenthedifferentmodels,the bestresultsare ob-tainedonreceptionandroom,withrepeatabilityratesofover30% in some cases. However, the other three models only obtain re-peatability rates of between15% and25%. Between the different point detectors,KBD-KBD-TandKBD-KBD-Bgenerallyperform the best acrossall models. GFT-Harris performs nearly aswell except onthemoretexturedmodelsroomandstudio. KBI-KBI-Tmore of-tenoutperformsKBI-KBI-B,furtherdemonstratingthatKBIdoesnot detectgeometricallysignificantfeaturesin2D.Similarlytothe syn-theticdataset,SIFT-SIFT-GHarris-Harris,andKBD-SUREdonot per-formwellingeneral.
Asacomparisonbetweenthemethodsproposedhere( KBD-KB-G,KBD-KB-T,andKBD-KB-B),KBD-KB-Ggenerallydoesnotperform aswellexceptonthecathedralmodel.Itisperhapssurprisingthat
KBD-KB-Tconsistentlyperformswell,particularlyoncourtyardand
reception wherethere is littlediscriminating texture; howeveras observedinthequalitativeresults,geometricfeaturesareoften ac-companiedbyachangeintexture.Furthermore,thescaleselection processwithin theKBdetectorallowsittonaturallyavoid repeti-tive partsofa scene.KBD-B consistently performswell regardless ofthescene,outperformingtheotherapproachesonthecathedral
andstudio.
8. Conclusionsandfuturework
In this paper we have presented a general approach to 2D-3D salient point feature detection, based on the
information-Fig. 15. Resultsontherealdataset.Ontheleftshowstherelativerepeatabilityofthedetectorsforaninlierthresholdof3pixels;ontherightaninlierthresholdof6pixels isused.kvariesbetween20and200.Thegraphsareorderedsuchthatagraphofinlierthreshold3pixelsisshownabovethatofinlierthreshold6pixels.
tectfeaturesbasedonbothattributessimultaneously,increasingits robustnessandwideningitsapplicability.
There is scope forimprovement in our method; inparticular, the qualitative results show our approach to occasionally detect edges assalient. While theremay be some salientproperties re-gardingtheedges, apoint onan edge isnot well localisedalong theedgeandmaynot be asusefulforgeometryestimation.This could be addressed in a similar manner to Tombari and di Ste-fano(2014)wherehistogramsarecomparedbetweenneighbouring points,ratherthanbetweenneighbouringscales.Alternatively,one mayconsiderotherattributestoconstructahistogramfrom,other thanthefirstderivativesoftheimage.However,whilethesecond derivativesoftheimage havehadconsiderablesuccessinfeature detectionviaSIFT(Lowe,2004),theblob-likefeaturestheydetect aregenerallymoreindicativeoftextureratherthangeometry.
Futureworkwillincludetheregistrationofpointsbetween im-agesand3DLiDARdata.Inmanycases,correspondencesbetween features cannot be automaticallydetermined, andneed to be es-tablishedalongsideregistrationparameters.Itisacomputationally expensiveproblem(Moreno-Noguer et al., 2008), so any method that hasa highrepeatability fora smallernumber ofpoints will be moresuited to thiskindof problem. We furthermore planto
Klaudinyetal.(2014); andfortheCathedral, Courtyard, Reception, andStudiodatasetsat(Kim,2014).
Acknowledgements
Thisworkwassupported bythe EngineeringandPhysical Sci-encesResearchCouncil(grantnumberEP/K503186/1)andthe Eu-ropeanCommissionFP7IM-PARTpro-ject(grantnumber316564).
AppendixA. Effectofscaleparametersettingonrepeatability
Herewe presentrepeatabilityresultswhen varyingthe choice of
σ
1 inboth 2D and3D.The resultsin2D are showninFig.16comparingresultsforKBOandKBDonthe2D-3Dsyntheticdataset. TheresultsforKBOshowsomevariabilitydependingonthechoice of
σ
1,withbetterresultsobservedonthebuddhaandthedragonwith
σ
1=4butthischoiceofparametergaveworseresultsonthe dino.Thechoiceofσ
1makesverylittledifferenceonKBDhowever.Theresultsforvarying
σ
1in3DaregiveninFig.17.Thechoiceof
σ
1 affectsthedifferentapproachesina verysimilar way,withσ
1=0.3%thediameteroftheboundingboxgivingthepoorestre-sultsand
σ
1=0.5%givingslightlystrongerresultsthanσ
1=0.4%.Fig. 16. 2D-3Drepeatabilityresultswhereσ1isvariedbetween2,3,and4pixelsin2D.OnlyKBOandKBDareshownherebecausetheother2Dfeaturedetectorsusea