arxiv: v1 [cs.gr] 10 Dec 2018

(1)

Unsupervised Deep Learning for Structured Shape Matching

Jean-Michel Roufosse

LIX, ´

Ecole Polytechnique

Maks Ovsjanikov

LIX, ´

Ecole Polytechnique

Abstract

We present a novel method for computing correspon-dences across shapes using unsupervised learning. Our method allows to compute a non-linear transformation of given descriptor functions, while optimizing for global structural properties of the resulting maps, such as their bijectivity or approximate isometry. To this end, we use the functional maps framework, and build upon the recently proposed FMNet architecture for descriptor learning. Un-like the method proposed in that work, however, we show

that learning can be done in a purelyunsupervised setting,

without having access to any ground truth correspondences. This results in a very general shape matching method, which can be used to establish correspondences within shape col-lections or even just a single shape pair, without any prior information. We demonstrate on a wide range of challeng-ing benchmarks, that our method leads to significant im-provement compared to the existing axiomatic methods and achieves comparable, and in some cases superior results to even the supervised learning techniques.

1. Introduction

Shape matching is a fundamental problem in computer vision and geometric data analysis more widely, with ap-plications in deformation transfer [35] or statistical shape modeling [5], to name a few.

During the past decades, a large number of techniques have been proposed for both rigid and non-rigid shape matching [37]. The latter case is both more general and more challenging since the shapes can potentially undergo arbitrary deformations, which are not easy to characterize by purely axiomatic approaches. As a result several recent methods have proposed to consider learning-based tech-niques for addressing the shape correspondence problem, e.g. [21,9,22,44] among many others. Most of these ap-proaches are based on the idea that the underlying corre-spondence model can be learned from data, typically given in the form of ground truth correspondences between some shape pairs. In the simplest case, this can be formulated as a labeling problem, where different points, e.g., in a template

shape, correspond to labels to be predicted [44,23]. More recently, several methods have been proposed for

structured map prediction, which aim to infer an entire map,

rather than labeling each point independently [9,19]. These techniques are based on learning pointwise descriptors, but, crucially, impose a penalty on the entire map, obtained after inference using these descriptors, which results in higher quality, globally consistent correspondences.

At the same time, while learning-based methods have achieved impressive performance, their utility is severely limited by requiring the presence of high-quality ground truth maps between a sufficient number of training exam-ples. This makes it difficult to apply such approaches to new shape classes for which ground truth is not available.

In our paper, we show that this limitation can be lifted and propose a purely unsupervised strategy, which com-bines the benefits and accuracy of learning-based methods with the generality of axiomatic techniques for structured shape correspondence. Key to our approach is a bi-level op-timization scheme, which optimizes given descriptors, but imposes a penalty on the entire map, inferred from them. For this, we use the recently proposed FMNet architecture [19], which exploits the functional map representation [26]. However, rather than penalizing the deviation of the map from the ground truth, we enforce structural properties on the map, such as its bijectivity or approximate isometry. This results in a very general shape matching method, that, perhaps surprisingly, achieves comparable or even superior performance to existing methods, but without any supervi-sion.

2. Related Work

Computing correspondences between 3D shapes is a very well-studied area of computer vision and computer graphics, and its full overview is beyond the scope of our paper. Below we only review the most closely related meth-ods and refer the interested readers to recent surveys includ-ing [39,37,4] for a more in-depth discussion of other shape matching approaches.

Functional Maps Our method is built on the functional

map representation, which was originally introduced in

1

(2)

[26] for solving non-rigid shape matching problems, and then extended significantly in follow-up works, including [17,2,16,30,13,8] among many others (see also [27] for a recent overview).

One of the key benefits of this framework is that it al-lows to represent maps between shapes as small matrices, which encode relations between basis functions defined on the shapes. Moreover, as observed by several works in this domain, [26,17,34,30,8] many natural properties on the underlying pointwise correspondences can be expressed as objectives on functional maps. This includes orthonor-mality of functional maps, which corresponds to the lo-cal area-preservation nature of pointwise correspondences [26,17,34]; commutativity with the Laplacian operators, which corresponds to intrinsic isometries [26], preservation of inner products of gradients of functions, which corre-sponds to conformal maps [34,8,43]; preservation of

point-wise productsof functions, which corresponds to functional

maps arising from point-to-point correspondences [25,24]; and slanted diagonal structure of functional map in the con-text of partial shapes [30,20] among others.

Similarly, several other regularizers have been proposed, including exploiting the relation between functional maps in different directions [12], the map adjoint [15], and pow-erful cycle-consistency constraints [14] in shape collec-tions, among many others. More recently constraints on functional maps have been introduced to promote

conti-nuityof the recovered pointwise correspondence [28] and

kernel-based techniques for extracting more information from given descriptor constraints [42] among others.

All these methods, however, are based on combining first order penalties, that arise from enforcingdescriptor

preser-vation constraintswith these additional desirable structural

properties of functional maps. As a result, any error or in-consistency in the pre-computed descriptors will inevitably lead to severe map estimation errors. Several methods have been suggested to use robust norms on descriptor con-straints [17,16], which can help reduce the influence of cer-tain descriptors but still does not to control the global map consistency properties.

Learning, inc. Deep Learning-based MethodsTo

over-come the inherent difficulty of axiomatic modeling non-rigid shape correspondence, several methods have been pro-posed to learn the correct deformation model from data with learning-based techniques. Some early approaches in this direction were used to learn either optimal parameters of spectral descriptors [21] or exploited random forests [32] or metric learning [10] for learning optimal constraints given some ground truth matches.

More recently, with the advent of deep learning methods, several approaches have been proposed to learning transfor-mations in the context of non-rigid shape matching. Most of the proposed methods either use Convolutional Neural

Net-works (CNNs) on depth maps, e.g. for dense human body correspondence [44] or propose extensions of CNNs di-rectly to curved surfaces, either using the link between con-volution and multiplication in the spectral domain [6,11], or directly defining local parametrizations, for example via the exponential map, which allows convolution in the tan-gent plane of a point, e.g. [22,7,23] among others.

These methods have been applied to non-rigid shape matching, in most cases modeling it as a label predic-tion problem, with points corresponding to different labels. Although successful in the presence of sufficient amount of training data, such approaches typically do not impose global consistency, which can lead to significant artefacts, such as outliers, and require post-processing to achieve high-quality maps.

Learning for Structured PredictionPerhaps most closely

related to our approach are recent works that apply learning for structured map prediction [9,19]. These methods learn a transformation of given input descriptors, while optimizing for the deviation of the map computed from them using the functional map framework, from some known ground truth correspondences. As shown in these works [9,19] impos-ing a penalty on entire maps, and thus evaluatimpos-ing the ulti-mate use of the descriptors, can lead to significant accuracy improvements in practice.

ContributionUnlike these existing methods, we propose

an unsupervised learning-based approach that transforms

given input descriptors, while optimizing for structural map properties, without any ground truth knowledge. Our method, which can be seen as a bi-level optimization strat-egy, allows to explicitly control the interaction between the pointwise descriptors and the global map consistency, com-puted via the functional map framework. As a result, our technique is both scalable with respect to shape complexity and, as we show below, leads to significant improvement compared to the standard axiomatic methods, and achieves comparable, and in some cases superior, performance even to supervised approaches.

3. Background & Motivation

3.1. Shape Matching and Functional Maps

Our work is based on the functional map framework and representation. For completeness, we briefly review the ba-sic notions and pipeline for estimating functional maps, and refer the interested reader to a recent course [27] for a more in-depth discussion.

Basic Pipeline Given a pair of shapes,S1, S2 represented as triangle meshes, and containing, respectively,n1andn2 vertices, the basic pipeline for computing a map between them using the functional map framework, consists of the following main steps (see Chapter 2 in [27]) :

(3)

1. Compute a small set of k1, k2 of basis functions on each shape, e.g. by taking the first few eigenfunctions of the corresponding Laplace-Beltrami operator. 2. Compute a set of descriptor functionson each shape

that are expected to be approximately preserved by the unknown map. For example, a descriptor function can correspond to a particular dimension (e.g. choice of time parameter of the Heat Kernel Signature [36]) computed at every point. Store their coefficients in the corresponding bases as columns of matricesA1,A2. 3. Compute the optimalfunctional mapCby solving the

following optimization problem: Copt= arg min

C12

Edesc C12+αEreg C12, (1)

where the first term aims at the descriptor preservation: Edesc C12 =C12A1−A2 2

, whereas the second term regularizes the map by promoting the correctness of its overall structural properties. The simplest ap-proach penalizes the failure of the unknown functional map to commute with the Laplace-Beltrami operators, which can be written as:

Ereg(C12) = C12Λ1−Λ2C12 2 (2) whereΛ1andΛ2are diagonal matrices of the Laplace-Beltrami eigenvalues on the two shapes.

4. Convert the functional mapCto a point-to-point map, for example using nearest neighbor search in the spec-tral embedding, or using other more advanced tech-niques [31,13].

One of the strengths of this pipeline is that typically Eq. (1) leads to a simple (e.g., least squares) problem with the k1k2unknowns, independent of the number of points on the shapes. This formulation has been extended using e.g. man-ifold optimization [18], descriptor preservation constraints via commutativity [25] and, more recently, with kerneliza-tion [42] among many others (see also Chapter 3 in [27]).

3.2. Deep Functional Maps

Despite its simplicity and efficiency, the functional map estimation pipeline described above is fundamentally de-pendent on the initial choice of descriptor functions. To al-leviate this dependence, several approaches have been pro-posed to learn the optimal descriptors from data [9, 19]. In our work, we build upon a recent deep learning-based framework, called FMNet, introduced by Litany et al. [19] that aims to transform a given set of descriptors so that the optimal map computed using them is as close as possible to some ground truth map given during training.

In particular, the approach proposed in [19] assumes, as input, a set of shape pairs for which ground truth point-wise

maps are given, and aims to solve the following problem:

min

T

X

(S1,S2)∈Train

lF(Sof t(Copt), GT(S1,S2)), where (3)

Copt = arg min

C

kCAT(D1)−AT(D2)k. (4)

HereTis a non-linear transformation, in the form of a neu-ral network, to be applied to some input descriptor functions

D,Trainis the set of training pairs for which ground truth

correspondence GT(S1,S2) is known, lF is the soft error

loss, which penalizes the deviation of the computed func-tional mapCopt, after converting it to a soft mapSof t(Copt) from the ground truth correspondence, andAT(D1)denotes the transformed descriptorsD1written in the basis of shape

1. In other words, the FMNet framework [19] aims to learn a transformationT of descriptors, so that the transformed descriptorsT(D1),T(D2),when used within the functional

map pipelineresult in a soft map that is as close as

possi-ble to some known ground truth correspondence. Unlike methods based on formulating shape matching as a label-ing problem this approach evaluates the quality of theentire map, obtained using the transformed descriptors, which as shown in [19] leads to significant improvement compared to several strong baselines.

MotivationSimilarly to other supervised learning methods,

although FMNet [19] can result in highly accurate corre-spondences in the presence of sufficient training data, its applicability is limited to shape classes for which high-quality ground truth maps are available. Moreover, perhaps less crucially, the soft map loss in FMNet is based on the knowledge of geodesic distances between all pairs of points, which makes it computationally expensive. Our goal, there-fore, is to show that a similar approach can be used more widely, without any training data, while also leading to a more efficient and scalable framework.

4. Our method

4.1. Overview

In this paper, we propose to use a neural network in order to optimize for non-linear transformations of descriptors, in order to obtain high-quality functional, and thus point-wise maps. For this, we follow the same general strategy proposed in the FMNet approach [19]. However, crucially, rather than penalizing the deviation of the computed map from some known ground truth correspondence, we evalu-ate the structural properties of the inferred functional maps, such as their bijectivity or orthogonality. Importantly, we express all these desired properties, and thus the penalties during optimization, purely in the spectral domain, which allows us to avoid the conversion of functional maps to soft maps during optimization as was done in [19]. Thus, in addition to being purely unsupervised our approach is

(4)

also more efficient since it does not require pre-computation of geodesic distance matrices or expensive manipulation of large soft map matrices during training.

To achieve these goals, we modify the FMnet problem, described in Eq. (3) and (4) in several ways: first, we pro-pose to consider functional maps in both directions, i.e. by treating the two shapes as both source and target, second, we remove the conversion step from functional to soft maps, and, most importantly, third, we replace the soft map loss with respect to ground truth with a set of penalties on the computed functional maps, which are described in detail below. This means that the optimization problem we aim to solve can be written as:

min T X (S1,S2) X i∈penalties wiEi(C12,C21), where (5) C12= arg min C kCAT(D1)−AT(D2)k, (6) C21= arg min C kCAT(D2)−AT(D1)k. (7) Here, similarly to Eq. (3) above,T denotes a non-linear transformation in the form of a neural network, (S1, S2) is a set of pairs of shapes in a given collection, wi are scalar weights, and Ei are the penalties, described below. In other words, we aim to optimize for a non-linear trans-formation of some descriptor functions, such that functional maps computed from transformed descriptors, possess cer-tain desirable structural properties, expressed via penalty minimization.

When deriving the penalties used in our approach, we exploit the links between properties of functional maps and associated pointwise maps, that have been established in several previous works [26,34,12, 25]. Unlike all these methods, however, we decouple the descriptor

preserva-tion constraints from structural map properties. This allows

us to optimize for descriptor functions, and thus, gain very strong resilience in the presence of noisy or uninformative descriptors, while still exploiting the compactness and effi-ciency of the functional map representation.

4.2. Penalties

In our work we propose to use four penalties, all inspired by desirable map properties.

BijectivityGiven a pair of shapes and the functional maps

in both directions, perhaps the simplest requirement is for them to be inverses of each other, which can be enforced by penalizing the difference between their composition and the identity. This penalty, used for functional map estimation in [12], can be written, simply as:

E1=kC12C21−Ik2+kC21C12−Ik2 (8)

Orthogonality As observed in several works [26, 34] a

point-to-point map is locally area preserving if and only

if the corresponding functional map isorthonormal. Thus, for shape pairs, approximately satisfying this assumption, a natural penalty to incorporate in our unsupervised pipeline: E2=kC>12C12−Ik2+kC>21C21−Ik2 (9)

Laplacian commutativitySimilarly, it is well-known that a

pointwise map is an intrinsic isometry if and only the asso-ciated functional map commutes with the Laplace-Beltrami operator [33,26]. This has motivated using the lack of com-mutativity as a regularizer for functional map computations, as mentioned in Eq. (2). In our work, we use it to introduce the following penalty:

E3= C12Λ1−Λ2C12 2 +C21Λ2−Λ1C21 2 (10) where Λ1 and Λ2 are diagonal matrices of the Laplace-Beltrami eigenvalues on the two shapes.

Descriptor preservation via commutativityThe previous

three penalties express desirable properties of pointwise correspondences when expressed as functional maps. How-ever, since the space of functional maps is larger that that of pointwise ones, in practice, we would like to penalize functional maps do not arise from any point-to-point cor-respondences. One approach for this has been proposed in [25], where the authors argued that preserving descriptors as linear operators acting on functions through multiplica-tion, both allows to extract more information from given descriptor functions and results in functional maps that are more likely to arise from point-to-point ones.

Following [25], we incorporate this penalty into our ap-proach via commutativity of the functional map with the multiplicative operators, which can be expressed as follows:

E4= X (fi,gi)∈Descriptors ||C12Mfi−MgiC12|| 2 +||C21Mgi−MfiC21|| 2_, Mgi= Ψ + Diag(gi)Ψ,Mfi = Φ + Diag(gi)Φ. (11)

Herefiandgi are theoptimizeddescriptors on source and target shape, obtained by the neural network, and expressed in the full (hat basis), whereasΦ,Ψare the fixed basis func-tions on the two shapes, and+denotes the Moore-Penrose pseudoinverse.

4.3. Optimization

As mentioned in Section4.1, we incorporate these four penalties into the energy in Eq. (5). Importantly, the only unknowns in this optimization are the parameters of the neural network applied to the descriptor functions. The functional mapsC12andC21are fully determined by the optimized descriptors via the solution of the corresponding

(5)

linear systems in Eq. (6) and Eq. (7), and are thus differen-tiable with respect to the neural network parameters. More-over, importantly, all of the penaltiesE1, E2, E3, E4are dif-ferentiable with respect to the functional maps C12,C21. This means that the total energy and thus its gradient can be back-propagated to the neural networkTin Eq. (5), allow-ing us to optimize for the descriptors while penalizallow-ing the structural properties of the functional maps.

5. Implementation & Parameters

Implementation details We implemented our method in

Tensorflow [1] by adapting the open-source implementa-tion of FMNet [19]. Thus, the neural networkT used for transforming descriptors in our approach, in Eq. (5) is ex-actly identical to that used in FMNet, as mentioned in Eq. (3). Namely, this network is based on a residual architec-ture, consisting of 7 fully connected residual layers with exponential linear units, without dimensionality reduction. Please see Section 5 in [19] for more details.

Following the approach of FMNet [19], we also sub-sample a random set of 1500 points at each training step, for efficiency. However, unlike their method, sub-sampling is done independently on each shape, without enforcing con-sistency. We also randomly sub-sample 20% of the de-scriptors to enforce our penalty E4 at each training step, to avoid manipulating a large set of operators. Note that this sub-sampling is random at each step and different

op-timizeddescriptors are used inE4throughout optimization.

We observed that this sub-sampling not only helps to gain speed but also robustness during optimization. Note also that we do not form large diagonal matrices explicitly, but rather define the multiplicative operatorsMin objectiveE4 directly via pointwise products and summation using con-traction between tensors. Finally, we also tested two ap-proaches for functional map conversion: either using the soft-map approach of FMNet [19] or via standard KD-tree method in the spectral domain [26], and we report the re-sults with both methods in the ablation study below.

ParametersOur method has two key parameters: the input

descriptors, and the scalar weightswiin Eq. 5. In all ex-periments below we used the same SHOT [38] descriptors as in FMNet [19] with the same parameters, which lead to 352-dimensional vector per point, or equivalently, 352 de-scriptor functions on each shape.

For the scalar weights,wi, we used the same four fixed values for all experiments below (namely,w1= 103,w2=

103_,_w

3= 1andw4= 10), which were obtained by exam-ining the relative penalty values obtained throughout the op-timization on a small set of shapes, and setting the weights inversely proportionally to those values.

Methods Mean Geodesic Error

with KDTree with Softmap

FMNet [19] 0.018 0.025 E1+E2+E3+E4 0.027 0.044 E3 0.073 0.073 E1+E2+E3 0.079 0.081 E1+E3+E4 0.082 0.077 E1 0.083 0.111 E2+E3+E4 0.087 0.079 E1+E2+E4 0.138 0.126 E2 0.152 0.135 E4 0.252 0.330

Ours optimized all 0.009 0.017

Table 1: Ablation study of the different penalty terms in our method and comparison with the supervised FMNet ap-proach on the FAUST shape matching benchmark.

6. Results

Ablation study We first evaluated our approach by

com-paring it to the baseline FMNet [19] method on the FAUST shape dataset [5], while also evaluating the relative impor-tance of the different penalties in our method.

The FAUST dataset consists of 100 human shapes in dif-ferent poses with known correspondences between them. In our first evaluation we split this dataset into training and test set containing 80 and 20 shapes respectively, as done in [19]. We used the training set to train the FMNet archi-tecture using the ground truth correspondences. We used the same set in our unsupervised method to optimize for the non-linear descriptor transformation. We stress that unlike FMNet, our method is purely unsupervised and the “train-ing set” was only used for descriptor optimization with the functional map penalties introduced above. We then applied the optimized network to the shapes in the test set and evalu-ated the average correspondence error obtained by different methods, with respect to the ground truth maps. Note that for fairness of comparison, we did not refine the computed maps with any post-processing techniques.

Table 1 summarizes the quality of the computed cor-respondences between shapes in the test set, using FM-Net [19] and our approach when using different combina-tion of penalties, and when using conversion to pointwise maps with both the soft-map approach used in [19] and us-ing nearest neighbor search in the spectral domain usus-ing a KDTree [26]. We can make several observations: first, KDTree conversion gives, in most cases, better accuracy than the soft map one; second, the combination of all four penalties significantly out-performs any other subset, and comes close to achieving the accuracy obtained with the

(6)

(a) FMnet (b) E3, as only penalty

(c) Ours optimized on all (d) Ours optimized on subset

Figure 1: We compare matches on meshes from FAUST with 6890 vertices and see how adding penalties compare to having all of them, as well as how training on more shapes improves matching.

supervised FMNet; third, among individual penalties used independently, the Laplacian commutativity gives the best result. In the last row of Table1 we also show the result of our method using all four penalties, while optimizing the neural network on pairs taken from all 100 shapes in the FAUST dataset, and testing on the same subset containing only the last 20. Note that in our case, unlike FMNet, this is reasonable, since we never use ground truth correspon-dences during optimization. As can be seen, our method, when optimized on all shapes gives superior performance even compared to FMNet, despite being purely unsuper-vised. Figure1shows qualitative comparison of correspon-dences obtained by different methods.

Datasets We evaluated our method on the following

datasets: the original FAUST dataset [5] containing 100 human shapes in 1-1 correspondence and two datasets obtained by independently remeshing each shape in the FAUST and SCAPE [5,3] shape collections, to approxi-mately 5000 vertices, using the LRVD remeshing method [45]. This algorithm results in a triangle mesh adapted to the structure of each shape, which means that differ-ent meshes are no longer in 1-1 correspondence, and in-deed can have different number of vertices. The resulting remeshed datasets therefore offer significantly more vari-ability in terms of shape structure, including e.g. point sam-pling density, compared to the original ones, making them

Figure 2: Example pair of shapes from the remeshed FAUST dataset. Note the significant changes in point sam-pling density in various shape regions.

more challenging for existing algorithms. Let us note also that the SCAPE dataset is slightly more challenging since the shapes are less regular (e.g., there are often reconstruc-tion artefacts on hands and feet) and have fewer features than those in FAUST.

We stress that although we also evaluated on the origi-nal FAUST dataset, we view the remeshed datasets as both more realistic and provide a more faithful representation of the accuracy and generalization power of different tech-niques. Figure2shows an example of a shape pair from the remeshed FAUST dataset. For reference, we also include il-lustrations of shapes from these datasets in the supplemen-tary material.

BaselinesWe compared our method to several techniques,

both supervised and fully automatic. In the former cate-gory we tested the original FMNet approach [19] and the Geodesic Convolutional Neural Networks (GCNN) method of [22] based on local shape parameterization. Both of these techniques assume, as input, ground truth maps between a subset of the training shapes. For supervised methods we always split the datasets into 80 (resp. 60) shapes for training and 20 (resp. 10) for testing in the FAUST and SCAPE datasets respectively. Among unsupervised meth-ods we used the Product Manifold Filter method with the Gaussian kernel [41] (PMF Gauss) and its variant with the Heat kernel [40] (PMF Heat). Note that FMNet has further been compared and shown to outperform a large number of other baseline methods in [19].

Finally, we also evaluated the basic functional map ap-proach, based on directly optimizing the functional maps as outlined in Section3.1, but using all four of our energies for regularization. This method, which we call “Fmap basic” can be viewed as a combination of the approaches of [12] and [24], as it incorporates functional map coupling (via energyE1) and descriptor commutativity (viaE4). Unlike

(7)

Methods Geodesic Error

Mean 95thPercentile FMNet with KDTree [19] 0.018 0.045 FMNet with Softmap [19] 0.025 0.064 Ours optimized on subset 0.027 0.048 Ours optimized all 0.009 0.028 PMF (Gaussian Kernel)[41] 0.029 0.079 PMF (Heat Kernel)[40] 0.017 0.024

Table 2: Geodesic errors of different methods obtained on the FAUST original dataset with 6890 vertices.

Mean 95thPercentile

FMNet [19] 0.171 0.771

Ours optimized on subset 0.112 0.686 Ours optimized all 0.020 0.052

GCNN [22] 0.051 0.194

Fmap basic [12,24] 0.388 0.757 PMF (Gaussian Kernel) [41] 0.039 0.126 PMF (Heat Kernel) [40] 0.038 0.112 Table 3: Geodesic errors of different methods obtained on the remeshed FAUST dataset.

our technique, however, it does not optimize the descriptor functions, and uses descriptor preservation constraints with the original, noisy descriptors.

For fairness of comparison, we used SHOT descriptors [38] as input to all methods that we tested, both supervised and unsupervised. Moreover, we did not apply any post-processing to the results obtained by any method, except PMF Gauss and PMF Heat, which are, by nature, iterative refinement algorithms. Therefore, the results that we ob-tained can likely be improved further using existing map refinement techniques.

Evaluation and Results

Tables2, 3and4 summarize the accuracy obtained by different methods on the three datasets. Note that in all cases, our method, when optimized on all shapes gives the best accuracy and the gap compared to other methods is es-pecially prominent on the remeshed FAUST and remeshed SCAPE datasets. Remarkably, our method outperforms even supervised learning techniques, GCNN [6] and FM-Net [19] despite being purely unsupervised.

We also plot the error rates of different methods in Figures 4, 6, and 5. Note that on the original FAUST dataset, the results of PMF Heat as shown in Figure4start at 70% perfect correspondences and contain mostly low-error matches. This is greatly facilitated by the consistent

Mean 95thPercentile

FMNet [19] 0.218 0.825

Ours optimized on subset 0.139 0.737 Ours optimized all 0.023 0.050

GCNN [22] 0.074 0.395

Fmap basic [12,24] 0.739 1.221 PMF (Gaussian Kernel) [41] 0.073 0.198 PMF (Heat Kernel) [40] 0.069 0.186

Table 4: Geodesic errors of different methods obtained on the remeshed SCAPE dataset.

sampling in the dataset, an assumption exploited by PMF which aims to find pointwise bijective maps. However, this method also leads to correspondences with very high error, which leads to slow saturation at 100% and is reason why the average error for this method as reported in Table2 is higher than that of ours.

Remark that the remeshed datasets are significantly harder for both supervised and unsupervised methods, since the shapes are no longer identically meshed and in 1-1 cor-respondence. We have observed this difficulty also while training supervised FMNet and GCNN techniques with very slow error convergence during training. On both of these datasets, our approach achieves the lowest average error, shown in Tables3and4. Note that on the remeshed FAUST dataset, as shown in Fig. 6 only GCNN [6] produces a similarly large fraction of correspondences with small error. However, this method issupervised, and moreover still re-sults in significantly higher average error than our approach on this dataset, primarily due to strong outliers. On the remeshed SCAPE dataset, summarized in Table3and Fig-ure 5our method leads to the best results across all mea-sures. We find this especially remarkable since our method is both unsupervised and no post-processing was applied to the computed correspondences.

Figure8shows an example of a pair of shapes and maps obtained between them using different methods visualized using texture transfer. Note the continuity and quality of the map obtained using our method.

RuntimeOne further advantage of our method is its

effi-ciency, since we do not rely on the computation of geodesic matrices and operate entirely in the spectral domain. FMnet [19] uses pairwise geodesic distance matrices for enforc-ing the soft map loss, which requires time and memory for preprocessing and during training. For comparison, run-ning one epoch with the same batch size takes 1.1 second using our methods compared to 18.8 with FMNet with an NVIDIA Tesla P100 GPU.

(8)

Source Ground-Truth FMnet PMF (heat) PMF (gauss) Ours on subset Ours all shapes Figure 3: Comparison of our method with texture transfer on shapes from the SCAPE remeshed dataset.

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Geodesic error 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction of correspondences

FAUST Original - 6890 vertices

FMnet-KDTree FMnet-Softmap Ours-Train on subset Ours-Train on whole set PMF Gauss PMF Heat

Figure 4: Point-to-Point correspondences plot comparing methods trained and tested on FAUST original dataset.

0 0.02 0.04 0.06 0.08 0.1 Geodesic error 0 0.2 0.4 0.6 0.8 1 Fraction of correspondences

SCAPE Remeshed - 5000 vertices

FMnet

Ours - Train on subset Ours - Train on whole set GCNN

PMF Gauss PMF Heat

Figure 5: Point-to-Point correspondences plot comparison for methods trained and tested on SCAPE remeshed dataset.

7. Conclusion & Future Work

We have presented an unsupervised learning-based method for computing correspondences between shapes. Key to our approach is a bilevel optimization

formula-0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Geodesic error 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction of correspondences

FAUST Remeshed - 5000 vertices

FMnet Ours-Train on subset Ours-Train on whole set GCNN

PMF Gauss PMF Heat

Figure 6: Point-to-Point correspondences plot comparison for methods trained and tested on FAUST remeshed dataset.

tion, aimed to optimize descriptor functions, while penal-izing the structural properties of the entire map, obtained via the functional maps framework, from the optimized de-scriptors. This allows us to achieve high-quality, globally-consistent correspondences without relying on any exter-nally provided ground-truth maps. Remarkably, our ap-proach achieves similar, and in some cases superior perfor-mance to even supervised correspondence techniques.

In the future, we plan to incorporate other penalties on functional maps, e.g., those arising from recently-proposed kernalization approaches [42], or for promoting orientation preserving maps [29]. Moreover, it might be beneficial to incorporate cycle consistency constraints [14], going be-yond pairwise map consistency used in our method. Fi-nally, it would be interesting to extend our method topartial

shapesand to study its performance for non-isometric shape

correspondence, and matching other modalities, such as im-ages or point clouds, since it opens the door to linking the properties of local descriptors to global map consistency, expressed through a very general functional framework.

(9)

References

[1] M. Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems Software

avail-able from tensorflow.org. 2015. 5

[2] Y. Aflalo, A. Dubrovina, and R. Kimmel. Spectral generalized multi-dimensional scaling. International

Journal of Computer Vision, 118(3):380–392, 2016.2

[3] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis. SCAPE: Shape Completion and Animation of People. InACM Transactions on

Graphics (TOG), volume 24, pages 408–416. ACM,

2005.6

[4] S. Biasotti, A. Cerri, A. Bronstein, and M. Bronstein. Recent trends, applications, and perspectives in 3d shape similarity assessment. InComputer Graphics

Forum, volume 35, pages 87–119, 2016.1

[5] F. Bogo, J. Romero, M. Loper, and M. J. Black. FAUST: Dataset and evaluation for 3D mesh regis-tration. InProceedings IEEE Conf. on Computer

Vi-sion and Pattern Recognition (CVPR), Piscataway, NJ,

USA, June 2014. IEEE.1,5,6

[6] D. Boscaini, J. Masci, S. Melzi, M. M. Bronstein, U. Castellani, and P. Vandergheynst. Learning class-specific descriptors for deformable shapes using lo-calized spectral convolutional networks. InComputer

Graphics Forum, volume 34, pages 13–23. Wiley

On-line Library, 2015.2,7

[7] D. Boscaini, J. Masci, E. Rodola, and M. M. Bron-stein. Learning shape correspondence with anisotropic convolutional neural networks. InProc. NIPS, pages 3189–3197, 2016.2

[8] O. Burghard, A. Dieckmann, and R. Klein. Embed-ding shapes with Green’s functions for global shape matching.Computers & Graphics, 68:1–10, 2017.2

[9] E. Corman, M. Ovsjanikov, and A. Chambolle. Super-vised descriptor learning for non-rigid shape match-ing. InProc. ECCV Workshops (NORDIA), 2014. 1,

2,3

[10] L. Cosmo, E. Rodola, J. Masci, A. Torsello, and M. M. Bronstein. Matching deformable objects in clutter. In

3D Vision (3DV), 2016 Fourth International

Confer-ence on, pages 1–10. IEEE, 2016.2

[11] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast lo-calized spectral filtering. InAdvances in Neural

Infor-mation Processing Systems, pages 3844–3852, 2016.

2

[12] D. Eynard, E. Rodola, K. Glashoff, and M. M. Bron-stein. Coupled functional maps. In3D Vision (3DV), pages 399–407. IEEE, 2016.2,4,6,7

[13] D. Ezuz and M. Ben-Chen. Deblurring and denois-ing of maps between shapes. InComputer Graphics

Forum, volume 36, pages 165–174. Wiley Online

Li-brary, 2017.2,3

[14] Q. Huang, F. Wang, and L. Guibas. Functional map networks for analyzing and exploring large shape col-lections. ACM Transactions on Graphics (TOG), 33(4):36, 2014.2,8

[15] R. Huang and M. Ovsjanikov. Adjoint map represen-tation for shape analysis and matching. InComputer

Graphics Forum, volume 36, pages 151–163. Wiley

Online Library, 2017.2

[16] A. Kovnatsky, M. M. Bronstein, X. Bresson, and P. Vandergheynst. Functional correspondence by ma-trix completion. In Proceedings of the IEEE

con-ference on computer vision and pattern recognition,

pages 905–914, 2015.2

[17] A. Kovnatsky, M. M. Bronstein, A. M. Bronstein, K. Glashoff, and R. Kimmel. Coupled quasi-harmonic bases. In Computer Graphics Forum, volume 32, pages 439–448, 2013.2

[18] A. Kovnatsky, K. Glashoff, and M. M. Bronstein. MADMM: a generic algorithm for non-smooth opti-mization on manifolds. InProc. ECCV, pages 680– 696. Springer, 2016.3

[19] O. Litany, T. Remez, E. Rodol`a, A. M. Bronstein, and M. M. Bronstein. Deep functional maps: Struc-tured prediction for dense shape correspondence.2017 IEEE International Conference on Computer Vision

(ICCV), pages 5660–5668, 2017.1,2,3,5,6,7

[20] O. Litany, E. Rodol`a, A. M. Bronstein, and M. M. Bronstein. Fully spectral partial shape matching. In

Computer Graphics Forum, volume 36, pages 247–

258. Wiley Online Library, 2017.2

[21] R. Litman and A. M. Bronstein. Learning spectral de-scriptors for deformable shape correspondence.IEEE transactions on pattern analysis and machine

intelli-gence, 36(1):171–180, 2014. 1,2

[22] J. Masci, D. Boscaini, M. Bronstein, and P. Van-dergheynst. Geodesic convolutional neural networks on riemannian manifolds. InProceedings of the IEEE international conference on computer vision

work-shops, pages 37–45, 2015.1,2,6,7

[23] F. Monti, D. Boscaini, J. Masci, E. Rodol`a, J. Svo-boda, and M. M. Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. InCVPR, pages 5425–5434. IEEE Computer Society, 2017.1,2

[24] D. Nogneng, S. Melzi, E. Rodol`a, U. Castellani, M. Bronstein, and M. Ovsjanikov. Improved func-tional mappings via product preservation. In

(10)

Com-puter Graphics Forum, volume 37, pages 179–190. Wiley Online Library, 2018.2,6,7

[25] D. Nogneng and M. Ovsjanikov. Informative descrip-tor preservation via commutativity for shape

match-ing. Computer Graphics Forum, 36(2):259–267,

2017.2,3,4

[26] M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and L. Guibas. Functional Maps: A Flexible Representation of Maps Between Shapes.

ACM Transactions on Graphics (TOG), 31(4):30,

2012.1,2,4,5

[27] M. Ovsjanikov, E. Corman, M. Bronstein, E. Rodol`a, M. Ben-Chen, L. Guibas, F. Chazal, and A. Bron-stein. Computing and processing correspondences with functional maps. In ACM SIGGRAPH 2017

Courses, SIGGRAPH ’17, pages 5:1–5:62, 2017. 2,

3

[28] A. Poulenard, P. Skraba, and M. Ovsjanikov. Topo-logical function optimization for continuous shape matching. InComputer Graphics Forum, volume 37, pages 13–25. Wiley Online Library, 2018.2

[29] J. Ren, A. Poulenard, P. Wonka, and M. Ovs-janikov. Continuous and orientation-preserving cor-respondences via functional maps.ACM Transactions

on Graphics (TOG), 37(6), 2018.8

[30] E. Rodol`a, L. Cosmo, M. M. Bronstein, A. Torsello, and D. Cremers. Partial functional correspondence.

InComputer Graphics Forum, volume 36, pages 222–

236. Wiley Online Library, 2017.2

[31] E. Rodol`a, M. Moeller, and D. Cremers. Point-wise map recovery and refinement from functional corre-spondence. InProc. Vision, Modeling and

Visualiza-tion (VMV), 2015. 3

[32] E. Rodol`a, S. Rota Bulo, T. Windheuser, M. Vest-ner, and D. Cremers. Dense non-rigid shape corre-spondence using random forests. InProceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, pages 4177–4184, 2014.2

[33] S. Rosenberg. The Laplacian on a Riemannian

man-ifold: an introduction to analysis on manifolds,

vol-ume 31. Cambridge University Press, 1997.4

[34] R. Rustamov, M. Ovsjanikov, O. Azencot, M. Ben-Chen, F. Chazal, and L. Guibas. Map-based explo-ration of intrinsic shape differences and variability.

ACM Trans. Graphics, 32(4):72:1–72:12, July 2013.

2,4

[35] R. W. Sumner and J. Popovi´c. Deformation transfer for triangle meshes. InACM Transactions on

Graph-ics (TOG), volume 23, pages 399–405. ACM, 2004.

1

[36] J. Sun, M. Ovsjanikov, and L. Guibas. A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion. In Computer graphics forum, vol-ume 28, pages 1383–1392, 2009.3

[37] G. K. Tam, Z.-Q. Cheng, Y.-K. Lai, F. C. Langbein, Y. Liu, D. Marshall, R. R. Martin, X.-F. Sun, and P. L. Rosin. Registration of 3d point clouds and meshes: a survey from rigid to nonrigid. IEEE transactions

on visualization and computer graphics, 19(7):1199–

1217, 2013.1

[38] F. Tombari, S. Salti, and L. Di Stefano. Unique sig-natures of histograms for local surface description. In

International Conference on Computer Vision (ICCV),

pages 356–369, 2010.5,7

[39] O. Van Kaick, H. Zhang, G. Hamarneh, and D. Cohen-Or. A survey on shape correspondence. InComputer

Graphics Forum, volume 30, pages 1681–1707, 2011.

1

[40] M. Vestner, Z. L¨ahner, A. Boyarski, O. Litany, R. Slossberg, T. Remez, E. Rodola, A. Bronstein, M. Bronstein, R. Kimmel, and D. Cremers. Efficient deformable shape correspondence via kernel match-ing. InProc. 3DV, 2017.6,7

[41] M. Vestner, R. Litman, E. Rodol`a, A. Bronstein, and D. Cremers. Product manifold filter: Non-rigid shape correspondence via kernel density estimation in the product space. In Proc. CVPR, pages 6681–6690, 2017.6,7

[42] L. Wang, A. Gehre, M. M. Bronstein, and J. Solomon. Kernel functional maps. In Computer Graphics Fo-rum, volume 37, pages 27–36. Wiley Online Library, 2018.2,3,8

[43] Y. Wang, B. Liu, K. Zhou, and Y. Tong. Vector field map representation for near conformal surface corre-spondence. InComputer Graphics Forum, volume 37, pages 72–83. Wiley Online Library, 2018.2

[44] L. Wei, Q. Huang, D. Ceylan, E. Vouga, and H. Li. Dense human body correspondences using convolu-tional networks. In Proceedings of the IEEE

Con-ference on Computer Vision and Pattern Recognition,

pages 1544–1553, 2016.1,2

[45] D.-M. Yan, G. Bao, X. Zhang, and P. Wonka. Low-resolution remeshing using the localized restricted voronoi diagram. IEEE transactions on visualization

and computer graphics, 20(10):1418–1427, 2014.6

(11)

Source Ground-Truth FMnet PMF (heat) PMF (gauss) Ours on subset Ours all shapes Figure 7: Comparison of our method with texture transfer on shapes from the FAUST remeshed dataset.

Source Ground-Truth FMnet PMF (heat) PMF (gauss) Ours on subset Ours all shapes