# Hitting and Commute Times in Large Random Neighborhood Graphs

48

## Full text

(1)

### Neighborhood Graphs

Ulrike von Luxburg luxburg@informatik.uni-hamburg.de

Department of Computer Science University of Hamburg

Vogt-Koelln-Str. 30, 22527 Hamburg, Germany

Institute of Mathematics University of Leipzig

Augustusplatz 10, 04109 Leipzig, Germany

Matthias Hein hein@cs.uni-saarland.de

Department of Mathematics and Computer Science Saarland University

Campus E1 1, 66123 Saarbr¨ucken, Germany

Editor:Gabor Lugosi

Abstract

In machine learning, a popular tool to analyze the structure of graphs is the hitting time and the commute distance (resistance distance). For two verticesuandv, the hitting time

Huv is the expected time it takes a random walk to travel from u to v. The commute distance is its symmetrized versionCuv=Huv+Hvu. In our paper we study the behavior of hitting times and commute distances when the numbernof vertices in the graph tends to infinity. We focus on random geometric graphs (ε-graphs, kNN graphs and Gaussian similarity graphs), but our results also extend to graphs with a given expected degree distribution or Erd˝os-R´enyi graphs with planted partitions. We prove that in these graph families, the suitably rescaled hitting timeHuvconverges to 1/dvand the rescaled commute time to 1/du+ 1/dvwhereduanddv denote the degrees of verticesuandv. In these cases, hitting and commute times do not provide information about the structure of the graph, and their use is discouraged in many machine learning applications.

Keywords: commute distance, resistance, random graph, k-nearest neighbor graph, spec-tral gap

1. Introduction

(2)

The commute distance is very popular in many different fields of computer science and be-yond. As examples consider the fields of graph embedding (Guattery, 1998; Saerens et al., 2004; Qiu and Hancock, 2006; Wittmann et al., 2009), graph sparsification (Spielman and Srivastava, 2008), social network analysis (Liben-Nowell and Kleinberg, 2003), proximity search (Sarkar et al., 2008), collaborative filtering (Fouss et al., 2006), clustering (Yen et al., 2005), semi-supervised learning (Zhou and Sch¨olkopf, 2004), dimensionality reduction (Ham et al., 2004), image processing (Qiu and Hancock, 2005), graph labeling (Herbster and Pon-til, 2006; Cesa-Bianchi et al., 2009), theoretical computer science (Aleliunas et al., 1979; Chandra et al., 1989; Avin and Ercal, 2007; Cooper and Frieze, 2003, 2005, 2007, 2011), and various applications in chemometrics and bioinformatics (Klein and Randic, 1993; Ivanciuc, 2000; Fowler, 2002; Roy, 2004; Guillot et al., 2009).

The commute distance has many nice properties, both from a theoretical and a practical point of view. It is a Euclidean distance function and can be computed in closed form. As opposed to the shortest path distance, it takes into account all paths between u and v, not just the shortest one. As a rule of thumb, the more paths connect u with v, the smaller their commute distance becomes. Hence it supposedly satisfies the following, highly desirable property:

Property (F): Vertices in the same “cluster” of the graph have a small commute distance, whereas vertices in different clusters of the graph have a large commute distance to each other.

Consequently, the commute distance is considered a convenient tool to encode the cluster structure of the graph.

In this paper we study how the commute distance behaves when the size of the graph increases. We focus on the case of random geometric graphs (k-nearest neighbor graphs, ε-graphs, and Gaussian similarity graphs). Denote by Huv the expected hitting time and

by Cuv the commute distance between two vertices u and v, by du the degree of vertex u,

by vol(G) the volume of the graph. We show that as the number n of vertices tends to infinity, there exists a scaling term sc such that the hitting times and commute distances in random geometric graphs satisfy

sc·

1

vol(G)Huv− 1 dv

−→0 and sc·

1

vol(G)Cuv−

1

du

+ 1

dv

−→0,

and at the same timesc·du and sc·dv converge to positive constants (precise definitions,

assumptions and statements below). Loosely speaking, the convergence result says that the rescaled commute distance can be approximated by the sum of the inverse rescaled degrees.

(3)

spectral gap in random geometric graphs, and Avin and Ercal (2007), who already studied the growth rates of the commute distance in random geometric graphs. Our main technical contributions are to strengthen the bound provided by Lov´asz (1993), to extend the results by Boyd et al. (2005) and Avin and Ercal (2007) to more general types of geometric graphs such ask-nearest neighbor graphs with general domain and general density, and to develop the flow-based techniques for geometric graphs.

Loosely speaking, the convergence results say that whenever the graph is reasonably large, the degrees are not too small, and the bottleneck is not too extreme, then the commute distance between two vertices can be approximated by the sum of their inverse degrees. These results have the following important consequences for applications.

Negative implication: Hitting and commute times can be misleading. Our approximation result shows that the commute distance does not take into account any global properties of the data in large geometric graphs. It has been observed before that the commute distance sometimes behaves in an undesired way when high-degree vertices are involved (Liben-Nowell and Kleinberg, 2003; Brand, 2005), but our work now gives a complete theoretical description of this phenomenon: the commute distance just considers the local density (the degree of the vertex) at the two vertices, nothing else. The resulting large sample commute distance dist(u, v) = 1/du+ 1/dv is completely meaningless as a distance on a graph. For

example, all data points have the same nearest neighbor (namely, the vertex with the largest degree), the same second-nearest neighbor (the vertex with the second-largest degree), and so on. In particular, one of the main motivations to use the commute distance, Property (F), no longer holds when the graph becomes large enough. Even more disappointingly, computer simulations show that n does not even need to be very large before (F) breaks down. Often, nin the order of 1000 is already enough to make the commute distance close to its approximation expression. This effect is even stronger if the dimensionality of the underlying data space is large. Consequently, even on moderate-sized graphs, the use of the raw commute distance should be discouraged.

Positive implication: Efficient computation of approximate commute distances. In some ap-plications the commute distance is not used as a distance function, but as a tool to encode the connectivity properties of a graph, for example in graph sparsification (Spielman and Srivastava, 2008) or when computing bounds on mixing or cover times (Aleliunas et al., 1979; Chandra et al., 1989; Avin and Ercal, 2007; Cooper and Frieze, 2011) or graph label-ing (Herbster and Pontil, 2006; Cesa-Bianchi et al., 2009). To obtain the commute distance between all points in a graph one has to compute the pseudo-inverse of the graph Lapla-cian matrix, an operation of time complexity O(n3). This is prohibitive in large graphs. To circumvent the matrix inversion, several approximations of the commute distance have been suggested in the literature (Spielman and Srivastava, 2008; Sarkar and Moore, 2007; Brand, 2005). Our results lead to a much simpler and well-justified way of approximating the commute distance on large random geometric graphs.

(4)

Section 5 we show in extensive simulations that our approximation results are relevant for many graphs used in machine learning. Relations to previous work is discussed in Section 2. All proofs are deferred to Sections 6 and 7. Parts of this work is built on our conference paper von Luxburg et al. (2010).

2. Related Work

The resistance distance became popular through the work of Doyle and Snell (1984) and Klein and Randic (1993), and the connection between commute and resistance distance was established by Chandra et al. (1989) and Tetali (1991). By now, resistance and com-mute distances are treated in many text books, for example Chapter IX of Bollobas (1998), Chapter 2 of Lyons and Peres (2010), Chapter 3 of Aldous and Fill (2001), or Section 9.4 of Levin et al. (2008). It is well known that the commute distance is related to the spectrum of the unnormalized and normalized graph Laplacian (Lov´asz, 1993; Xiao and Gutman, 2003). Bounds on resistance distances in terms of the eigengap and the term 1/du+ 1/dv have

already been presented in Lov´asz (1993). We present an improved version of this bound that leads to our convergence results in the spectral approach.

Properties of random geometric graphs have been investigated thoroughly in the literature, see for example the monograph of Penrose (2003). Our work concerns the case where the graph connectivity parameter (εork, respectively) is so large that the graph is connected with high probability (see Penrose, 1997; Brito et al., 1997; Penrose, 1999, 2003; Xue and Kumar, 2004; Balister et al., 2005). We focus on this case because it is most relevant for machine learning. In applications such as clustering, people construct a neighborhood graph based on given similarity scores between objects, and they choose the graph connectivity parameter so large that the graph is well-connected.

Asymptotic growth rates of commute distances and the spectral gap have already been studied for a particular special case: ε-graphs on a sample from the uniform distribution on the unit cube or unit torus in Rd (Avin and Ercal, 2007; Boyd et al., 2005; Cooper and Frieze, 2011). However, the most interesting case for machine learning is the case of kNN graphs or Gaussian graphs (as they are the ones used in practice) on spaces with a non-uniform probability distribution (real data is never uniform). A priori, it is unclear whether the commute distances on such graphs behave as the ones on ε-graphs: there are many situations in which these types of graphs behave very different. For example their graph Laplacians converge to different limit objects (Hein et al., 2007). In our paper we now consider the general situation of commute distances in ε, kNN and Gaussian graphs on a sample from a non-uniform distribution on some subset ofRd.

(5)

(2005); Avin and Ercal (2007); Cooper and Frieze (2011). These three papers consider the special case of an ε-graph on the unit cube / unit torus in Rd, endowed with the uniform distribution. For this case, the authors also discuss cover times and mixing times, and Avin and Ercal (2007) also include bounds on the asymptotic growth rate of resistance distances. We extend these results to the case ofε, kNN and Gaussian graphs with general domain and general probability density. The focus in our paper is somewhat different from the related literature, because we care about the exact limit expressions rather than asymptotic growth rates.

3. General Setup, Definitions and Notation

We consider undirected graphs G= (V, E) that are connected and not bipartite. By nwe denote the number of vertices. The adjacency matrix is denoted by W := (wij)i,j=1,...,n.

In case the graph is weighted, this matrix is also called the weight matrix. All weights are assumed to be non-negative. The minimal and maximal weights in the graph are de-noted by wmin and wmax. By di := Pnj=1wij we denote the degree of vertex vi. The

diagonal matrixD with diagonal entriesd1, . . . , dnis called the degree matrix, the minimal

and maximal degrees are denoted dmin and dmax. The volume of the graph is given as

vol(G) = P

j=1,...,ndj. The unnormalized graph Laplacian is given as L := D−W, the

normalized one as Lsym = D−1/2LD−1/2. Consider the natural random walk on G. Its

transition matrix is given as P =D−1W. It is well-known thatλ is an eigenvalue of Lsym

if and only if 1−λis an eigenvalue of P. By 1 =λ1 ≥λ2 ≥. . .≥λn >−1 we denote the

eigenvalues ofP. The quantity 1−max{λ2,|λn|}is called thespectral gapof P.

Thehitting timeHuvis defined as the expected time it takes a random walk starting in

ver-texu to travel to vertexv (whereHuu= 0 by definition). Thecommute distance(commute

time) between u and v is defined as Cuv := Huv+Hvu. Closely related to the commute

distance is the resistance distance. Here one interprets the graph as an electrical network where the edges represent resistors. The conductance of a resistor is given by the corre-sponding edge weight. The resistance distanceRuv between two vertices u andv is defined

as the effective resistance betweenuandvin the network. It is well known (Chandra et al., 1989) that the resistance distance coincides with the commute distance up to a constant: Cuv= vol(G)Ruv. For background reading on resistance and commute distances see Doyle

and Snell (1984); Klein and Randic (1993); Xiao and Gutman (2003); Fouss et al. (2006).

Recall that for a symmetric, non-invertible matrixAits Moore-Penrose inverse is defined as A†:= (A+U)−1−U whereU is the projection on the eigenspace corresponding to eigenvalue 0. It is well known that commute times can be expressed in terms of the Moore-Penrose inverse L† of the unnormalized graph Laplacian (e.g., Klein and Randic, 1993; Xiao and Gutman, 2003; Fouss et al., 2006):

Cij = vol(G) D

ei−ej, L†(ei−ej) E

,

where ei is the i-th unit vector in Rn. The following representations for commute and

hitting times involving the pseudo-inverseL†symof the normalized graph Laplacian are direct

(6)

Proposition 1 (Closed form expression for hitting and commute times) LetGbe a connected, undirected graph withnvertices. The hitting timesHij,i6=j, can be computed

by

Hij = vol(G) D 1

p

dj

ej , L†sym

1 p

dj

ej−

1

di

ei E

,

and the commute times satisfy

Cij = vol(G) D 1

di

ei−

1

p

dj

ej , L†sym

1 p

dj

ej−

1

di

ei E

.

Our main focus in this paper is the class ofgeometric graphs. For adeterministicgeometric graph we consider a fixed set of points X1, . . . , Xn ∈ Rd. These points form the vertices

v1, . . . , vn of the graph. In the ε-graph we connect two points whenever their Euclidean

distance is less than or equal to ε. In the undirected symmetric k-nearest neighbor graph we connect vi to vj if Xi is among the k nearest neighbors of Xj or vice versa. In the

undirectedmutual k-nearest neighbor graph we connectvi tovj ifXi is among the k

near-est neighbors of Xj and vice versa. Note that by default, the terms ε- and kNN-graph

refer to unweighted graphs in our paper. When we treat weighted graphs, we always make it explicit. For a general similarity graph we build a weight matrix between all points based on a similarity function k :RRd → R≥0, that is we define the weight matrix W

with entries wij = k(Xi, Xj) and consider the fully connected graph with weight matrix

W. The most popular weight function in applications is the Gaussian similarity function wij = exp(−kXi−Xjk2/σ2), where σ >0 is a bandwidth parameter.

While these definitions make sense with any fixed set of vertices, we are most interested in the case of random geometric graphs. We assume that the underlying set of vertices X1, ..., Xn has been drawn i.i.d. according to some probability density p on Rd. Once

the vertices are known, the edges in the graphs are constructed as described above. In the random setting it is convenient to make regularity assumptions in order to be able to control quantities such as the minimal and maximal degrees.

Definition 2 (Valid region) Let p be any density on Rd. We call a connected subset

X ⊂Rd a valid region if the following properties are satisfied:

1. The density on X is bounded away from 0 and infinity, that is for all x∈ X we have that0< pmin≤p(x)≤pmax<∞ for some constants pmin, pmax.

2. X has “bottleneck” larger than some value h > 0: the set {x ∈ X : dist(x, ∂X) > h/2} is connected (here∂X denotes the topological boundary of X).

3. The boundary of X is regular in the following sense. We assume that there exist positive constants α > 0 and ε0 >0 such that if ε < ε0, then for all points x ∈∂X

we have vol(Bε(x) ∩ X) ≥ αvol(Bε(x)) (where vol denotes the Lebesgue volume).

(7)

Sometimes we consider a valid region with respect to two pointss,t. Here we additionally assume that sand t are interior points of X.

In the spectral part of our paper, we always have to make a couple of assumptions that will be summarized by the term general assumptions. They are as follows: First we assume that X := supp(p) is a valid region according to Definition 2. Second, we assume that X does not contain any holes and does not become arbitrarily narrow: there exists a homeomorphism h : X → [0,1]d and constants 0 < Lmin < Lmax <∞ such that for all

x, y∈ X we have

Lminkx−yk ≤ kh(x)−h(y)k ≤ Lmaxkx−yk.

This condition restricts X to be topologically equivalent to the cube. In applications this is not a strong assumption, as the occurrence of “holes” with vanishing probability density is unrealistic due to the presence of noise in the data generating process. More generally we believe that our results can be generalized to other homeomorphism classes, but refrain from doing so as it would substantially increase the amount of technicalities.

In the following we denote the volume of the unit ball inRdbyηd. For readability reasons, we

are going to state our main results using constantsci >0. These constants are independent

of n and the graph connectivity parameter (ε or k or h, respectively) but depend on the dimension, the geometry ofX, and p. The values of all constants are determined explicitly in the proofs. They are not the same in different propositions.

4. Main Results

Our paper comprises two different approaches. In the first approach we analyze the re-sistance distance by flow based arguments. This technique is somewhat restrictive in the sense that it only works for the resistance distance itself (not the hitting times) and we only apply it to random geometric graphs. The advantage is that in this setting we obtain good convergence conditions and rates. The second approach is based on spectral arguments and is more general. It works for various kinds of graphs and can treat hitting times as well. This comes at the price of slightly stronger assumptions and worse convergence rates.

4.1 Results Based on Flow Arguments

Theorem 3 (Commute distance on ε-graphs) Let X be a valid region with bottleneck h and minimal density pmin. For ε ≤ h, consider an unweighted ε-graph built from the

sequence X1, . . . , Xn that has been drawn i.i.d. from the density p. Fix i and j. Assume

that Xi and Xj have distance at least h from the boundary of X, and that the distance

between Xi and Xj is at least 8ε. Then there exist constants c1, . . . , c7 >0 (depending on

(8)

c3exp(−c4nεd)/εd the commute distance on the ε-graph satisfies

nεd

vol(G)Cij−

nεd di

+nε

d

dj

≤   

 

c5/nεd if d >3

c6·log(1/ε)/nε3 if d= 3

c7/nε3 if d= 2

The probability converges to 1 if n→ ∞ and nεd/log(n)→ ∞. The right hand side of the deviation bound converges to 0 as n→ ∞, if

  

 

nεd→ ∞ if d >3

nε3/log(1)→ ∞ if d= 3

nε3 =nεd+1 → ∞ if d= 2.

Under these conditions, if the density p is continuous and ifε→0, then

nεd

vol(G)Cij −→ 1 ηdp(Xi)

+ 1

ηdp(Xj)

a.s.

Theorem 4 (Commute distance on kNN-graphs) Let X be a valid region with bot-tleneck h and density bounds pmin and pmax. Consider an unweighted kNN-graph (either

symmetric or mutual) such that (k/n)1/d/(2pmax) ≤h, built from the sequence X1, . . . , Xn

that has been drawn i.i.d. from the density p.

Fix i andj. Assume that Xi and Xj have distance at least h from the boundary ofX, and

that the distance between Xi andXj is at least 4(k/n)1/d/pmax. Then there exist constants

c1, . . . , c5>0such that with probability at least 1−c1nexp(−c2k) the commute distance on

both the symmetric and the mutual kNN-graph satisfies

k

vol(G)Cij−

k di

+ k

dj

≤   

 

c4/k if d >3

c5·log(n/k)/k if d= 3

c6n1/2/k3/2 if d= 2

The probability converges to 1 if n→ ∞ and k/log(n)→ ∞. In case d >3, the right hand side of the deviation bound converges to 0 if k→ ∞ (and under slightly worse conditions in cases d= 3 and d = 2). Under these conditions, if the density p is continuous and if additionallyk/n→0, then vol(kG)Cij →2 almost surely.

Let us make a couple of technical remarks about these theorems.

To achieve the convergence of the commute distance we have to rescale it appropriately (for example, in the ε-graph we scale by a factor ofnεd). Our rescaling is exactly chosen such that the limit expressions are finite, positive values. Scaling by any other factor in terms of n,εorkeither leads to divergence or to convergence to zero.

(9)

anyway, see for example Penrose (1997, 1999); Xue and Kumar (2004); Balister et al. (2005). In dimensions 3 and 2, our rates are a bit weaker. For example, in dimension 2 we need nε3 → ∞instead of nε2 → ∞. On the one hand we are not too surprised to get systematic differences between the lowest few dimensions. The same happens in many situations, just consider the example of Polya’s theorem about the recurrence/ transience of random walks on grids. On the other hand, these differences might as well be an artifact of our proof methods (and we suspect so at least for the case d= 3; but even though we tried, we did not get rid of the log factor in this case). It is a matter of future work to clarify this.

The valid region X has been introduced for technical reasons. We need to operate in such a region in order to be able to control the behavior of the graph, e.g. the minimal and maximal degrees. The assumptions on X are the standard assumptions used regularly in the random geometric graph literature. In our setting, we have the freedom of choosing

X ⊂ Rd as we want. In order to obtain the tightest bounds one should aim for a valid X

that has a wide bottleneckh and a high minimal density pmin. In general this freedom of

choosing X shows that if two points are in the same high-density region of the space, the convergence of the commute distance is fast, while it gets slower if the two points are in different regions of high density separated by a bottleneck.

We stated the theorems above for a fixed pairi, j. However, they also hold uniformly over all pairs i, j that satisfy the conditions in the theorem (with exactly the same statement). The reason is that the main probabilistic quantities that enter the proofs are bound on the minimal and maximal degrees, which of course hold uniformly.

4.2 Results Based on Spectral Arguments

The representation of the hitting and commute times in terms of the Moore-Penrose inverse of the normalized graph Laplacian (Proposition 1) can be used to derive the following key proposition that is the basis for all further results in this section.

Proposition 5 (Absolute and relative bounds in any fixed graph) Let G be a fi-nite, connected, undirected, possibly weighted graph that is not bipartite.

1. For i6=j

1

vol(G)Hij − 1 dj

≤ 2

1 1−λ2

+ 1

wmax

d2min.

2. For i6=j

1

vol(G)Cij−

1 di

+ 1

dj

≤ wmax

dmin

1 1−λ2

+ 2

1 di

+ 1

dj

≤ 2

1 1−λ2

+ 2

wmax

(10)

We would like to point out that even though the bound in Part 2 of the proposition is reminiscent to statements in the literature, it is much tighter. Consider the following formula from Lov´asz (1993)

1 2

1 di

+ 1

dj

≤ 1

vol(G)Cij ≤ 1 1−λ2

1 di

+ 1

dj

that can easily be rearranged to the following bound:

1

vol(G)Cij −

1 di

+ 1

dj

≤ 1

1−λ2

2 dmin

. (2)

The major difference between our bound (1) and Lovasz’ bound (2) is that while the latter has the term dmin in the denominator, our bound has the term d2min in the denominator.

This makes all of a difference: in the graphs under considerations our bound converges to 0 whereas Lovasz’ bound diverges. In particular, our convergence results are not a trivial consequence of Lov´asz (1993).

4.2.1 Application to Unweighted Random Geometric Graphs

In the following we are going to apply Proposition 5 to various random geometric graphs. Next to some standard results about the degrees and number of edges in random geometric graphs, the main ingredients are the following bounds on the spectral gap in random geo-metric graphs. These bounds are of independent interest because the spectral gap governs many important properties and processes on graphs.

Theorem 6 (Spectral gap of the ε-graph) Suppose that the general assumptions hold. Then there exist constantsc1, . . . , c6 >0such that with probability at least1−c1nexp(−c2nεd)−

c3exp(−c4nεd)/εd,

1−λ2 ≥ c5·ε2 and 1− |λn| ≥ c6·εd+1/n.

If nεd/logn→ ∞, then this probability converges to 1.

Theorem 7 (Spectral gap of the kNN-graph) Suppose that the general assumptions hold. Then for both the symmetric and the mutual kNN-graph there exist constantsc1, . . . , c4 >

0 such that with probability at least 1−c1nexp(−c2k),

1−λ2 ≥ c3·(k/n)2/d and 1− |λn| ≥ c4·k2/d/n(d+2)/d.

If k/logn→ ∞, then the probability converges to 1.

(11)

Corollary 8 (Hitting and commute times on ε-graphs) Assume that the general as-sumptions hold. Consider an unweighted ε-graph built from the sequence X1, . . . , Xn drawn

i.i.d. from the densityp. Then there exist constantsc1, . . . , c5 >0such that with probability

at least 1−c1nexp(−c2nεd)−c3exp(−c4nεd)/εd, we have uniformly for alli6=j that

nεd

vol(G)Hij − nεd

dj

≤ c5

nεd+2.

If the densityp is continuous and n→ ∞, ε→0 and nεd+2 → ∞, then nεd

vol(G)Hij −→ 1 ηd·p(Xj)

almost surely.

For the commute times, the analogous results hold due to Cij =Hij+Hji.

Corollary 9 (Hitting and commute times on kNN-graphs) Assume that the gen-eral assumptions hold. Consider an unweighted kNN-graph built from the sequence X1, . . . , Xn drawn i.i.d. from the density p. Then for both the symmetric and

mu-tual kNN-graph there exist constants c1, c2, c3 > 0 such that with probability at least

1−c1·n·exp(−kc2), we have uniformly for all i6=j that

k

vol(G)Hij− k dj

≤ c3·

n2/d k1+2/d.

If the densityp is continuous and n→ ∞, k/n→0 and k k/n2/d → ∞, then

k

vol(G)Hij −→1 almost surely.

For the commute times, the analogous results hold due to Cij =Hij+Hji.

Note that the density shows up in the limit for the ε-graph, but not for the kNN graph. The explanation is that in the former, the density is encoded in the degrees of the graph, while in the latter it is only encoded in thek-nearest neighbor distance, but not the degrees themselves. As a rule of thumb, it is possible to convert the last two corollaries into each other by substituting εby (k/(np(x)ηd))1/d or vice versa.

4.2.2 Application to Weighted Graphs

In several applications, ε-graphs or kNN graphs are endowed with edge weights. For example, in the field of machine learning it is common to use Gaussian weights wij =

exp(−kXi−Xjk2/σ2), whereσ >0 is a bandwidth parameter. We can use standard

spec-tral results to prove approximation theorems in such cases.

(12)

lower bounded by some constants wmin, wmax, that is 0 < wmin ≤ wij ≤ wmax for all i, j.

Then, uniformly for all i, j∈ {1, ..., n}, i6=j,

n

vol(G)Hij − n dj

≤ 4n

wmax

wmin

wmax

d2min ≤ 4 w2max w3min

1 n.

For example, this result can be applied directly to a Gaussian similarity graph (for fixed bandwidth σ).

The next theorem treats the case of Gaussian similarity graphs with adapted bandwidthσ. The technique we use to prove this theorem is rather general. Using the Rayleigh principle, we reduce the case of the fully connected Gaussian graph to a truncated graph where edges beyond a certain length are removed. Bounds for this truncated graph, in turn, can be reduced to bounds of the unweighted ε-graph. With this technique it is possible to treat very general classes of graphs.

Theorem 11 (Results on Gaussian graphs with adapted bandwidth) Let X ⊆Rd be a compact set and p a continuous, strictly positive density on X. Consider a fully con-nected, weighted similarity graph built from the pointsX1, . . . , Xndrawn i.i.d. from density

p. As weight function use the Gaussian similarity functionkσ(x, y) = 1 (2πσ2)d2

exp−kx2σy2k2

.

If the densityp is continuous and n→ ∞, σ→0 andnσd+2/log(n)→ ∞, then n

vol(G)Cij −→ 1 p(Xi)

+ 1

p(Xj)

almost surely.

Note that in this theorem, we introduced the scaling factor 1/σdalready in the definition of the Gaussian similarity function to obtain the correct density estimate p(Xj) in the limit.

For this reason, the resistance results are rescaled with factorn instead ofnσd.

4.2.3 Application to Random Graphs with Given Expected Degrees and Erd˝os-R´enyi Graphs

Consider the general random graph model where the edge between verticesiand j is cho-sen independently with a certain probability pij that is allowed to depend on i and j.

This model contains popular random graph models such as the Erd˝os-R´enyi random graph, planted partition graphs, and random graphs with given expected degrees. For this class of random graphs, bounds on the spectral gap have been proved by Chung and Radcliffe (2011). These bounds can directly be applied to derive bounds on the resistance distances. It is not surprising to see that hitting times are meaningless, because these graphs are ex-pander graphs and the random walk mixes fast. The model of Erd˝os-R´enyi graphs with planted partitions is more interesting because it gives insight to the question how strongly clustered the graph can be before our results break down.

Theorem 12 (Chung and Radcliffe, 2011) Let G be a random graph where edges be-tween vertices i and j are put independently with probabilities pij. Consider the

(13)

ij = E(Aij) = pij and D = E(D). Let dmin be the minimal

expected degree. Denote the eigenvalues ofLsym by µ, the ones ofLsym by µ. Chooseε >0. Then there exists a constant k=k(ε) such that if dmin > klog(n), then with probability at

least 1−ε,

∀j= 1, ..., n: |µj−µj| ≤2 s

3 log(4n/ε) dmin

.

Random graphs with given expected degrees. For a graph ofnvertices we havenparameters ¯

d1, ...,d¯n >0. For each pair of vertices vi and vj, we independently place an edge between

these two vertices with probability ¯did¯j/Pnk=1d¯k. It is easy to see that in this model, vertex

vi has expected degree ¯di (cf. Section 5.3. in Chung and Lu, 2006 for background reading).

Corollary 13 (Random graphs with given expected degrees) Consider any se-quence of random graphs with expected degrees such that dmin = ω(logn). Then the

commute distances satisfy for all i6=j,

1

vol(G)Cij−

1 di

+ 1

dj

1 di

+ 1

dj

=O

1 log(2n)

−→0, almost surely.

Planted partition graphs. Assume that the n vertices are split into two “clusters” of equal size. We put an edge between two vertices u and v with probability pwithin if they are in

the same cluster and with probabilitypbetween< pwithin if they are in different clusters. For

simplicity we allow self-loops.

Corollary 14 (Random graph with planted partitions) Consider an Erd˝os-R´enyi graph with planted bisection. Assume that pwithin = ω(log(n)/n) and pbetween such that

npbetween→ ∞ (arbitrarily slow). Then, for all vertices u, v in the graph

1

n·Hij −1

=O

1

n pbetween

→0 in probability.

This result is a nice example to show that even though there is a strong cluster structure in the graph, hitting times and commute distances cannot see this cluster structure any more, once the graph gets too large. Note that the corollary even holds if npbetween grows much

slower than npwithin. That is, the larger our graph, the more pronounced is the cluster

structure. Nevertheless, the commute distance converges to a trivial result. On the other hand, we also see that the speed of convergence isO(npbetween), that is, ifpbetween=g(n)/n

(14)

5. Experiments

In this section we examine the convergence behavior of the commute distance in practice. We ran a large number of simulations, both on artificial and real world data sets, in order to evaluate whether the rescaled commute distance in a graph is close to its predicted limit expression 1/du+ 1/dv or not.We conducted simulations for artificial graphs (random

geometric graphs, planted partition graphs, preferential attachment graphs) and real world data sets of various types and sizes (social networks, biological networks, traffic networks; up to 1.6 million vertices and 22 million edges). The general setup is as follows. Given a graph, we compute the pairwise resistance distance Rij between all points. The relative

deviation between resistance distance and predicted result is then given by

RelDev(i, j) := |Rij −1/di−1/dj| Rij

.

Note that we report relative rather than absolute deviations because this is more meaning-ful if Rij is strongly fluctuating on the graph. It also allows to compare the behavior of

different graphs with each other. In all figures, we then report the maximum, mean and median relative deviations. In small and moderate sized graphs, these operations are taken over all pairs of points. In some of the larger graphs, computing all pairwise distances is prohibitive. In these cases, we compute mean, median and maximum based on a random subsample of vertices (see below for details).

The bottom line of our experiments is that in nearly all graphs, the deviations are small. In particular, this also holds for many moderate sized graphs, even though our results are statements about n→ ∞. This shows that the limit results for the commute distance are indeed relevant for practice.

5.1 Random Geometric Graphs

We start with the class of random geometric graphs, which is very important for machine learning. We use a mixture of two Gaussian distributions on Rd. The first two dimen-sions contain a two-dimensional mixture of two Gaussians with varying separation (centers (−sep/2,0) and (+sep/2,0), covariance matrix 0.2·Id, mixing weights 0.5 for both Gaus-sians). The remaining d−2 dimensions contain Gaussian noise with variance 0.2 as well. From this distribution we draw nsample points. Based on this sample, we either compute the unweighted symmetric kNN graph, the unweighted ε-graphs or the Gaussian similarity graph. In order to be able to compare the results between these three types of graphs we match the parameters of the different graphs: given some value k for the kNN-graph we choose the values of ε for the ε-graph and σ for the Gaussian graph as the maximal k-nearest neighbor distance in the data set.

(15)

200 500 1000 2000 −3.5

−3 −2.5 −2 −1.5 −1 −0.5

Influence of Sample Size Mix. of Gauss., dim=10, k= n/10, sep=1

log

10

(rel deviation)

n

Gaussian epsilon knn

20 50 100 200

−3.5 −3 −2.5 −2 −1.5 −1 −0.5 0

Influence of Sparsity Mix. of Gauss., n=2000, dim=2, sep=1

log

10

(rel deviation)

connectivity parameter k

Gaussian epsilon knn

1 2 3

−3.5 −3 −2.5 −2 −1.5 −1 −0.5 0

separation

log

10

(rel deviation)

Influence of Clusteredness Mix. of Gauss., n=2000, dim=10, k=100

Gaussian epsilon knn

2 5 10 15

−3.5 −3 −2.5 −2 −1.5 −1 −0.5

Influence of Dimension Mix. of Gauss., n=2000, k=50,sep=1

log

10

(rel deviation)

dim

Gaussian epsilon knn

Figure 1: Deviations in random geometric graphs. Solid lines show the maximum relative deviation, dashed lines the mean relative deviation. See text for more details.

increasing dimension, as predicted by our bounds. The intuitive explanation is that in higher dimensions, geometric graphs mix faster as there exist more “shortcuts” between the two sides of the point cloud. For a similar reason, the deviation bounds also decrease with increasing connectivity in the graph. All in all, the deviations are very small even though our sample sizes in these experiments are modest.

5.2 Planted Partition Graphs

Next we consider graphs according to the planted partition model. We modeled a graph with n vertices and two equal sized clusters with connectivity parameters pwithin and pbetween.

(16)

200 500 1000 2000 −2

−1.9 −1.8 −1.7 −1.6 −1.5

Influence of Sample Size

, Planted partition, pwithin = 0.10, pbetween=0.02

log

10

(rel deviation)

n

0.01 0.02 0.05 0.1

−1.8 −1.6 −1.4 −1.2 −1 −0.8 −0.6 −0.4

Influence of Clusteredness Planted partition, n=1000, pbetween=0.01

log

10

(rel deviation)

pwithin

Figure 2: Deviations in planted partition graphs. Solid lines show the maximum relative deviation, dashed lines the mean relative deviation. See text for more details.

5.3 Preferential Attachment Graphs

An important class of random graphs is the preferential attachment model (Barab´asi and Albert, 1999), because it can be used to model graphs with power law behavior. Our current convergence proofs cannot be carried over to preferential attachment graphs: the minimal degree in preferential attachment graphs is constant, so our proofs break down. However, our simulation results show that approximating commute distances by the limit expression 1/du+ 1/dv gives accurate approximations as well. This finding indicates that our

conver-gence results seem to hold even more generally than our theoretical findings suggest.

(17)

0 50 100 150 200 250 0

100 200 300 400 500

1 5 10 20 50 100 200 −3

−2.5 −2 −1.5 −1 −0.5 0

log

10

(rel deviation)

Influence of sparsity Preferential attachment, n = 1000

max

max 3 removed mean

Figure 3: Relative deviations for a preferential attachment graph. First two figures: illus-tration of such a graph in terms of histogram of degrees and heat plot of the adjacency matrix. Third figure: Relative deviations. As before the solid line shows the maximum relative deviation and the dashed line the mean relative de-viation. The line “max 3 removed” refers to the maximum deviation when we exclude the three highest-degree vertices from computing the maximum. See text for details.

approximated by the expression 1/du+ 1/dv. In the dense regime, the same is true unless

u orv is among the very top degree vertices.

5.4 Real World Data

(18)

0 1 2 3 −5

−4 −3 −2 −1 0

log 10(k)

log

10

(rel deviation)

USPS data set

Gaussian epsilon knn

Figure 4: Relative deviations on different similarity graphs on the USPS data. As before the solid line shows the maximum relative deviation and the dashed line the mean relative deviation. See text for details.

Type of Number Number mean median max

Name Network Vertices Edges rel dev rel dev rel dev

mousegene2 gene network 42923 28921954 0.12 0.01 0.70

jazz1 social network 198 5484 0.09 0.06 0.51

soc-Slashdot09022 social network 82168 1008460 0.10 0.06 0.33

cit-HepPh2 citation graph 34401 841568 0.09 0.07 0.25

soc-sign-epinions2 social network 119130 1408534 0.14 0.09 0.53

pdb1HYS2 protein databank 36417 4308348 0.12 0.10 0.28

as-Skitter2 internet topology 1694616 22188418 0.16 0.12 0.54

celegans1 neural network 453 4065 0.18 0.12 0.77

email1 email traffic 1133 10903 0.15 0.13 0.77

Amazon05052 co-purchase 410236 4878874 0.28 0.24 0.80

CoPapersCiteseer2 coauthor graph 434102 32073440 0.33 0.30 0.76

PGP1 user network 10680 48632 0.54 0.53 0.97

Figure 5: Relative Error of the approximation in real world networks. Downloaded from: 1, http://deim.urv.cat/~aarenas/data/welcome.htmand 2,http://www.cise. ufl.edu/research/sparse/matrices/. See text for details.

(19)

based on this subset. Figure 5 shows the mean, median and maximum relative commute distances in all these networks. We can see that in many of these data sets, the deviations are reasonable small. Even though they are not as small as in the artificial data sets, they are small enough to acknowledge that our approximation results tend to hold in real world graphs.

6. Proofs for the Flow-Based Approach

For notational convenience, in this section we work with the resistance distance Ruv =

Cuv/vol(G) instead of the commute distance Cuv, then we do not have to carry the factor

1/vol(G) everywhere.

6.1 Lower Bound

It is easy to prove that the resistance distance between two points is lower bounded by the sum of the inverse degrees.

Proposition 15 (Lower bound) Let G be a weighted, undirected, connected graph and consider two vertices sand t, s6=t. Assume that G remains connected if we remove sand t. Then the effective resistance betweens and tis bounded by

Rst ≥

Qst

1 +wstQst

,

where Qst = 1/(ds−wst) + 1/(dt−wst). Note that if sand t are not connected by a direct

edge (that is, wst = 0), then the right hand side simplifies to1/ds+ 1/dt.

Proof. The proof is based on Rayleigh’s monotonicity principle that states that increasing edge weights (conductances) in the graph can never increase the effective resistance between two vertices (cf. Corollary 7 in Section IX.2 of Bollobas, 1998). Given our original graphG, we build a new graphG0 by setting the weight of all edges to infinity, except the edges that are adjacent to s or t (setting the weight of an edge to infinity means that this edge has infinite conductance and no resistance any more). This can also be interpreted as taking all vertices except sand tand merging them to one super-nodea. Now our graph G0 consists of three vertices s, a, twith several parallel edges fromstoa, several parallel edges froma tot, and potentially the original edge between s and t (if it existed in G). Exploiting the laws in electrical networks (resistances add along edges in series, conductances add along edges in parallel; see Section 2.3 in Lyons and Peres (2010) for detailed instructions and examples) leads to the desired result.

6.2 Upper Bound

(20)

Theorem 16 (Resistance in terms of flows, cf. Bollobas, 1998) Let G = (V, E) be a weighted graph with edge weights we (e ∈ E). The effective resistance Rst between two

fixed vertices sand t can be expressed as

Rst = inf (

X

e∈E

u2e we

u= (ue)e∈E unit flow from sto t )

.

Note that evaluating the formula in the above theorem for any fixed flow leads to an upper bound on the effective resistance. The key to obtaining a tight bound is to distribute the flow as widely and uniformly over the graph as possible.

For the case of geometric graphs we are going to use a grid on the underlying space to construct an efficient flow between two vertices. Let X1, ..., Xn be a fixed set of points in

Rd and consider a geometric graph G with vertices X

1, ..., Xn. Fix any two of them, say

s:=X1 and t:=X2. Let X ⊂Rd be a connected set that contains bothsand t. Consider

a regular grid with grid width gon X. We say that grid cells are neighbors of each other if they touch each other in at least one edge.

Definition 17 (Valid grid) We call the gridvalid if the following properties are satisfied:

1. The grid width is not too small: Each cell of the grid contains at least one of the points X1, ..., Xn.

2. The grid width g is not too large: Points in the same or neighboring cells of the grid are always connected in the graph G.

3. Relation between grid width and geometry of X: Define the bottleneck h of the region

X as the largest u such that the set{x∈ X

dist(x, ∂X)> u/2} is connected.

We require that √d g≤h (a cube of side length g should fit in the bottleneck).

Under the assumption that a valid grid exists, we can prove the following general proposition that gives an upper bound on the resistance distance between vertices in a fixed geometric graph. Note that proving the existence of the valid grid will be an important part in the proofs of Theorems 3 and 4.

Proposition 18 (Resistance on a fixed geometric graph) Consider a fixed set of points X1, ..., Xn in some connected region X ⊂ Rd and a geometric graph on X1, ..., Xn.

Assume that X has bottleneck not smaller than h (where the bottleneck is defined as in the definition of a valid grid). Denote s=X1 and t =X2. Assume that s and t can be

con-nected by a straight line that stays inside X and has distance at least h/2 to ∂X. Denote the distance between sand t by d(s, t). Let g be the width of a valid grid on X and assume that d(s, t)>4√d g. By Nmin denote the minimal number of points in each grid cell, and

define aas

(21)

Assume that points that are connected in the graph are at mostQ grid cells apart from each other (for example, two points in the two grey cells in Figure 6b are 5 cells apart from each other). Then the effective resistance between sand t can be bounded as follows:

In case d >3 : Rst≤

1 ds

+ 1

dt

+

1 ds

+ 1

dt

1 Nmin

+ 1

Nmin2

6 + d(s, t)

g(2a+ 1)3 + 2Q

In case d= 3 : Rst≤

1 ds

+ 1

dt

+

1 ds

+ 1

dt

1 Nmin

+ 1

N2 min

4 log(a) + 8 + d(s, t)

g(2a+ 1)2 + 2Q

In case d= 2 : Rst≤

1 ds

+ 1

dt

+

1 ds

+ 1

dt

1 Nmin

+ 1

N2 min

4a+ 2 + d(s, t)

g(2a+ 1)+ 2Q

Proof. The general idea of the proof is to construct a flow from s tot with the help of the underlying grid. On a high level, the construction of the proof is not so difficult, but the details are lengthy and a bit tedious. The rest of this section is devoted to it.

Construction of the flow — overview. Without loss of generality we assume that there exists a straight line connecting sand twhich is along the first dimension of the space. By C(s) we denote the grid cell in which ssits.

Step 0. We start a unit flow in vertex s.

Step 1. We make a step to all neighbors Neigh(s) of s and distribute the flow uniformly over all edges. That is, we traverse ds edges and send flow 1/ds over each edge (see

Figure 6a). This is the crucial step which, ultimately, leads to the desired limit result.

Step 2. Some of the flow now sits insideC(s), but some of it might sit outside of C(s). In this step, we bring back all flow toC(s) in order to control it later on (see Figure 6b).

Step 3. We now distribute the flow from C(s) to a larger region, namely to a hypercube H(s) of side lengthhthat is perpendicular to the linear path fromstotand centered atC(s) (see the hypercubes in Figure 6c). This can be achieved in several substeps that will be defined below.

Step 4. We now traverse from H(s) to an analogous hypercube H(t) located at t using parallel paths, see Figure 6c.

Step 5. From the hypercube H(t) we send the flow to the neighborhood Neigh(t) (this is the “reverse” of steps 2 and 3).

Step 6. From Neigh(t) we finally send the flow to the destination t(“reverse” of step 1).

(22)

(a) Step 1.

Distribut-ing the flow from s

(black dot) to all its neighbors (grey dots).

C(s) = C(6) C(p) = C(1)

C(3) C(2)

(b) Step 2. We bring

back all flow from p

to C(s). Also shown in the figure is the hy-percube to which the flow will be expanded in Step 3.

s t

(c) Steps 3 and 4 of the flow construction:

distribute the flow from C(s) to a

“hyper-cube” H(s), then transmit it to a similar

hypercubeH(t) and guide it toC(t).

Figure 6: The flow construction — overview.

Layer 1

Layer 2 C(s)

(a) Definition of lay-ers.

(b) Before Step 3A starts, all flow is uni-formly distributed in

Layer i − 1 (dark

area).

(c) Step 3A then dis-tributes the flow from

Layeri−1 to the

(d) After Step 3A: all

flow is in Layeri, but

not yet uniformly dis-tributed

(e) Step 3B

redis-tributes the flow in

Layeri.

(f) After Step 3B, the flow is uniformly

dis-tributed in Layeri.

(23)

bound on the resistance. We start with the general cased >3. We will discuss the special casesd= 2 and d= 3 below.

In the computations below, by the “contribution of a step” we mean the part of the sum in Theorem 16 that goes over the edges considered in the current step.

to flow 1/ds overds edges. According to the formula in Theorem 16 this contributes

r1 =ds·

1 d2

s

= 1

ds

to the overall resistanceRst.

Step 2: After Step 1, the flow sits on all neighbors of s, and these neighbors are not nec-essarily all contained in C(s). To proceed we want to re-concentrate all flow in C(s). For each neighborp ofs, we thus carry the flow along a Hamming path of cells from pback to C(s), see Figure 6b for an illustration.

To compute an upper bound for Step 2 we exploit that each neighborp ofshas to traverse at mostQcells to reach C(s) (recall the definition of Qfrom the proposition). Let us fixp. After Step 1, we have flow of size 1/ds inp. We now move this flow from p to all points in

the neighboring cellC(2) (cf. Figure 6b). For this we can use at leastNmin edges. Thus we

send flow of size 1/ds overNminedges, that is each edge receives flow 1/(dsNmin). Summing

the flow fromC(p) to C(2), for all points p, gives

dsNmin

1 dsNmin

2

= 1

dsNmin

,

Then we transport the flow from C(2) along to C(s). Between each two cells on the way we can use Nmin2 edges. Note, however, that we need to take into account that some of these edges might be used several times (for different pointsp). In the worst case, C(2) is the same for all pointsp, in which case we send the whole unit flow over these edges. This amounts to flow of size 1/(Nmin2 ) over (Q−1)Nmin2 edges, that is a contribution of

Q−1 Nmin2 . Altogether we obtain

r2≤

1 dsNmin

+ Q

Nmin2 .

(24)

is, as long asd >3) that is perpendicular to the line that connectss and t(see Figure 6c, where the case of d = 3 and a 2-dimensional “hypercube” are shown). To distribute the flow to this cube we divide it into layers (see Figure 7a). Layer 0 consists of the cell C(s) itself, the first layer consists of all cells adjacent to C(s), and so on. Each side of Layer i consists of

li= (2i+ 1)

cells. For the 3-dimensional cube, the numberzi of grid cells in Layer i,i≥1, is given as

zi= 6·(2i−1)2 | {z } interior cells of the faces

+ 12·(2i−1)

| {z }

cells along the edges (excluding corners)

+ 8

|{z} corner cells

= 24i2+ 2.

All in all we consider

a=

j

h/(2g√d−1)

k

≤jh/(2(g−1)√d−1)

k

layers, so that the final layer has diameter just a bit smaller than the bottleneckh. We now distribute the flow stepwise through all layers, starting with unit flow in Layer 0. To send the flow from Layeri−1 to Layeriwe use two phases, see Figure 7 for details. In the “ex-pansion phase” 3A(i) we transmit the flow from Layeri−1 to all adjacent cells in Layer i. In the “redistribution phase” 3B(i) we then redistribute the flow in Layerito achieve that it is uniformly distributed in Layeri. In all phases, the aim is to use as many edges as possible.

Expansion phase 3A(i). We can lower bound the number of edges between Layeri−1 and Layeri by zi−1Nmin2 : each of thezi−1 cells in Layeri−1 is adjacent to at least one of the

cells in Layer i, and each cell contains at least Nmin points. Consequently, we can upper

bound the contribution of the edges in the expansion phase 3A(i) to the resistance by

r3A(i)≤zi−1Nmin2 ·

1 zi−1Nmin2

2

= 1

zi−1Nmin2

.

Redistribution phase 3B(i). We make a crude upper bound for the redistribution phase. In this phase we have to move some part of the flow from each cell to its neighboring cells. For simplicity we bound this by assuming that for each cell, we had to move all its flow to neighboring cells. By a similar argument as for Step 3A(i), the contribution of the redistribution step can be bounded by

r3B(i)≤ziNmin2 ·

1 ziNmin2

2

= 1

ziNmin2

.

(25)

r3 = a X

i=1

r3A(i)+r3B(i) ≤

2 Nmin2

a X

i=1

1 zi−1

≤ 2

Nmin2 1 + 1 24

a−1 X

i=1

1 i2

!

≤ 3

Nmin2 . To see the last inequality, note that the sumPa−1

i=1 1/i2 is a partial sum of the over-harmonic

series that converges to a constant smaller than 2.

Step 4: Now we transfer all flow in “parallel cell paths” from H(s) to H(t). We have (2a+ 1)3 parallel rows of cells going from H(s) to H(t), each of them contains d(s, t)/g cells. Thus all in all we traverse (2a+ 1)3Nmin2 d(s, t)/g edges, and each edge carries flow 1/((2a+ 1)3Nmin2 ). Thus step 4 contributes

r4≤(2a+ 1)3Nmin2

d(s, t)

g ·

1 (2a+ 1)3N2

min 2

= d(s, t) g(2a+ 1)3N2

min

.

Step 5 is completely analogous to steps 2 and 3, with the analogous contribution r5 = 1

dtNmin +

Q N2

min

+r3.

Step 6is completely analogous to step 1 with overall contribution ofr6= 1/dt.

Summing up the general case d > 3. All these contributions lead to the following overall bound on the resistance in case d >3:

Rst ≤

1 ds

+ 1

dt

+

1 ds

+ 1

dt

1 Nmin

+ 1

N2 min

6 + d(s, t)

g(2a+ 1)3 + 2Q

with aand Q as defined in Proposition 18. This is the result stated in the proposition for case d >3.

Note that as spelled out above, the proof works whenever the dimension of the space satisfies d >3. In particular, note that even ifdis large, we only use a 3-dimensional “hypercube” in Step 3. It is sufficient to give the rate we need, and carrying out the construction for higher-dimensional hypercube (in particular Step 3B) is a pain that we wanted to avoid.

The special case d= 3. In this case, everything works similar to above, except that we we only use a 2-dimensional “hypercube” (this is what we always show in the figures). The only place in the proof where this really makes a difference is in Step 3. The number zi of

grid cells in Layeriis given aszi= 8i. Consequently, instead of obtaining an over-harmonic

sum inr3 we obtain a harmonic sum. Using the well-known fact thatPai=11/i≤log(a) + 1

(26)

r3≤

2 N2

min

1 +1 8

a−1 X

i=1

1 i

!

≤ 2

N2 min

(2 + log(a)).

In Step 4 we just have to replace the terms (2a+ 1)3 by (2a+ 1)2. This leads to the result in Proposition 18.

The special case d = 2. Here our “hypercube” only consists of a “pillar” of 2a+ 1 cells. The fundamental difference to higher dimensions is that in Step 3, the flow does not have so much “space” to be distributed. Essentially, we have to distribute all unit flow through a “pillar”, which results in contributions

r3 ≤

2a+ 1 Nmin2 r4 ≤

d(s, t) g

1 (2a+ 1)Nmin2 .

This concludes the proof of Proposition 18.

Let us make a couple of technical remarks about this proof. For the ease of presentation we simplified the proof in a couple of respects.

Strictly speaking, we do not need to distribute the whole unit flow to the outmost Layera. The reason is that in each layer, a fraction of the flow already “branches off” in direction oft. We simply ignore this leaving flow when bounding the flow in Step 3, our construction leads to an upper bound. It is not difficult to take the outbound flow into account, but it does not change the order of magnitude of the final result. So for the ease of presentation we drop this additional complication and stick to our rough upper bound.

When we consider Steps 2 and 3 together, it turns out that we might have introduced some loops in the flow. To construct a proper flow, we can simply remove these loops. This would then just reduce the contribution of Steps 2 and 3, so that our current estimate is an overestimation of the whole resistance.

The proof as it is spelled out above considers the case where s and t are connected by a straight line. It can be generalized to the case where they are connected by a piecewise linear path. This does not change the result by more than constants, but adds some tech-nicality at the corners of the paths.

(27)

6.3 Proof of the Theorems 3 and 4

First of all, note that by Rayleigh’s principle (cf. Corollary 7 in Section IX.2 of Bollobas, 1998) the effective resistance between vertices cannot decrease if we delete edges from the graph. Given a sample from the underlying density p, a random geometric graph based on this sample, and some valid regionX, we first delete all points that are not inX. Then we consider the remaining geometric graph. The effective resistances on this graph are upper bounds on the resistances of the original graph. Then we conclude the proofs with the following arguments:

Proof of Theorem 3. The lower bound on the deviation follows immediately from Propo-sition 15. The upper bound is a consequence of PropoPropo-sition 18 and well known properties of random geometric graphs (summarized in the appendix). In particular, note that we can choose the grid width g := ε/(2√d−1) to obtain a valid grid. The quantity Nmin can be

bounded as stated in Proposition 28 and is of order nεd, the degrees behave as described in Proposition 29 and are also of order nεd (we useδ = 1/2 in these results for simplicity).

The quantity ain Proposition 18 is of the order 1/ε, and Q can be bounded by Q= ε/g and by the choice ofg is indeed a constant. Plugging all these results together leads to the final statement of the theorem.

Proof of Theorem 4. This proof is analogous to theε-graph. As grid widthg we choose g =Rk,min/(2

d−1) where Rk,min is the minimal k-nearest neighbor distance (note that

this works for both the symmetric and the mutual kNN-graph). Exploiting Propositions 28 and 30 we can see thatRk,min and Rk,max are of order (k/n)1/d, the degrees andNmin are

of orderk,ais of the order (n/k)1/dand Qa constant. Now the statements of the theorem follow from Proposition 18.

7. Proofs for the Spectral Approach

In this section we present the proofs of the results obtained by spectral arguments.

7.1 Proof of Propositions 1 and 5

First we prove the general formulas to compute and approximate the hitting times.

Proof of Proposition 1. For the hitting time formula, let u1, . . . , un be an orthonormal

set of eigenvectors of the matrixD−1/2W D−1/2corresponding to the eigenvaluesµ1, . . . , µn.

Letuij denote thej-th entry ofui. According to Lov´asz (1993) the hitting time is given by

Hij = vol(G) n X

k=2

1 1−µk

u2kj dj

−upkiukj

didj !

.

A straightforward calculation using the spectral representation of Lsym yields

Hij = vol(G) *

1 √ dj

ej, n X

k=2

1 1−µk

uk,√1d

j

ej−√1d

iei

(28)

= vol(G)

1 √

djej, L

sym

1 √

djej

−√1

diei

.

The result for the commute time follows from the one for the hitting times.

In order to prove Proposition 5 we first state a small lemma. For convenience, we set A=D−1/2W D−1/2 and ui =ei/

di. Furthermore, we are going to denote the projection

on the eigenspace of the j-the eigenvalue λj of Aby Pj.

Lemma 19 (Pseudo-inverse L†sym) The pseudo-inverse of the symmetric Laplacian sat-isfies

L†sym=I−P1+M,

where I denotes the identity matrix and M is given as follows:

M =

X

k=1

(A−P1)k = n X

r=2

λr

1−λr

Pr. (3)

Furthermore, for all u, v∈Rn we have

| hu, M vi | ≤ 1

1−λ2

· k(A−P1)uk · k(A−P1)vk+| hu , (A−P1)vi |. (4)

Proof. The projection onto the null space ofLsym is given by P1 = √

d√dT/P

i=1di where √

d= (√d1, . . . , √

dn)T. As the graph is not bipartite,λn>−1. Thus the pseudoinverse of

Lsym can be computed as

L†sym= (I−A)†= (I −A+P1)−1−P1=

X

k=0

(A−P1)k−P1.

Thus

M :=

X

k=1

(A−P1)k =

X

k=0

(A−P1)k(A−P1)

= ( ∞ X k=0 n X r=2

λkrPr)( n X

r=2

λrPr) = ( n X

r=2

1 1−λr

Pr)( n X

r=2

λrPr)

=

n X

r=2

λr

1−λr

Pr

which proves Equation (3). By a little detour, we can also see

M =

X

k=0

(A−P1)k(A−P1)2+ (A−P1) = ( n X

r=2

1 1−λr

(29)

Exploiting that (A−P1) commutes with all Pr gives

hu, M vi=

(A−P1)u , ( n X

r=2

1 1−λr

Pr)(A−P1)v

+hu,(A−P1)vi.

Applying the Cauchy-Schwarz inequality and the factkPn

to the desired statement.

Proof of Proposition 5. This proposition now follows easily from the Lemma above. Observe that

hui, Auji=

wij

didj

≤ wmax

d2min

kAuik2 = n X k=1 w2 ik d2 idk

≤ wmax

dmind2i X

k

wik=

wmax

dmin

1 di

≤ wmax

d2 min kA(ui−uj)k2 ≤

wmax dmin 1 di + 1 dj

≤ 2wmax

d2min .

Exploiting thatP1(ui−uj) = 0 we get for the hitting time

1

vol(G)Hij− 1 dj

=| huj, M(uj−ui)i |

≤ 1

1−λ2

kAujk · kA(uj−ui)k+|huj, A(uj−ui)i|

≤ 1

1−λ2

wmax dmin 1 p dj s 1 di + 1 dj !

+ wij didj

+ wjj d2

j

≤ 2wmax d2min

1 1−λ2

+ 1

.

For the commute time, we note that

1

vol(G)Cij −

1 di + 1 dj

=hui−uj, M(ui−uj)i

≤ 1

1−λ2

kA(ui−uj)k2+

hui−uj, A(ui−uj)i

≤ wmax

dmin

1 1−λ2

+ 2 1 di + 1 dj .

We would like to point out that the key to achieving this bound is not to give in to the temptation to manipulate Eq. (3) directly, but to bound Eq. (4). The reason is that we can compute terms of the form hui, Auji and related terms explicitly, whereas we do not

(30)

7.2 The Spectral Gap in Random Geometric Graphs

As we have seen above, a key ingredient in the approximation result for hitting times and commute distances is the spectral gap. In this section we show how the spectral gap can be lower bounded for random geometric graphs. We first consider the case of a fixed geometric graph. From this general result we then derive the results for the special cases of theε-graph and the kNN-graphs. All graphs considered in this section are unweighted and undirected. We follow the strategy in Boyd et al. (2005) where the spectral gap is bounded by means of the Poincar´e inequality (see Diaconis and Stroock, 1991, for a general introduction to this technique; see Cooper and Frieze, 2011, for a related approach in simpler settings). The outline of this technique is as follows: for each pair (X, Y) of vertices in the graph we need to select a path γXY in the graph that connects these two vertices. In our case, this

selection is made in a random manner. Then we need to consider all edges in the graph and investigate how many of the pathsγXY, on average, traverse this edge. We need to control

the maximum of this “load” over all edges. The higher this load is, the more pronounced is the bottleneck in the graph, and the smaller the spectral gap is. Formally, the spectral gap is related to the maximum average load bas follows.

Proposition 20 (Spectral gap, Diaconis and Stroock, 1991) Consider a finite, con-nected, undirected, unweighted graph that is not bipartite. For each pair of vertices X6=Y let PXY be a probability distribution over all paths that connect X and Y and have uneven

length. Let (γXY)X,Y be a family of paths independently drawn from the respective PXY.

Define b := max{eedge}E|{γXY

e ∈ γXY}|. Denote by |γmax| the maximum path length

(where the length of the path is the number of edges in the path). Then the spectral gap in the graph is bounded as follows:

1−λ2≥

vol(G) d2

max|γmax|b

and 1− |λn| ≥

2 dmax|γmax|b

.

For deterministic sets Γ, this proposition has been derived as Corollary 1 and 2 in Diaconis and Stroock (1991). The adaptation for random selection of paths is straightforward, see Boyd et al. (2005).

The key to tight bounds based on Proposition 20 is a clever choice of the paths. We need to make sure that we distribute the paths as “uniformly” as possible over the whole graph. This is relatively easy to achieve in the special situation where X is a torus with uniform distribution (as studied in Boyd et al., 2005; Cooper and Frieze, 2011) because of symmetry arguments and the absence of boundary effects. However, in our setting with general X

and p we have to invest quite some work.

7.2.1 Fixed Geometric Graph on the Unit Cube in Rd

(31)

C(b)

C(a)

Figure 8: Canonical path between a and b. We first consider a “Hamming path of cells” betweenaand b. In all intermediate cells, we randomly pick a point.

Construction of the paths. Assume we want to construct a path between two vertices a and b that correspond to the points a = (a1, . . . , ad), b = (b1, . . . , bd) ∈ [0,1]d. Let C(a)

and C(b) denote the grid cells containing a and b, denote the centers of these cells by c(a) = (c(a)1, . . . , c(a)d) and c(b) = (c(b)1, . . . , c(b)d). We first construct a deterministic

“cell path” between the cells C(a) and C(b) (see Figure 8). This path simply follows a Hamming path: starting at cell C(a) we change the first coordinate until we have reached c(b)1. For example, if c(a)1 < c(b)1 we traverse the cells

c(a)1, c(a)2, . . . , c(a)d

; c(a)1+g, c(a)2, . . . , c(a)d

;. . .; c(b)1, c(a)2, . . . , c(a)d

.

Then we move along the second coordinate from c(a)2 until we have reached c(b)2, that is

we traverse the cells (c(b)1,∗, c(a)3, . . . , c(a)d). And so on. This gives a deterministic way

of traversing adjacent cells from C(a) to C(b). Now we transform this deterministic “cell path” to a random path on the graph. In the special cases wherea and b are in the same cell or in neighboring cells, we directly connect a and b by an edge. In the general case, we select one data point uniformly at random in each of the interior cells on the cell path. Then we connect the selected points to form a path. Note that we can always force the paths to have uneven lengths by adding one more point somewhere in between.

Proposition 21 (Path construction is valid) Assume that (1) Each cell of the grid contains at least one data point. (2) Data points in the same and in neighboring cells are always connected in the graph. Then the graph is connected, and the paths constructed above are paths in the graph.

Proof. Obvious, by construction of the paths.

In order to apply Proposition 20 we now need to compute the maximal average load of all paths.

Proposition 22 (Maximum average load for fixed graph on cube) Consider a ge-ometric graph on [0,1]d and the grid of width g on [0,1]d. Denote by Nmin and Nmax the

minimal and maximal number of points per grid cell. Construct a random set of paths as described above.

Updating...