Convergence of clustering coefficient and function for fixed k

FOR FIXED k 105

it follows from (4.27) that as k → ∞,

γ(k) ∼ R∞

0 y

2e^−y/2ρ(y, k)αe^−αydy R∞

0 ρ(y, k)αe^−αydy

∼ log(k) R∞

0 e^−y/2ρ(y, k)αe^−αydy R∞

0 ρ(y, k)αe^−αydy ∼ ξ log(k)k⁻¹= 6ν

π log(k)k⁻¹, when α = 3/4.

Proof when α > 3/4 Again, by Proposition4.1.7, equation (4.26) and (4.23) with β = 1/2, it follows that as k → ∞,

γ(k) ∼ α −¹₂ α −³₄

R∞

0 e^−y/2ρ(y, k)αe^−αydy R∞

0 ρ(y, k)αe^−αydy ∼ α −¹₂

α −³₄ ξk⁻¹= 8αν π(4α − 3)k⁻¹.

4.2 Convergence of clustering coefficient and function for fixed k

We will first derive Theorem1.5.2. It will turn out that Theorem1.5.1has a quick derivation assuming Theorem1.5.2.

4.2.1 Clustering function for fixed k, proving Theorem 1.5.2 We will now show that the clustering function of the KPKVB model c(k; G_n)−→^P γ(k) for a fixed k. The key ideas are: First of all, the coupling of the Poissonized KPKVB model with the box model is guaranteed to be exact (in the sense that it also preserves edges) for all vertices up to height R/4. Secondly, when computing the expected value of the clustering function c(k; G) in the subgraph of the box model induced by all vertices up to height R/4 using the Campbell-Mecke formula we obtain integrals that are very similar to the expressions we found earlier for γ(k).

We will repeatedly rely on the following observation.

Lemma 4.2.1. Let k ≥ 2 and let G, H be graphs such that G is an induced subgraph of H, or vice versa. Then

|c(k; G) − c(k; H)| ≤ 6|E(G)∆E(H)|

NG(k) − 2|E(G)∆E(H)|,

provided that NG(k) > 2|E(G)∆E(H)|, where NG(k) is the number of vertices with degree k in G.

Proof. We observe that

|c(k; G) − c(k; H)| cH(v) since one of G, H is an induced subgraph of the other. In the third line we use that the clustering coefficients cG(v), cH(v) take values in [0, 1], and if either deg_G(v) 6= deg_H(v) or v ∈ V (G)∆V (H) and v has degree K in whichever of G, H it belongs to then at least one edge of E(G)∆E(H) is incident with v, and that every edge in E(G)∆E(H) only affects the status of its two incident vertices. For the fifth line we used that |N_G(k) − N_H(k)| ≤ |{v ∈ V (G) : deg_G(v) = k}∆{v ∈

4.2. CONVERGENCE OF CLUSTERING COEFFICIENT AND FUNCTION

FOR FIXED k 107

V (H) : deg_H(v) = k}| ≤ 2|E(G)∆E(H)| for similar reasons. In the last line we used that NH(k) ≥ NG(k) − |NG(k) − NH(k)| ≥ NG(k) − 2|E(G)∆E(H)|.) Lemma 4.2.2. |E(Gn)∆E(GPo)| = o(n) a.a.s.

Proof. Let us fix some ε > 0 and write

G₋:= G((1 − ε)n, α, (1 − ε)ν), G₊:= G((1 + ε)n, α, (1 + ε)ν).

(We ignore rounding issues, i.e. the issue that (1−ε)n, (1+ε)n may not be integers, to avoid notational burden. We leave the straightforward details of adapting our arguments below to deal with it to the reader.)

Observe that the vertices of G₋, G+, Gn, GPo all live on the same hyperbolic disk, of radius R = 2 ln(n/ν). We consider the standard coupling where we have an infinite supply of i.i.d. points u1, u2, . . . chosen according to the (α, R)-quasi uniform distribution, the vertices of Gn = G(n; α, ν) are u1, . . . , un, the vertices of G− are u1, . . . , u(1−ε)n, the vertices of G+ are u1, . . . , u(1+ε)nand the vertices of GPo are u1, . . . , uN with N = Po(n) independently of u^d 1, u2, . . . .

As N = Po(n), by Chebychev’s inequality, we have that |N − n| < εn a.a.s.^d So in particular, under our coupling we have G₋ ⊆ G_n, G_Po⊆ G₊ a.a.s. We now point out that, by the results of Gugelmann et al. on the average degree ( [30], Theorem 2.3), we have that a.a.s.

|E(G₋)| = (1−ε)²· 4να²

π(2α − 1)²·n+o(n), |E(G+)| = (1+ε)²· 4να²

π(2α − 1)²·n+o(n).

It follows that

|E(Gn)∆E(GPo)| ≤ |E(G+) \ E(G₋)| = ε · 16να²

π(2α − 1)² · n + o(n) a.a.s.

This holds for every fixed ε > 0. Sending ε & 0, concludes the proof of the lemma.

Next, let us recall that by the results of Gugelmann et al. on the degree dis-tribution ( [30], Theorem 2.2) we have that

N_G_n(k) n

−−−−→P

n→∞ p(k), (4.28)

for every fixed k. In particular N_G_n(k) = Ω(n) a.a.s. Combining this with lem-mas4.2.1and4.2.2we obtain:

Corollary 4.2.3. For every fixed k ≥ 2, we have c(k; G_n) − c(k; G_Po)−−−−→^P

n→∞ 0, and NG_Po(k) n

−−−−→P n→∞ p(k).

(For the second statement we use that NG_n(k) − 2|E(Gn)∆E(GPo)| ≤ NG_Po(k) ≤ NG_n(k) + 2|E(Gn)∆E(GPo)|.)

In the remainder of this section, we will denote by Gbox,Hthe subgraph of Gbox

induced by all vertices (x, y) ∈ Vbox= P ∩ R of height at most R/4.

Lemma 4.2.4. Under the coupling provided by Lemma1.6.1, a.a.s., Gbox,H is an induced subgraph of G_Poand |E(G_Po) \ E(G_box,H)| = o(n).

Proof. Recall that under the coupling of Lemma1.6.1, we can view G_boxand G_Po as having the same vertex set V_box = P ∩ R; and two points p = (x, y), p⁰ = (x⁰, y⁰) ∈ Vboxare joined by an edge in Gboxif |x − x⁰|_πeR/2 ≤ e^(y+y⁰^)/2, while p, p⁰ are joined by an edge in GPoif either y + y⁰≥ R or y + y⁰< R and |x − x⁰|_πeR/2≤ Φ(y, y⁰) with Φ as provided by (1.8). It follows immediately from Lemma 1.6.4 that Gbox,H is an induced subgraph of GPo, a.a.s., as claimed.

Fix ε > 0, and let X denote the number points of Vbox with y-coordinate

≥ (1 − ε)R. Then X is a Poisson random variable with expectation

E[X] = µ (−π

2e^R/2,π

2e^R/2] × [(1 − ε)R, R]

= R^π₂^e^R/2

−^π₂e^R/2

RR (1−ε)R

να

π e^−αy dy dx

= O(eR/2−(1−ε)αR) = o(1),

the last equality holding provided ε was chosen sufficiently small (using that α >

1/2). We conclude that, a.a.s., there are no vertices of height ≥ (1 − ε)R.

Let Y denote the number of pairs of vertices p = (x, y), p⁰ = (x⁰, y⁰) ∈ Vbox

with y + y⁰≥ R. Then, by the Campbell-Mecke formula

E[Y ] = Z

R1{y+y⁰≥R}µ(dp⁰)µ(dp)

Z ^π₂e^R/2

−^π₂e^R/2

Z R 0

Z ^π₂e^R/2

−^π₂e^R/2

Z R R−y

να π

e^−α(y+y⁰⁾dy⁰dx⁰dydx

= O(Re^(1−α)R) = o(n),

the last equality holding because α > 1/2 and n = νe^R/2. In particular, by Markov’s inequality, Y = o(n) a.a.s.

Now let Z denote the number of pairs of vertices p = (x, y), p⁰ = (x⁰, y⁰) for which y+y⁰< R, y < (1−ε)R, R/4 ≤ y⁰< (1−ε)R and |x−x⁰|_πeR/2 < Φ(y, y⁰). By Lemma1.6.2we have that Φ(y, y⁰) = O(e^(y+y⁰^)/2) for all such y, y⁰. By Campbell-Mecke we can write

4.2. CONVERGENCE OF CLUSTERING COEFFICIENT AND FUNCTION

FOR FIXED k 109

E[Z] =

Z ^π₂e^R/2

−^π₂e^R/2

Z (1−ε)R 0

Z ^π₂e^R/2

−^π₂e^R/2

Z (1−ε)R R/4

1{|x−x⁰|

πeR/2<Φ(y,y⁰),y+y⁰<R}

×να π

e^−α(y+y⁰⁾dy⁰dxdydx⁰

= O e^R/2

Z (1−ε)R 0

Z (1−ε)R R/4

e(1/2−α)(y+y⁰)dy⁰dy

= O

eR/2+(1/2−α)R/4

= o(n).

Hence also Z = o(n) a.a.s. by Markov’s inequality.

This concludes the proof as we have now shown that under the stated coupling, a.a.s., Gbox,H and GPo differ by only o(n) edges.

Analogously to Corollary4.2.3we obtain:

Corollary 4.2.5. For every fixed k ≥ 2 we have

c(k; GPo) − c(k; Gbox,H)−−−−→^P

n→∞ 0, and NG_box,H(k) n

−−−−→P n→∞ p(k).

Lemma 4.2.6. For every fixed k ≥ 2 we have c(k; Gbox,H)−−−−→^P

n→∞ γ(k).

Proof. We write R₋:= R ∩ (R × [0, R/4]) = (−^π₂e^R/2,^π₂e^R/2] × [0, R/4] and set

X := X

v∈N_Gbox,H(k)

c(v) = X

v∈P∩R−

cG_box,H(v) · 1{deg_Gbox,H(v)=k}.

By the Campbell-Mecke formula

E[X] = Z

R−

cG^z_box,H(z) · 1{deg_Gz

box,H(z)=k}

i µ(dz),

where G^z_box,H denotes the graph we get by adding z as an additional vertex to G_box,H, and adding edges between z and the other vertices as per the connection rule (for G_box). Spelling out the intensity measure µ, plus symmetry

considera-tions, gives

E[X] =

Z ^π₂e^R/2

−^π₂e^R/2

Z R/4 0

c_G(x,y)

box,H((x, y)) · 1{deg

G(x,y) box,H

((x,y))=k}

να π

e^−αy dy dx

= n

˙ Z ^R/4

c_G(0,y)

box,H((0, y)) · 1{deg

G(0,y) box,H

((0,y))=k}

αe^−αydy

= n · Z R/4

c_G(0,y0)

box,H

((0, y0))

deg_G(0,y0) box,H

((0, y0)) = k

degG^(0,y0)_box,H((0, y₀)) = k

αe^−αy⁰dy₀.

The random variable deg_G(0,y0) box,H

((0, y0)) follows a Poisson distribution with expec-tation

deg_G(0,y0) box,H

((0, y0))

= µ(B_∞((0, y0)) ∩ R₋)

= Z R/4

Z e^(y+y0)/2

−e^(y+y0)/2

να π

e^−αy dx dy

= ξe^y⁰^/2· (1 − e^{(1/2−α)R/4}).

Hence, for every fixed y₀ and k, we have that

deg_G(0,y0) box,H

((0, y0)) = k

−−−−→

n→∞ P(Po(ξe^y⁰^/2) = k) = ρ(y0, k).

Next we remark that, analogously to the argument given in the beginning of Section4.1.2, we have

cG^(0,y0)_box,H((0, y0)) deg

G^(0,y0)_box,H((0, y0)) = k

= P(w¹∈ B_∞(w2)) =: Pn(y0),

with w₁ = (x₁, y₁), w₂ = (x₂, y₂) chosen independently from B_∞((0, y₀)) ∩ R₋ according to the probability measure we get by renormalizing µ, i.e. with pdf f_µ· 1B∞((0,y₀))∩R−/µ(B_∞((0, y₀)) ∩ R₋). By considerations completely analogous to those following Lemma4.1.1, the random variables y₁, y₂both follow a truncated exponential distribution with parameter α − 1/2 truncated at height R/4 (i.e. with density 1{y_i≤R/4}· (α − 1/2)e^(1/2−α)yⁱ/(1 − e^{(1/2−α)R/4})) and, given the values of y₀, y₁, y₂, each x_i is chosen uniformly on the interval [−e^(y⁰^+yⁱ^)/2, e^(y⁰^+yⁱ^)/2]. In

4.2. CONVERGENCE OF CLUSTERING COEFFICIENT AND FUNCTION

FOR FIXED k 111

particular Pn(y0) =

α − 1/2 1 − e^{(1/2−α)R/4}

²Z R/4 0

Z R/4 0

P (y0, y1, y2)e^(1/2−α)(y¹^+y²⁾dy1dy2, with P (y0, y1, y2) as defined in the paragraph following Lemma 4.1.1. (That is, P (y0, y1, y2) is the probability that |x1− x2| < e^(y¹^+y²^)/2, where x1, x2 are inde-pendent with xi uniform on the interval [−e^(y⁰^+yⁱ^)/2, e^(y⁰^+yⁱ^)/2]). It follows that, for any fixed y0, we have

P_n(y₀) −−−−→

n→∞ (α − 1/2)² Z ∞

Z ∞ 0

P (y₀, y₁, y₂)e^(1/2−α)(y¹^+y²⁾dy₁dy₂= P (y₀).

(Applying monotone convergence to justify the convergence of the integral as n →

∞.)

Since (expected) clustering coefficients and probabilities are between zero and one and αe^αy⁰is integrable, we can now apply the dominated convergence theorem to obtain that

EX n −−−−→

n→∞

Z ∞ 0

P (y0)ρ(y0, k)αe^−αy⁰ dy0= p(k) · γ(k). (4.29) (Applying (4.6) for the last equality.)

Next, we turn attention to X(X − 1) = P

u6=v∈N_Gbox,H(k)c(v)c(u). Another application of Campbell-Mecke shows that

E[X(X − 1)]

= Z

R−

c_Gz,z0

box,H

(z)c_Gz,z0 box,H

(z⁰) · 1{deg

Gz,z0 box,H

(z)=deg

Gz,z0 box,H

(z⁰)=k}

µ(dz)µ(dz⁰),

with G^z,z_box,H⁰ denoting the graph we get by adding z, z⁰ as additional vertices to Gbox,H. Now note that if z = (x, y) and z⁰ = (x⁰, y⁰) satisfy |x − x⁰|_πeR/2 > 2e^R/4 then the neighbourhoods of z, z⁰ are determined by the points of the Poisson process P in disjoint areas of the plane. This implies that, provided |x − x⁰|_πeR/2 >

2e^R/4: E

c_Gz,z0

box,H

(z)c_Gz,z0 box,H

(z⁰) · 1{deg

Gz,z0 box,H

(z)=deg

Gz,z0 box,H

(z⁰)=k}

= E

h c_G^z

box,H(z) · 1{deg_Gz

box,H(z)=k}

i· E

c_Gz0

box,H

(z⁰) · 1{deg

Gz0 box,H

(z⁰)=k}

(4.30)

On the other hand, the LHS of (4.30) is always between zero and one, also if

|x − x⁰|_πeR/2 ≤ 2e^R/4. We conclude that E[X(X − 1)] ≤

R−

E hc_G^z

box,H(z) · 1{deg_Gz

box,H(z)=k}

× E

c_Gz0

box,H(z⁰) · 1{deg

Gz0 box,H

(z⁰)=k}

µ(dz)µ(dz⁰) +

R−

1{|x−x⁰|≤2e^R/4}µ(dz)µ(dz⁰)

= Z

R−

E h

cG^z_box,H(z) · 1{deg_Gz

box,H(z)=k}

i µ(dz)

+ O(e^3R/4)

= (E[X])²+ O(n^3/2).

Combining this with (4.29), it follows that Var(X) = E[X(X − 1)] + E[X] − (E[X])²= o

(E[X])²

. By Chebychev’s inequality, we therefore have X = n · γ(k) · p(k) + o(n) a.a.s.

In combination with Corollary4.2.5(second limit) we can conclude that c(k; Gbox,H) = X

N_G_box,H_(k)

−−−−→P n→∞ γ(k), as desired.

Proof of Theorem 1.5.2: For completeness, we point out that Theorem 1.5.2 follows immediately from Corollaries4.2.3,4.2.5and Lemma4.2.6.

4.2.2 Overall clustering coefficient, proving Theorem 1.5.1 Proof of Theorem 1.5.1: Recall in Section 4.1, we defined p(k) := P(D = k), γ := EC, γ(k) := E(C|D = k) with D the degree and C the clustering coefficient of the “typical point” in the infinite limit model G_∞. We can write

γ = EC =X

k≥2

E (C|D = k) P(D = k) = X

k≥2

γ(k) · p(k).

For the KPKVB random graph, or any graph for that matter, we have the similar relation

c(Gn) =X

k≥2

c(k; Gn) · (NG_n(k)/n).

By Theorem1.5.2and (4.28) we have, for any fixed k ≥ 2:

c(Gn) ≥

k=2

c(k; Gn) · (NG_n(k)/n)−−−−→^P

n→∞

k=2

γ(k) · p(k), (4.31) where Slutsky’s theorem justifies the convergence in probability. On the other hand we have

In document Perfect matchings, Hamilton cycles, degree distribution and local clustering in Hyperbolic Random Graphs (Page 112-120)