seshadri.pptx

(1)

Monotonicity testing, alternating paths,

directed isoperimetry, and strawberries

C. Seshadhri

(Sandia National Labs, Livermore)

(2)

Chapter I

Getting to know the problem

(3)

Property Testing in a Slide

•

f: {0,1}

n

-> R

– Amen

•

R = {0,1}, or the reals

•

A “property” P is some subset of these functions

•

Hamming distance between two functions:

d(f,g) = (# x s.t. f(x) ≠ g(x))/ 2

n

•

d(f,P) = min

_{g in P}

d(f,g). The minimum fraction of values you

need to change so f “has P”

(4)

Property Testing in (another) Slide

•

Suppose you have query access to f

•

Standard decision problem: is f in P or d(f,P) ≠ 0?

•

Relaxed version with ε



(0,1): is f in P or d(f,P) > ε

•

Can hopefully decide with much fewer queries than 2

n

(using

randomization)

•

Property tester: if input f has P, accept with prob > 2/3.

If input f is ε-far from P, reject with prob > 2/3.

ε-far

(5)

Monotonicity

•

Take product ordering on {0,1}

n

: x ≤ y if



i, x

i

≤ y

i

•

Standard partial order given by containment in {0,1}

n

•

f is monotone if



x < y, f(x) ≤ f(y)

5 4

5 8

6 7

3 9

(0,0,0)

(1,1,1) 7

(6)

Distance to monotonicity

•

ε

_f

= distance to monotonicity = min fraction of values to be

changed to make f monotone

•

Can change to any real number

•

ε

_f

= min fraction of value to remove to make remaining f

monotone

•

This definition is independent of range of f

5 4 5 8 6 7 3 9

7

6

x < y,

then f(x) ≤ f(y)

x y

Can “fill in”

(7)

Monotonicity testing

• _{f is monotone if} x < y, f(x) ≤ f(y)

• _{[Goldreich Goldwasser Lehman Ron Rubinfeld Samorodnitsky 00]:}

Property testing monotonicity

• _f:{0,1}n_{-> {0,1}}

• Tester accepts when f is monotone, rejects when ε_f > ε

• _{Violation is pair (x,y) where x < y, but f(x) > f(y)}

• How to find violation in poly(n/ε) queries, when ε > ε?

1 0

1 1

(8)

The edge tester

• _{Edge tester: sample q uniform random edges. If any violation}

detected, reject. Otherwise, accept.

• [GGL+00]: f:{0,1}n_{-> {0,1}. There are Ω(ε}

f 2n) violated edges

• Total number of edges = n2n-1_{. Fraction of violated edges = Ω(ε}

f /n)

• So O(n/ε) query edge tester finds violation when ε_f > ε

• Tight. ε_f = ½, 2n-1_{violated edges}

1 ₀

8

1 0 1

(9)

General ranges

• _{Edge tester: sample q uniform random edges. If any violation}

detected, reject. Otherwise, accept.

• _{[Dodis Goldreich Lehman Raskhodnikova Ron Samorodnitsky 99]:}

f:{0,1}n_{-> R. There are Ω(}_ε

f 2n/(log |R|)) violated edges

• _{Queries required by tester = O(n log |R|/ε)} • _{log |R| can be n, so worst case tester O(n}2_/ε)

• _{Not known to be tight}

(10)

Questions

• [GGL+00] f:{0,1}n_{-> {0,1}. There are ≥ ε}

f 2n violated edges. Bound is

tight

• [DGL+99] f:{0,1}n_{-> R. There are ≥ ε}

f 2n/(log |R|) violated edges.

• _{For boolean range, can we beat the O(n/ε)-query bound of the edge}

tester?

• _{For general range, can we get better than O(n(log |R|)}_{/ε) tester?}

Can we improve bound on violated edges? Does it depend on |R|?

• _{[Blais Brody Matulef 11, Brody 13] For |R| > n}1/2_{, Ω(n/ε) queries}

required by any tester

• _{[Fischer Lehman Newman Raskhodnikova Rubinfeld 02]}

For R = {0,1}, Ω(n1/2_{) queries required by any non-adaptive tester}

• _{Ω(log n) required by any tester}

10

(11)

Answers

• For f:{0,1}n_{-> R, there are ≥ ε}

f 2n-1 violated edges

• _{This gives O(n/ε) monotonicity tester. Optimal when |R| > n}1/2

• _{For f:{0,1}}n_{-> {0,1}, there is O(ε}-5/3_n5/6_{) tester for monotonicity}

• _{So edge tester can be beaten for boolean functions}

• _{Key tool: new analysis using alternating paths in violation graph}

• _{New “isoperimetric bounds” for directed hypercube}

(12)

Chapter II

Previous work

(13)

The fixing argument

•

[GGL+00]

Suppose there are < ε

2

n

violated edges.

•

Change f at these endpoints and “few” other points to get

monotone function. Bound total change by O(ε

2

n

), so ε

f

= O(ε)

•

Works for boolean range, get bound of violated edges ≥ ε

_f

2

n-1

•

Fixing very hard with larger ranges

(0,0,…,0) (1,1,…,1)

1 0

Only violated edges

(14)

The range reductions

• [DGL+00] Given f, consider boolean f_r : f_r(x) = 0 if f(x) < r, f(x) = 1 otherwise

• Violated edge in f_r is violated edge in f

• Argue about violated edges in various f_r (using previous analysis) to give bound in f

• Get ε_f 2n_{/log |R| violated edges}

0

1

x

y x < y

f(x) > r f(y) < r

(15)

The routing approach

• Any path from s to t contains a violated edge

• There are at least

ε

_f 2n-1_{disjoint violated pairs}

• (s₁, t₁), (s₂, t₂),… all violations and disjoint

• [Lehman Ron 01] Given k (s_i, t_i) pairs, how many can be routed with edge disjoint paths?

• [LR01] conjectured this to be k. Would give

ε

_f 2n-1_{violated edges for} general range

s

t

(s,t) violation

s < t

(16)

The routing approach

• [LR01] Given k (s_i, t_i) pairs, how many can be routed with edge

disjoint paths? Conjectured this to be k. Would give

ε

_f 2n-1_violated

edges for general range

• [Briet Chakraborty GarciaSoriano Matsliah10]

No! There are Ω(2n_{) pairs with only 2}n_{/ n}1/2_{routable with}

edge-disjoint paths

• _{Shows [LR01] approach can only give O(n}3/2_{/ε) tester}

• No non-trivial upper bound given

s

t

(s,t) violation

s < t

f(s) > f(t)

(17)

Beating O(n) testers for boolean range?

•

Edge tester: pick u.a.r x, take random walk of length 1 to

get y, compare f(x), f(y)

•

Path tester: pick u.a.r x, take random walk of length t to

•

t ≈ n

1/2

is the “right distance”

•

Try to find distant violations

x

y Take directed hypercube

x y

(18)

Beating O(n) testers for boolean range?

•

Path tester: pick u.a.r x, take random walk of length n

1/2

to

•

[Ron Rubinfeld Safra Weinstein 11]

For monotone boolean

f, path tester with n

1/2

queries can approximate average

sensitivity

•

For anti-monotone f, path tester can approximate number

of violated edges

•

But can this give a o(n) monotonicity tester?

18

x

y Take directed hypercube

x y

(19)

Chapter III

(20)

Directed edge isoperimetry

•

f:{0,1}

n

-> {0,1}

•

Think of set S indexed by 1s. Let μ = |S|/2

n

, μ ≤ ½

•

Φ(S) = E(S, S

c

)/2

n-1

•

[Folklore]

Φ(S) ≥ μ

•

(Follows from E(S, S

c

) ≥|S|)

•

What about directed hypercube?

•

Φ

+

(S) = E

+

(S, S

c

)/2

n-1

•

E

+

(S, S

c

) = no. of violated edges

20

S S

(21)

Edge testing = directed isoperimetry

•

Φ(S) ≥ μ

•

Φ

+

(S) = E

+

(S, S

c

)/2

n-1

•

Implicit from [GGL+00]: f:{0,1}

n

-> {0,1}

Φ

+

(S)

≥

ε

f

•

We give new proof that generalizes to f:{0,1}

n

-> R

•

And prove that

# violated edges

≥

ε

_f

2

n-1

(22)

Edge and vertex isoperimetry

•

Vertex expansion = Γ(S) = |Nbrs(S)|/2

n-1

•

[Harper 66]

For size s, Γ(S) is minimized by Hamming ball

of size s

•

S is “dimension cut”: Φ(S) = 1 (small), but Γ(S) = 1 (large)

•

S is “majority”: Φ(S) ≈ n

1/2

(large), but Γ(S) ≈ 1/n

1/2

(small)

•

Size of middle layer is

22

S

(23)

Касающихся край и вершина границы

•

[Margulis 74]

Φ(S) Γ(S) = Ω(

μ

2

)

•

(μ = |S|/2

n

)

•

Actually proves it for the product distribution

•

If S is dimension cut, Φ(S) = 1, Γ(S) = 1

•

If S is majority, Φ(S) ≈ n

1/2

, Γ(S) ≈ 1/n

1/2 S

(24)

A directed version of Margulis’ theorem

•

[Margulis 74]

Φ(S) Γ(S) = Ω(

μ

2

)

•

We prove

Φ

+

(S) Γ

+

(S) = Ω((ε

f

)

2

)

•

Either “many” violated edges or “large 1-0 boundary”

•

Proof also works with

Γ

+

(S) := (max # disjoint 1-0 edges)/2

n-1

•

Either “many” violated edges or “quite a few” disjoint

violated edges

24

Γ

+

(S) = |Nbrs

+

(S)|/2

n-1

(25)

Applications to monotonicity

•

Assume ε

_f

is constant, and we want to find violation is

o(n) time

•

We have Φ

+

(S) Γ

+

(S) = Ω(1)

•

If Φ

+

(S) > n

1/6

, then #violated edges > n

1/6

2

n

•

So edge tester with n

5/6

queries suffices

(n2

n

/ n

1/6

2

n

= n

5/6

)

•

Otherwise, Γ

+

(S) > n

-1/6

(26)

Applications to monotonicity

•

Run edge tester with n

5/6

queries

•

If edge tester fails, Φ

+

(S) < n

1/6

. So Γ

+

(S) > n

-1/6

•

So there are 2

n

/n

1/6

disjoint violated edges

•

Only 2

n

/n

1/2

in

•

Argue that the path tester finds violation with prob > n

-5/6

26

2n_/n1/6

S

(27)

Chapter IV

(28)

But how to prove the isoperimetry?

•

f:{0,1}

n

-> R

•

Assume that for edge (x,y), f(x) ≠ f(y)

•

Will show that there are ε

_f

2

n-1

violated edges

28

(29)

The violation graph

• Violation graph (VG): vertices = domain, undirected edge (u,v) if (u,v) is violation

• _{Consider any vertex cover S of VG.}

• _{No edges if we delete S ≡ No violations if f(x) undefined for all x in S} • ε_f 2n_{= |min vertex cover|}

• _{[GGL+00,DGL+99, FLN+02] Let M be maximal violation matching. |M|}

≥ ε_f 2n-1

v u

(30)

From matchings to edges

•

Let VG be weighted with “magnitude of violation”

•

Key idea: let M be matching of maximum weight. (M must be

maximal, so |M| ≥ ε

_f

2

n-1

)

•

Theorem: # violated edges ≥ |M|

v

u u < v, f(u) > f(v)

w(u,v) = f(u) - f(v)

10

6

(31)

Dimension by dimension

• Let M_i be pairs in M crossing dimension i.

• _{Not a partition, but}

U

_i_M_i_{= M}

• Lemma: #violated dimension i edges ≥ |M_i|

• So total # violated edges ≥ Σ_i|M_i| ≥ |M|

(32)

Trying to find a violated edge

•

Start with (x,y) in M

_i

. So f(x) > f(y)

•

Project down along dimension i to get edge (s

₁

, y).

•

If violation, done. Otherwise f(s

₁

) < f(y) (using assumption of

distinct values at edges)

•

f(x) – f(s

₁

) > f(x) – f(y)

•

Suppose s

₁

was M-unmatched. Replace (x,y) by (x,s

₁

) in M to

increase weight and contradict maximum weight property of M

i _x

y

s₁

(33)

Trying to find a violated edge

• Start with (x,y) in M_i. So f(x) > f(y), and s₁ is matched in M

• Suppose s₁ < s₂, so f(s₁) > f(s₂). Also x < s₂

• f(x) – f(s₂) = [f(x) – f(s₁)] + [f(s₁) – f(s₂)] > [f(x) – f(y)] + [f(s₁) – f(s₂)]

• Max weight of M contradicted

i _x

y

s₁ s2

(34)

A few more steps…

i _x

y

s₁ s₂

f(x) > f(y) s₁ > s₂ f(s₂) > f(s₁)

• If (s₂, s₃) violated, done. So f(s₂) < f(s₃)

• Suppose s₃ is M-unmatched.

• [f(x) – f(s₁)] + [f(s₃) – f(y)] = [f(x) – f(y)] + [f(s₃) – f(s₁)] > [f(x) – f(y)] + [f(s₂) – f(s₁)]

(Contradiction!)

s₃

Project up s₂ to get s₃. So s₃ < y

(35)

A few more steps…

i _x

y

s₁ s₂

• If (s₂, s₃) violated, done. So f(s₂) < f(s₃)

• Suppose s₃ is M-unmatched.

• [f(x) – f(s₁)] + [f(s₃) – f(y)] = [f(x) – f(y)] + [f(s₃) – f(s₁)] > [f(x) – f(y)] + [f(s₂) – f(s₁)] (Contradiction!) • Suppose s₄ < s₃, so s₄ < y

• [f(x) – f(s₁)] + [f(s₄) – f(y)] > [f(x) – f(y)] + [f(s₂) – f(s₁)]

s₃

s₄ f(x) > f(y)_s

1 > s2

f(s₂) > f(s₁)

(36)

Where we are

i

• No violated edge encountered so far. Hence, s₄ exists and s₃ < s₄.

• Project down to s₄, and continue…? x

y

s₁ s₂

s₃

s₄

s₅

(37)

Alternating paths: what’s really going on

• Let H_i be ith dimension edge. It’s a perfect matching.

• Symmetric diff of H_i and M is collection of alternating paths and cycles

• Partition this into segments between M_i-pairs

• _{Claim: Each segment has a violating edge.}

H_i

M

x

y

s₂

s₁ s₅ s₆

(38)

The basic tools

•

For even i: if s

_i

exists, so does s

_i+1

. (H

_i

is perfect matching.)

•

What happens at odd i?

– s_i not matched by M, so alternating path ends

– s_i matched by M and (s_i, z) in M_i, so segment ends

– s_i matched by M and (s_i, s_i+1) not in M_i, so segment continues

•

We show: if no violated edge seen so far, only the last can

happen

•

If no violated edge ever seen, segment goes on forever.

Contradiction

x

y

s₂

s₁ s₅ s₆

s₃ s₄

(39)

The basic tools

• For even i: if s_i exists, so does s_i+1. (H_i is perfect matching.)

• _{What happens at odd i?}

• _{Suppose no violated edge seen so far.}

• Structure lemma: For j = 1 (mod 4), s_j > s_j+1. For j = 3 (mod 4), s_j < s_j+1

• Progress lemma: s_i is matched in M, so s_i+1 also exists. (Structure implies (s_i, s_i+1) not in M_i.)

• Hence, if no violated edge ever seen, segment goes on forever. x

y

s₂

s₁ s₅ s₆

s₃ s₄

(40)

Proving structure

• s_i exists. For j < i: if j = 1 (mod 4), s_j < s_j+1. If j = 3 (mod 4), s_j > s_j+1

• _{Prove by induction on j. (And a contradiction)} • Suppose true for all j’ < j. And s_j < s_j+1

• _{Then remove red pairs and add blue pairs from M.} • _{Argue (index chasing) that weight has increased.} • _{Induction hypothesis gives handle on red weight}

• w(Blue) = [f(x) – f(s₁)] + [f(s₃) – f(y)] + [f(s₂) – f(s₅)] + [f(s₈) – f(s₄)]

• w(Red) = [f(x) – f(y)] + [f(s₂) – f(s₁)] + [f(s₃) – f(s₄)] + [f(s₆) – f(s₅)] + [f(s₈) – f(s₇)]

>

40

x

y

s₂

s₁ s₅ s₆

s₃ s₄

(41)

Proving structure

• s_i exists. For j < i: if j = 1 (mod 4), s_j < s_j+1. If j = 3 (mod 4), s_j > s_j+1

• _{Then remove red pairs and add blue pairs from M.} • _{Argue (index chasing) that weight has increased.}

• w(Blue) = w(Red) + [f(s₇) – f(s₆)]

• _{w(Blue) > w(Red). Contradiction.}

Positive because (s₆, s₇) not violated edge!

Must be < >

x

y

s₂

s₁ s₅ s₆

s₃ s₄ s₇

(42)

Proving progress

•

s

_i

exists. What about s

_i+1

?

•

Suppose s

_i

unmatched in M

•

Replace red by blue, and go through the motions

•

I’ll spare you the details…

42

x

y

s₂

s₁ s₅ s₆

s₃ s₄

(43)

Recap

•

Segment of alternating path between two M

_i

pairs has

violated edge

•

Hence, # violated H

_i

edges ≥ |M

_i

|

•

Hence, #violated edges ≥ ∑

_i

|M

_i

| ≥ ε

_f

2

n-1

•

The rearrangement idea: alternating paths generated from

max weight matchings in VG are highly structured

x y s₁ s₂ s₃ s₄ s₅ x y s₂

s₁ s₅ s₆

(44)

Chapter V

The directed version of Margulis’

theorem

(45)

For the boolean range

•

Assumption of distinct value removed by perturbation

argument

•

#violated edges ≥ ∑

_i

|M

_i

| ≥ ε

_f

2

n-1

•

Consider (x,y) in M. This belongs to|y – x|

₁

different M

_i

s

•

#violated edges ≥ ∑

_{(x,y) in M}

|y – x|

₁

S

x

y

x

y ₁

0 i

x

(46)

For the boolean range

•

#violated edges ≥ ∑

_{(x,y) in M}

|y – x|

₁

≥ ε

_f

2

n-1

•

Φ

+

(S) = (#violated edges)/2

n

= Ω(1)

•

If Φ

+

(S) < r, then 2

-n

∑

(x,y) in M

|y – x|

1

< r

•

Average distance between pairs in M < r

•

For constant ε

_f

,

want to show Φ

+

(S) Γ

+

(S) = Ω(1)

Þ

Want to show: if Φ

+

(S) < r, Γ

+

(S) > 1/r

Þ

If Φ

+

(S) < r, (#disjoint violated edges) > 2

n

/r

46

S

x

y

x

y ₁

(47)

A routing theorem

•

[Lehman Ron 01]

Consider k comparable (s

_i

, t

_i

) pairs in

two levels. (

For all i, s

_i

< t

_i

)

We can route k (s

_i

,

t

_j

) pairs in vertex disjoint paths

•

If k pairs are in M, there are k disjoint edge violations in

between the levels

s₁ s₂ s₃ _s₄ t₁ t₂ t₃ t₄

1 1 1 ₁

(48)

Getting the disjoint edges

•

Take pairs of M between two levels, apply LR theorem to

get disjoint violated edges

•

Average distance between pairs in M < r, so mostly violated

edges obtained between close levels

•

Total number of disjoint violated edges > |M|/r = Ω(2

n

/r)

•

If number of violated edges < r2

n

, then number of disjoint

violated edges > 2

n

/r

48

(49)

Chapter VI

(50)

Summarizing

•

Number of violated edges

≥

ε

_f

2

n-1

•

If number of violated edges ≤

ε

_f

r 2

n-1

, then number of

disjoint violated edges

≥

ε

_f

2

n-1

/r

•

We can get a O(ε

-5/3

n

5/6

) tester for boolean monotonicity

by running edge tester and path testers with O(ε

-5/3

n

5/6

)

queries

50

2n_/n1/6

(51)

Monotonicity for hypergrids

•

f: [k]

n

-> R. x ≤ y if for all i, x

i

≤ y

i

•

k=2, hypercube. n=1, total order. And everything in

between

•

[DGL+99, EKK+99, AC04, HK04]

O(ε

-1

n log k log|R|) and O(ε

-1

2

n

log k) testers

•

Theorem: There is a O(ε

-1

n log k)-query tester for

monotonicity on hypergrids

•

[Blais-Raskhodnikova-Yaroslavtsev 13, Chakrabarty S 13]

(0,0)

(52)

The Lipschitz property

•

f: [k]

n

-> R

_{is c-Lipschitz if for all nbrs (x,y), |f(x) – f(y)| ≤ c}

• For all (x,y) |f(x) – f(y)| ≤ c|x – y|₁

• _{[Jha Raskhodnikova 11]}_{“Testing c-Lipschitz”. Applications to differential}

privacy

• _{[JR11, Awasthi Jha Molinari Raskhodnikova 12]}_{R = δ Z. There is O((δε)}-1 n2_{k log k) tester.}

• _{Theorem: There is a O(ε}-1_{n log k)-query tester for c-Lipschitz on}

hypergrids.

4 5 4 4 5 4 3 3 4 3 2 2 3 2 2 1

(53)

The obvious open problems

•

Monotonicity for f:{0,1}

n

-> {0,1}

•

We have O(ε

-5/3

n

5/6

) tester. Improve it!

•

[FLN+02]

Best lower bounds: Ω(log n) for general testers,

Ω(n

1/2

) for non-adaptive testers

•

[Blais Brody Matulef 11]

Communication complexity

reductions

–

[BRY 13, CS 13]

Progress on lower bounds for general

ranges

•

f:{0,1}

n

-> R, where |R| < n

1/2

•

[DGL+99]

Reducing general R to {0,1}

(54)

The path tester

•

We think path tester with O(n

1/2

) is bonafide tester

•

But our approach won’t get there. Can probably chip off

exponent in O(n

5/6

)

•

Path and edge tester are “pair testers”. Define some

distribution on pairs (x,y). Sample repeatedly

•

[BCGM10]

Any pair tester requires Ω(ε

-1

n/log n) queries

•

Look beyond path testers. Correlate queries, or use

(55)

Appendix

(56)

Proving path tester works

•

Two sets S and T, perfectly matched by directed edges

•

Let |S|/2

n

= μ

•

What is the probability that directed random walk of

length Θ(n

1/2

) starts in S and lands in T?

•

What is the probability that walk starts and lands in S?

•

Claim: Prob = Ω(μ

2

)

56

(57)

Proving path tester works

•

What is the probability that walk starts and lands in S?

•

Claim: Prob = Ω(μ

2

)

•

Pr(x,y) is prob that path tester starts at x and end at y

•

Claim: Pr(x,y’) ≥ Pr(x,y)/n

1/2

•

Probability of starting in S and ending at T is = Ω(μ

2

/n

1/2

)

S T

x

y