• No results found

Towards a Constructive Version of Banaszczyk's Vector Balancing Theorem

N/A
N/A
Protected

Academic year: 2020

Share "Towards a Constructive Version of Banaszczyk's Vector Balancing Theorem"

Copied!
58
0
0

Loading.... (view fulltext now)

Full text

(1)

www.theoryofcomputing.org

S

PECIAL ISSUE

: APPROX-RANDOM 2016

Towards a Constructive Version of

Banaszczyk’s Vector Balancing Theorem

Daniel Dadush

Shashwat Garg

Shachar Lovett

Aleksandar Nikolov

§

Received December 11, 2016; Revised October 8, 2019; Published December 13, 2019

Abstract: An important theorem of Banaszczyk (Random Structures & Algorithms 1998) states that for any sequence of vectors of `2 norm at most 1/5 and any convex body K

of Gaussian measure 1/2 inRn, there exists a signed combination of these vectors which lands insideK. A major open problem is to devise a constructive version of Banaszczyk’s vector balancing theorem, i. e., to find an efficient algorithm which constructs the signed combination.

We make progress towards this goal along several fronts. As our first contribution, we show an equivalence between Banaszczyk’s theorem and the existence ofO(1)-subgaussian distributions over signed combinations. For the case of symmetric convex bodies, our equivalence implies the existence of auniversalsigning algorithm (i. e., independent of the An extended abstract of this paper appeared in theProceedings of the 20th International Workshop on Randomization and Computation, 2016.

Supported by the NWO Veni grant 639.071.510.

Supported by the Netherlands Organisation for Scientific Research (NWO) under project no. 022.005.025.Supported by an NSF CAREER award 1614023.

§Supported by an NSERC Discovery grant, application number RGPIN-2016-06333.

ACM Classification:F.2.2

AMS Classification:68W20

(2)

body), which simply samples from the subgaussian sign distribution and checks to see if the associated combination lands inside the body. For asymmetric convex bodies, we provide a novel recentering procedure, which allows us to reduce to the case where the body is symmetric.

As our second main contribution, we show that the above framework can be efficiently implemented when the vectors have lengthO(1/√logn), recovering Banaszczyk’s results un-der this stronger assumption. More precisely, we use random walk techniques to produce the requiredO(1)-subgaussian signing distributions when the vectors have lengthO(1/√logn), and use a stochastic gradient ascent method to implement the recentering procedure for asymmetric bodies.

1

Introduction

Given a family of setsS1, . . . ,Smover a universeU= [n], the goal of combinatorial discrepancy

mini-mization is to find a bi-coloringχ:U→ {−1,1}such that the discrepancy, i. e., the maximum imbalance

maxj∈[m]|∑i∈Sjχ(i)|, is made as small as possible. Discrepancy theory, where discrepancy minimization

plays a major role, has a rich history of applications in computer science as well as mathematics, and we refer the reader to [24,12,13] for a general exposition.

A beautiful question regards the discrepancy of sparse set systems, i. e., set systems in which each element appears in at mostt sets. A classical theorem of Beck and Fiala [9] gives an upper bound of 2t−1 in this setting. They also conjectured anO(√t)bound, which if true would be tight. An improved Beck–Fiala bound of 2t−log∗twas given by Bukh [11], where log∗tis the iterated logarithm function in base 2. Recently, it was shown by Ezra and Lovett [17] that a bound ofO(√tlogt)holds with high probability whenm≥nand each element is assigned totsets uniformly at random. The best general bounds having sublinear dependence int currently depend on n or m. Srinivasan [33] used Beck’s

partial coloring method[8] to give a bound of O(√tlog min{n,m}). Using techniques from convex geometry, Banaszczyk [2] proved a general result on vector balancing (stated below) which implies an

O(ptlog min{n,m})bound.

The proofs of both Srinivasan’s and Banaszczyk’s bounds were non-constructive, that is, they provided no efficient algorithm to construct the guaranteed colorings, short of exhaustive enumeration. In the last six years, tremendous progress has been made on the question of matching classical discrepancy bounds algorithmically. Currently, essentially all discrepancy bounds proved using the partial coloring method, including Srinivasan’s, have been made constructive [4,23,19,30,16]. Constructive versions of Banaszczyk’s result have, however, proven elusive until very recently. In recent work [5], the first and second named authors jointly with Bansal gave a constructive algorithm for recovering Banaszczyk’s bound in the Beck–Fiala setting as well as the more general Komlós setting. An alternate algorithm via multiplicative weight updates was also given recently in [21]. However, finding a constructive version of Banaszczyk’s more general vector balancing theorem, which has further applications in approximating hereditary discrepancy, remained an open problem until after an extended abstract [14] of the present paper was published. This theorem is stated as follows:

Theorem 1.1 (Banaszczyk [2]). Let v1, . . . ,vn ∈Rm satisfykvik2≤1/5. Then for any convex body K⊆Rmof Gaussian measure at least1/2, there exists

(3)

In the theorem statement above, and in the rest of the paper we use “Gaussian measure” to refer to the standard Gaussian measure. The lower bound 1/2 on the Gaussian measure ofKis easily seen to be tight. In particular, if all the vectors are equal to 0, we must have that 0∈K. If we allow Gaussian measure<1/2, thenK={x∈Rn:x

1≥ε}, forε>0 small enough, is a clear counterexample. On the

other hand, it is not hard to see that ifKhas Gaussian measure 1/2 then 0∈K. Otherwise, there exists a halfspaceHcontainingKbut not 0, whereHclearly has Gaussian measure less than 1/2.

Banaszczyk’s theorem gives the best known bound for the notorious Komlós conjecture [32], a generalization of the Beck–Fiala conjecture, which states that for any sequence of vectorsv1, . . . ,vn∈Rm of`2norm at most 1, there existsχ∈ {−1,1}nsuch thatk∑ni=1χivik∞is a constant independent ofmand

n. In this context, Banaszczyk’s theorem gives a bound ofO(√logm), because anO(√logm)scaling of the unit ball of`mhas Gaussian measure 1/2. Banaszczyk’s theorem together with estimates on the Gaussian measure of slices of the`mball due to Barthe, Guedon, Mendelson, and Naor [7] give a bound of

O(√logd), whered≤min{m,n}is the dimension of the span ofv1, . . . ,vn. A well-known reduction (see,

e. g., Lecture 9 in [32]), shows that this bound for the Komlós problem implies anO(ptlog min{m,n})

bound in the Beck—Fiala setting.

While the above results only deal with the case ofKbeing a cube, Banaszczyk’s theorem has also been applied to other cases. It was used in [3] to give the best known bound on the Steinitz conjecture. In this problem, the input is a set of vectorsv1, . . . ,vninRmof norm at most one and summing to 0. The

aim is to find a permutationπ:[n]→[n]to minimise the maximum sum prefix of the vectors rearranged

according toπi. e., to minimize

max

k∈[n]

k

i=1 vπ(i)

.

The Steinitz conjecture is that this bound should always beO(√m), irrespective of the number of vectors, and using the vector balancing theorem Banaszczyk proved a bound ofO(√m+√logn)for the`2norm.

More recently, Banaszczyk’s theorem was applied to more general symmetric polytopes in Nikolov and Talwar’s approximation algorithm [28] for a hereditary notion of discrepancy. Hereditary discrepancy is defined as the maximum discrepancy of any restriction of the set system to a subset of the universe. In [28] it was shown that an efficiently computable quantity, denotedγ2, bounds hereditary discrepancy

from above and from below for any given set system, up to polylogarithmic factors. For the upper bound they used Banaszczyk’s theorem for a natural polytope associated with the set system. However, without an algorithmic version of of Banaszczyk’s theorem for a general body, it is not clear how to efficiently compute colorings that achieve the discrepancy upper bounds in terms ofγ2. The recent work

on algorithmic bounds in the Komlós setting [5] does not address this more general problem.

Banaszczyk’s proof ofTheorem 1.1follows an ingenious induction argument, which folds the effect of choosing the sign of vn into the body K. The first observation is that finding a point of the set ∑ni=1{−vi,vi}insideKis equivalent to finding a point of∑in=1−1{−vi,vi}inK−vn∪K+vn. Inducting on

this set is not immediately possible because it may no longer be convex. Instead, Banaszczyk shows that there exists a convex subsetK∗vnof(K−vn)∪(K+vn)with Gaussian measure at least that ofK, as

long asKhas measure at least 1/2, which allows induction onK∗vn. In the base case, one needs to show

(4)

proof can be seen as relatively mysterious, as it does not seem to provide any tangible insights as to what the colorings look like.

1.1 Our results

As our main contribution, we help demystify Banaszczyk’s theorem, by showing that it is equivalent, up to a constant factor in the length of the vectors, to the existence of certain subgaussian coloring distributions. Using this equivalence, as our second main contribution, we give an efficient algorithm that recovers Banaszczyk’s theorem up to aO(plog min{m,n})factor for all convex bodies. This improves upon the best previous algorithms of Rothvoss [30], Eldan and Singh [16], which only recover the theorem for symmetric convex bodies up to aO(log min{m,n})factor.

As a major consequence of our equivalence, we show that for any sequence v1, . . . ,vn∈Rm of sufficiently short vectors there exists a probability distributionχ ∈ {−1,1}n over colorings such that,

forany symmetric convex body K⊆Rmof Gaussian measure at least 1/2, the random variable

∑ni=1χivi

lands insideKwith probability at least 1/2. Importantly, if such a distribution can be efficiently sampled, we immediately get auniversal samplerfor constructing Banaszczyk colorings for all symmetric convex bodies (we remark that the recent work of [5] constructs a more restricted form of such distributions). Using random walk techniques, we show how to implement an approximate version of this sampler efficiently, which guarantees the same conclusion when the vectors are of lengthO(1/plog min{m,n}). We provide more details on these results inSection 1.1.1andSection 1.1.2.

To extend our results to asymmetric convex bodies, we develop a novelrecentering procedureand a corresponding efficient implementation which allows us to reduce the asymmetric setting to the symmetric one. After this reduction, a slight extension of the aforementioned sampler again yields the desired colorings. We note that our recentering procedure in fact depends on the target convex body, and hence our algorithms are no longer universal in this setting. We provide more details on these results inSection 1.1.3

andSection 1.1.4.

Interestingly, we additionally show that this procedure can be extended to yield a completely different coloring algorithm, i. e., not using the sampler, achieving the sameO(plog min{m,n})approximation factor. Surprisingly, the coloring outputted by this procedure is essentially deterministic and has a natural analytic description, which may be of independent interest.

Before we continue with a more detailed description on our results, we begin with some terminology and a well-known reduction. Given a set of vectorsv1, . . . ,vn∈Rm, we shall call a propertyhereditaryif it holds for all subsets of the vectors. We note that Banaszczyk’s vector balancing bounds restricted to a set of vectors are hereditary, since a bound on the maximum`2 norm of the vectors is hereditary. We

shall say that a property of colorings holds in thelinear setting, if when given any shift

t∈

n

i=1

[−vi,vi]def=

(

n

i=1

λivi:λ∈[−1,1]n

)

,

one can find a coloring (or distribution on colorings)χ∈ {−1,1}nsuch that∑ni=1χivi−tsatisfies the

(5)

from the general inequality between hereditary and linear discrepancy proved by Lovász, Spencer, and Vesztergombi [22].

All the results in this work will in fact hold in the linear setting. When treating the linear setting, it is well known that one can always reduce to the case where the vectorsv1, . . . ,vnare linearly independent,

and in our setting, whenm=n. In particular, assume we are given some shiftt∈∑ni=1[−vi,vi]and

thatv1, . . . ,vnarenotlinearly independent. Then, using a standard linear algebraic technique, we can

find a “fractional coloring”x∈[−1,1]nsuch that∑ni=1xivi=t, and the vectors(vi:i∈Ax)are linearly

independent, whereAx def

={i:xi ∈(−1,1)}is the set of fractional coordinates (see Lecture 5 in [32],

or Chapter 4 in [24]). We can think of this as a reduction to coloring the linearly independent vectors indexed byAx. Specifically, givenxas above, define the lifting functionLx:[−1,1]Ax →[−1,1]nby

Lx(z)i=

(

zi :i∈Ax, xi :i∈[n]\Ax,

∀i∈[n]. (1.1)

This map takes any coloring χ ∈ {−1,1}Ax and “lifts” it to a full coloring Lx(χ)∈ {−1,1}n. It also

satisfies the property that

Lx(χ)−t=

i∈Ax

χivi−

i∈Ax

xivi.

So, if we can find a coloringχ∈ {−1,1}Axsuch that

i∈Ax

χivi−

i∈Ax

xivi∈K,

then we would haveLx(χ)−t∈Kas well. Moreover, if we defineWas the span of(vi:i∈Ax), then

i∈Ax

χivi−

i∈Ax

xivi∈K if and only if

i∈Ax

χivi−

i∈Ax

xivi∈K∩W,

so we can replaceKwithK∩W, and work entirely insideW. For convex bodiesKwith Gaussian measure at least 1/2, the central sectionK∩W has Gaussian measure that is at least as large, so we have reduced the problem to the case of|Ax|linearly independent vectors in an|Ax|-dimensional space. (SeeSection 2

for the full details.) We shall thus, for simplicity, state all our results in the setting where the vectors

v1, . . . ,vnare inRnand are linearly independent.

1.1.1 Symmetric convex bodies and subgaussian distributions

In this section, we detail the equivalence of Banaszczyk’s theorem restricted to symmetric convex bodies with the existence of certain subgaussian distributions. We begin with the main theorem of this section, which we note holds in a more general setting than Banaszczyk’s result.

Theorem 1.2 (Main Equivalence). Let T ⊆Rn be a finite set. Then, the following parameters are equivalent up to a universal constant factor independent of T and n:

(6)

2. The minimum sg>0such that there exists an sg-subgaussian random variable Y supported on T .

We recall that a random vectorY ∈Rn iss-subgaussian, or subgaussian with parameter s, if for

any unit vectorθ ∈Sn−1 andt≥0, Pr[|hY,θi| ≥t]≤2e−(t/s)2/2. In words,Y is subgaussian if all its 1-dimensional marginals satisfy the same tail bound as the 1-dimensional Gaussian of mean 0 and standard deviations.

To apply the above to discrepancy, we setT =∑ni=1{−vi,vi}, i. e., all signed combinations of the

vectorsv1, . . . ,vn∈Rn. In this context, Banaszczyk’s theorem directly implies thatsb≤5 maxi∈[n]kvik2,

and hence by our equivalence thatsg=O(1)maxi∈[n]kvik2. Furthermore, the above extends to the linear

setting lettingT =∑ni=1{−vi,vi} −t, fort∈∑ni=1[−vi,vi], because, as mentioned above, Banaszczyk’s

theorem extends to this setting as well.

The existence of the universal sampler claimed in the previous section is in fact the proof that

sb=O(sg)in the above Theorem. In particular, it follows directly from the following lemma.

Lemma 1.3. Let Y ∈Rnbe an s-subgaussian random variable. There exists an absolute constant c>0, such that for any symmetric convex body K⊆Rnof Gaussian measure at least1/2,Pr[Y s·cK]1/2.

Here, ifY is thesg-subgaussian distribution supported on∑ni=1{−vi,vi} −t as above, we simply

letχ denote the random variable such thatY =∑ni=1χivi−t. Thatχ now yields the desired universal

distribution on colorings is exactly the statement of the lemma.

As a consequence of the above, we see that to recover Banaszczyk’s theorem for symmetric convex bodies, it suffices to be able to efficiently sample from anO(1)-subgaussian distribution over sets of the type∑ni=1{−vi,vi} −t, fort∈∑ni=1[−vi,vi], whenv1, . . . ,vn∈Rnare linearly independent and have`2

norm at most 1. Here we rely on homogeneity, that is, ifY is ans-subgaussian random variable supported on∑ni=1{−vi,vi} −tthenαY isαs-subgaussian on∑ni=1{−αvi,αvi} −αt, forα>0.

The proof ofLemma 1.3(seeSection 3for more details) follows relatively directly from well-known convex geometric estimates combined with Talagrand’s majorizing measures theorem, which gives a powerful characterization of the supremum of any Gaussian process.

Unfortunately,Lemma 1.3does not hold for asymmetric convex bodies. In particular, ifY =−e1, the

negated first standard basis vector, andK={x∈Rn:x

1≥0}, the conclusion is clearly false no matter

how much we scaleK, even thoughY isO(1)-subgaussian andKhas Gaussian measure 1/2. One may perhaps hope that the conclusion still holds if we ask for eitherY or−Y to be ins·cKin the asymmetric setting, though we do not know how to prove this. We note however that this only makes sense when the support ofY is symmetric, which does not necessarily hold in the linear discrepancy setting.

We now describe the high level idea of the proof for the reverse direction, namely, thatsg=O(sb).

For this purpose, we show that the existence of aO(sb)-subgaussian distribution onT can be expressed

as a two player zero-sum game, i. e., the first player chooses a distribution onT and the second player tries to find a non-subgaussian direction. Here the value of the game will be small if and only if the

O(sb)-subgaussian distribution exists. To bound the value of the game, we show that an appropriate

(7)

1.1.2 The random walk sampler

From the algorithmic perspective, it turns out that subgaussianity is a very natural property in the context of random walk approaches to discrepancy minimization. Our results can thus be seen as a good justification for the random walk approaches to making Banaszczyk’s theorem constructive.

At a high level, in such approaches one runs a random walk over the coordinates of a “fractional coloring”χ∈[−1,1]nuntil all the coordinates hit either 1 or−1. The steps of such a walk usually come

from Gaussian increments (though not necessarily spherical), which try to balance the competing goals of keeping discrepancy low and moving the fractional coloringχcloser to{−1,1}n. Since a sum of small

centered Gaussian increments is subgaussian with the appropriate parameter, it is natural to hope that the output of a correctly implemented random walk is subgaussian. Our main result in this setting is that this is indeed possible to a limited extent, with the main caveat being that the walk’s output will not be “subgaussian enough” to fully recover Banaszczyk’s theorem.

Theorem 1.4. Let v1, . . . ,vn∈Rnbe vectors of`2norm at most1and let t∈∑ni=1[−vi,vi]. Then, there is an expected polynomial-time algorithm which outputs a random coloringχ∈ {−1,1}nsuch that the random variable∑ni=1χivi−t is O(

logn)-subgaussian.

To achieve the above sampler, we guide our random walk using solutions to the so-called vector Kómlos program, whose feasibility was first given by Nikolov [26], and show subgaussianity using well-known martingale concentration bounds. Interestingly, the random walk’s analysis does not rely on phases, and is instead based on a simple relation between the walk’s convergence time and the subgaussian parameter. As an added bonus, we also give a new and simple constructive proof of the feasibility of the vector Kómlos program (seeSection 5for details) which avoids the use of an SDP solver.

Given the results of the previous section, the above random walk is a universal sampler for constructing the following colorings.

Corollary 1.5. Let v1, . . . ,vn∈Rnbe vectors of`2norm at most1, let t∈∑ni=1[−vi,vi], and let K⊆Rn

be a symmetric convex body of Gaussian measure1/2(given by a membership oracle). Then, there is an expected polynomial-time algorithm which outputs a coloringχ∈ {−1,1}nsuch that∑ni=1χivi−t∈ O(√logn)K.

As mentioned previously, the best previous algorithms in this setting are due to Rothvoss [30], Eldan and Singh [16], which find a signed combination insideO(logn)K. Furthermore, these algorithms are not universal, i. e., they heavily depend on the bodyK. We note that these algorithms are in fact tailored to findpartial coloringsinside a symmetric convex bodyKof Gaussian measure at least 2−cn, forc>0 small enough, a setting in which our sampler does not provide any guarantees.

(8)

K; the walk in [23] is only well-defined in a polytope, while the one in [19] remains well-defined in any convex body, although the analysis still applies only to the polyhedral setting. Most directly related to this paper is the recent work [5], which gives a random walk that can be viewed as a randomized variant of the original 2t−1 Beck–Fiala proof. This walk induces a distributionχ∈ {−1,1}non colorings for

whicheach coordinateof the output∑ni=1χivi isO(1)-subgaussian. From the discrepancy perspective,

this gives a sampler which finds colorings inside any axis parallel box of Gaussian measure at least 1/2 (and their rotations, though not in a universal manner), matching Banaszczyk’s result for this class of convex bodies.

1.1.3 Asymmetric convex bodies

In this section, we explain how our techniques extend to the asymmetric setting. The main difficulty in the asymmetric setting is that one cannot hope to increase the Gaussian mass of an asymmetric convex body by simply scaling it. In particular, if we takeK⊆Rnto be a halfspace through the origin, e. g.,

{x∈Rn:x1≥0}, thenKhas Gaussian measure exactly 1/2 butsK=K for alls>0. At a technical

level, the lack of any measure increase under scaling breaks the proof ofLemma 1.3, which is crucial for showing that subgaussian coloring distributions produce combinations that land insideK.

The main idea to circumvent this problem will be to reduce to a setting where the mass ofK is “symmetrically distributed” about the origin, in particular, when the barycenter ofKunder the induced Gaussian measure is at the origin. For such a bodyK, we show that a constant factor scaling ofK∩ −K

also has Gaussian mass at least 1/2, yielding a direct reduction to the symmetric setting.

To achieve this reduction, we will use a novelrecentering procedure, which will both carefully fix certain coordinates of the coloring as well as shift the bodyK to make its mass more “symmetrically distributed.” The guarantees of this procedure are stated below:

Theorem 1.6(Recentering Procedure). Let v1, . . . ,vn∈Rnbe linearly independent, t∈∑ni=1[−vi,vi], and K⊆Rn be a convex body of Gaussian measure at least1/2. Then, there exists a fractional coloring x∈[−1,1]n, such that for p=

∑ni=1xivi−t, Ax={i∈[n]:xi∈(−1,1)}and W =span(vi:i∈Ax), the following inequalities hold.

1. p∈K.

2. The Gaussian measure of(K−p)∩W on W is at least the Gaussian measure of K. 3. The barycenter of(K−p)∩W is at the origin, i. e.,

Z

(K−p)∩W

ye−kyk2/2dy=0. (1.2)

(9)

procedure can also be extended to yield a full coloring algorithm. We explain the high level details of its implementation together with this extension in the next subsection.

To explain how to use the fractional coloringxfromTheorem 1.6to get a useful reduction, recall the lifting functionLx:[−1,1]Ax→[−1,1]ndefined in (1.1). We reduce the initial vector balancing problem

to the problem of finding a coloringχ∈ {−1,1}Ax such that

i∈Ax

χivi−

i∈Ax

xivi∈(K−p)∩W

(note that∑i∈Axχivi−∑i∈Axxivi∈W by construction). Then we can lift this coloring toLx(χ), which

satisfies

i∈Ax

χivi−

i∈Ax

xivi∈(K−p)∩W⇔ n

i=1

Lx(χ)ivi−t∈K.

From here, the guarantee thatK0def= (K−p)∩W has Gaussian measure at least 1/2 and barycenter at the origin allows a direct reduction to the symmetric setting. Namely, we can replaceK0by the symmetric convex bodyK0∩ −K0without losing “too much” of the Gaussian measure ofK0. This is formalized by the following extension ofLemma 1.3, which directly implies a reduction to subgaussian sampling as in

Section 1.1.1.

Lemma 1.7. Let Y ∈Rn be an s-subgaussian random variable. There exists an absolute constant

c>0, such for any convex body K⊆Rnof Gaussian measure at least1/2and barycenter at the origin,

Pr[Y ∈s·c(K∩ −K)]≥1/2.

In particular, if there exists a distribution over coloringsχ∈ {−1,1}Ax such that

i∈Ax

χivi−

i∈Ax

xivi

as above is 1/c-subgaussian,Lemma 1.7implies that the random signed combination lands insideK0

with probability at least 1/2. Thus, the asymmetric setting can be effectively reduced to the symmetric one, as claimed.

Crucially, the recentering procedure inTheorem 1.6can be implemented in probabilistic polynomial time if one relaxes the barycenter condition from being exactly 0 to having “small” norm (seeSection 6.3

for details). Furthermore, the estimate inLemma 1.7 will be robust to such perturbations. Thus, to constructively recover the colorings in the asymmetric setting, it will still suffice to be able to generate good subgaussian coloring distributions.

Combining the sampler fromTheorem 1.4together with the recentering procedure, we constructively recover Banaszczyk’s theorem for general convex bodies up to aO(√logn)factor.

Theorem 1.8(Weak Constructive Banaszczyk). There exists a probabilistic polynomial-time algorithm which, on input a linearly independent set of vectors v1, . . . ,vn∈Rn of`2 norm at most 1, and t ∈

∑ni=1[−vi,vi], and a (not necessarily symmetric) convex body K⊆Rnof Gaussian measure at least1/2

(given by a membership oracle), computes a coloring χ ∈ {−1,1}n such that with high probability ∑ni=1χivi−t∈c

p

(10)

As far as we are aware, the above theorem gives the first algorithm to recover Banaszczyk’s result for asymmetric convex bodies under any non-trivial restriction. In this context, we note that the algorithm of Eldan and Singh [16] finds “relaxed” partial colorings, i. e.„ where the fractional coordinates of the coloring are allowed to fall outside[−1,1], lying inside ann-dimensional convex body of Gaussian measure at least 2−cn. However, it is unclear how one could use such partial colorings to recover the

above result, even with a larger approximation factor.

1.1.4 The recentering procedure

In this section, we describe the details of the recentering procedure. We leave a thorough description of its algorithmic implementation however toSection 6.3, and only provide its abstract instantiation here.

Before we begin, we give a more geometric view of the vector balancing problem and the recentering procedure, which help clarify the exposition. Letv1, . . . ,vn∈Rnbe linearly independent vectors and

t∈∑ni=1[−vi,vi]. Given the target bodyK⊆Rnof Gaussian measure at least 1/2, we can restate the vector balancing problem geometrically as that of finding a vertex of the parallelepipedP=∑ni=1[−vi,vi]−t

lying insideK. Here, the choice oft ensures that 0∈P. Note that this condition is necessary, since otherwise there exists a halfspace separatingPfrom 0 having Gaussian measure at least 1/2.

Recall now that in the linear setting, and using this geometric language, Banaszczyk’s theorem implies that ifPcontains the origin, and maxi∈[n]kvik2≤1/10 (which we do not need to assume here), then any

convex body of Gaussian measure at least 1/2 contains a vertex ofP. Thus, for our given target bodyK, we should make our situation better replacingPandKbyP−qandK−q, ifq∈Pis a shift such that

K−qhas higher Gaussian measure thanK. In particular, given the symmetry of Gaussian measure, one would intuitively expect that if the Gaussian mass ofKis not symmetrically distributed around 0, there should be a shift ofKwhich increases its Gaussian measure.

In the current language, fixing a colorχi∈ {−1,1}for vectorvi, corresponds to restricting ourselves to

finding a vertex in the facetF=χivi+∑j6=i[−vj,vj]−tofPlying insideK. Again intuitively, restricting

to a facet ofPshould improve our situation if the Gaussian measure of the corresponding slice ofKin the lower dimension is larger than that ofK. To make this formal, note that when inducting on a facetF

ofP(which is ann−1 dimensional parallelepiped), we must choose a centerq∈Fto serve as the new origin in the lower dimensional space. Precisely, this can be expressed as inducting on the parallelepiped

F−qand shifted slice(K−q)∩span(F−q)ofK, using then−1 dimensional Gaussian measure on span(F−q).

With the above viewpoint, one can restate the goal of the recentering procedure as that of finding a pointq∈P∩K, such that smallest facetF ofPcontainingq, satisfies that(K−q)∩span(F−q)has its barycenter at the origin and Gaussian measure no smaller than that ofK. Recall that as long as

(K−q)∩span(F−q)has Gaussian measure at least 1/2, we are guaranteed that 0∈K−q⇒q∈K. With this geometry in mind, we implement the recentering procedure as follows:

Computeq∈Pso that the Gaussian mass ofK−qis maximized. Ifqis on the boundary ofP, letting

Fdenote a facet ofPcontainingq, induct onF−qand the slice(K−q)∩span(F−q)as above. Ifqis in the interior ofP, replacePandKbyP−qandK−q, and terminate.

We now explain why the above achieves the desired result. Firstly, if the maximizerqis in a facetFof

P, then a standard convex geometric argument reveals that the Gaussian measure of(K−q)∩span(F−q)

(11)

recentering procedure fixes a color for “free.” In the second case, ifqis in the interior ofP, then a variational argument gives that the barycenter ofK−qunder the induced Gaussian measure must be at the origin, namely,

Z

K−q

xe−x2/2dx=0.

To conclude this section, we explain how to extend the recentering procedure to directly produce a deterministic coloring satisfyingTheorem 1.8. For this purpose, we shall assume thatv1, . . . ,vnhave

length at mostc/√logn, for a small enough constantc>0. To begin, we run the recentering procedure as above, which returnsPandK, withKhaving its barycenter at the origin. We now replaceP,Kby a joint scalingαP,αK, forα >0 a large enough constant, so thatαKhas Gaussian mass at least 3/4. At this point, we run the original recentering procedure again with the following modification: every time we get to the situation whereKhas its barycenter at the origin, induct on the closest facet ofPclosest to the origin. More precisely, in this situation, compute a point pon the boundary ofPclosest to the origin, and, lettingF denote the facet containingp, induct onF−pand(K−p)∩span(F−p). At the end, return the final found vertex.

Notice that, as claimed, the coloring (i. e., vertex) returned by the algorithm is indeed deterministic. The reason the above algorithm works is the following. While we cannot guarantee, as in the original recentering procedure, that the Gaussian mass of(K−p)∩span(F−p)does not decrease, we can instead show that it decreases onlyvery slowly. In particular, we use the bound ofO(1/√logn)on the length of the vectorsv1, . . . ,vnto show that every time we induct, the Gaussian mass drops by at most a 1−c/n

factor. More generally, if the vectors had length at mostd>0, ford small enough, the drop would be of the order 1−ce−1/(cd)2

, for some constantc>0. Since we “massage”Kto have Gaussian mass at least 3/4 before applying the modified recentering algorithm, this indeed allows to inductntimes while keeping the Gaussian mass above 1/2, which guarantees that the final vertex is inK. To derive the bound on the rate of decrease of Gaussian mass, we prove a new inequality on the Gaussian mass of sections of a convex body near the barycenter (seeTheorem 7.1), which may be of independent interest.

As a final remark, we note that unlike the subgaussian sampler, the recentering procedure is not scale invariant. Namely, if we jointly scalePandK by some factorα, the output of the recentering

procedure will not be anα-scaling of the output on the originalKandP, as Gaussian measure is not

homogeneous under scalings. Thus, one must take care to appropriately normalizeP and K before applying the recentering procedure to achieve the desired results.

We now give the high level overview of our recentering step implementation. The first crucial observation in this context, is that the task of findingt∈Pmaximizing the Gaussian measure ofK−t

is in fact aconvex program. More precisely, the objective function (Gaussian measure ofK−t) is a logconcave function oft and the feasible regionPis convex. Hence, one can hope to apply standard convex optimization techniques to find the desired maximizer.

It turns out however, that one can significantly simplify the required task by noting that the recentering strategy does not in fact necessarily need an exact maximizer, or even a maximizer inP. To see this, note that ifpis a shift such thatK−phas larger Gaussian measure thanK, then by logconcavity the shiftsK−αp, 0<α ≤1, also have larger Gaussian measure. Thus, if a we find a shift p∈/ Pwith larger Gaussian measure, lettingαpbe the intersection point with the boundary∂P, we can induct on the

(12)

essentially “ignore” the constraintp∈Pand we treat the optimization problem as unconstrained. This last observation will allow us to use the following simple gradient ascent strategy. Precisely, we simply take steps in the direction of the gradient until either we pass through a facet ofPor the gradient becomes “too small.” As alluded to previously, the gradient will exactly equal a fixed scaling of the barycenter ofK−p,pthe current shift, under the induced Gaussian measure. Thus, once the gradient is small, the barycenter will be very close to the origin, which will be good enough for our purposes. The last nontrivial technical detail is how to efficiently estimate the barycenter, where we note that the barycenter is the expectation of a random point insideK−p. For this purpose, we simply take an average of random samples fromK−p, where we generate the samples using rejection sampling, using the fact that the Gaussian measure ofKis large.

Conclusion and follow-up work. In conclusion, we have shown a tight connection between the existence of subgaussian coloring distributions and Banaszczyk’s vector balancing theorem. Furthermore, we make use of this connection to constructively recover a weaker version of this theorem.

Subsequently to the publication of an extended abstract of this paper [14], the first three named authors, jointly with Bansal, showed the existence of a polynomial-time sampling algorithm for an

O(1)-subgaussian distribution on signed sums of vectors with`2norm at most 1 [6]. Together with the

results in this paper (specificallyCorollary 6.6), this sampling algorithm implies a new constructive proof of Banaszczyk’s theorem in its full generality.

Despite this success, Banaszczyk’s more recent results on signed series [3], and the last author’s upper bounds on the discrepancy of axis-aligned boxes [27], both proved using the techniques used in [2], remain out of reach algorithmically. Extending the techniques of this paper and of [6] to give constructive proofs of these results is an interesting open problem.

Organization. InSection 2, we provide necessary preliminary background material. InSection 3, we give the proof of the equivalence between Banaszczyk’s vector balancing theorem and the existence of subgaussian coloring distributions. InSection 4, we give our random walk based coloring algorithm, which exploits the feasibility of the vector Komlós program, proved constructively in Section 5. In

Section 6, we describe the recentering procedure and its implementation, and give the algorithmic reduction from asymmetric bodies to symmetric bodies, giving the proof ofTheorem 1.8. InSection 7, we show how to extend the recentering procedure to a full coloring algorithm. InSection 8, we prove the main technical estimate on the Gaussian measure of slices of a convex body near the barycenter, which is needed for the algorithm inSection 7.

Acknowledgments. We would like to thank the American Institute for Mathematics for hosting a recent workshop on discrepancy theory, where some of this work was done.

2

Preliminaries

(13)

For a vectorx∈Rn, we define

kxk2=

s

n

i=1 x2i

to be its Euclidean norm. LetBn2={x∈Rn:kxk2≤1} denote the unit Euclidean ball and Sn−1 =

{x∈Rn:kxk

2=1}=∂Bn2denote the unit sphere inR

n. Forx,y

Rn, we denote their inner product hx,yi=∑ni=1xiyi.

For subsets A,B⊆Rn, we denote their Minkowski sum A+B={a+b:a∈A,b∈B}. Define span(A)to be the smallest linear subspace containingA. We denote the boundary ofAby∂A. We use the

phrase∂Arelative to span(A)to specify that we are computing the boundary with respect to the subspace topology on span(A).

A setK⊆Rnis convex if for allx,yK,

λ ∈[0,1],λx+ (1−λ)y∈K.Kis symmetric ifK=−K.

We shall say thatKis a convex body if additionally it is closed and has non-empty interior. We note that the usual terminology, a convex body is also compact (i. e., bounded), but we will state this explicitly when it is necessary. If convex body contains the origin in its interior, we say thatKis 0-centered.

We will need the concept of a gauge function for 0-centered convex bodies. For bounded symmetric convex bodies, this functional will define a standard norm.

Proposition 2.1. Let K⊆Rnbe a0-centered convex body. Defining thegauge functionof the body K by

kxkK=inf{s≥0 :x∈sK}, the following holds. 1. Finiteness: kxkK<∞, for x∈Rn.

2. Positive homogeneity:kλxkK=λkxkK, for x∈Rn,λ≥0. 3. Triangle inequality: kx+ykK≤ kxkK+kykK, for x,y∈Rn.

Furthermore, if K is additionally bounded and symmetric, thenk · kK is a norm which we call the norm induced by K. In particular,k · kK additionally satisfies thatkxkK=0iff x=0andkxkK=k −xkK,∀x∈

Rn.

Gaussian and subgaussian random variables. We definen-dimensional standard GaussianX∈Rn to be the random variable with density

1 √

2πn

e−kxk22/2

forx∈Rn.

Definition 2.2(Subgaussian Random Variable). A random variableX ∈Risσ-subgaussian, forσ>0,

if∀t≥0,

Pr[|X| ≥t]≤2e−12(t/σ)2.

We note that the canonical example of a 1-subgaussian distribution is the 1-dimensional standard Gaussian itself.

For a vector valued random variableX∈Rn, we say thatXisσ-subgaussian if all its one dimensional

(14)

We remark that fromDefinition 2.2, it follows directly that ifXisσ-subgaussian thenαX is|α|σ

-subgaussian for anyα ∈R.

The following standard lemma allows us to deduce subgaussianity from upper bounds on the Laplace transform of a random variable. We include a proof in the appendix for completeness.

Lemma 2.3. Letcosh(x) =12(ex+e−x)for x∈Rn. Let X∈Rnbe a random vector. Assume that

E[cosh(hw,Xi)]≤βekσwk

2

2/2, ∀wRn,

for someσ>0andβ ≥1. Then X isσplog2β+1-subgaussian. Furthermore, for X∈Rnstandard

Gaussian,E[cosh(hw,Xi)] =ekwk

2

2/2for wRn.

Gaussian measure. We defineγnto be then-dimensional Gaussian measure onRn. Precisely, for any measurable setA⊆Rn,

γn(A) =√1

2πn

Z

A

e−kxk22/2dx, (2.1)

noting thatγn(Rn) =1. We will also need lower dimensional Gaussian measures restricted to linear subspaces ofRn. Thus, if A⊆W, W ⊆Rn a linear subspace of dimensionk, thenγk(A) should be

understood as the Gaussian measure ofAwithinW, whereW is treated as the whole space. When convenient, we will also use the notationγW(A)to denoteγdim(W)(A∩W). When treating one dimensional

Gaussian measure, we will often denoteγ1((a,b)), where (a,b) is an interval, simply byγ1(a,b)for

notational convenience. By convention, we defineγ0(A) =1 if 0∈Aand 0 otherwise.

An important concept used throughout the paper is that of the barycenter under the induced Gaussian measure.

Definition 2.4(Barycenter). For a convex bodyK⊆Rn, we define its barycenter under the induced

Gaussian measure, by

b(K) =√1 2πn

Z

K

xe−kxk22/2 dx

γn(K).

Note thatb(K) =E[X], ifXis the random variable supported onKwith probability density 1

√ 2πn

e−kxk22/2/γn(K).

Extending the definition to slices ofK, for any linear subspaceW ⊆Rn, we refer to the barycenter of K∩W to denote the one relative to the dim(W)-dimensional Gaussian measure onW (i. e., treatingW as the whole space).

Throughout the paper, we will need many inequalities regarding the Gaussian measure. The first important inequality is the Prékopa-Leindler inequality, which states that forλ ∈[0,1]andA,B,λA+

(1−λ)B⊆Rnmeasurable subsets, that

(15)

We note that the Prékopa-Leindler inequality applies more generally to any logconcave measure onRn, i. e., a measure defined by a density whose logarithm is concave. Importantly, this inequality directly implies that ifA⊆Rnis convex, then logγn(A+t), fort∈Rn, is a concave function oft.

We will need the following powerful inequality of Ehrhard, which provides a crucial strengthening of Prékopa-Leindler for Gaussian measure.

Theorem 2.5(Ehrhard’s inequality [15,10]). For Borel sets A,B⊆Rnand0 λ≤1,

Φ−1(γn(λA+ (1−λ)B))≥λΦ−1(γn(A)) + (1−λ)Φ−1(γn(B)) whereΦ(a) =γ1((−∞,a])for all a∈R.

The power of the Ehrhard inequality is that it allows us to reduce many non-trivial inequalities about Gaussian measure to two dimensional ones.

One can use it to show the following standard inequality on the Gaussian measures of slices of a convex body. We include a proof for completeness.

Lemma 2.6. Given a convex body K⊆Rnwith

γn(K)≥1/2, and a linear subspace H⊆Rnof dimension

k. Then,γk(K∩H)≥γn(K).

Proof. Clearly it suffices to prove the lemma for k=n−1. Since Gaussian distribution is rotation invariant, without loss of generality,H={x∈Rn:x

1=0}. LetKt={x∈K:x1=t}denote a slice of Katx1=t. Then,

γn(K) =

Z ∞

−∞

e−t2/2 √

γn−1(Kt−te1)dt

whereγn−1(Kt−te1) =0 outside support ofK.

DefineW⊆R2asW={(x,y):y f(x)}where f:

R→Ris defined as

f(t) =Φ−1(γn−1(Kt−te1))

and f(t) =−∞outside the support ofK. It follows thatγ2(W) =γn(K)≥1/2. By Ehrhard’s inequality, f is concave on its support. Hence,Wis a closed convex body.

Letg=Φ−1(γn(K))≥0.γn−1(K∩H)≥γn(K)is then equivalent to showing(0,g)∈W. If(0,g)6∈W,

then there exists a halfspaceLsuch thatW⊆Land(0,g)6∈L. Letd be the distance of origin(0,0)from

∂L, the boundary ofL. Since(0,g)6∈Landγ2(L)≥1/2,d<g. But this implies γ2(L) =γ1(−∞,d)<γ1(−∞,g) =γn(K) =γ2(W)

contradictingW⊆L.

(16)

Definition 2.7(Lifting Function). For a fractional coloring x∈[−1,1]n, denote the set of fractional coordinates byAx={i∈[n]:xi∈(−1,1)}. From here, forz∈[−1,1]Ax, we define the lifting function Lx:[−1,1]Ax→[−1,1]nby

Lx(z)i=

(

zi :i∈Ax, xi :i∈[n]\Ax,

∀i∈[n].

Importantly, forχ∈ {−1,1}Axwe have thatLx(χ)∈ {−1,1}n. Thus,Lxsends full colorings in{−1,1}Ax

to full colorings in{−1,1}n.

The lifting function above is useful in that it allows us, given a fractional coloringx∈[−1,1]with some of its coordinates set to{−1,1}, to reduce any linear vector balancing problem to one on a smaller number of coordinates. We detail this in the following lemma.

Lemma 2.8. Let v1, . . . ,vm∈Rm, t ∈∑ni=1[−vi,vi], and K ⊆Rn. Then given a fractional coloring

x∈[−1,1]nand p=

∑ni=1xivi−t, the following holds. 1. For z∈[−1,1]Ax, we have that

i∈Ax

zivi−

i∈Ax

xivi+p= n

i=1

Lx(z)ivi−t.

2. For z∈[−1,1]Ax, we have that

i∈Ax

zivi−

i∈Ax

xivi∈K−p⇔ n

i=1

Lx(z)ivi−t∈K.

Proof. The first part follows from the computation

i∈[n]

Lx(z)ivi−t=

i∈Ax

zivi+

i∈[n]\Ax

xivi−t

=

i∈Ax

zivi−

i∈Ax

xivi+ (

i∈[n]

xivi−t)

=

i∈Ax

zivi−

i∈Ax

xivi+p.

The second part follows since

i∈Ax

zivi−

i∈Ax

xivi∈K−p⇔

i∈Ax

zivi−

i∈Ax

xivi+p∈K⇔ n

i=1

Lx(z)ivi−t∈K,

where the last equivalence is by part (1).

In terms of a reduction, the above lemma says in words that the linear vector balancing problem with respect to the vectors(vi:i∈[n]), shifttand setK, reduces to the linear discrepancy problem on

(vi:i∈Ax), shift∑i∈Axxiviand setK−p.

(17)

Lemma 2.9. Let v1, . . . ,vn∈Rm, t∈∑ni=1[−vi,vi]. Then, there is a polynomial-time algorithm computing a fractional coloring x∈[−1,1]nsuch that the following hold.

1. ∑ni=1xivi=t.

2. The vectors(vi:i∈Ax)are linearly independent. 3. For z∈[−1,1]Ax,

∑i∈Axzivi−∑i∈Axxivi=∑ n

i=1Lx(z)ivi−t. Proof. Letxdenote a basic feasible solution to the linear system

(

y∈Rn:

n

i=1

yivi=t,yi∈[−1,1]∀i∈[n]

)

,

which clearly can be computed in polynomial time. Note the system is feasible by construction oft. We now show thatxsatisfies the required conditions.

Letr≤ndenote the rank of the matrix(v1, . . . ,vn). Sincexis basic, it must satisfy at leastnleast

of the constraints at it equality. In particular, at leastn−rof the bound constraints must be tight. Thus, sinceAxis the set of fractional coordinates, we must have|Ax| ≤r. Furthermore, the vectors(vi:i∈Ax)

must be linearly independent, since otherwisexis not basic. Finally, forz∈[−1,1]Ax, we have that

i∈Ax

zivi−

i∈Ax

xivi=

i∈Ax

zivi+

i∈[n]\Ax

xivi−

i∈[n]

xivi=

i∈[n]

Lx(z)ivi−t,

as needed.

Let us now apply the above lemma to both the vector balancing problem and the subgaussian sampling problem. First assume that we have a vector balancing problem with respect tov1, . . . ,vn∈Rm, shift

t∈n

i=1[−vi,vi], andK⊆Rma convex body of Gaussian measure at least 1/2. Then applying the above lemma, we getx∈[−1,1]n, such that our vector balancing reduces to the one with respect to(vi:i∈Ax),

shift

i∈Ax

xivi∈

i∈Ax

[−vi,vi],

and K. This follows directly fromLemma 2.9 part 3 using the lifting function Lx. Now letW =

span(vi:i∈Ax), where dim(W) =|Ax|by linear independence. Clearly, the reduced vector balancing

problem looks for signed combinations inW, and hence we may replaceKbyK∩W. Here, note that by

Lemma 2.6,

γ|Ax|(K∩W)≥γm(K)≥1/2.

Hence, this reduction reduces to a problem of the same type, where in addition, the vectors form a basis of the ambient spaceW. For the subgaussian sampling problem, by part3ofLemma 2.9, sampling a random coloringχ∈ {−1,1}nsuch that∑ni=1χivi−tis subgaussian clearly reduces to sampling a random

coloringχ∈ {−1,1}Ax such that

i∈Ax

χivi−

i∈Ax

(18)

is subgaussian since this equals∑ni=1Lx(χ)ivi−t. Furthermore, since the support of such a support

distribution lives inW, to test subgaussianity we need only check the marginals

*

θ,

i∈Ax

χivi−

i∈Ax

xivi

+

forθ∈W∩Sm−1. Thus, we may assume thatW is the full space. This completes the needed reductions. Computational model. To formalize how our algorithms interact with convex bodies, we will use the following computational model.

To interact algorithmically with a convex bodyK⊆Rn, we will assume thatK is presented by a membership oracle. Here a membership oracleOK on inputx∈Rn, outputs 1 ifx∈Kand 0 otherwise. Interestingly, since we will always assume that our convex bodies have Gaussian measure at least 1/2, we will not need any additional centering (known point insideK) or well-boundedness (inner contained and outer containing ball) guarantees.

The running time of our algorithms will be measured by the number of oracle calls and arithmetic operations they perform. We note that we use a simple model of real computation here, where we assume that our algorithms can perform standard operations on real numbers (multiplication, division, addition, etc.) in constant time.

3

Banaszczyk’s Theorem and subgaussian distributions

In this section, we give the main equivalences between Banaszczyk’s vector balancing theorem and the existence of subgaussian coloring distributions.

The fundamental theorem which underlies these equivalences is known as Talagrand’s majorizing measure theorem, which provides a nearly tight characterization of the supremum of any Gaussian process using chaining techniques. We now state an essential consequence of this theorem, which will be sufficient for our purposes. For a reference, see [34].

Theorem 3.1(Talagrand). Let K⊆Rn be a0-centered convex body and Y

Rnbe an s-subgaussian

random vector. Then for X∈Rnthe n-dimensional standard Gaussian, we have that

E[kYkK]≤s·CT·E[kXkK], where CT>0is an absolute constant.

As a consequence of the above theorem together with geometric estimates proved inSubsection 9.3, we derive the following lemma, which will be crucial to our equivalences and reductions. The first part of the lemma is a restatement ofLemma 1.3from the Introduction.

Lemma 3.2(Reduction to Subgaussianity). Let Y ∈Rnbe s-subgaussian. Then, 1. If K⊆Rnis a symmetric convex body with

γn(K)≥1/2, then

(19)

2. If K⊆Rnis a convex body with

γn(K)≥1/2andkb(K)k2≤32√12

π, then E[kYkK∩−K]≤2(1+π

8 ln 2)·CT·s. In particular,Pr[Y ∈4(1+π

8 ln 2)·CT·s(K∩ −K)]≥1/2.

Proof ofLemma 3.2. The proof follows immediately by combiningLemmas 9.5,Lemma 9.9and Theo-rem 3.1. We note that the lower bounds on the probabilities follow directly by Markov’s inequality.

To state our equivalence, we will need the definitions of the following geometric parameters.

Definition 3.3(Geometric Parameters). LetT ⊆Rnbe a finite set.

• Definesg(T)>0 to be least numbers>0 such that there exists ans-subgaussian random vectorY

supported onT.

• Definesb(T)>0 to be the least numbers>0 such that for any symmetric convex bodyK⊆Rn,

γn(K)≥1/2,T∩sK6=/0.

We now state our main equivalence, which gives a quantitative version of Theorem 1.2 in the introduction.

Theorem 3.4. Let T ⊆Rnbe a finite set. Then the following inequalities hold. 1. sb(T)≤1.5CT·sg(T).

2. sg(T)≤

2·sb(T).

Using the above language, we can restate Banaszczyk’s vector balancing theorem restricted to

symmetric convex bodiesas follows:

Theorem 3.5([2]). Let v1, . . . ,vm∈Rn. Then sb(∑mi=1{−vi,vi})≤5 maxi∈[m]kvik2.

As an immediate corollary ofTheorem 3.4 and Theorem 3.5(extended to the linear setting) we deduce:

Corollary 3.6. Let v1, . . . ,vm∈Rn. Then

sg

m

i=1

{−vi,vi}

≤√2·5 max

i∈[m]

kvik2.

Furthermore, for t∈∑mi=1[−vi,vi],

sg

m

i=1

{−vi,vi} −t

≤√2·10 max

i∈[m]

(20)

As explained in the introduction, the above equivalence shows the existence of auniversal samplerfor recovering Banaszczyk’s vector balancing theorem for symmetric convex bodies up to a constant factor in the length of the vectors. Precisely, this follows directly fromLemma 3.2part1andCorollary 3.6(for more details see the proof ofTheorem 3.4below).

The final ingredient we need is the classical minimax principle of von Neumann.

Theorem 3.7(Minimax Theorem [25]). Let X⊆Rn, Y

Rmbe compact convex sets. Let f :X×Y →R

be a continuous function such that

1. f(·,y):X →Ris convex for fixed y∈Y , 2. f(x,·):Y →Ris concave for fixed x∈X . Then,

min

x∈Xmaxy∈Y f(x,y) =maxy∈Y minx∈X f(x,y).

We now proceed to the proof ofTheorem 3.4.

Proof ofTheorem 3.4. We treat the two inequalities separately.

Proof of 1. LetY ∈T be thesg(T)-subgaussian random variable. LetK⊆Rnbe a symmetric convex body such thatγn(K)≥1/2. ByLemma 3.2part1, we have that

E[kYkK]≤1.5CT·sg(T).

Thus, there existsx∈T such thatx∈1.5CT·sg(T)K. Since this holds for all such K, we have that sb(T)≤1.5CT·sg(T)as needed.

Proof of 2. Recall the definition of the hyperbolic cosine function

cosh(x) =1

2(e

x+e−x)

forx∈R. Note that cosh is convex, symmetric (cosh(x) =cosh(−x)), and non-negative. Forw∈Rn,

definegw:Rn→R≥0bygw(x) =cosh(hx,wi)/ekwk

2

2/2. ByLemma 2.3, note thatE[gw(X)] =1 forXan

n-dimensional standard Gaussian.

LetDdenote the set of probability distributions onT. Our goal is to show that there existsD∈D such thatY ∼Dis√2·sb(T)-subgaussian. By homogeneity, we may replaceT byT/sb(T), and thus

assume thatsb(T) =1. To show the existence of the subgaussian distribution, we will show that

inf

D∈DwsupRnY∼ED[gw(Y)]≤2. (3.1)

Before proving the bound (3.1), we show that this suffices to show the existence of the desired √

2-subgaussian distribution. LetD∗∈Ddenote the minimizing distribution for (3.1). Then by definition ofgw, we have that

E

Y∼D∗[cosh(hw,Yi)]≤2e kwk2

(21)

With the bounds on the Laplace transform in (3.2), byLemma 2.3withβ =2 andσ=1, we have thatY

isplog22+1=√2-subgaussian as needed.

We now prove the estimate in (3.1). LetCdenote the closed convex hull of the functionsgw. More

precisely,Cis the closure of the set of functions

(

f :T →R≥0 f =

k

i=1

λigwi,k∈N, k

i=1

λi=1,λi≥0∀i∈[k],wi∈Rn∀i∈[k]

)

.

By continuity, we clearly have that

inf

D∈Dwsup Rn

E

Y∼D[gw(Y)] =Dinf∈DsupfCYE∼D[f(Y)]. (3.3)

The strategy will now be to apply theMinimax Theorem 3.7to (3.3). For this to hold, we first need that bothDandCare both convex and compact. This is clear forD, sinceDcan be associated with the standard simplex inR|T|. By constructionCis also convex, hence we need only prove compactness. Since

T is finite andCis a closed subset of non-negative functions onT,Ccan be associated in the natural way with a closed subset ofR|≥T0|. To show compactness, it suffices to show that this set is bounded. In

particular, it suffices to show that for f∈C, maxx∈T f(x)≤Mfor some universal constantM<∞. Since

every f∈Cis a limit of convex combinations of the functionsgw,w∈Rn, it suffices to show that sup

w∈Rn max

x∈T gw(x)<∞.

We prove this with the following computation:

sup

w∈Rn

max

x∈T gw(x) =wsup

Rn max

x∈T

cosh(hw,xi)

ekwk2 2/2

≤ sup

w∈Rn

cosh(maxx∈Tkxkkwk2)

ekwk2 2/2

≤ sup

w∈Rn

e(maxx∈Tkxk2)kwk2−kwk22/2

=e(maxx∈Tkxk2)2/2<

∞.

ThusCis compact as needed. Lastly, note that the functionEY∼D[f(Y)]onD×Cis bilinear, and hence

is both continuous and satisfies (trivially) the convexity-concavity conditions inTheorem 3.7. By compactness ofDandCand continuity, we have that

inf

D∈DsupfCYE∼D[f(Y)] =Dmin∈Dmaxf∈CYE∼D[f(Y)].

Next, by the MinimaxTheorem 3.7, we have that

min

D∈Dmaxf∈CYE∼D[f(Y)] =maxf∈CminD∈DYE∼D[f(Y)] =maxf∈Cminx∈T f(x)

= sup

f=∑ki=1λigwi

∑ki=1λi=1,λi≥0∀i∈[k] wi∈Rn∀i∈[k],k∈N

min

(22)

Take f =∑ki=1λigwi as above (now as a function onR

n) and letK

f ={x∈Rn: f(x)≤2}. Our task now reduces to showing that∃x∈T such thatx∈Kf. Since sb(T)≤1, it suffices to show thatγn(Kf) =

Pr[X∈Kf]≥1/2, forX then-dimensional standard Gaussian, and thatKf is symmetric and convex.

Since f is a convex combination of symmetric and convex functions, it follows thatKf is symmetric and

convex. Since f is non-negative, by Markov’s inequality Pr[X∈/Kf] =Pr[f(X)≥2]≤

E[f(X)]

2 =

1 2

k

i=1

λiE[gwi(X)] =

1 2

k

i=1 λi=1

2.

Hence Pr[X ∈Kf] =1−Pr[X ∈/Kf]≥1/2, as needed.

4

An

O

(

log

n

)-subgaussian random walk

In this section we describe the efficient sampling algorithm for aO(√logn)-subgaussian distribution supported on∑mi=1{−vi,vi}, where maxikvik2≤1. This gives an efficient version ofCorollary 3.6with a

weaker subgaussianity parameter.

The random walk algorithm is given asAlgorithm 1. Step 8can be executed in polynomial time by either calling an SDP solver, or executing the algorithm fromSection 5. The feasibility of the program is guaranteed byTheorem 5.1, and also by the results of [26]. The matrixU(t)instep 10can be computed by Cholesky decomposition.

Let us first make some observations that will be useful throughout the analysis. Notice first that the random processχ(0), . . . ,χ(T)is Markovian. Letui(t)be thei-th row ofU(t). By the definition ofΣ(t)

andU(t),kui(t)k2=Σ(t)iiequals 1 ifi∈A(t), and 0 otherwise. We haveχ(t)i−χ(t−1)i=γhui(t),r(t)i,

and, becauser(t)∈ {−1,1}n, by Cauchy-Schwarz we get

|χ(t)i−χ(t−1)i| ≤

(

γ

n ifi∈A(t),

0 otherwise. (4.1)

We first analyze the convergence of the algorithm: we show that, with constant probability, the random walk fixes all coordinates to have absolute value between 1−δ and 1. First we prepare a lemma.

Lemma 4.1. Let X0,X1,X2, . . .form a martingale sequence adapted to the filtration {Ft} such that X0∈[−1,1], and for every t≥1we haveE[(Xt−Xt−1)2|Ft−1] =σ2, and|Xt−Xt−1| ≤δ <1. Denote τ=inf{t:|Xt| ≥1−δ}. ThenE[τ]<(1−X02)/σ2.

Proof. Define Y0,Y1,Y2, . . .to be a martingale with respect toX0,X1,X2, . . .defined byYt =Xmin{t,τ}. Because|Xt−Xt−1|<δ, we easily see by induction that|Yt|<1 for allt≥0. Therefore, for anyt≥1,

we compute

1>EYt2= t

s=1

E E[(Yt−Yt−1)2|Ft−1] +X02

=

t

s=1

σ2Pr[τ≥s] +X02

(23)

Algorithm 1O(√logn)-subgaussian Random Walk Algorithm

1: Input:v1, . . . ,vn∈Rmsuch that maxni=1kvik2≤1; a vectory∈∑ni=1[−vi,vi].

2: Output: random signsχ1, . . . ,χn∈ {−1,1}such that∑ni=1χivi−yisO(

logn)-subgaussian. 3: LetV = (vi)ni=1.

4: Letγ=2 log22n n5/2 ;δ=

2√2 ln 2 log22n

n ;T=d2/γ2e · dlog22ne;

5: Letχ(0)∈[−1,1]nbe such that∑iχ(0)ivi=y.

6: LetA(1) ={1, . . . ,n} 7: fort=1, . . . ,T do

8: ComputeΣ(t)∈Rn×n,Σ(t)0, such that

Σ(t)ii=1 ∀i∈A(t) Σ(t)ii=0 ∀i6∈A(t) VΣ(t)VTIm.

9: Pickr(t)∈ {−1,+1}nuniformly at random.

10: χ(t) =χ(t−1) +γU(t)r(t), whereU(t)U(t)|=Σ(t). 11: A(t+1) ={i:|χ(t)i|<1−δ}.

12: end for

13: Setχi=sign(χ(T)i)for alli∈ {1, . . . ,n}.

14: ifA(T) =/0then 15: Returnχ

16: else

17: Restart algorithm fromline 5. 18: end if

By the monotone convergence theorem, we have thatσ2E[τ]<1−X02, which was to be proved.

The next lemma gives our convergence analysis of the random walk.

Lemma 4.2. With probability1,|χ(t)i| ≤1for all1≤t≤T and all1≤i≤n. With probability at least

1/2,|χ(T)i| ≥1−δ for all1≤i≤n.

Proof. We prove the first claim by induction ont. It is clearly satisfied fort=0; assume then that the claim holds up tot−1, and we will prove it fort. Ifi6∈A(t), thenχ(t)i=χ(t−1)i by (4.1), and the

claim follows by the inductive hypothesis. Ifi∈A(t), then|χi(t)|<1−δ, and by (4.1) and the triangle

inequality, we have|χi(t)|<1−δ+γ

n<1, where the final inequality follows becauseγ

n<δ.

To prove the second claim, we will show that Pr[|χ(T)i|<1−δ]≤1/2n holds for everyi. The

claim will then follow by a union bound. Let us fix an arbitraryi∈ {1, . . . ,n}. Define∆=d2/γ2e, and

letEj, 0≤ j≤(T−∆)/∆be the event that|χ(t)i|<1−δ for allt∈[j∆+1,(j+1)∆]. Observe that if

|χ(t)i| ≥1−δ, then|χ(s)i|=|χ(t)i| ≥1−δ for allt≤s≤T, and, thereforeχ(T)i<1−δ if and only

(24)

have

Pr[|χ(T)i|<1−δ] = T/∆−1

j=0

Pr[Ej|χ(j∆)]. (4.2)

Let τj = (j+1)∆ if Ej holds and τj =min{t ≥ j∆ :|χ(t)i| ≥1−δ}, otherwise. The sequence χ(j∆)i, . . . ,χ(τj)i, conditioned on χ(j∆), is a martingale. Moreover, since for any t ∈[j∆,τj] we

havei∈A(t), for all suchtwe get

E[(χ(t)i−χ(t−1)i)2|χ(t−1)] =γ2E|hui(t),r(t)i|2=γ2kuik22=γ2. (4.3)

By (4.1) and (4.3), the sequenceχ(j∆)i, . . . ,χ(τj)i, conditioned onχ(j∆), satisfies the assumptions of Lemma 4.1. By the lemma, we have

E[τj−j∆|χ(j∆)]≤

1−χ(j∆)2i γ2 ≤

1

γ2.

Since the eventEjholds only ifτj≥(j+1)∆, by Markov’s inequality we have

Pr[Ej|χ(j∆)]≤ 1 γ2∆ ≤

1 2.

This bound and (4.2) imply that Pr[|χ(T)i|<1−δ]≤2−T/∆≤1/2n, which was to be proved.

To prove that the walk is subgaussian, we will need the following martingale concentration inequality due to Freedman.

Theorem 4.3([18]). Let Z1, . . . ,ZT be a martingale adapted to the filtration{Ft}, let|Zt−Zt−1| ≤M for all t, and let Wt =∑tj=1Ej−1[(Zj−Zj−1)2|Fj−1] =∑tj=1Var[Zj|Fj−1]. Then for allλ ≥0andσ2≥0, we have

Pr[∃t s.t.|Zt−Z0| ≥λ and Wt ≤σ2]≤2 exp

− λ

2

2(σ2+Mλ)

.

Next we state the main lemma, which, together with an estimate on the error due to rounding, implies subgaussianity.

Lemma 4.4. The random variable∑ni=1χ(T)ivi−y is(γ

2T)-subgaussian.

Proof. DefineYt=∑ni=1χ(t)ivifor allt=1, . . . ,T. Notice thatY0=y. Let us fix aθ∈Sn−1once and for

all, and letZt=hθ,Ytifort=0, . . . ,T. We need to show that for everyλ >0, Pr[|ZT| ≥λ]≤2e−λ

2/2σ2

. We first observe thatZtis bounded, so we only need to considerλ in a finite range. Indeed, byLemma 4.2, Yt ∈∑ni=1[−vi,vi]with probability 1, so by the triangle inequality, kYtk2≤∑ni=1kvik2≤n. Then, by

Cauchy-Schwarz,|Zt| ≤nas well, and, therefore, Pr[|ZT|>n] =0. For the rest of the proof we will

assume that 0<λ≤n.

Observe thatZ0, . . . ,ZT is a martingale. First we prove that the increments are bounded: this follows

from the boundedness of the increments of the coordinates ofχ(t). Indeed, by the triangle inequality and

(4.1),

kYt−Yt−1k2=

n

i=1

(χ(t)i−χ(t−1)i)vi

2

n

i=1

(25)

Then, it follows from Cauchy-Schwarz that|Zt−Zt−1| ≤γn3/2.

Next we bound the variance of the increments. By the Markov property of the random walk,Zt−Zt−1

is entirely determined byχt−1. DenotingV = (vi)ni=1as in the description ofAlgorithm 1, we have

E[(Zt−Zt−1)2|χt−1] =θ|E[(Yt−Yt−1)(Yt−Yt−1)|]θ

=θ|VE[(χ(t)−χ(t−1))(χ(t)−χ(t−1))|]V|θ

=γ2θ|VU(t)E[r(t)r(t)|]U(t)|V|θ

=γ2θ|VΣ(t)V|θ

≤γ2.

The penultimate equality follows becauseE[r(t)r(t)|] =InandU(t)U(t)|=Σ(t), and the final inequality follows becauseΣ(t)was chosen so thatVΣ(t)V|Im.

We are now ready to applyTheorem 4.3. Using the notation from the theorem, we have shown that

M≤γn3/2, and thatWt ≤γ2tfor allt, and both bounds hold with probability 1. Letσ2=γ2T. First we

claim that for anyλ ≤n,Mλ ≤σ2. Indeed,

Mλ ≤γn5/2≤2 log2(2n)≤γ2T =σ2.

Now,Theorem 4.3and the above calculation imply that Pr[|ZT−Z0| ≥λ]≤2e−λ

2/4σ2

for all 0<λ ≤n. This proves the lemma.

Finally we state our main theorem.

Theorem 4.5(Restatement ofTheorem 1.4). Algorithm 1runs in expected polynomial time, and outputs a random vectorχsuch that the random variable∑ni=1χivi−y is O(

logn)-subgaussian.

Proof. LetE be the event that for alli, |χ(T)i| ≥1−δ (equivalently, thatA(T) = /0). The algorithm

takes returns ifE holds, and otherwise it restarts. ByLemma 4.2this event occurs with probability at least 1/2, so there will be a constant number of restarts in expectation. Since the random walk takesT

steps, whereT is polynomial in the input size, and each step can also be executed in polynomial time, it follows that the expected running time of the algorithm is polynomial.

Because the algorithm returns an output exactly whenE holds, the output is distributed as the random vectorχconditioned onE. Let us fix a vectorθ∈Sn−1once and for all. LetY be the random variable ∑ni=1χivi−yand letZ=hθ,Yi. LetYt andZt be defined as in the proof ofLemma 4.4. Lets=γ

√ 2T be the parameter with which we provedZT−Z0is subgaussian inLemma 4.4. We will show thatZ−Z0,

conditioned onE, is 2s-subgaussian, i. e., we will prove that Pr[|Z−Z0| ≥λ |E]≤2e−λ

2/8s2

. Observe that this inequality is trivially satisfied forλ ≤λ0=2

2 ln 2s, since the right hand side is at least 1 in this range. For the rest of the proof we will assume thatλ >λ0.

Conditional onE, and using the triangle inequality, we can bound the distance betweenY andY(T)

by

kY−Y(T)k2=

n

i=1

(sign(χ(T)i)−χ(T)i)vi

2

n

i=1

(26)

By Cauchy-Schwarz, we get that, conditional onE, |Z−ZT| ≤δn, which implies that |Z−Z0| ≤

|ZT−Z0|+δn, by the triangle inequality. Then, we have that, conditional on E, |Z−Z0| ≥λ ⇒

|ZT−Z0| ≥λ−δn. For everyλ >λ0,δn≤2p2 ln 2 log22n≤λ0/2<λ/2. Therefore, conditional on Eand for everyλ >λ0,|Z−Z0| ≥λ ⇒ |ZT−Z0| ≥λ/2. ByLemma 4.4we have

Pr[|Z−Z0| ≥λ |E]≤Pr[|ZT−Z0| ≥λ/2|E]≤2 Pr[|ZT−Z0| ≥λ/2]≤4e−λ

2/4s2

.

Recall that for everyλ>λ0, e−λ

2/8s2

<1/2, so the right hand side above is at most 2e−λ2/8s2, as claimed. Therefore,Z−Z0, conditioned onE, is 2s-subgaussian, and, sinces=O(√logn), this suffices to prove the theorem.

5

Constructive vector Komlós

In this section we give a new proof of the main result of [26] that the natural SDP for the Komlós problem has value at most 1. While the proof in [26] used duality, our proof is direct and immediately yields an algorithm to compute an SDP solution which only uses basic linear algebraic operations, and does not need a general SDP solver. This SDP solution can then be used in the random walk inSection 4. We state the main theorem next.

Theorem 5.1. Let v1, . . . ,vn∈Rmbe vectors of Euclidean length at most1, and letα1, . . . ,αn∈[0,1]. There exists an n×n PSD matrix X such that

Xii=αi ∀1≤i≤n, V XVTIm,

where V= (v1, . . . ,vm)is the n×m matrix whose columns are the vectors vi.

To proveTheorem 5.1we make use of a basic identity about inverses of block matrices. This is a standard use of the Schur complement and we will not prove it here.

Lemma 5.2. Let

A=

A11 A12 A21 A22

be a(k+`)×(k+`)block matrix, where A11is a k×k matrix, A12is a k×`matrix, A21is a`×k matrix, and A22is a`×`matrix. Assume A and A22are invertible, and write B=A−1in block form as

B=

B11 B12 B21 B22

,

where Bi j has the same dimensions as Ai j. Then B11= (A11−A12A−221A21)−1 (i. e., the inverse of the Schur complement of A11in A).

Figure

Figure 1: B is the triangular region beneath the red line

References

Related documents

In this 2013 conference, we will especially focus on effective teaching and learning, on attainment levels, on what students, teachers, schools and the educational system as

Connotatively Consistent and Connotatively Inconsistent Items 12 Life Orientation Test and Life Orientation Test- Revised 16 Factor Structure of the Life Orientation Test 16

When the relations between the concepts in this cut point are considered, the concept of laicism and nationalism words Parliament and the judiciary; secularism, republicanism

If the wave file of a particular word in Konkani is not present in the database then concatenation technique is used to play the word. In this the word is first broken

The total cholesterol levels were higher in 46.7% of the patients, with mean total cholesterol of 198.4±21.0. Similarly, reviewed studies by Harris reported

Women, with endometrial hyperplasia, have larger populations of CD20+, CD3+, and CD138+ and decreased amount of CD56+ in the stroma and glandular cells, as for women with

The first stage of the process is to choose the type of backup (full, differential or incremental), which item you want to backup and the format of the backup file. Comodo