Second-order cone programming certificates

10.5 Lower bounds on quantum query complexity

10.5.3 Second-order cone programming certificates

p = X

S∈(^[n]≤t)

c_Sχ_S (10.25)

For a fixed ε ≥ 0, a polynomial φ that is a feasible solution to the maximization problem in (10.25) with objective value strictly larger than 2ε is called a dual poly-nomial for f , and it is a certificate for deg_ε(f ) > t. Note that by LP duality such a certificate exists whenever deg_ε(f ) > t. Dual polynomials have been used to give tight bounds on the approximate degree of many Boolean functions, see for example [ˇSpa08, She13, BT13, BKT18].

Feasible solutions to the maximization problem in (10.25) provide feasible solu-tions to the SDP in (10.24). This gives a “direct” proof that dual polynomials give lower bounds on quantum query complexity.

Lemma 10.8. Let f : D → R and let φ : {±1}ⁿ → R be a feasible solution to (10.25) with objective value strictly larger than 2ε, then cb-degε(f ) > t.

Proof. Observe that the tuple (φ, X = 0, w = 0) forms a feasible solution to (10.20) with objective value strictly larger than ε. Indeed, X = 0 is positive semidefinite, it satisfies diag(X) = 0 = w · e and A₀(X) = 0. Moreover, the condition bφ(S) = 0 for all S ∈ ^[n]_≤t ensures that, for all i ∈ [n + 1]^t,

X_0,i= 2ⁿφ(Sb _i) = 0 since |S_i| ≤ t.

10.5.3 Second-order cone programming certificates

In (the proof of) Lemma 10.8 we have seen that the linear programming certificates of deg_ε(f ) > t correspond to SDP certificates (φ, X, w) = (φ, 0, 0) using the all-zeroes matrix X = 0 in (10.24). Here we consider a more general class of SDP certificates (φ, X, w) where X and w still have an easy structure: those certificates for which we can take X = ^{w v}_{v wI}^T for some vector v ∈ R⁽ⁿ⁺¹⁾^t and real number w.

This is based on the following observation.

10.5. Lower bounds on quantum query complexity 185

Proof. First note that A₀(X) = 0 is trivially satisfied by X. Indeed, A₀ignores the first row and column of X and, for all ` ∈ [t − 1], i, i⁰ ∈ [n + 1]^`, j, k ∈ [n + 1]^t−`, we have that

Xi j,i k= Xi⁰j,i⁰k=

(w if j = k 0 else.

Second, by considering the Schur complement of X with respect to its upper-left corner, we have X ∈ S^1+(n+1)

+ if and only if either X = 0, or w > 0 and w − v^Tv/w ≥ 0.

By restricting our attention to feasible solutions of the above form, the pro-gram (10.24) reduces to the following second-order cone propro-gram:

max − w +X

This second-order cone program involves the (1 + (n + 1)^t)-dimensional Lorentz cone. However, by counting the number of tuples i for which S_i equals a given set S we can reduce to dimension

and therefore (10.26) is equivalent to the following pair of primal/dual second-order

186 Chapter 10. Quantum query complexity and semidefinite programming

cone programs:

min 2ε max − w +X

x∈D

f (x)φ(x) −X

x /∈D

|φ(x)|

s.t. c = (cS)_S∈(^[n]_≤t) ∈ R(^[n]≤t), ε ∈ R s.t. φ = (φ(x))x∈{±1}ⁿ∈ R^{±1}ⁿ, w ∈ R

|p(x) − f (x)| ≤ 2ε for x ∈ D X

x∈{±1}ⁿ

|φ(x)| = 1

|p(x)| ≤ 1 + 2ε for x /∈ D v =

2ⁿp|IS| bφ(S)

S∈(^[n]≤t) X

S∈(^[n]≤t) c²_S

|I_S| ≤ 1 w ≥ kvk2

p = X

S∈(^[n]_≤t)

cSχS (10.27)

We note that strong duality holds since both the primal and dual are strictly feasible.

Lemma 10.10. If the optimal value of (10.27) is strictly larger than 2ε, then cb-degε(f ) > t.

Hence, the above forms a strengthening of the polynomial method. Indeed, any φ that is feasible for the maximization program in (10.25) (with objective > 2ε) will have low-degree Fourier coefficients equal to zero and therefore (φ, w = 0, v = 0) will be feasible for the maximization program in (10.27) (with objective > 2ε). Also, notice that compared to (10.25) the primal here has the additional constraint that the coefficients of the approximating polynomial have to be normalized (w.r.t. a weighted 2-norm).

Chapter 11

Quantum algorithms for semidefinite programming

This chapter is based on the paper “Quantum SDP-solvers: Better upper and lower bounds”, by J. van Apeldoorn, A. Gily´en, S. Gribling, R. de Wolf [vAGGdW17].

Some of the key ideas needed to provide the better upper bounds the title suggests have been generalized by Gily´en et al. [GSLW18], we have seen some of these gen-eralizations in Section 9.3. Here we use those gengen-eralizations to provide a cleaner presentation of the results of [vAGGdW17].

After seeing many applications of semidefinite programming in the preceding chapters, we turn our attention to solving semidefinite programs using quantum computers. The first contribution in this direction was due to Brand˜ao and Svore in 2016 [BS17]. They provided a quantum algorithm for solving semidefinite pro-grams, which in some regimes is faster than the best-possible classical algorithms in terms of the dimension n of the problem and the number m of constraints, but worse in terms of various other parameters. This chapter is based on [vAGGdW17], the first work to improve on the results of Brand˜ao and Svore, where we improve their algorithm in several ways, getting better dependence on those other parameters.

Subsequent progress in the same framework has been made in [BKL⁺17, vAG18a], which we briefly discuss in Section 11.5.

To be more concrete, let us recall the formulation of a pair of primal-dual semi-definite programs, and define some useful parameters. Given a set of matrices C, A₁, . . . , A_m∈ Sⁿ and a vector b ∈ R^m we can define a pair of semidefinite

pro-187

188 Chapter 11. Quantum algorithms for semidefinite programming

grams, a primal (P ) and a dual (D):¹

(P ) max hC, Xi (D) min hb, yi (11.1)

s.t. X ∈ Sⁿ₊ s.t. y ∈ R^m+

Tr(A_jX) ≤ b_j j ∈ [m]

j=1

y_jA_j− C ∈ Sⁿ₊

For the sake of normalization, let us assume that the operator norm of each of the matrices C, A₁, . . . , A_m is at most one. A special class of semidefinite programs is formed by linear programs, those SDPs for which all matrices involved are diagonal.

Under assumptions that will be satisfied everywhere in this chapter, strong duality applies: the primal and dual SDP (11.1) will have the same optimal value OPT. To talk about the complexity of SDP-solvers, let us define some parameters. Let s be the sparsity of the input matrices: the maximal number of non-zero entries per row (and hence also per column) of the input matrices. Let R be an upper bound on the trace of an optimal X. Let r be an upper bound on kyk1for an optimal y to the dual. Let ε > 0 be the desired additive error with which we want to approximate OPT. Assume that the rows and columns of the matrices of SDP (11.1) can be accessed as adjacency lists: we can query, say, the `th non-zero entry of the kth row of matrix Aj in constant time (this is the same sparse access model that we have seen in Section 9.3.2). One can define ‘solving’ an SDP in different ways. At the very least an SDP-solver should produce an additive approximation of the value OPT. On top of that, one can require a solver to output a primal or dual point that is feasible, or nearly feasible, with the stated objective value. The algorithms stated below provide at least an approximation of OPT, but they differ in the additional output. Our algorithm (see Theorem 11.1) will provide, with high probability, a feasible solution y to the dual that is optimal up to an additive error ε.

One way to divide SDP-solvers into two categories is by looking at the depen-dence of their runtime on R, r, and 1/ε.

The first class of SDP-solvers has a runtime that scales polylogarithmically in these parameters. This class of SDP-solvers encompasses for instance the ellip-soid method [GLS81], which is mainly of theoretical importance, and interior point methods [NN94], which are used in practice to solve SDPs. One can show that interior point methods can solve SDPs in time

O(√

nm(m²+ mn²+ n³)L),

where L is a measure of the size of the instance [BTN01, Sec. 6.6]. The dependence on m and n becomes prohibitive even for moderate size SDPs.

The second class of SDP-solvers, often based on first-order methods, often pro-vides a better runtime in terms of m and n, at the expense of a polynomial de-pendence on R, r, and 1/ε. In this chapter we focus on the matrix version of the

1Note that we slightly deviate from the presentation in (1.1) by allowing inequality constraints in the primal instead of equality constraints.

189 multiplicative weights update method due to Arora and Kale [AK16].² A typical classical runtime for SDP-solvers in this framework is of the form

which can provide a faster algorithm for SDPs with small values of Rr/ε. The framework of Arora and Kale should really be seen as a meta-algorithm, because it does not specify how to implement a certain crucial step, let us call this ‘the oracle’ for now.³ They themselves provide oracles that are optimized for special cases. For example for the MAXCUT SDP, they obtain a solver with near-linear runtime eO |E|/ε⁵ in the number of edges of the graph. For the sake of comparison, let us note that in [vAGGdW17] we show that one can get a general classical SDP-solver in their framework with complexity⁴

Oe nms Rr

The first quantum SDP-solver of Brand˜ao and Svore achieved a runtime of

where the degree of the polynomial term is at least 32. Note that compared to the classical runtime this provides a quadratic improvement in terms of the dependence on m and n. We subsequently modified their algorithm. These modifications both simplify and speed up the quantum SDP-solver, resulting in complexity

Oe √

mns² Rr ε

⁸! .

The dependence on m, n, and s is the same as in Brand˜ao-Svore, but our dependence on R, r, and 1/ε is substantially better. Note that each of the three parameters R, r, and 1/ε now occurs with the same 8th power in the complexity. This is no coincidence: as we show in [vAGGdW17, App. E], these three parameters can all be traded for one another, in the sense that we can massage the SDP to make each one of them small at the expense of making the others proportionally bigger.

These trade-offs suggest we should actually think of Rr/ε as one parameter of the

2See also [AHK12] for a subsequent survey; the same algorithm was independently discovered around the same time in the context of learning theory [TRW05, WK12]. In the optimization community first-order methods for semidefinite programming have been considered for instance in [Ren16, Ren19].

3We provide a complete overview of the Arora-Kale method in Section 11.2.1. We refer to that section for definitions and details. ‘The oracle’ should not be confused with the oracle access to the input data.

4Here, and in the rest of this chapter, the notation eO(·) is used to hide polylogarithmic factors in n, m, s, r, R and the desired additive error ε.

190 Chapter 11. Quantum algorithms for semidefinite programming primal-dual pair of SDPs, not three separate parameters. For the special case of LPs, we can improve the runtime to

Oe √

mn Rr ε

⁵! .

Finally, in terms of upper bounds on the complexity of SDP solving, we mention that the current state of the art is due to van Apeldoorn and Gily´en [vAG18a]

who provide an algorithm with a runtime of eO (√

m +√

n^Rr_ε )s ^Rr_ε ⁴

. We briefly discuss their result in Section 11.5.

Limitations of our approach. Given that the runtime of our algorithm depends polynomially on the factor Rr/ε, a natural question is how big this term can be, or needs to be. In other words, for a fixed SDP (i.e., fixed R and r), what is the error up to which we can efficiently solve the SDP? As we will argue, sometimes the ‘natural’

choice of error is inverse polynomial in n and m, which negates our ‘speed=up’. Let us briefly sketch why this is the case. As we will see, the output of our algorithm is a vector y ∈ R^m+ such thatPm

j=1y_jA_j−C 0 and |hb, yi−OPT| ≤ ε. The vector y will be very sparse, it will have O(T ) non-zero entries where T = O

Rr ε

² ln(n)

is the number of iterations of our algorithm. Such sparse vectors have some advantages, for example they take much less space to store than arbitrary y ∈ R^m. In fact, to get a sublinear running time in terms of m, this is necessary. However, this sparsity of the algorithm’s output also points to a weakness of these methods: if every ε-optimal dual-feasible vector y has many non-zero entries, then the number of iterations needs to be large. For example, if every ε-optimal dual-feasible vector y has Ω(m) non-zero entries, then these methods require T = Ω(m) iterations before they can reach an ε-optimal dual-feasible vector. Since T = O

Rr ε

² ln(n)

this would imply that ^Rr_ε = Ω(pm/ ln(n)), and hence many classical SDP-solvers would have a better complexity than our quantum SDP-solver. As we show in Section 11.3, this will naturally be the case for families of SDPs that have a lot of symmetry.

Lower bounds. What about lower bounds for quantum SDP-solvers? Brand˜ao and Svore already proved that a quantum SDP-solver has to make Ω(√

n +√ m) queries to the input matrices, for some SDPs. Their lower bound is for a family of SDPs where s, R, r, 1/ε are all constant, and is by reduction from a search problem.

Somewhat surprisingly, the subsequent work in [BKL⁺17, vAG18a] shows that this lower bound is in fact tight, in the setting where s, R, r, 1/ε are all constant.

Here we step away from this regime. We prove lower bounds that are quanti-tatively stronger in m and n, but for SDPs with non-constant R and r. The key idea is to consider a Boolean function F on N = abc input bits that is the com-position of an a-bit majority function with a b-bit OR function that is composed with a c-bit majority function. The known quantum query complexities of major-ity and OR, combined with composition properties of the adversary lower bound, imply that every quantum algorithm that computes this function requires Ω(a√

bc)

11.1. Basic approach 191 queries. We define a family of LPs, with constant 1/ε but non-constant r and R, such that constant-error approximation of OPT computes F . Choosing a, b, and c appropriately, this implies a lower bound of

Ωp

max{n, m}(min{n, m})^3/2

queries to the entries of the input matrices for quantum LP-solvers. Since LPs are SDPs with sparsity s = 1, we get the same lower bound for quantum SDP-solvers.

If m and n are of the same order, this lower bound is Ω(mn), the same scaling with mn as the classical general instantiation of Arora-Kale (11). In particular, this shows that we cannot have an O(√

mn) upper bound without simultaneously having polynomial dependence on Rr/ε. The value of Rr/ε in the proof of our lower bound implies that for the case m ≈ n, this polynomial dependence has to be at least (Rr/ε)^1/4.

Organization. This chapter is structured as follows. We first provide an informal overview of the Arora-Kale framework for solving SDPs in Section 11.1. This allows us to point out where the quantum improvements come from in Section 11.1.1. We then give a formal proof of our quantum SDP-solver in Section 11.2. We then proceed by highlighting the limitations of quantum SDP-solvers. First we consider SDP-solvers in the Arora-Kale framework (that are not tuned to specific classes of SDPs): we show that the inherent sparsity of the provided solutions puts a lower bound on the runtime for SDPs whose good solutions are dense (Section 11.3). We then prove some general lower bounds on the runtime of quantum LP-solvers and therefore quantum SDP-solvers (Section 11.4). Finally, in Section 11.5 we describe subsequent progress.

11.1 Basic approach

Arora and Kale [AK16] showed how to approximate OPT using a matrix version of the “multiplicative weights update” method. In Section 11.2.1 we will describe their framework in more detail, but in order to describe our result we will start with an overly simplified sketch here. The algorithm goes back and forth between candidate solutions to the primal SDP and to the corresponding dual SDP. Recall that under assumptions that will be satisfied everywhere in this chapter, strong duality applies: the primal and dual SDP (11.1) will have the same optimal value OPT. The algorithm does a binary search for OPT by trying different guesses α for it. Suppose we have fixed some α, and want to find out whether α is bigger or smaller than OPT. This is now a feasibility problem and we will try to construct a feasible solution to the dual with objective value at most α or show that it does not exist. Start with some candidate solution X⁽¹⁾ for the primal, for example a multiple of the identity matrix (X⁽¹⁾ has to be psd but need not be a feasible

192 Chapter 11. Quantum algorithms for semidefinite programming solution to the primal). This X⁽¹⁾ induces the following polytope:

Pε(X⁽¹⁾) := {y ∈ R^m: b^Ty ≤ α, (11.2)

This polytope can be thought of as a relaxation of the feasible region of the dual SDP with the extra constraint that OPT ≤ α: instead of requiring that P

jy_jA_j− C is psd, we merely require that its inner product with the particular psd matrix X⁽¹⁾ is not too negative. The algorithm then calls an “oracle” that provides a y⁽¹⁾∈ Pε(X⁽¹⁾), or outputs “fail” if P0(X⁽¹⁾) is empty (how to efficiently implement such an oracle depends on the application). In the “fail” case we know there is no dual-feasible y with objective value ≤ α, so we can increase our guess α for OPT, and restart. In case the oracle produced a y⁽¹⁾, this is used to define a Hermitian matrix H⁽¹⁾and a new candidate solution X⁽²⁾for the primal, which is proportional to e^−H⁽¹⁾. Then the oracle for the polytope Pε(X⁽²⁾) induced by this X⁽²⁾is called to produce a candidate y⁽²⁾ ∈ Pε(X⁽²⁾) for the dual (or “fail”), this is used to define H⁽²⁾ and X⁽³⁾ proportional to e^−H⁽²⁾, and so on.

Surprisingly, the average of the dual candidates y⁽¹⁾, y⁽²⁾, . . . converges to a nearly-dual-feasible solution. Let w^∗ be the “width” of the oracle for a certain SDP: the maximum of

over all psd matrices X and all vectors y that the oracle may output for the corresponding polytope Pε(X). In general we will not know the width of an oracle exactly, but only an upper bound w ≥ w^∗, that may depend on the SDP; this is, however, enough for the Arora-Kale framework.

In Section 11.2.1 we will show that without loss of generality we can assume the oracle returns a y such that kyk₁≤ r (recall that r is an upper bound on kyk1 for an optimal y to the dual). Because we assumed kAjk, kCk ≤ 1, we have w^∗≤ r + 1 as an easy width-bound. General properties of the multiplicative weights update method guarantee that after T = eO(w²R²/ε²) iterations, if no oracle call yielded

“fail”, then the vector _T¹ PT

t=1y^(t) is close to dual-feasible and satisfies b^Ty ≤ α.

This vector can then be turned into a dual-feasible solution by tweaking its first coordinate, certifying that OPT ≤ α + ε, and we can decrease our guess α for OPT accordingly.

The framework of Arora and Kale is really a meta-algorithm, because it does not specify how to implement the oracle. They themselves provide oracles that are optimized for special cases, which allows them to give a very low width-bound for these specific SDPs. As mentioned before, for example, for the MAXCUT SDP, they obtain a solver with near-linear runtime in the number of edges of the graph.

They also observed that the algorithm can be made more efficient by not explicitly calculating the matrix X^(t) in each iteration: the algorithm can still be made to work if instead of providing the oracle with X^(t), we feed it good estimates of Tr(A_jX^(t)) and Tr(CX^(t)).

In document Applications of optimization to factorization ranks and quantum information theory (Page 192-200)