Cheap checking for cloud computing : statistical analysis via annotated data streams

(1)

warwick.ac.uk/lib-publications

Original citation:

Cormode, Graham and Hickey, Christopher (2018) Cheap checking for cloud computing :

statistical analysis via annotated data streams. In: The 21st International Conference on

Artificial Intelligence and Statistics, Playa Blanca, Lanzarote, Canary Islands, 9-11 Apr 2018.

Published in: Proceedings of the 21st International Conference on Artificial Intelligence and

Statistics, 84

Permanent WRAP URL:

http://wrap.warwick.ac.uk/99395

Copyright and reuse:

The Warwick Research Archive Portal (WRAP) makes this work by researchers of the

University of Warwick available open access under the following conditions. Copyright ©

and all moral rights to the version of the paper presented here belong to the individual

author(s) and/or other copyright owners. To the extent reasonable and practicable the

material made available in WRAP has been checked for eligibility before being made

available.

Copies of full items can be used for personal research or study, educational, or not-for-profit

purposes without prior permission or charge. Provided that the authors, title and full

bibliographic details are credited, a hyperlink and/or URL is given for the original metadata

page and the content is not changed in any way.

Publisher’s version:

Proceedings of the 21st International Conference on Artificial Intelligence and Statistics

(AISTATS) 2018, Lanzarote, Spain. PMLR: Volume 84. Copyright 2018 by the author(s).

A note on versions:

The version presented here may differ from the published version or, version of record, if

you wish to cite this item you are advised to consult the publisher’s version. Please see the

‘permanent WRAP URL’ above for details on accessing the published version and note that

access may require a subscription.

(2)

Statistical Analysis via Annotated Data Streams

Graham Cormode Christopher Hickey

University of Warwick

Abstract

As the popularity of outsourced computation in-creases, questions of accuracy and trust between the client and the cloud computing services be-come ever more relevant. Our work aims to pro-vide fast and practical methods to verify anal-ysis of large data sets, where the client’s com-putation and memory costs are kept to a mini-mum. Our verification protocols are based on defining “proofs” which are easy to create and check. These add only a small overhead to re-porting the result of the computation itself. We build up a series of protocols for elementary sta-tistical methods, to create more complex proto-cols for Ordinary Least Squares, Principal Com-ponent Analysis and Linear Discriminant Analy-sis, and show them to be very efficient in practice.

1 Introduction

The massive leap in popularity of machine learning tech-niques can be attributed in part not simply to novel al-gorithms, but also to dramatic increases in scale: much larger models with many parameters to set optimally, and much larger training data sets to determine these parame-ters. However, this presents a challenge to data owners who do not have a convenient data centre at their disposal. The size of data and computational cost in order to extract ac-curate models begins to look prohibitive. At the same time, a potential answer has emerged, in the form of outsourced computation. That is, instead of building the infrastructure needed to store and analyse large quantities of data, compu-tation can be ‘rented’ on demand. Initially cloud offerings provided only the barebones of a remote system, but cur-rent options provide many tools, libraries and algorithms available to take “off the shelf”.

Proceedings of the 21st International Conference on Artificial

One doubt remains. If we send data off to the cloud, and request some analysis to be performed, what guarantee do we get that the processing has been done to our satisfac-tion? The provider has an economic incentive to cut cor-ners: to perform the computation on only a sample of pro-vided data, or to terminate an iterative parameter search before convergence has occurred, for example. Such short cuts yield plausible but suboptimal models. So how could we be assured that the best model has been found, with-out repeating the computation ourself or having multiple providers repeat the work, substantially driving up costs? In this paper, we adopt the Annotated Data Streams ap-proach (Chakrabarti et al., 2009) for verifying the models found by an outsourced provider Rather than repeating all or some of the computation, we instead provide protocols in which the cloud also provides some extra information that allows us to check a strict adherence to the required computation. The overhead for the cloud provider is min-imal – often, the required information is a relatively low cost function of the input data or natural by-products of the target computation. These do not restrict the cloud to use any particular implementation or algorithm; just that they demonstrate that the output meets certain necessary properties. The key part of these protocols is that the in-formation required is very easy for the original data owner to check, based on appropriately defined fingerprints of the input. These fingerprints can be computed flexibly and in-crementally from the input as it arrives in any order, so the data owner does not even need to retain a complete copy of the input. The overhead for the data owner is therefore low: it is typically dominated by the cost of sending the data to the cloud and receiving the output of the computa-tion. If the data owner’s checks pass, then they are assured that the computation has been performed by the cloud sat-isfactorily, with a very high degree of certainty. One can then think of these protocols as providing effective “check-sums for computation”.

(3)

verifier corresponded exactly to the powerful class of com-putations that could be performed using polynomial space (PSPACE) (Babai, 1985; Goldwasser and Sipser, 1986; Shamir, 1992). Such results were initially thought to be of purely theoretical interest. A decade ago, this topic was re-visited from a perspective closer to our own: to what extent could arbitrary programs be checked without fully repeat-ing them (Goldwasser et al., 2008)? Several strong models were proposed, allowing a large class of computations to be checked in this way. However, the costs were still typically large: programs have to be compiled into non-standard for-mats (such as circuits with gates performing arithmetic op-erations), and the overheads for the cloud can be very sub-stantial, often hundreds or even millions of times slower than directly performing the computation.

We break away from this paradigm, and achieve protocols that have minimal overheads by deliberately narrowing the scope of the computations considered. By focusing on a collection of important tasks in machine learning based on linear algebra, we can provide bespoke protocols based on “fingerprinting” the input data that take advantage of specific structure in the target problem. These problems are of sufficient generality that our efforts are repaid. Af-ter surveying prior work and providing a technical back-ground (Section 2), we begin with primitives for ubiqui-tous steps in data analysis: matrix multiplication and inver-sion, Cholesky decomposition and eigenvalue finding (Sec-tion 3). In Sec(Sec-tion 4, we present applica(Sec-tions of these to tasks of interest: regression, PCA, and LDA. These core tasks are sufficient to show the power of this paradigm. We provide empirical validation of our claims in Section 5. This work represents some first steps in verifying out-sourced computation of machine learning. The next steps are to extend this work to more complex models and algo-rithms currently enjoying popularity in machine learning, such as deep learning and beyond. Since, at the risk of drastic oversimplification, almost all of machine learning can be performed via numerical data encodings (vectors, matrices and tensors) combined with optimization, we are optimistic that the foundations laid in this work will nat-urally extend to further protocols for common mining and modelling tasks.

2 Preliminaries

2.1 Annotated Data Stream Model

In our protocols, we have a data stream S, which is ob-served by two parties, a “helper” (H) and a “verifier” (V). The data stream will usually be arranged as a sequence of

ntuples of elements, where each tuple typically defines an element of a larger structure, such as a matrix or vector. Abstractly, the verifier wishes to compute some function on_S,f(_S), with assistance from the helper. Typically, the

helper will provide the value off(S), along with a proof

of its correctness. This yields theAnnotated Data Stream

model, introduced by Chakrabarti et al. (2009). We formal-ize the model via the definitions below.

Definition 1. We have a helperH, and a verifierV, with

the aim of cooperating to compute some function f(_S)of the streamS. The helper provides a messageMH(S) com-prised of ,_fˆ_{, the claimed value}_f₍_S₎_{, and a proof}_P_H_which

supports this claim according to some pre-agreed structure. The output ofV based on the streamS,V’s randomly

cho-sen bitsRV, and the helper’s message is

OutV(S,RV, MH(S)) = (

ˆ

f IfV is convinced

⊥ Otherwise

A protocol is defined by the functionsMH andOutV. We say that a protocol iscompleteif

∃H :_P[OutV(S,RV, MH(S)) =f(S)] = 1 (1)

The Verifier’s protocol issoundif

∀H0_,_S0_:_Ph_Out

V(S0,RV, MH0(S0))∈ {/ f(S0),⊥}

i ≤ 1

3

Intuitively this says that we seek protocols so that an honest helper (one who faithfully follows the protocol) can always persuade the verifier to accept the correct answer, while a dishonest helper cannot persuade the verifier to accept an incorrect result with more than a constant probability. Our protocols allow this probability to be reduced to an arbitrar-ily small value with minimal cost.

In order to show an annotated data streaming protocol, we need to show completeness and soundness. This is suffi-cient to check that our protocol will successfully do what we want: verify the computation with a high probability of detecting a malicious helper. Trivial protocols always exist wherein the helper’s message is null, and the verifier evaluates the function in full. Consequently, we seek proto-cols whose costs for the verifier are substantially lower than this. Ideally, the protocol should run in sublinear memory space for the verifier, without the need for intensive com-putation and the message should be as small as possible. Similarly, we seek protocols where the honest helper does not have to do substantially more work than simply com-puting f(_S). Our focus in this paper are on the memory

space for the verifier, and the size of the proof, which we call the communication cost.

Definition 2. A (h,v)-protocol is a valid annotated data streaming protocol using a message of sizeO(h), andO(v)

space cost for the verifier.

2.2 Fingerprint techniques

(4)

Finite Fields.In line with prior work, all our protocols rely on computations performed over finite fields. For ease of implementation, we use prime fields. Given a primeq, the finite (prime) fieldFq is the set{0. . . q−1}with addition and multiplication moduloq. Hence, storing field values re-quiresO(logq)bits. We make use of the fact that in many cases arithmetic in the field and arithmetic over the integers is in exact correspondence. However, as we consider more complicated computations, we encounter situations where we seek solutions over the reals, which do not correspond to solutions in the field. To avoid this, we will use scaling and rounding techniques to approximate using field values. Specifically, we consider the input to be fixed precision ra-tional numbers which can be represented as members of the set Fρ,M = {x ∈ R∩[−M, M] : bρx ∈ Z}, with respect to a base, b. We then choose the field sizeqas a

function ofρandM, in order to allow us to maintain the

exact correspondence between the field and the fixed pre-cision rationals. We map y _{∈ F}ρ,M toy0 ∈ Fq, where

y0 _{= mod (}_xbρ_{, q}₎_{, choosing}_q_{to be a prime bigger than}

(2M+ 1)bρ_.

Fingerprints.Most of the values stored by our verifier will be fingerprintsof large objects, such as vectors or matri-ces. Fingerprints (based on randomly chosen seedsx) have

the property that if two fingerprints agree, then with high probability the objects that gave rise to the fingerprints are identical.

Definition 3(Matrix fingerprint). ForA∈Fn×m

q , the ma-trix fingerprint of A isfmat

x (A)withx∈RFq and

fmat

x (A) = Pn−1

i=0

Pm−1

j=0 Aijxin+j

From the Schwartz-Zippel Lemma (Shamir, 1992), we have that for a randomly chosen x _∈ Fq, given A, B ∈

Fn×m

q , if A 6= B, then Pr[fx(A) = fx(B)] ≤ nm_q−1. Therefore for sufficently large q(say, greater than 3nm)

we will always have soundness and completeness for ver-ifying equality with fingerprints. We can view vectors as a special case of matrices, so similarly for vectorsu∈ Fq we use the notationfvec

x (u) = Pn−1

i=0 uixito denote a

vec-tor fingerprint. Fingerprints have several useful properties, such as linearity, withfx(A+B) =fx(A) +fx(B).

2.3 Related Work

We briefly survey the most related work in this area. Chakrabartiet al.(Chakrabarti et al., 2009) introduced the annotated streams model and provided protocols for fre-quency moments in data streams, and several graph prob-lems, including triangle counting, connectivity and bipar-tite matchings. They also introduced a square matrix mul-tiplication protocol built extending the classical result of Freivalds (1979). Subsequently, Cormode et al. (2013) pro-vided further protocols for graph problems, using linear and integer programs to validate optimal matchings and

shortest paths. More recently, Daruki et al. (2015) extended results on matrix analysis, provided more general protocols for matrix multiplication, and a protocol for eigenvalue (but not eigenpair) checking. For matricesA_∈_Fk×n

q andB ∈

Fn×k0

q , they show an (kk0hlog(q), vlog(q))−protocol, where hv ≥ n. These protocols are used to perform

shape fitting and clustering, although they shift away from annotated data streams and towards an interactive proof model which allows several exchanges of messages be-tween helper and verifier. The annotated data stream model is generalized by definitions ofstreaming interactive proofs

(SIPs) (Cormode et al., 2011, 2012). Note that annotated data stream verification protocols can be considered as sin-gle message SIPs.

3 Linear Algebraic Checks

In this section, we define protocols for checking a variety of linear algebraic primitives, based on careful use of fin-gerprints. We use the notationA→

i to denote theith row of the matrixA, andA↓_i to denote theith column ofA.

3.1 Fingerprinting the Gramian Matrix

We first show how to efficiently build a fingerprint of the Gramian matrixG=AT_A_{given a stream that specifies}_A_.

Lemma 1. ForA_∈Fn×m q ,

fmat

x (ATA) =Pni=1fxvecm(A→_i )f_xvec(A→_i )

Proof. Givenp, q_∈Fm q ,

fmat

x (p⊗q) = Pm−1

i=0

Pm−1

j=0 piqjxim+j

=Pm_i₌₀−1piximPmj=0−1qjxj

=fvec

xm(p)f_xvec(q)

,

Using the outer product definition of matrix multiplication;

fmat

x (ATA) =Pn−

1

i=0 fxmat

(AT₎↓

i

(A→

i )

=Pn_i₌₀−1fvec

xm(A→_i )f_xvec(A→_i )

Hence, we can compute the fingerprint of AT_A _from _A row-by-row and summing the product of row vector finger-prints. This immediately implies a protocol to verify that a matrixGprovided byH is the Gramian: simply use the above identity to computefx(ATA)from the stream, and check that this is equal tofx(G). Soundness and complete-ness follow immediately from properties of fingerprints in Section 2.2, and the cost isO(m2_log_q₎_{communication to}

specifyG, while the verifier can maintain the needed fin-gerprints in spaceO(logq).

3.2 Matrix Multiplication

We generalize the previous protocol to solve matrix multi-plication. Given two matrices,A_∈Fk×n

q andB ∈Fn×k

0

(5)

a similar proof to Lemma 1 shows

fxmat(AB) = n_X−1

i=0

f_xveck0(A↓_i)fxvec(Bi→) (2)

Equation (2) allows for an efficient matrix multiplication protocol, where the main work of the helper is to repeat the input matrices in a convenient order. To find the fingerprint ofABwe need to see each column of A and row of B

in-terleaved in order, i.e. MH =hA˜↓0,B˜→0 , ...,A˜↓n−1,B˜n→−1i.

The verifier uses fingerprints to check that the reordered versions ofAandBagree with the versions present in the

streamS, and that the claimed matrix productC has the

same fingerprint as the fingerprint computed via (2).

Theorem 1. There is a (max(k0k, kn, k0n) log(q),

log(q))-protocol for verifying matrix multiplication with

A∈Fk×n

q ,B ∈Fn×k

0

q .

Soundness and completeness are again immediate from the properties of fingerprints. The verifier keeps a constant number of fingerprints in space O(log(q)), and the

com-munication cost ofO(max(k0_{k, kn, k}0_n_{) log(}_q₎₎_{is due to}

sending each ofA,BandAB.

Matrix multiplication has been considered in prior work on annotated streaming. Daruki et al. (2015) showed that any protocol for this problem must have the product of the communication cost and space cost at leastΩ((k+k0₎_n₎_.

Our protocol achieves this lower bound up to logarithmic factors (noting that any protocol which reports(AB)

re-quiresΩ(kk0₎_{communication for this step). The previous}

best rectangular matrix multiplication protocol, achieved by Daruki et al. (2015), was a kk0hlog(q), vlog(q)

-protocol with hv ≥ n, that used the inner product

proto-col of Chakrabarti et al. (2009). Our protoproto-col can be un-derstood as settingv = 1, but removing the high overhead

factor ofnfrom the communication cost in this case. When

our input matrices are constituted of fixed precision ratio-nals, we chooseqas follows:

Corollary 1. GivenA_{∈ F}_ρ,Mk×n andB _{∈ F}_ρ,Mn×k0, the ma-trixABis inF2nρ,nM×n 2, and we chooseq > n(2M+ 1)222ρ

so that the product can be represented exactly in the field.

Choosing q to be this large means that when we move A and B from Fρ,M to A,˜ B˜ ∈ Fq by multiplying by

2ρ_{, and then compute}_A_˜_B_˜ _{all these values remain in} _F q, and by scaling back down by 22ρ _{we get our result in} the desired format, without wraparound or rounding er-rors. The memory required to store elements of Fq is

log(q) = O(log(M) + log(b)ρ), which is proportional to

the space required to store the original matrices inFρ,Mis

log(M) + log(b)ρ.

Lower bounds on computing matrix fingerprints. The verifier needs very little memory for the above protocol, since the matrix is provided in a convenient order. More

generally, we would like to be able to findfxmat(AB)from justfmat

x (A)andfxmat(B), without requiring the helper to repeat these. This would reduce communication and sim-plify the protocol; however, we show this is not possible.

Theorem 2. Any functiongwithfmat

x (AB) =g(fxmat(A),

fmat

x (B))forA, B∈Fnq×nrequires that fingerprintsfxmat are at leastΩ(n)bits in size.

Proof. We make use of a hard problem from communica-tion complexity to show the space lower bound. In the DIS-JOINTNESSproblem, two players Alice and Bob each have a bit string,a,b _{∈ {}0,1_}n_{, and they wish to see whether} for anyi _∈[n]they haveai =bi = 1. If we had a func-tionfmat

x : Fnq×n →Fq they could createn×nmatrices A and B withaandbon the diagonals and 0’s elsewhere.

Then Alice could send Bobfmat

x (A), Bob could compute

fmat

x (B), and then find fxmat(AB) using g.Observe that

AB = 0iff stringsaandb are disjoint, and is non-zero

otherwise. So by comparing fxmat(AB)to fxmat(0), we can determine the answer to the disjointness problem. The fingerprints must be at leastΩ(n)bits from the correspond-ing communication complexity ofDISJOINTNESS(H˚astad and Wigderson, 2007).

3.3 Eigenvalue Check

For the subsequent problems, we need to apply scaling and rounding, as mentioned earlier. There is a tension here, since matrix computations can include values which are very large compared to the input values. We also need to ensure that our approximation tolerance always allows an honest helper to find a satisfying answer, but prevents a dis-honest helper from getting a wildly wrong answer accepted. We first work with (approximate) eigenvectors and eigen-values of a symmetric matrix, i.e. a matrix such that ∀i, j. Aij = Aji. We wish to find pairs (λi, vi)over the reals, for alli _∈ [n], such thatAacts on each of the vec-tors by only scaling them by λi and that the vectors are orthonormal, i.e.

Avi=λivi vTi vj=δij = (

0 i₆=j

1 i=j

Mapping these eigenpairs to the finite field can be deli-cate, since they may not align with coordinates in the field. Our protocol relies on a scaling factor T, which is used

to multiply up values from the original domain. The field size qmust grow by a corresponding factor to accommo-date the large range of values. To tolerate this, we relax to allow approximate eigenvectors as defined below. We first show that rounding to the scaled field FT q is always possible. Consider a particular eigenvector vi, and write

b

vi = T vi+r, withrj ∈ −₂1,1₂, andλbi = T λi +ρ, whereρ_∈₋1

2, 1 2

(6)

Theorem 3. For symmetricA ∈ Fn×n

q if(λi, vi), for all

i _∈ [n]are the eigenpairs ofA andvbi = T vi+r, with

r_∈₋1 2,

1 2

n

,λbi=T λi+ρforρ∈−1₂,1₂, then (inR)

|T Av_bi−λbivbi| ≤

T n_kA_kF+

T√n

2 +

n

4

|vbiTvbj−T2δij| ≤ l

T√n+n 4

m

These bounds show how the error scales with the rounding factorT. To better interpret this, we next show that this

error can be made arbitrarily small by increasingT.

Theorem 4. If we have

|T Avbi−λbivbi| ≤

T nkAkF+T √_n

2 +

n

4

|vbiTvbj−T2δij| ≤ l

T√n+n 4

m

Then we have (inR)

|T λi−λbi|

T ≤

2n√nkAkF

T +

n T +

n√n

2T2 :=EA,n(T)

andEA,n(T)→0asT → ∞

This means that if we want to find eigenvalues within cer-tain error, we simply have to check that the bounds of The-orem 4 hold, using a T∗ satisfying As kAkF = Ω(1), this expression isO(n3/2kAkF

T ). Hence it suffices to pick

T =O(n3/2

kA_kF/).

Theorem 5. There is an annotated steaming

n2_log_qn3

2kAk_F/

,logqn32kAk_F/

−protocol for finding the eigenvalues of a symmetric matrix

A∈Fn×n

q to a precision of >0.

Proof. The protocol is relatively straightforward. The ver-ifier can computekAkF and a fingerprint ofA as the in-put is received. The helper provides the claimed matrix of (approximate) eigenvectorsV and corresponding

eigen-values D We then make use of the matrix multiplication

protocol to compute (T A ₋D)V. If the eigenvectors were exact, this would be 0; since they are approximate, we allow each entry in the result matrix to be at most

T n_kA_kF +1₂T√n+1₄n(computed over the reals and rounded to the finite field), and can verify that each entry of the product satisfies this bound. We also need to check that the eigenvectors are (almost) orthogonal, that is that each entry of(VT_V

−T2_I₎_{is at most}_T√_n₊1 4n

_{. Thus} we need two invocations of the matrix multiplication proto-col, evaluated over a sufficiently large finite field. SettingT

according to the above bounds gives the claimed costs.

Once again, we consider the consequences of our input be-ingA_{∈ F}_ρ,Mn×n, and choose the value ofqas follows

Corollary 2. Given A ∈ Fρ,Mn×n and desired precision

, our matrices V and D are in F2nρ×0,nMn 2, where ρ0 = max_{ρ,₋log_}. Therefore, to keep a direct

correspon-dence between verification done in the field, and our real values, we chooseq > n2₍₂_M_{+ 1)}2₂2ρ0_.

This value ofqis determined by the matrix multiplication

step, and the fact that the largest possible eigenvalue ofA

is bounded byn||A||max, where||A||maxisM.

3.4 Matrix Inversion

For matrix inversion we again scale the field by a factor

T. Over this expanded field, given ann×nmatrixA, we

want to find a matrixBsuch thatAB=T I. To allow for

rounding, we can relax the requirement of exact equality, and instead seek a matrixBthat acts approximately like an

inverse. We first show that such a matrix is guaranteed to exist.

Lemma 2. For an invertible A _∈ _Fn×n

q , we can find a matrixB_∈_Fn_qT×nsatisfying

||AB−T I||max≤T (3)

ifT _≥ n2||A||max

2 and >0

Proof. Consider the true inverse ofA,A−1, computed over

the reals. Then we can find B = T A−1 ₊_E _{so that} B _∈ _Zn×n _and

∀i, j._|Ei,j| ≤ 1₂. We can then map the entries ofBdirectly into the fieldFT q. We then have

kAB−T Ikmax≤ kAEk2≤ 1₂n2kAkmax

That is, the error is at most n2 kAkmax

2T . We can set the pa-rameterTto be as large as needed to make this error value

some smallat quite modest cost: the field values are

rep-resented usingO(logn+ logkAkmax+ log 1/)bits.

Theorem 6. There is a streaming interactive

n2_log(_n

||A_||maxq/),log(n_||A_||maxq/)-protocol to invert an invertible matrix A _∈ _Fn×n

q with the above criteria,(3).

Proof. The protocol is based on the above Lemma: we require the helper to provide such a matrix B under the extended field FT q, along with the claimed value of AB. We then run the above protocol for matrix multiplica-tion, and check that each entry of AB meets the

re-quired size bound. This ensures that the rere-quired con-dition on entries is met. The cost of storing the fin-gerprints is O(log(n_kA_kmaxq/)), and the

communica-tion costs come from sending B over FT q, which is

O(n2_log(_n_k_A_k_maxq/₎₎_.

In the case that the matrix is singular, the helper could demonstrate this by showing that there is an eigenvalue of

(7)

Ensuring that everything is representable within finite pre-cision is a little more involved, sinceA−1_{can contain very}

large values, depending on the condition numberκ(A) =

||A−1

||2· ||A||2. Since||A||2≥ ||A||max, we see that κ(A) =||A||2||A−1||2≥ ||A||max||A−1||max

So, given a bound onκ(A), we choose a field large enough

to represent κ(A)

||A||max ≥ ||A −1

||max.

3.5 Cholesky Decomposition

If we have a positive semi-definite matrixA _∈ Fn×n q , the Cholesky Decomposition ofAinvolves finding a lower

tri-angular matrixL_∈Rn×n_{, with}_A₌_LLT_{. As before, we} seek an approximate answer,Lb∈Fn×n

qT .

Theorem 7. Given a positive semi-definite matrix A _∈ Fn×n

q , and T ≥

2nkAkF

there is an annotated steam-ing(n2_log(_qn

kA_kF/),log(qnkAkF/))-protocol to find

ˆ

L_∈Fn_qT×nsatisfying

||LˆLˆT₋_T2_A_||

max≤T2

Proof. The protocol has the helper provide_Lˆ _∈ _Rn×n _in order, and the helper executes the matrix multiplication protocol. It is straightforward to check that _Lˆ _{is lower}

triangular as it is presented. For a Lb = T L+E with Ei,j ∈−1₂,1₂, we have

kLbLbT −T2Akmax≤ kT LET +T ELT +EETk2

≤2TkLk2kEk2+kEk22

≤T n_kA_kF+n

2

4

So we have an error at most nkAkF

T +

n2

4T2. Picking

T ≥ 2nkAkF/suffices for any given (sincekAkF ≥

1). Via the matrix multiplication protocol, we just check

||LˆLˆT ₋_T2_A_||

max ≤ T2. The resulting protocol has

communication costn2_log(qnkAkF

)and memory cost of

log(qnkAkF

), as claimed.

3.6 Symmetric Generalised Eigenvalues

The Cholesky Decomposition allows us to solve the sym-metric generalised eigenvalue problem forA, B _∈ Fn×n

q , withAsymmetric, andBsymmetric positive semi-definite:

FindV, D_∈_Rn×n_{such that}_AV ₌_{BV D} We do this by finding the Cholesky Decomposition of

B, L and then finding the eigenvalues of the symmetric

matrix C = L−1_A₍_L−1₎T _{to get matrices} _V0_{, D}0 _with

CV0 = V0D0. The solutions we want areD = D0, and V =L−1_V0_.

Theorem 8. There is a streaming annotated(n2_log(_qT₎_,

log(qT))-protocol withT =q2_n4_/_{for verifying}_{V ,}_ˆ _D_ˆ

∈

Fn×n

qT approximately solving the generalised eigenvalue

problem forA ∈ Fn×n

q symmetric, andB ∈ Fnq×n sym-metric positive semi-definite:

AV =BV D

withthe maximum absolute error betweenDˆ andD.

The proof of Theorem 8 can be found in the supplementary materials.

4 Statistical Analysis

With the primitives outlined above, we can construct pro-tocols for several common statistical analysis tasks.

4.1 Ordinary Least Squares

We first consider ordinary least squares regression (OLS), where we aim to find the optimal linear relationship be-tween a data setX and a set of observationsy.

Definition 4. Let _S = _h{y1, X1_}, ...,_{yn, Xn}i where

yi ∈Fq is an observation of the set ofdpredictorsXi ∈

Fd

q. If we defineX ∈Fqn×d+1with thei’th row being[1Xi], OLS involves finding the linear relationshipy = Xβ+

with∈Rn_and_β_∈_Rd+1_minimizingPn

i=12i.

The OLS problem is solved with the Moore-Penrose pseudo inverse, XT_X−1_XT_{. The optimal} _β _{is then}

XT_X−1_XT_y_{. This leads directly to a protocol where} the helper provides aβwhich is claimed to be optimal

Theorem 9. There is an annotated streaming

max{(d+ 1)2_,₍_d_{+ 1)}_n_}_log(_q₎_,_log(_q₎_-protocol

for OLS forX ∈Fn×d+1

q ,y∈Fnq as above.

Proof. The algorithm requires two applications of our ma-trix multiplication protocol to check that the β provided

satisfies (XT_X₎_β ₌ _XT_y_{. This avoids explicitly} in-verting XT_X_{. Via Theorem 1, the costs for} _XT_y _are

((d+ 1)n,1)and for(XT_X₎_β_are ₍_d_{+ 1)}2_,₁_{. Our}

to-tal communication and space costs for OLS will therefore beO max_{(d+ 1)2_,₍_d_{+ 1)}_n

}log(q)andO(log(q)) re-spectively. As XT_X_,_XT _and_y _{are in}_F

q, we can find a

β _∈_Fqand perform an exact equality check.

4.2 Principal Component Analysis

Given a large data matrix S _∈ _Fn×d

q where Sij repre-sents the jth observation of the ith variable, PCA finds

the principal components of the data, which can be used for dimensionality reduction or classification. PCA max-imises the variation captured by successive vectors inRn_, i.e.var vT

1S

≥var vT

2S

≥...≥var vT nS

, with each

λi = var viTS

maximised with respect to λj,∀j < i. Theseviare the principal components and we can perform dimensionality reduction onSby choosing thekcolumns

(8)

formingV1T...kS =S0 ∈ Fnq×k. This is equivalent to find-ing vectors correspondfind-ing to approximate eigenvalues of the covariance matrix ofS.

Definition 5. For a data set S _∈ _Fn×d_{, the}_Covariance MatrixofSis

Cov(S)ij= _n₋1₁Pdk=1

Sik−E[Si↓] Sjk−E[Sj↓]

Using the above protocols to check matrix multiplication and approximate eigenvectors (Sections 3.2 and 3.3), we can check PCA results to any desired precision >0with

the costs stated in the following theorem.

Theorem 10. Given S _∈ _Fn×d

q and > 0, there is a

(d2_log(_qnd

kS_kF/), log(qndkSkF/))-protocol for ver-ifying that PCA has been done to the desired precision.

Proof. We have a primitive for producing STS whilst

streamingS, and we can adapt this to generate the covari-ance matrix ofS(scaled byn₋1to be in the field). This method can be found in the supplementary material and is a (d2_log(_qn₎_, _log(_qn₎₎

−protocol. This algorithm al-lows us to have a(d2_log(_qnd

kS_kF/),log(qndkSkF/)) -protocol for PCA.

We use the approximate eigenvalue check (Section 3.3) with scaling factor T = O(n3/2kSk2

F/)to ensure that

V has the necessary properties, i.e. is almost orthogo-nal, and each claimed eigenvector acts on the covariance matrix as an approximate stretch. Each principal com-ponent vi corresponds to var(ˆviTS) = ˆλi and therefore |var(ˆvT

i S)−T λi| ≤ T , allowing us to reduce the di-mensionality of S, confident of how much variance we are removing with each principal component. Soundness and correctness then follow from the invoked protocols.

4.3 Fisher Linear Discriminant Analysis

In Linear Discriminant Analysis (LDA), we again have a large data matrixS _∈_Fn×d

q , withSij representing thejth observation of the ith variable, however, we also have a classification for each observation,ωmform∈[k], where we havekclasses. The aim is to do dimensionality reduc-tion, as in PCA, while maximizing the class discriminatory information between the classes in the reduced dimension. If we havekclasses, we wish to transformS to a new set

S0 _∈ _Fn×k−1

q , i.e. S0 = WTS whereW ∈ Fdq×k−1is a matrix projectingStoS0 _and_S0_{has the maximum Fisher}

linear discriminantJ(W) = _det(det(_WWTT_SWWSBW)₎where we have

Within-class scatter SW =Pk_i₌₁Cov(Si)

Between-class scatter SB=Pki=1ni(µi−µ)(µi−µ)T

We first treat the two-class case, then generalize tokclasses.

4.3.1 Two Classes

To findw ∈Fd

q we splitS into two matrices,S1 ∈Fnq1×d holding the observations in class 1 with averageµ1∈Fn1

q , andS2∈ Fn2×d

q for the observations in class 2 with aver-ageµ2∈Fn2

q . Then our two scatter matrices are:

SW = Cov(S1)+Cov(S2)andSB= (µ1−µ2)(µ1−µ2)T

To ensure that we can easily represent elements in the finite field, we actually find the (scaled-up) matrices(n1n2)SW and(n1n2)SB. We wish to maximise the functionJ(w) =

wT_SBw

wT_SWw. As such, these rescalings do not affect the result. Using the KKT conditions, we can shift the optimisation problem above to solvingSBw=λSWw, for someλ.

Theorem 11. There is a(dnlog(nq),log(nq))₋protocol for verifying LDA with 2 classes onS _∈_Fn×d

q .

Proof. We can simplifySBw=λSWwfor 2 classes, since for all vectorsv_∈_Rd_;

SBv= (µ1−µ2)(µ1−µ2)Tv

Asα = (µ1₋µ2)T_v _{is just a scalar,} _S

Bv points in the direction of(µ1₋µ2). Hence we get thatSBw=λSWw is equivalent to α(µ1 ₋µ2) = λSWw. To verify we have been sent the correctw, all we need do is check that

SWw= (µ1−µ2), and our result will lie in the (scaled-up) field, since all the scalars and vectors involved do, and so we can check this with a simple implementation of matrix multiplication protocol and covariance (Gramian) finger-printing (Sections 3.1 and 3.2). The helper just has to send

S, making the communication cost O(ndlog(qn)). The

memory space isO(log(qn)), where we havelog(qn)due

to the scaling of the scatter matrices. Completeness and Soundness follow from Theorem 1.

4.3.2 kClasses

Withkclasses, we now need to find a matrixW _∈Fd×k−1

q .

We still want to maximise the ratio between the between-class and within-between-class scatter, J(W). The simplification

to an eigenvalue problem is more complex here, but from Devijver and Kittler (1982) we see it reduces to finding (at most)k−1eigenvectors with non-zero real eigenvalues of

the matrixSW−1SB. ConsiderA, B_∈Fn×n

q , withAsymmetric, andB symmet-ric positive semi-definite (psd) the task is to find V, D _∈ Rn×n _{such that}_AV ₌ _{BV D}_where_A ₌ _S

B andB =

SW. SW is psd since it is the sum of psd matrices, so we can apply the Symmetric Generalised Eigenvalue Method.

Theorem 12. There is a (dnlog(qd/), log(qd/)) -protocol for verifying LDA withkclasses onS_∈_Fn×d

q .

Proof. We stream in and fingerprint S, which then

(9)

0 2 4 6 8

·106

0.5 1 1.5

·10−2

nk2

Time(s)

Manual Verifying

(a) Matrix Multiplication ofA _∈ _Fk×nq

andB_∈_Fn×k

q and 500 runs per test, for

nandkfrom10to200.

200 400 600 800 1,000 0

2 4

Dimension, n

Time(s)

Manual Verifying

(b) Eigen-decomposition of symmetric A _∈_Fn×n

q with= 0.01and 150 runs

per test.

200 400 600 800

0 0.5 1 1.5 2

Dimension, n

Time(s)

Manual Verifying

(c) Matrix Inversion ofA_∈_Fn×n q

where=1

[image:9.612.84.520.76.212.2]

2 and 100 runs per test.

Figure 1: Experimental Results, with time taken for self computation (blue dots), and for verification (red squares).

the scatter matrices, with this we then use the method in Section 3.6 to find _Vˆ _and _Dˆ _{with error} _{. This is a}

d2_log(_qd/₎_,_log(_qd/₎_{-protocol, note however that we}

are only interested in columns of_Vˆ _{corresponding to}

non-zero eigenvalues. We can use these get our r _≤ k₋1

vectors to form W and computeWT_S _{using the matrix} multiplication protocol to get the desired result. Our to-tal protocol will therefore be(dnlog(nqd/),log(nqd/))

from constructing our scatter matrices.

5 Practical Results

We performed a series of experiments with a field size of q = 231 ₋ ₁_{, using implementations in C of our}

protocols for matrix multiplication, inversion, and eigen-decomposition. We used a machine with an Intel i7 processor and 16 GB of memory. We worked on matrices up to 1000 by 1000, as this allowed us to get a distinguish-ing result between the verifier runndistinguish-ing the computation and the verifier using our checking protocols. We used the GNU Scientfic Library for C, using gsl blas dgemm for matrix multiplication, gsl linalg LU invert to find the inverse andgsl eigen symm for the eigen-decomposition. The verification was done on the result produced by the self-computation. We created dense random matrices with entries drawn uniformly. Note that our protocols do not depend on any structure in the matrices, so uniform random data suffices to test them. Where we require symmetric matrices, we setAij =Aji∀i < j.

Our verification protocols were built as discussed in the paper, with the time taken to find the fingerprints of all relevant matrices and check the necessary bounds being the verification time seen in Figure 1. We tested the veri-fication protocols against a variety of incorrect matrices, which were successfully rejected. The memory costs for our verification, even in the case ofA _∈ F1000×1000

q was

only a few hundred bytes, a significant reduction over self computation, which would be in the order of several megabytes. The communication cost is the cost of sending the matrix, simply the size of the matrix multiplied by a small constant. Consequently, the main aim of our experiments is to quantify the potential timing gains of verification over self-computation.

For our matrix multiplication protocol (Figure 1a), we con-sider the number of scalar multiplications required to find the result of multiplyingA_∈_Fk×n

q andB∈Fnq×k, that is,

k2_n_{. When we plot the time taken against}_k2_n_{, we see the}

expected linear relationship between self-computation and time taken. The verifier needs to compute the fingerprint of A,B, which will involvenkmultiplications, and AB, which involves k2 _{multiplications. This agrees with the}

observation in the graph; that the self-computation time is linear, and the verification time is sublinear.

In both matrix inversion (Figure 1b) and eigen-decomposition (Figure 1c), we see the expected cubic increases in running time for self-computation, with a significantly lower increase for the verification. The verification running time depends primarily on theO(n2₎

cost of computing the needed fingerprints, as above, and as such, we see the asymptotically shallower lower curve.

6 Concluding Remarks

We have introduced effective protocols for checking mod-els generated by common tasks, such as Ordinary Least Squares, Principal Components Analysis and Linear Dis-criminant Analysis. Natural next steps are to study more complex models, such as Support Vector Machines, and (initially shallow but ultimately deep) neural networks, which have high training costs but could be cheap to verify.

(10)

References

L´aszl´o Babai. Trading group theory for randomness. In

Proceedings of the seventeenth annual ACM symposium on Theory of computing, pages 421–429. ACM, 1985. Amit Chakrabarti, Graham Cormode, and Andrew

Mcgre-gor. Annotations in data streams. Automata, Languages and Programming, pages 222–234, 2009.

Graham Cormode, Justin Thaler, and Ke Yi. Verifying computations with streaming interactive proofs. Pro-ceedings of the VLDB Endowment, 5(1):25–36, 2011. Graham Cormode, Michael Mitzenmacher, and Justin

Thaler. Practical verified computation with streaming interactive proofs. InProceedings of the 3rd Innovations in Theoretical Computer Science Conference, pages 90– 112. ACM, 2012.

Graham Cormode, Michael Mitzenmacher, and Justin Thaler. Streaming graph computations with a helpful ad-visor.Algorithmica, 65(2):409–442, 2013.

Samira Daruki, Justin Thaler, and Suresh Venkatasubra-manian. Streaming verification in data analysis. In In-ternational Symposium on Algorithms and Computation, pages 715–726. Springer, 2015.

Pierre A Devijver and Josef Kittler. Pattern recognition: A statistical approach. Prentice hall, 1982.

R¯usin¸ˇs Freivalds. Fast probabilistic algorithms. Mathe-matical Foundations of Computer Science 1979, pages 57–69, 1979.

Shafi Goldwasser and Michael Sipser. Private coins versus public coins in interactive proof systems. InProceedings of the eighteenth annual ACM symposium on Theory of computing, pages 59–68. ACM, 1986.

Shafi Goldwasser, Yael Tauman Kalai, and Guy N Roth-blum. Delegating computation: interactive proofs for muggles. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 113–122. ACM, 2008.

Johan H˚astad and Avi Wigderson. The randomized com-munication complexity of set disjointness. Theory of Computing, 3(1):211–219, 2007.

(11)

This will cover the proofs required for checking eigenvalues of a symmetric ma-trix, checking the symmetric generalised eigenvalue problem and the protocol for fingerprinting the covariance matrix.

1.1 Details of Theorem 3

We use the fact thatλi ≤ kAk2, andkvik2 = 1. We also have krk2≤

√

n

2 and

|ρ| ≤ 1

2, and so;

kT Av_bi−λbivbik∞≤√nkT Avbi−λbivbik2

=√n_kT2_Av

i+T Ar−T2λivi−T ρvi−T λir−ρrk2

≤√n(T_kA_k2krk2+T|ρ|kvik2+T|λi|krk2+|ρ|krk2)

≤T n_kA_kF +T √_n

2 +

n

4

kvbiTvbj−T2δijk∞≤√nkvbiTvbj−T2δijk2

≤√n_k(T vi+r)T(T vj+r)−T2δijk2

≤√nkT viTr+T rTvj+rTrk2

≤2T√n_kr_k2+√nkrk22

≤T n+n

√_n

4

1.2 Details of Theorem 4

First definev_ei =vi_Tb,λei=λi_Tb, so we have;

kT Avbi−λbivbik∞

T2 =kAvei−λeiveik∞≥

kAvei−λeiveik2

√_n

As A is symmetric, we can writeA=V DVT_{, where}_V _{is the orthogonal matrix}

of eigenvectors, andDis the diagonal matrix of corresponding eigenvalues. Then

(12)

kAvei−λeiveik2=kV DVTvei−λeiveik2

=kV(D−λei)VTveik2

=_k(D₋λei)VTveik2

≥min

j (|λj−λei|)kV T

e

vik2

= min

j (|λj−λei|)kveik2

≥min

j (|λj−λei|) r

1₋

√_n

T − n

4T2

= min

j (|λj−λei|) r

1₋1 2 −

1 16 if

√

n_≤ T

2

≥minj(|λ₂j−λei|)

So if we consider >0, and wish to ensure that minj(|λj−λei|)< , i.e. there

is a (true) eigenvalue close to the approximate eigenvalue, then we can choose aT based on

min

j (|λj−λei|)≤2kAvei−λeiveik2 ≤2√nkAvei−λeiveik∞

≤ 2 √_n

kT Av_bi−λbivbik∞

T2

≤ 2n √_n

kA_kF

T +

n T +

n√n

2T2 (using Theorem 3)

AsT tends to infinity, this bound positively approaches 0, as such, for any >0 we can find aT s.t. the error in_Rof minj(|λj−λei|) will be.

1.3 Details of Theorem 8

The Cholesky Decomposition allows us to solve the symmetric generalised eigen-value problem forA, B ∈Fn×n

q , with A symmetric, andB symmetric positive

semi-definite;

FindV, D∈Rn×n _{such that}_AV ₌_{BV D}

We do this by finding the Cholesky Decomposition ofB,L and then perform-ing findperform-ing the eigenvalues of the symmetric matrix C =L−1_A₍_L−1₎T _{to get}

matricesV0_{, D}0 _with _CV0 ₌_V0_D0_. _D ₌_D0_{, and}_V ₌_L−1_V0 _{are the solutions} we desire.

With our approximations, we use our matrix inversion and Cholesky Decom-position protocols to find, using scaling factor T1, we have that ˆC will be in

(13)

ˆ

C=( ˆ\L)−1_A_{( ˆ}\_L₎−1

So we have

ˆ

LLˆT =T12B+E1 E1∈

−nk₂B_T1kF,nkBkF

2

n×n

ˆ

L( ˆ\L)−1₌_T1I₊_E2 _E2_∈

−nkBˆkF−n

2

4 , nkBˆkF+

n2

4

n×n

If we receive approximate eigenpairs, ˆU ,Dˆ with diagonal ˆλ, of ˆCfrom the helper, with scaling factorT giving errorδ_{, satisfying}

kTCˆUb₋DbUb_kmax≤

T n_kC_kF+

T√n

2 +

n

4

kUbTUb−T2Ikmax≤

l

T√n+n 4

m

LetDδ_{, U}δ

∈Rn×n _{be the true eigenvalues and eigenvectors of ˆ}_C_{. So}

ˆ

CUδ =UδDδ

We know that_kT Dδ

−Dˆ_kmaxwill be at mostT δ. Furthermore let ˆLTVδ =Uδ,

so

ˆ

CUδ=UδDδ \

( ˆL)−1_A_{( ˆ}\_L₎−1

T

ˆ

LT_Vδ_{= ˆ}_LT_Vδ_Dδ

ˆ

L( ˆ\L)−1_A_{( ˆ}\_L₎−1

T

ˆ

LTVδ= ˆLLˆTVδDδ

(T1I+E2)A(T1I+E2)TVδ= (T12B+E1)VδDδ

A+E2A+AE

T

2 +E2ET2 T2

1

T

Vδ=

B+E2

T2 1

VδDδ

By using the eigenvalue perturbation theory [?], we can say that there exists

V _∈Rn×n_,_D

∈Rn×n _{with diagonal}_λ_{, so}

λi=λδi +vδi

TE2A+AE₂T +E2ET₂

T2

1 −

λδi

E1 T2 1

viδ

(14)

|λi−λδi|=|viδ

T E2A+AE₂ +E2E₂

T2 1 − λδ i E1 T2 1 vδ i|

≤vδi T ₂

_E2A₊_AET

2 +E2ET2 T2

1 −

λδi

E1 T2 1 2 vδi

2 ≤

E2A+AE T

2 +E2E2T −λδiE1

T2 1 ₂

≤ kE2Ak2+ AET

2

₂+E2ET

2

₂+λδ

iE1 ₂

T2 1

≤ 2kE2k2kAk2+kE2k

2

2+kCˆk2kE1k2 T2

1

≤

2nn_kB_k2+n

2

4

kA_k2+n2

n_kB_k2+n

2

4

2

+_kCˆ_k2nnkBk2

2 T2 1 ≤

2n2

kB_k2+n

3

2

kA_k2+

n4

kB_k2

F+k Bk2n5

2 +

n6

16

+kCˆk2n2kBk2

2 T2 1 ≤

2n2

kBˆ_k2+n

3

2

kA_k2+

n4

kBˆ_k2

2+kBk2n

5

2 +

n6

16

+kCˆk2n2kBk2

2

T2 1

≤ 32n

2

kB_k2kAk2+ 8n3kAk2+ 16n4kBk22+ 8kBkFn5+n6+ 8kCˆk2n2kBk2 T2

1

As,A, B∈Fq, we havekAk2,kBk2≤qn,kCˆk2≤qnT1

|λi−λδi| ≤

32n2₍_nq₎2_{+ 8}_n3_nq_{+ 16}_n4₍_nq₎2_{+ 8}_nqn5₊_n6_{+ 8}_qnT1n2_nq T2

1

≤ 32n

4_q2_{+ 8}_n4_q_{+ 16}_n6_q2_{+ 8}_n6_q₊_n6_{+ 8}_qnT1n3_q T2

1

≤ 8n

4₍₄_q2₊_q_{) +}_n6₍₁₆_q2_{+ 8}_q_{+ 1) + 8}_qnT1n3_q T2

1

≤ q

3_n4 T1

If we have thatq_≥20,n_≥3 andT1_≥n2_{. We also have}

|T λδi −λˆi| ≤T δ

So

≤T

q3_n4 T1 +

δ

(15)

|T λi−λˆi|

T ≤

q3_n4 T1 +

δ

Therefore, to get the generalised eigenvalues to an error of we must choose

T, T1 such that

T= q

2_n5 2

δ ≥

qn32kCˆk

δ

T1= q

3_n4 ₋δ

Whereδ _<_.

If we takeδ ₌

2, then we have

T =2q

2_n5 2

T1=2q

3_n4

And our total protocol is therefore, assumingq > n, n2_log _q3_n4_/_,_log _q3_n4_/_.

1.4 Fingerprinting the Covariance Matrix

Algorithm 1:Streaming AnnotatedCovarianceFingerprint

Input :S∈Fd×n

q

Output:fx(A) =fx (n−1)Cov(S)or⊥

1 Choosex∈RF

2 Whilst StreamingS column by column;

3 forS_j↓ with j= 0to n−1 do

4 Construct the sum of each of thesef_xvn(S_j↓),f_xv(S_j↓),f_xvn(S_j↓)f_xv(S↓_j), Pd−1

i=0 Sijxn and

Pd−1

i=0 Sijxni individually

5 fx(A) =Pjfxvn(S_j↓)f_xv(S_j↓)− hP

j Pd−1

i=0 Sijxni hPjfxvn(S↓_j) i

− hP

j Pd−1

i=0 Sijxni

i hP

jfxv(Sj↓) i

+nhP_jPd_i₌₀−1Sijxni hP_jPd_i₌₀−1Sijxni i

6 ReceiveAbfrom the helper

7 Check

8 fx(A) ==fx(Ab)

This algorithm provides a d2_log(_qn₎_,_log(_qn₎

−protocol for verification thatA

is indeed the covariance matrix scaled byn. The costs come from scaling byn

and receivingA_∈_Fd×d qn .