Multi-file proofs of retrievability for cloud storage auditing

(1)

Multi-file proofs of retrievability for cloud storage auditing

Bin Wanga* and Xiaojing Hongb

a

No.196 West HuaYang Road, Information Engineering College of Yangzhou

University, Yangzhou City, Jiangsu Province, 225127 P.R.China

b

No.5 South YangZiJiang Road, Yangzhou City, Jiangsu Province, 225101 P.R. China

E-mail: [email protected] a*

Abstract: Cloud storage allows clients to store a large amount of data with the help of storage service providers (SSPs). Proof-of-retrievability(POR) protocols allow one server to prove to a verifier the availability of data stored by some client. Shacham et al. presented POR protocols based on homomorphic authenticators and proved security of their schemes under a stronger security model, which requires the existence of an extractor to retrieve the original file by receiving the program of a successful prover. When using their POR protocol with public verifiability to verify the availability of multiple files separately, the number of pairing operations computed by a verifier is linear with the number of files. To improve the heavy burden on the verifier, we introduce a notion called multi-proof-of-retrievability(MPOR), allowing one verifier to verify the availability of multiple files stored by a server in one pass. We also design a MPOR protocol with public verifiability by extending the work of Shacham et al. The advantage of our MPOR scheme is that computational overhead of a verifier in our scheme is constant, independent of the number of files. Nevertheless, the soundness of our MPOR protocol is proved under a relatively weak security notion. In particular, analysis of our MPOR protocol shows that each file can be extracted in expected polynomial time under certain restriction on the size of processed files.

Keywords: Cloud storage; Storage service provider; Proof-of-retrievability; Homomorphic authenticator; Public verifiability

1. Introduction

Cloud computing [13] has a deep impact on IT industry by allowing clients to access high-quality cloud based service in pay-as-you-go manner. Cloud storage systems which involve delivery of data storage as a service over Internet are becoming more attractive. Amazon S3 [1], a well-known storage service, allows clients to store large amounts of data

(2)

and access their stored data from any place at low costs. One of the major challenges on cloud storage systems is how to efficiently verify whether some remote storage service provider(SSP) is faithfully storing clients’ data since the SSP may not be trusted. Clients’ data may be at the danger of loss either by data loss incidents or by malicious deletion from a SSP who wants to save storage by deleting rarely accessed portion of the stored data.

There has been a lot of research focused on proof-of-storage mechanisms without the need to download the whole data for verification. This kind of solutions is also called storage auditing [9,15,17], allowing clients to verify through cryptographic means that their data on an untrusted SSP are kept intact and ready for retrieval if needed. Some crucial system criteria for designing proof-of-storage systems can be summarized as follows [16]:

(1) Computation and communication overhead of proof-of-storage protocols and storage overhead on a server should be as efficient as possible.

(2) Verifiers should be stateless without the need to maintain states for storage auditing. (3) The most important security criterion for proof-of-storage systems requires that a verifier should be convinced that the file is actually stored by a server when the server can pass the check of storage auditing. Early research works on proof-of-storage systems lacks provable security guarantee without providing formal security models and proofs.

To formalize the security criterion for proof-of-storage systems, formal models and constructions were first studied by Naor et al. [14] and Juels et al. [12] by using homomorphic authenticators based on message authentication code. A file is first encoded redundantly and divided into message blocks. Each block is authenticated by a MAC tag computed by the client. The client then erases the file and sends the encoded file and the MAC tags to the server. The idea behind their protocols is that a verifier only need check a random subset of the message blocks stored by the server to check whether the original file is correctly stored by the server in order to guarantee an efficient storage audit protocol. The length of the server’s response is improved with the help of homomorphic authenticators by aggregating the authentication tags of different message blocks.

Security for proof-of-storage systems is captured by requiring the existence of an extractor to retrieve the specified file by interacting with a server that can pass the verification of storage auditing. This security notion is also called “proof-of-retrievability” (POR). This

(3)

concept is similar to zero knowledge proofs of knowledge [5]. A weaker notion called “Proof-of-data-possession”(PDP) [3] can only guarantee that a certain percentage (e.g., 99%) of message blocks can be recovered with high probability (e.g., 0.99). There are also some papers [4,7,18] considering proof-of-storage systems supporting dynamic operations on the stored data. In addition, POR protocols that support public verifiability means everyone can check whether a client’s data are correctly stored by a server. Digital signatures can be used to replace Mac tags to achieve public verifiability.

The NR model in [14] restricts a checker to access specific memory locations from the prover. Shacham et al. [16] strengthened the NR model by allowing an extractor black-box access to a prover program. Another distinct feature formalized by Shacham et al. in their security model [16] is that the specified file can be extracted so long as the prover program can correctly answer any small (but non-negligible) fraction of verification queries. In addition, Shacham et al. [16] designed a homomorphic authenticator based on BLS short signature [6] to present a POR protocol with public verifiability secure in the random oracle model. Homomorphic authenticators can be used to improve the response length from the server by compressing authentication values { }

σ

_i of blocks { }m_i into one authenticator

σ

for their linear combination

{

∑

v m

i i

}

.

We notice that the POR protocol with public verifiability [16] requires the verifier to compute two pairing operations to verify the response from a server during one instance of their POR protocol. This means if one wants to verify the availability of

n

files stored by the server separately, 2n pairing operations should be computed by the verifier, which is a heavy overhead when a client has a large number of files stored by that server.

Motivated by the above discussion, we aim to design a multi-proof-of-retrievability (MPOR) protocol with public verifiability. In other words, our MPOR protocol allows one verifier to verify the availability of

n

files stored by the server in one pass, while the computational overhead is independent of the parameter

n

. The idea behind our construction is to further compress the homomorphic authenticators from different files to reduce the computation and communication overhead.

The key point of our work is to choose a proper security model for MPOR protocol and prove the security of our construction under it. Soundness of MPOR protocols is defined in a

(4)

relatively weak manner compared with that of POR protocols. In the multi-file setting, one invocation of the extractor algorithm can only guarantee that at least one file among those

n

files can be recovered. The reason is that the extracted knowledge is now distributed over these

n

files randomly. Hence we cannot guarantee that all the files can be extracted by the extractor all at once in the multi-file setting. On the other hand, our analysis shows that each file can be extracted in expected polynomial time under the assumption that each file’s size is of the same magnitude (e.g., digital images, office documents). So we define soundness for MPOR protocols by requiring the existence of an extractor to retrieve each file in expected polynomial time.

The rest of this paper is organized as follows. At first, we describe the notations used in this paper and bilinear mappings in section 2. Syntax of MPOR schemes is introduced in section 3. We also define correctness, soundness for MPOR protocols. The soundness for MPOR protocols is relatively weak in the sense that it only requires each file to be extracted in expected polynomial time under the assumption that each file’s size is of the same magnitude. In the following, we design a MPOR protocol with public verifiability by extending the work in [16] and prove our MPOR scheme meet the requirement of soundness defined in this paper. Finally, we evaluate performance of our MPOR protocol to show that the cost of a verifier is greatly reduced compared with that of the original POR protocol [16] when verifying multiple files stored by a server.

2. Preliminaries 2. 1 Notation

We use the notation x←_R S to mean “the element

x

is chosen with uniform probability from the set S”. If A is a algorithm, then O( )( ,₁ )

y←A ⋅ x means that A

has input x₁, , access to a oracle O, and the output of A is assigned to y. Let [1.. ] {1,n = , }n .

2.2 Bilinear pairing

Given a security parameter k , an efficient algorithm PG(1 )k outputs ( , ,e G G g p_T, , ) , where G is a cyclic group of a prime order p generated by

(5)

g,2k−1< p<2k. G_T is a cyclic group of the same order, and let e G G: × →G_T be an efficiently computable bilinear function with the following properties:

1. Bilinear: e g( a,gb)=e g g( , ) ,ab for all a b, ∈Z_p. 2. Non-degenerate: ( , ) 1

T

G

e g g ≠

3. Definitions

3.1 Syntax of a multi-proof-of-retrievability scheme

A multi-proof-of-retrievability(MPOR) scheme consists of the following algorithms: (1 )k

Kg : Given a security parameter k, this randomized key-generation algorithm outputs a secret/public key pair (sk pk, ).

( , )

St sk M : Given a secret key sk and a file M chosen by a client, this file-storing algorithm first encodes M by applying a rate-

ρ

erasure code [2] to obtain

M

such that any

ρ

fraction of the encoded file

M

is sufficient to recover the original file M . Finally it outputs a authentication tag

τ

and some auxiliary information aug ,

(

M

, ( ,

τ

aug

))

←

St M sk

(

,

)

, where sk is the secret key of the client.

M

and aug will be stored on the server side.

τ

and aug will be used in the following MPOR protocol.

The rest part of our MPOR scheme consist of the algorithms P (prover) and

MV (verifier). The following notation is used to denote one interactive execution of our MPOR protocol between the algorithms Pand MV :

1 1 1

( ,{ } )_{i i}n ( ,{ , _n},{ _{i i}} )n ( , )

MV pk

τ

₌ P pk M M aug ₌ b

〈 〉→ ⊥

where pk is the public key of the client who stores the encoded files {M₁,M_n} on the server side. Parameter

n

determines the number of files that can be verified simultaneously. {( , )} 1

n i augi i

τ

₌ are generated by the file-storing algorithm St( )⋅ for

1

{M ,M_n} respectively. The output of the verification algorithm MV is bit b. b=1 denotes that the verifier is convinced that the original files {M_{i i}}n₌₁ are stored correctly by the prover.

(6)

verifier of validity of the stored files. That is:

for any (sk pk, )←Kg(1 )k , {(M_i, ( ,

τ

_i aug_i))←St M sk( _i, )}_in₌₁, we have

1 1 1

( ,{ } )i in ( ,{ , n},{ i i} )n ( 1, )

MV pk

τ

₌ P pk M M aug ₌ b

〈 〉→ = ⊥

3.2 Soundness of multi-proof-of-retrievability protocols

Soundness of MPOR protocols informally requires that if a prover can convince a verifier by passing the verification, then the original files { }n₁

i i

M ₌ can ready for retrieval if needed. To formalize this point, an extraction algorithm is required to recover the original files {M_{i i}}n₌₁ by interacting with the successful prover via the MPOR protocol. The following game Exp_MPORsound n, ( ,1 )A k will give the formal definition of soundness of MPOR protocols. , ( ,1 ) sound n k MPOR Exp A

Phase 1: The challenger C generates a keypair (pk sk, )←Kg(1 )k and provides

pk to A.

Phase 2: The adversary A interacts with the challenger by making queries to a store oracle. The store query is handled by the oracle as follows.

Store(M_i) // M_i is a file chosen by A. Compute (M_i, ( ,

τ

_i aug_i))←St sk M( , _i); Return (M_i, ( ,

τ

_i aug_i));

For some files M1,Mn queried to the store oracle who responds {( , )} 1 n i augi i

τ

₌

respectively, A makes a query to a MPOR oracle, which is handled as follows. MPOR( ,

τ

₁

τ

_n)

The challenger C runs an instance of the MPOR protocol with A as follows: <C pk( ,{ ,

τ

₁

τ

_n}) A>→( , )b ⊥

The challenger plays the role of a verifier during the above execution and outputs a bit

b to denote whether it is convinced that the original files are being stored by the adversary correctly.

(7)

At the end of phase 2, A outputs the challenge set T ={ ,

τ

₁* ,

τ

_n*} consisting of the tags returned by the store oracle for some queries M₁*,,M_n* respectively. Description of a prover program P* is also provided by A.

The prover program P* is

ε

-admissible if it can convincingly answer

ε

fraction of MPOR queries, i.e., Pr[<MV((pk,{ ,

τ

₁

τ

_n})P* >→(b= ⊥ ≥1, )]

ε

. Here the probability is over the coins of the verifier and the prover.

Phase 3: We say a MPOR scheme is

ε

-sound if there exists an efficient extraction algorithm Extr such that for every adversary A who plays the above game and outputs the challenge tags { ,

τ

₁* ,

τ

n*} returned by the store oracle for the queries

* *

1, , n

M M

respectively and a

ε

-admissible prover program P* at the end of phase 2, then ,1

i i n

∀ ≤ ≤ , there exists an extraction algorithm Extr who recovers the file M_i* in expected polynomial time by interacting with P* via the MPOR protocol. That is:

*_{( )} _* _* _* 1

,1 , P ( ,{ , , _n}) _i

i i n Extr ⋅ pk

τ

M

∀ ≤ ≤ → occurs in expected polynomial time

except with negligible probability.

4. Our multi-proof-of-retrievability scheme

Let ∑=(SKg, SSig, SVer) be a digital signature scheme. (1 )k

Kg : Generate a signing keypair ( , ) SKg(1 )k

ssk spk ← by the key generation

algorithm of ∑. Choose

α

←_R Z_p and compute v=gα. The secret key is sk=( ,

α

ssk). The public key is pk=( ,v spk).

( , _l)

St sk M : Given a file M_l chosen by a client, this file-storing algorithm first chooses a random file-name fn_l from a large domain. The collision probability over file names is negligible when the domain is large enough. In the following, apply a erasure code to M_l to obtain M_l and split M_l into n_l message blocks { }₁

l

li i n

m _{≤ ≤} . Parse the secret key of the client as sk←( ,

α

ssk) and pick a random u_l ←Z_p. Let

τ

₀_l= fn_l ||n_l||u_l and

(8)

compute a signature

σ

₀_l=SSig(ssk,

τ

₀_l) by the signature-producing algorithm of ∑ and an authentication tag

τ

_l=

τ

₀_l ||

σ

₀_l. For 1≤ ≤i n_l, it computes aug_li=( ( || ) mli)

l l

H fn i u⋅ α. The encoded file M_l and { }nl₁

li i

aug ₌ will be stored on the server side.

One interactive execution of our MPOR protocol between the algorithms P(prover) and

MV (verifier) can be described as follows:

(1) First, the verifier MV makes a MPOR-query Q to the prover P :

1 1

{ , , _n _n}

Q= pk∪ P∪Q P ∪Q , P_l =(

λ

_l ←_R Z_p, fn_l) , Q_l ={( ,j v_j)} with distinct [1.. ]_l

j∈ n and each coefficient

v

j

←

R

Z

*p, where pk is the public key of the client, fnl is the file-name for a file M_l and n_l is the number of message blocks of the file M_l. The size of each set Q_l is a fixed system parameter

s

.

(2) Having received the query Q , the prover algorithm

1 1 ,1 ( ,{ } ,{ } ) l n l l li l n i n

P Q M ₌ aug _{≤ ≤} _{≤ ≤} responds as follows: First it parses

{

}

n₁

l l

M

₌ as blocks { }₁ _,1

l

li l n i n

m _{≤ ≤} _{≤ ≤} respectively. In the following, compute:

µ

_l= ( , _j) _l j lj j v Q v m ⊂

∑

,

σ

_l= ( , ) j j l v lj j v Q aug ⊂

∏

, 1≤ ≤l n 1

(

,

_l

,

_n

)

µ

=

µ

, 1 l n l l λ

σ

= =

_∏

The prover P sends the response ( , )

µ σ

to the verifier.

(3) Having received the response ( , )

µ σ

, the verification algorithm

1

(

,{ ,

_n

}, ( , ))

MV pk

τ

µ σ

proceeds as follows:

First it parses pk=( ,v spk) and the authentication tags

τ

_l=

τ

₀_l ||

σ

₀_l, 1≤ ≤l n. If ∃l:1≤ ≤l n, the signature verification algorithm SVer(spk,

τ σ

₀_l, ₀_l)=0, then it outputs b=0. Otherwise, parse

τ

₀_l as fn_l ||n_l||u_l ,1≤ ≤l n and the vector

µ

as

1

(9)

( , ) e

σ

g = 1 ( , ) 1 ( ( ( || ) )j l ( l l), ) j l n n v l l l j v Q l e H fn j λ uµ λ⋅ v = ⊂ = ⋅

∏ ∏

∏

It is easy to verify the correctness of our MPOR scheme as follows: 1 l n l l λ

σ

= =

_∏

= 1 ( , ) ( j) l j l n v lj l j v Q aug λ = ⊂

∏ ∏

= 1 ( , ) ( ( ( || ) lj) )j l j l n m v l l l j v Q H fn j u λ α⋅ = ⊂ ⋅

∏ ∏

= 1 ( , ) ( ( || ) )j l j l n v l l j v Q H fn j λ α⋅ = ⊂

∏ ∏

1 ( , ) ( j lj) l j l n v m l l j v Q u ⋅ λ α⋅ = ⊂

∏ ∏

( , ) 1

(

)

j lj j vj Ql l v m n l l

u

⊂ λ α ⋅ ⋅ =

∑

∏

∏ ∏

1 ( l l) n l l uµ λ⋅ α =

∏

The above result means e( , )

σ

g = 1 ( , ) 1 ( ( ( || ) )j l ( l l) , ) j l n n v l l l j v Q l e H fn j λ α⋅ uµ λ⋅ α g = ⊂ =

∏ ∏

∏

= 1 ( , ) 1 ( ( ( || ) )j l ( l l), ) j l n n v l l l j v Q l e H fn j λ uµ λ⋅ v = ⊂ = ⋅

∏ ∏

∏

Remark: The verifier chooses a MPOR query as:

1 1

{ , , _n _n}

Q= pk∪ P∪Q P ∪Q , P_l =(

λ

_l ←_R Z_p, fn_l) , Q_l ={( ,j v_j)} with distinct indices j∈[1.. ]n_l and each coefficient

v

_j

←

_R

Z

*_p. Let

1 i i n

N n

≤ ≤

=

∑

. The size of each set l

Q is a fixed system parameter

s

such that ns<N.

A vector notation for the

s

-element set Q_l ={( ,j v_j)} over indices I_l ={ }j ⊆[1.. ]n_l

is represented by a N-element vector q_l∈(Z_p)N, such that q_{l j}_, =v_j if j∈I_l+N_l₋₁

and q_{l j}_, =0 for all j∉I_l +N_l₋₁,1≤ ≤j N , where ₁

1 1 l i i l N₋ n ≤ ≤ − =

∑

, 1 l l I +N₋ ={j+N_l₋₁: j∈I_l}.

Given a

ns

-element set ₁ 1

(

_l _l

)

[1.. ]

l n

I

N

₋

N

≤ ≤

=

∪

+

⊆

, the query Q can be regarded as

(10)

a N- element vector 1 l l n q q ≤ ≤ =

∑

. Let

c

₁

,

c

_N be that canonical basis for

(

Z

_p

)

N ,

1 ( , ) 1 l j l j j N j v Q l n

q

v c

− + ∈ ≤ ≤

=

∑

⋅

.

Recall that the honest response ( , )

µ σ

for the query Q satisfies:

µ

_l= ( , j) l j lj j v Q v m ⊂

∑

σ

l= ( , ) j j l v lj j v Q aug ⊂

∏

, 1≤ ≤l n 1

(

,

l

,

n

)

µ

=

µ

, 1 l n l l λ

σ

= =

∏

We can view the message blocks of these

n

files as a N n× matrix H: 1

[ ,

,

_n

]

H

=

h

, 1 1 1 1 (0 , ( , , ), 0 ) i i i l l i n l n n T l l ln h ₌ ≤ ≤ −∑ m m + ≤ ≤∑

, where each column vector

h

_l corresponds to the message blocks of M_l respectively. We can also equivalently denote

µ

in the response by a vector q H⋅ , where q is the joint coefficient vector notation for the query Q.

5. Security proofs

We prove the security of our MPOR scheme by a series of games. Game 0 is exactly the same as Exp_MPORsound n, ( ,1 )A k with the following modification.

The challenger initially sets a flag d=0 and keeps a table of the store queries made by the adversary and its responses for these queries. Based on the table and one MPOR query Q, the challenger is able to determine the deterministic verification response ( , )

µ σ

returned by the honest prover algorithm.

Having received the response ( , )

µ σ

returned by the adversary behaving as a prover in one execution of our MPOR protocol <C pk( ,{ ,

τ

₁

τ

_n}) A>→( , )b ⊥ , the challenger sets d =1 if for all executions, the verification algorithm

MV pk

(

,{ ,

τ

₁

τ

_n

}, , )

µ σ

outputs b=1. Otherwise the challenger sets d =0. Let

ε

_i denote Pr[d =1] in Game , 0

i i≥ .

(11)

Game 1 is almost the same as Game 0. The challenger keeps a list of authentication tags generated by itself when handling queries to the store oracle. If any tag submitted by the adversary can be verified as valid but is not on the list of tags generated by the challenger, the challenger aborts. The major modifications are represented by the following boxed statements.

Phase 1: The challenger C picks an empty list sList and generates a keypair ( , ) (1 )k

pk sk ←Kg , where sk=( ,

α

ssk), pk=( ,v spk). C provides pk to A. The adversary A interacts with the challenger by making queries. The store and MPOR queries are handled as follows.

Store(M_l) // M_l is a file chosen by A. Compute (, ,{ } )nl₁ ( , )

l l li i l

M

τ

aug ₌ ←St sk M ;

Parse

τ

_l as

τ

₀_l ||

σ

₀_l and sList←sList∪{

τ

₀_l} Return (, ,{ } )nl₁

l l li i

M

τ

aug ₌ ;

MPOR( ,

τ

₁

τ

_n)

If ∃ ∈

τ

_l { ,

τ

₁

τ

_n} such that

τ

_l=

τ

₀_l||

σ

₀_l can be verified as a valid signature and

0l sList

τ

∉ , the challenger aborts. Otherwise C runs an instance of MPOR protocol with

A: <C pk( ,{ ,

τ

₁

τ

_n}) A>→( , )b ⊥ . Return b;

Finally, A outputs the challenge set T ={ ,

τ

₁* ,

τ

*_n} consisting of tags returned by the store oracle for some queries M1*,,Mn* respectively. The description of a prover program P* is also provided by A.

If ∃

τ

_l*∈{ ,

τ

₁* ,

τ

_n*} such that

τ

_l*=

τ

*₀_l||

σ

₀*_l can be verified as a valid signature and *

0l sList

τ

∉ , the challenger aborts. The other part of Game 1 is kept unchanged.

Claim 1: |

ε

₀−

ε

₁| is negligible under the assumption that the signature scheme ∑ is

existential unforgeable [11] under the chosen message attack.

Proof: It is obvious that Game 0 and Game 1 proceed identically unless the event E₁ “any tag submitted by the adversary can be verified as a valid signature but is not on the list of

(12)

authentication tags generated by the challenger” occurs. Hence |

ε

₀−

ε

₁| Pr[≤ E₁]. But if this event happens with non-negligible probability, we can construct an adversary B against the unforgeability of the signature scheme ∑.

B takes a signing public key SKg(1 )k

spk← as input and is given access to a signing oracle SSig_ssk( )⋅ . B picks a random

α

and sets the secret key sk=( , )

α

⊥ , the public key pk=(v=gα,spk). It is clear that B is able to simulates the view of A in Game 1 perfectly with the help the signing oracle.

If the event E₁ happens, B simply outputs the tag submitted by A as its forgery against the signature scheme ∑.

Game 2 is almost the same as Game 1. The challenger in Game 2 verifies the response from the adversary during each execution of our MPOR protocol in a way different from the standard verification algorithm MV( )⋅ .

Recall that the challenger is able to determine the deterministic verification response ( , )

µ σ

returned by the honest prover algorithm P( )⋅ by the modification in Game 1, given the corresponding MPOR query. Let ( , )

µ σ

be the response returned by the adversary in one execution of our MPOR protocol : <MV pk( ,{ ,

τ

₁

τ

_n}) A>→( , )b ⊥ . The challenger sets d=1 and aborts if there is at least one execution

MV pk

(

,{ ,

τ

₁

τ

_n

}, , )

µ σ

outputs b=1 and

σ

≠

σ

.

Claim 2: |

ε

₂−

ε

₁| is negligible under the CDH assumption.

Proof: Game 2 is distinct from Game 1 only when the response from the adversary in one execution of our MPOR protocol can pass the verification but is not equal to the correct response from the honest prover algorithm.

Let E₂ denote the event “The adversary in one execution of our MPOR protocol can pass the verification but

σ

≠

σ

”. Game 2 is distinct from Game 1 only when E₂ happens. Hence |

ε

₂−

ε

₁| Pr[≤ E₂]. If Pr[E₂] is non-negligible, we can construct a simulator that solves the CDH problem in the random oracle model.

(13)

simulates the environment of Game 2 for the adversary A as follows. S generates a signing key pair by running ( , ) SKg(1 )k

ssk spk ← and sets v←gα ,

which implicitly defines the secret key sk=( ,

α

ssk) and the public key pk=( ,v spk). S

provides pk to A.

To respond each query issued to the random oracle H( )⋅ , S first parses it as fn_l||i

and programs the response of H( )⋅ as we will describe later.

To respond each query M_l issued to the store oracle, S first chooses a random file-name fn_l, encodes M_l to obtain M_l and splits it into n_l blocks { }₁

l

li i n

m _{≤ ≤} . S

sets l l

, ,

l l l R p

u

←

g h

β γ

γ β

←

Z

. S picks r_li ←_R Z_p,1≤ ≤i n_l and programs the response of H fn( _l|| )i as ( || ) rli ( lmli lmli)

l

H fn i =g gβ⋅ hγ⋅ . At this point, S computes

,1 , li l aug ≤ ≤i n as follows: li aug = ( ( || ) mli) l l H fn i u⋅ α=((_grli (_gβl⋅mli _hγl⋅mli)) (_⋅ _{g h}βl γl) )mli α =(_grli)α₌₍_gα₎rli

S computes

τ

_l according to the specification of the file-storing algorithm St( )⋅ via the signing key and returns (, ,{ } )nl₁

l l li i

M

τ

aug ₌ to A. Sinteracts with A until the event

2 E happens. Assume that Q= pk∪{P₁∪Q₁,,P_n∪Q_n}, P_l =( ,

λ

_l fn_l) , {( , )} i l i l Q = l v is the MPOR query issued by the verifier in one execution of our MPOR protocol for the encoded files M₁,M_n. Let the response returned by the adversary as a prover to this query be

_,

µ σ

. Let ( , )

µ σ

be the deterministic response returned by the honest prover algorithm for the query Q, which satisfies the following:

1

(

,

_l

,

_n

)

µ

=

µ

, 1 l n l l λ

σ

= =

_∏

(14)

µ

_l= ( , _j) _l j lj j v Q v m ⊂

∑

σ

l= ( , ) j j l v lj j v Q aug ⊂

∏

, 1≤ ≤l n ( , ) e

σ

∏ ∏

∏

When E₂ happens,

µ σ

,

can also pass the verification and the following hold:

1

(

,

_l

,

_n

)

µ

=

µ

( , )

e

σ

g

=

1 ( , ) 1 ( ( ( || ) )j l ( l l), ) j l n n v l l l j v Q l e H fn j λ uµ λ⋅ v = ⊂ = ⋅

∏ ∏

∏

Let ∆

σ

=

σ σ

− , ∆

µ

_l =

µ

_l−

µ

_l,1≤ ≤l n . If ∀ ≤ ≤1 l n,

µ

_l =

µ

_l, it follows that

σ

=

σ

according to the verification equation. Consequently, ∆

µ

_l ≠0 holds for at least one

position l by the assumption

σ

≠

σ

. We derive the following by division:

(

, )

e

σ σ

g

=

1 (( ( l l), ) n l l e u µ λ⋅ v =

∏

= 1 (( (( l l) l l), ) n l e g hβ γ µ λ⋅ v =

∏

Rearranging terms yields

₍₍

₎

1

_{, )}

l l l l n

e

v

g

β λ µ

σ σ

≤ ≤ −

∑

⋅ ⋅

⋅

=

_{( , )}

1 l l l l n

e h v

γ λ µ ≤ ≤

∑

.

As v=gα,h=gβ, we see that the solution gαβ to the CDH problem can be written

as

1 1 1

((

)

l l l l l l l n l n

v

β λ µ γ λ µ

σ σ

≤ ≤ ≤ ≤ −

∑

⋅ ⋅

∑

⋅

unless 1 l l l l n

γ λ µ

≤ ≤

∑

is equal to zero. For any fixed sequence { }n₁

l l

µ

₌ that is not all zero, the probability that

1 0 l l l l n

γ λ µ

≤ ≤ =

∑

is

1 p

since each

γ

_l chosen by the simulator is uniformly distributed over Z_p and hidden from the adversary’s view since l l

l

u =g hβ γ reveals no information of

l

γ

. Hence the success probability of S solving the CDH problem is at least

2

Pr[E ] 1− p.

The challenger in Game 3 verifies the response from the adversary during each execution of our MPOR protocol in a way different from Game 2. Given a MPOR query Q, let ( , )

µ σ

be the deterministic response returned by the honest prover algorithm for the query

(15)

Q and ( , )

µ σ

be the response returned by the adversary in one execution of our MPOR protocol. The challenger parses

µ

as (

µ

₁,,

µ

_l,,

µ

_n) and aborts if ∃l,

µ

_l ≠

µ

_l, where ( , ) ,1 j l l j lj j v Q v m l n

µ

⊂ =

∑

≤ ≤ .

Claim 3: |

ε

₃−

ε

₂ | is negligible under the discrete logarithm assumption.

Proof: Game 3 is distinct from Game 2 only when the response from the adversary in one of the MPOR protocol executions may cause the challenger to abort as specified. Let E₃

denote the event “The adversary behaving as a prover in one execution of our MPOR protocol can pass the verification as specified but

( , ) , ,1 j l l j lj j v Q l

µ

v m l n ⊂ ∃ ≠

∑

≤ ≤ ”.

Game 3 is distinct from Game 2 only when E₃ happens. Hence |

ε

₃−

ε

₂ | Pr[≤ E₃]. If

3

Pr[E ] is non-negligible, we can construct a simulator that solves the discrete logarithm problem.

The simulator S takes an instance ( ,g h=gβ) of the discrete logarithm problem as input and simulates the environment of Game 3 for the adversary A as follows.

S generates a signing keypair (ssk spk, )←SKg(1 )k and picks

α

←_R Z_p. Let

v←gα, which defines the secret key sk=( ,

α

ssk) and the public key pk=( ,v spk).

Sprovides pk to A.

To respond each query M_l issued to the store oracle, S first chooses a random file-name fn_l from a large domain, encodes M_l to obtain M_l and splits it into n_l

blocks { }₁ l li i n m _{≤ ≤} . S sets l l

, ,

l l l R p

u

←

g h

β γ

γ β

←

Z

.

S interacts with the adversary until the event E₃ happens.

Assume Q= pk∪{P₁∪Q₁,,Pn∪Qn},Pl =( ,

λ

l fnl),Ql ={( ,l vi l_i)} is the MPOR query issued by the verifier in one execution of our MPOR protocol for the files M₁,M_n. The response returned by the adversary to this query is ( , )

µ σ

. Let ( , )

µ σ

be the deterministic response returned by the honest prover algorithm for the query Q.

(16)

According to the proof in Game 2, we know that

σ

=

σ

except with negligible probability negl( )

λ

. Under this assumption, we derive the following according to the verification equation: ( , ) e

σ

∏ ∏

∏

( , )

e

σ

g

=

1 ( , ) 1 ( ( ( || ) )j l ( l l), ) j l n n v l l l j v Q l e H fn j λ uµ λ⋅ v = ⊂ = ⋅

∏ ∏

∏

We conclude 1 1 ( l l) ( l l) n n l l l l uµ λ⋅ uµ λ⋅ = = =

∏

by taking

σ

=

σ

, which means

1 1 1 ( l l) (( l l) l l) n n l l l u µ λ⋅ g hβ γ µ λ⋅ = = =

∏

=

∏

Let ∆

µ

_l =

µ

_l−

µ

_l,1≤ ≤l n . ∆

µ

_l ≠0 holds for at least one position l by the assumption ( , ) , ,1 j l l j lj j v Q l

µ

v m l n ⊂ ∃ ≠

∑

≤ ≤ . If 1 0 mod l l l l n p

γ λ µ

≤ ≤ ≠

∑

, the discrete logarithm 1

1 ( ) mod l l l l n l l l l n p

β λ µ

β

γ λ µ

≤ ≤ ≤ ≤ = −

∑

because we have 1 1 ( ) l l l l n l l l l n h g β λ µ γ λ µ ≤ ≤ ≤ ≤ − ∑ ∑ = .

Similarly, for any fixed sequence {

µ

_{l l}}n₌₁ that is not all zero we can argue that the probability that 1 l l l l n

γ λ µ

≤ ≤

∑

is equal to zero is

1 p

. Hence the success probability of S

solving the discrete logarithm problem is at least Pr[E₃] 1− p−negl( ).

λ

Response of the adversary in Game 3 is forced to be the same as that output by the honest prover algorithm of our MPOR protocol. A well-behaved prover program P* causes the verification algorithm MV( )⋅ to accept in each execution of MPOR protocol by responding with ( , )

µ σ

computed by the honest prover algorithm. Claims 1-3 show that any adversary that wins in the game ExpMPORsound n, ( ,1 )A k is well-behaved except with negligible probability. In the following, we show that extraction will succeed by interacting

(17)

with a well-behaved prover program.

Definition 1: Given a MPOR query Q as input, a polite prover program

P

outputs either the correct response computed by the honest prover algorithm or a special symbol ⊥. If

P

outputs the correct response with probability at least

ε

, then we call

P

a

ε

-

polite prover program.

A

ε

-

well-behaved prover program P* can be transformed into

ε

-

polite prover program

* ( ) P

P ⋅ with black box access to P*.

Having received a MPOR query Q from a verifier,

P

plays the role of verifier to interact with P* by forwarding the query Q to P*. Having received the response ( , )

µ σ

from P* ,

P

outputs ( , )

µ σ

if and only if the verification algorithm

1

(

,{ ,

_n

}, ( , ))

MV pk

τ

µ σ

outputs 1; otherwise

P

outputs ⊥.

P

provides P* with fresh randomness and rewinds it for each interaction. As P* is

ε

-

well-behaved prover program,

P

is

ε

-

polite. Note that the tags { ,

τ

₁

τ

_n} that are responses to the store queries can help

P

to verify the correctness of ( , )

µ σ

returned by P*.

Let 1 i i n N n ≤ ≤

=

∑

. For a subspace Ψ of

(

Z

_p

)

N , denote the dimension of Ψ by dimΨ. Let FreeΨ be the indices of the canonical basis vectors { } (c_i ∈ Z_p)N included in

Ψ. In other words, FreeΨ=

{

i

∈

[1.. ] :

N

c

_i

∈ Ψ

}

.

Lemma 1 [ 16, Claim 4.6]: For a subspace Ψ of

(

Z

p

)

N, and let I be an

ns

-

element subset of [1.. ]N . If I ⊄FreeΨ, then a random MPOR query Q over indices in I with its coefficient vector q∈ Ψ occurs with probability at most

1 (

p

−

1)

.

Lemma 2 [16, Claim 4.7]: Let #(Free )Ψ =m. For a random

ns

-

element subset I of [1.. ]N , the probability that I ⊆FreeΨ is at most ns ( 1)ns

m N−ns+ .

(18)

1 (

)

1 (

1)

ns ns

N

p

N

ns

ρ

ω

=

+

−

+

. If

ε

>

ω

, for each file

l

M , 1≤ ≤l n, the expected time to

recover at least

ρ

fraction of it is O((1

ε

N2)

ρ

nN t n2 2)

ε ω

+ ⋅ +

− , where t is the number of

MPOR queries issued to the prover program by the extractor during one round of interaction. Proof: The t MPOR query-response pairs during one round of interaction with the prover program

P

contribute the following to the extractor’s knowledge of the encoded files

* * 1, , n M M : (1) (1), , ( )t ( )t q ⋅H =

µ

q ⋅H =

µ

H is the matrix constructed from the encoded files M₁*,,M*_n , which is described at the end of Section 4. We rewrite the above as V H⋅ =W where V is the t N× matrix whose row vectors are the t coefficient vectors {q( )i }t_i₌₁

of the MPOR queries and W is the t n× matrix whose row vectors are t corresponding MPOR responses { ( )i}t ₁

i

µ

₌

. The matrix V can be reduced to a matrix G=U V⋅ in the row-reduced echelon form, where U is a t t× matrix with nonzero determinant computed by applying Gaussian elimination to V.

The extractor’s knowledge during the above interaction can be represented by the matrixs G U V W, , , , where the matrix G in the row-reduced echelon form. The subspace generated by the matrix G is denoted by Ψ. The extractor’s knowledge space is initially empty. The extractor repeats the following behavior until |FreeΨ ≥|

ρ

N.

The extractor chooses a random MPOR query Q over the indices I ⊆[1.. ]N with coefficient vector q by its random coins and runs the

ε

-

polite

P

on Q.

P

answers the correct response

µ

=q H⋅ with probability

ε

. We consider the following three types:

1. q∉ Ψ:

For queries of this type, the extractor extends its knowledge as follows:

It adds the row vector q to the current matrix V , obtaining V/ and adds the response vector

µ

to the existing matrix W, obtaining W/. It also computes G/ =U V/⋅ / in the row-reduced echelon form. G U V W/, /, /, / represent the update of the extractor’s

(19)

knowledge.

2. q∈ Ψ but I ⊄FreeΨ: 3. I⊆FreeΨ:

For queries of type 2 or 3, the extractor does not update its knowledge and continue its current interaction with the prover program

P

.

A query of type 1 increases dimΨ by 1. The extractor’s interaction with

P

is guaranteed to terminate when the number of queries of type 1 is above

ρ

N .

By Lemma 1, queries of type 2 make up at most

1 (

p

−

1)

since Pr[ is type 2] Pr[ Free ]

Q Q = Q q∈ Ψ ∧ ⊄I Ψ

Pr[ | Free ] Pr[ Free ] Pr[ | Free ] 1 ( 1)

Q q I Q I Q q I p

= ∈ Ψ ⊄ Ψ ⋅ ⊄ Ψ ≤ ∈ Ψ ⊄ Ψ ≤ −

Assume #(Free )Ψ =m . For a random

ns

-

element subset I of [1.. ]N , the probability that I ⊆FreeΨ is at most ns ( 1)ns

m N−ns+ by Lemma 2. By the convention set for the extractor, m≤

ρ

N, this quantity is at most ( )ns ( 1)ns

N N ns

ρ

− + .

Therefore the fraction of queries of type 2 or 3 is at most

1 (

)

1 (

1)

ns ns

N

p

N

ns

ρ

ω

=

+

−

+

.

As

P

is a

ε

-

polite prover program,

P

is able to answer at least

ε

fraction of the queries. Therefore, a random MPOR query chosen by the extractor will be of type 1 with probability at least

ε ω

−

. To yield

ρ

N queries of type 1, the extractor will carry out

( N )

O

ρ

ε ω

− interactions with

P

in this round.

As the matrix G in the row-reduced echelon form, it is possible to determine whether a query Q is of type 1, to which

P

has responded. The extractor adds the coefficient vector

q of Q to the current matrix V and applies Gaussian elimination to V to yield G, which takes O N( 2) time [8]. If the newly added row is not all zeros, then q is of type 1. As Gaussian elimination need only be applied to at most

ε

fraction of the queries responded correctly by

P

, the running time of the extractor in this phase is O((1

ε

N2)

ρ

N )

ε ω

+ ⋅

− .

(20)

of the matrixes G U V W, , , such that V H⋅ =W , G=U V⋅ . G is in the row-reduced echelon form. The free dimension of the subspace Ψ spanned by G is

ρ

N by the conventions set for the extractor. For each i∈FreeΨ, there must be a row in G, say row t, that equals some canonical basis vector c_i∈(Z_p)N since G is in the row-reduced echelon form. Multiplying both sides of V H⋅ =W by U, we obtain G H⋅ =U W⋅ .

We define a set 1 1 1 { : Free , } l i i i l i l G i i n i n ≤ ≤ − ≤ ≤

= ∈ Ψ

∑

< ≤

∑

for each index l, 1≤ ≤l n. As the number of indices i∈FreeΨ is

ρ

N , there exists at least one index l such that

|G_l| is at least

ρ

n_l by the pigeonhole principle. Recall that

H

=

[ ,

h

₁

,

h

_n

]

,

1 1 1 1 (0 , ( , , ), 0 ) i i i l l i n l n n T l l ln h ≤ ≤ − m m + ≤ ≤ ∑ ∑ =

. The inner products between these canonical vectors

,

i l

c i

∈

G

and the corresponding column vector

h

_l will recover at least

ρ

fraction of the encoded file M_l*, which can extracted from the product U W⋅ . The computation will takes

2 ( )

O t n time [8].

In the following, we analyze the probability p_l of an event F_l “ |G_l|≥

ρ

n_l for the given index l,1≤ ≤l n”.

At first, we will analyze the probability p_l/ of an event “ |G_l|=

ρ

n_l for the given index l,1≤ ≤l n”.

As mentioned above, we assume that each file’s size is of the same magnitude. For ease of analysis, we further assume b=n₁==n_l. For instance, these files of the same magnitude can be redundantly padded before encoding.

The probability p_l/

(

)

(

)

l l l l

n

N

n

N

b

nb b

nb

n

N

n

N

b

nb b

nb

ρ

−





 



 



=

_

_

_{ }

=

_

_{ }

_

−

_

_{ }

_

−

_{ }

_







The probability p_l= b j nb b nb j nb j nb j nb ρ ≤ ≤ρ

ρ

−        ₋        

∑

, only depends on the values of

, ,n b

ρ

and is independent of the choice of l.

As Pr[F₁∨,,∨F_n] 1= , ₁ 1 1 Pr[ , ] n n l l l F F F n p =

(21)

know that Pr[ ]F_l 1 n

≥ . Hence the extractor can recover at least

ρ

fraction of the encoded file M_l* with probability at least

1 n

during this phase. This means at least

ρ

fraction of the file M_l* can be recovered by running the extractor algorithm

n

rounds on average, who rewinds the

ε

-

polite prover

P

with fresh coins before each interaction.

In conclusion, for each file M_l*, 1≤ ≤l n, the expected time to recover at least

ρ

fraction of it is O((1

ε

N2)

ρ

nN t n2 2)

ε ω

+ ⋅ + − . As * l

M is redundantly encoded by erasure codes [2] such that any

ρ

fraction of the M_l* is sufficient to recover the original file M_l*. The original file M_l* is guaranteed to be recovered in this case.

6. Performance analysis

We evaluate the performance of the proposed MPOR protocol and that of the POR protocol [16]in terms of the required communication and computational cost to verify

n

files stored on the server side. The result is stated in Table 1, 2. Pair denotes one

pairing operation. MExp Gn( ) denotes one general multi-exponentiation 1 1 n e e n g g

over a group G. The size of each set Q_l is assumed to be a fixed

s

in both schemes. For ease of comparison, we assume that there is only one sector per each encoded message block in the POR protocol [16].

The advantage of our MPOR protocols lies in the fact that the number of pairing

operations computed by the verifier is independent of the parameter

n

. In addition,

length of the response from a prover is further reduced by aggregating the responses

from the prover.

Nevertheless, the soundness of our MPOR protocol only satisfies a relatively weak security notion under the assumption that each file’s size is of the same magnitude. One invocation of the extractor algorithm can only guarantee that at least one file among those

n

files can be recovered. Our analysis shows that each file can be extracted in expected polynomial time under our assumption on the size of processed files.

(22)

7. Conclusion

Proof-of-retrievability protocols can help a client to be assured of the availability of files stored by a server. H. Shacham et al. [16] strengthened the security model for POR protocols by allowing an extractor black-box access to a prover program. In addition, they presented a POR protocol with public verifiability based on a homomorphic authenticator derived from BLS short signature. When using their POR protocol with public verifiability to verify the availability of multiple files separately, the number of pairing operations computed by a verifier is linear with the number of files. To handle this issue, we extend the work in [16] by introducing a new notion called multi-proof-of-retrievability. Our MPOR protocol with public verifiability allows one verifier to verify the availability of

n

files stored by a server in one pass, while the computational overhead of a verifier in our MPOR scheme is constant, independent of the parameter

n

. Analysis of our MPOR protocol shows that each file can be extracted in expected polynomial time under certain restriction on the size of processed files.

Acknowledgement

This work is supported by Natural Science Foundation of Higher Education Institutions, in Jiangsu Province office of education, P.R. China (Grant No. 10KJD520005)

References

[1] Amazon simple storage service (Amazon S3), http://aws.amazon.com/s3/

[2] A.Alon and M.Luby, A linear time erasure-resilient code with near optimal recovery, IEEE.Tran.Inf.Theory,42(6) (1996) 1732-1736.

[3] G. Ateniese, R.Burns, R.Curtmola, J.Herring, O.Khan, L.Kissner, Z.Peterson and D. Song, Remote data checking using provable data possession, ACM Trans.Inf.Syst.Security, 14(1), (2011) Article No.12,

[4] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, Scalable and efficient provable data possession, Cryptology ePrint Archive, Report 2008/114, 2008. http://eprint.iacr.org/2008/114

[5] M. Bellare and O. Goldreich, On defining proofs of knowledge, CRYPTO'92, LNCS 740 (1993) 390-420.

(23)

[6] D. Boneh, B.Lynn and H. Shacham, Short signatures from weil pairing, Journal of Cryptology, 17(4) (2004) 297-319.

[7] D. Cash, A. Kupcu, and D. Winchs, Dynamic Proofs of Retrievability via Oblivious RAM, Cryptology ePrint Archive, http://eprint.iacr.org/2012/550

[8] H. Cohen, A course in computational algebraic number theory, Berlin, Springer, 1993 [9] Y. Deswarte, J.-J.Quisquarter and A. Saidaneand H. Shacham, Remote integrity checking, IICIS 2003, IFIP Vol.140 (2004) 1-11.

[10] C. C. Erway, A. Kuppcu, C. Papamanthou, and R. Tamassia. Dynamic provable data possession. ACM CCS 09, (2009) 213-222.

[11] S. Goldwasser, S.Micali and R. Rivest, A digital signature scheme secure against adaptive chosen-message attacks, SIAM Journal of Computing, 17(2) (1988) 281-308. [12] A. Juels and B.Kaliski, PORs: proofs of retrievability for large files, CCS 2007, ACM, (2007) 584-597.

[13] P. Mell and T. Grance, NIST SD 800-145, The NIST definition of cloud computing, NIST special publication, 2011

[14] M. Naor and G.Rothblum, The complexity of online memory checking, J.ACM, 56(1), (2009) Article No.2

[15] T. Schwarz and E. Miller, Store, forget and check: Using algebraic signatures to check remotely administered storage, ICDCS 2006, IEEE, (2004) 1-11.

[16] H. Shacham and B.Waters, Compact proofs of retrievability, Journal of Cryptology, published online

[17] M.Shah, M.Baker, J.Mogul and R.Swaminathan, Auditing to keep online storage service honest, Proceedings of HotOS 2007, ACM, (2007) Article No.11.

[18] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou, Enabling public verifiability and data dynamics for storage security in cloud computing, ESORICS 2009, Springer, LNCS 5789, (2009) 355-370.

(24)

Table 1. Computational overhead of a verifier Cost for verifying

n

files Scheme Pairing Exponentiation POR protocol [16] 2n ₁ 2 s ( ) n MExp⋅ − G Our MPOR protocol 2 1 ( ) s n MExp⋅ − G +MExpn−1( )G

Table 2. Length of the response from a prover

Scheme Total bit length of the

n

responses from the prover POR protocol

[16] n⋅(|Zp |+|G|)

Our MPOR