Proofs of Data Possession and Retrievability Based on MRD Codes

(1)

Proofs of Data Possession and Retrievability

Based on MRD Codes

*

Shuai Han

1

, Shengli Liu

1

, Kefei Chen

2

, Dawu Gu

1

1_{Department of Computer Science and Engineering,}

Shanghai Jiao Tong University, Shanghai 200240, China

2_{School of Science, Hangzhou Normal University, Hangzhou 310036, China}

{dalen17,slliu,dwgu}@sjtu.edu.cn, kfchen@hznu.edu.cn

Abstract

Proofs of Data Possession (PoDP) scheme is essential to data outsourcing. It provides an efficient audit to convince a client that his/her file is available at the storage server, ready for retrieval when needed. An updated version of PoDP is Proofs of Retrievability (PoR), which proves the client’s file can be recovered by interactions with the storage server. We propose a PoDP/PoR scheme based on Maximum Rank Distance (MRD) codes. The client file is encoded block-wise to generate homomorphic tags with help of an MRD code. In an audit, the storage provider is able to aggregate the blocks and tags into one block and one tag, due to the homomorphic property of tags. The algebraic structure of MRD codewords enables the aggregation to be operated over a binary field, which simplifies the computation of storage provider to be the most efficient XOR operation. We also prove two security notions, unforgeability served for PoDP and soundness served for PoR with properties of MRD codes. Meanwhile, the storage provider can also audit itself to locate and correct errors in the data storage to improve the reliability of the system, thanks to the MRD code again.

Keywords:Data integrity, dependable storage, error localization, cloud computing.

1 Introduction

Data outsourcing to cloud storage reduces the storage cost and makes the data maintenance easier at both personal and business level. When clients move their data to a service provider for storage, they can enjoy the convenience of outsourcing storage with a relative low fee. However, they may also worry about the security of their data, as their data is out of their hands, and can be manipulated by the untrustworthy storage provider. The storage provider may maliciously delete some rarely accessed data to save space, or lose some data due to system failure. Hence, one of the main issues bothering the clients is whether their data is still available for retrieval if needed. The fact that even Amazon’s S3, one of the best data outsourcing service provider, suffered significant downtime in 2008, makes this issue more critical.

A naive way to solve this issue is that a client retrieves all his/her data to check its authenticity and integrity from time to time. This costs lots of communication bandwidth, and deviates from the original appealing features of cloud storage.

A better approach is that a client performs an audit of the storage provider. A successful audit verifies that the provider stores the data, without the retrieval and transfer of all the data from the provider to the client. The ability of a storage system to generate proofs of possession of client’s data, without having to retrieve the whole file, is calledProofs of Data Possession(PoDP). PoDP only concerns the provider’s proof

*_{Corresponding author: Shengli Liu. Funded by NSFC Nos.61170229, 61133014, 61373153, Innovation Project (No.12ZZ021) of}

(2)

of data possession. An improved version of PoDP is known asProof of Retrievability (PoR), which enables the provider to convince the client that the original data could be recovered through enough interactions between the client and provider.

More precisely, a PoDP/PoR scheme is implemented through an audit in which a client interacts with the storage provider. In each interaction, the client issues a query, and the provider feeds back a response. The client verifies the consistency of the response. After several interactions, the client decides whether the audit is successful or not according to some audit strategy. A successful audit indicates the client’s confidence of the data possession of the provider. Proof of retrievability means that if a storage provider succeeds in an audit, then there exists an extractor, which can extract all the data from interactions with the provider.

In the above introduction, we assume that clients implement the audit, and this is called PoDP/PoR with private verification. Audit can also be performed by any trustworthy third party, and in this case it is called PoDP/PoR with public verification. Public verification often involves complicated computation like modular exponentiations or bilinear pairings, which makes the corresponding PoDP/PoR inefficient. In this paper, we only study PoDP/PoR with private verification.

To evaluate a PoDP/PoR scheme, both “system” and “crypto” criteria should be considered. As suggested in [SW13], the “system” criteria includescomputational complexityof the storage provider and clients, com-munication complexitybetween the provider and clients,statelessnessand unbounded useof the interactions in the audit. The computational complexityandcommunication complexitydetermine the efficiency of the PoDP/PoR scheme. Theunbounded usemeans that the number of possible interactions should not be bounded to some fixed threshold. The statelessnessrequires there should not be state to maintain and update between interactions in an audit. The “crypto” criteria requires that a successful audit guarantees the provider server is actually storing the file (as for PoDP), with overwhelming probability, and the existence of an extractor to recover data with interactions between the client and the provider (as for PoR).

In a PoDP/PoR system, the storage provider is in charge of responding to all the clients for auditing. It is very important for the storage provider to implement light-weight computation, otherwise the provider will be the bottleneck of the system. However, almost all the available PoDP/PoR schemes associate the storage provider with heavy computation, like modular exponentiations or modular multiplications.

1.1

Related Works

Deswarte et al. [DQS03] (2003) and Filho et al. [FB06] (2006) designed schemes to prove the data avail-ability based on RSA. The schemes require huge amount of modular exponentiations. Similarly, Schwarz et al. [SM06] (2006) also proposed to employ algebraic signatures for integrity check of storage. The highly computational complexity makes the aforementioned schemes impractical.

It was Juels and Kaliski [JJ07] (2007) who first formulated the concept of Proof of Retrievability and defined the corresponding security model. Dodis, Vadhan and Wichs [DVW09] (2009) presented Proof of Retrievability Codes (PoR codes) for PoR schemes, and they also divide PoR schemes to “bounded vs. un-bounded” according to the number of possible queries in an audit, and “knowledge soundness vs. information soundness” according to whether the extractor is probabilistic polynomial-time (PPT) or not. Naor and Roth-blum [NR09] (2005) designed “sublinear authenticator”, which can be considered as “information-theoretic unbounded-use of PoR schemes” according to Dodis, Vadhan and Wichs [DVW09].

More constructions of PoR schemes were given by Ateniese et al. [ABC+07] (2007), Shacham and Waters [SW13] (2008), or Schwarz and Miller [SM06] (2006). When authentication is achieved with a digital signature scheme, one gets a PoR scheme with public verification.

(3)

homomorphic property, multiple file blocks resp. authenticators can be combined into a single aggregated block/authenticator pair. And the verification is simplified to integrity check of the aggregated block. There-fore, both the computational complexity and communication complexity of an audit are greatly reduced, due to the homomorphic property of authenticators.

Wang et al. proposed [WWR+12] (2012) to integrate PoDP with dependable storage. They used Reed-Solomon (RS) erasure-correcting codes to encode the original file of the client before storage. The audit of PoDP will locate erasures which are later corrected with RS decoding. Their scheme is bounded and only the client is able to implement the erasure-correcting. At the same time, erasure correction needs interactions between the client and each individual servers to locate erasures, recover the whole encoded file, and dis-tribute the file among all the disdis-tributed servers. This imposes a great burden to clients’ local computation, storage, and management, and again deviates from the original goal (free of data management) of cloud storage.

Next, we briefly review two seminal schemes. The first one was proposed by Ateniese et al. [ABC+07] (2007) and referred to the ABCHKPS scheme. The second one was proposed by Shacham and Waters [SW13] (2008) and referred to the SW scheme.

1.2 The ABCHKPS Scheme

The ABCHKPS scheme proposed by Ateniese et al. [ABC+07] (2007) is based on the RSA assumption in the Random Oracle security model. It is roughly recalled as follows.

Homomorphic Tag Generation. Each file M is divided into blocks of 1024-bit, i.e., M = (M1,M2,· · ·, Mw). For each blockMi, an RSA-like signature is computed as a tagTi.

Interactions in Audit. In an audit, the client will interact with the storage provider in the form of chal-lenge/response. When a challenge binary vector(v1,v2,· · ·,vw) of weight cis issued, the provider computes an aggregated tag, whose computation is dominated bygPwi=1viMi _mod _N_{. Here}_N_{is a 1024}

modulus, andgis a random element from the Quadratic Residue subgroup ofZ∗_N. Verifying the con-sistence of the provider’s response is just the RSA signature verification.

The scheme presented a security analysis about a malicious storage provider. If the provider deletest blocks of fileM, then the scheme cannot detects the provider’s misbehavior with a probabilityζ = w_c−t. w_c. Remembercis the weight of the challenger vector, i.e., the number of sampled blocks, andw the total number of blocks in fileM. Ifc=460 andt=0.01w, i.e., 1% fraction ofMis deleted,ζcan be as small as 1% according to [ABC+07].

The probability ζsuggests a necessary condition on auditing strategy. Only if the storage provider cor-rectly answers the challenge vector with probability larger thanζ, does the audit claim success.

Now we give a simple analysis of the scheme.

• To answer a challenge in the audit, the computation of the storage provider is dominated bygPwi=1viMi

mod N, which needs aboutcmodulo exponentiations with a 1024-bit modulus. The reason is that the exponentPw

i=1viMi is aboutcbits longer than a 1024-bit block. Without the secret factorization of

N, the provider can not reduce the exponentPw

i=1viMiwithφ(N)before the computation of modular exponentiation.

• If the storage provider deletes only a few blocks, the stored file is not the original one any more, but it is very hard to detect the provider’s misbehavior. Lett=1,c=460,Mbe a file of 1GB. Thenw =223, butζ=(w−c)/w ≈100%.

(4)

1.3 The SW Scheme

The scheme presented by Shacham and Waters [SW13] is called the SW scheme. A new security notion, namely-soundness, was proposed in the SW scheme. If the service provider correctly answers a client’s challenges with probability at least, it is possible for the client to recover the file from enough interactions with the provider.-soundness suggests a sufficient condition on the auditing strategy. As long as the storage provider correctly answers the challenge vector with probability at least, the audit can claim success.

The idea of the SW scheme is as follows.

Erasure Encoding. The original fileM0is encoded into Mwith an erasure code, such that the retrieval of

ρfraction ofMis able to recover the originalM0.

Homomorphic Tag Generation. The encoded fileM is divided intow blocks of the same length (the last block may be padded to achieve the length): M =₍M1,M2,· · ·, Mw). A tagσiis generated from each block Miwith help of an authentication keyK. The final fileM∗consists of the blocks and their tags.

Interactions in Audit. In an audit, the client will interact with the storage provider in the form of chal-lenge/response. Let(v1,v2,· · ·,vw)be a challenge vector, wherevi ∈B⊆Fp.The provider can gener-ate an aggreggener-ated pair (µ,σ) of message and tag, whereµ=Pw

i=1viMiandσ=Pwi=1viσi. Given the prover’s response (µ0,σ0), the client is able to use the authentication keyKto verify whether (µ0,σ0) is consistent or not. A consistent pair (µ0,σ0), i.e., the pair that passes the verification, must guarantee

µ0₌Pw

i=1viMiwith overwhelming probability.

To prove that the original file M0can be retrieved from a storage provider who feeds back consistent responses upon queries with some probability , an extractor is constructed to interact with the storage provider to recover M0. If the storage provider’s reply₍µ,σ₎passes the verification, we call that₍µ,σ₎is consistent to the challenge vector(v1,v2,· · ·,vw). Firstly, it is necessary to prove that a consistent reply

(µ,σ) is in fact a linear combination of the blocks, i.e., µ = Pw

i=1viMi, with overwhelming probability. Secondly, ifis big enough, the consistent responses will bring more and more information aboutM0in the form of linear combinationsµ = Pw

i=1viMi. When enough consistent replies are collected, a ρfraction of

the encoded file Mcan be recovered by solving a system of equations. Finally, with decoding algorithm of the erasure code, ρfraction of the encoded fileMuniquely determines the original fileM0.

Although the SW scheme enjoys short responses due to the homomorphic verifiable tags, there are some other problems with the SW scheme.

• The Erasure Encoding imposes a heavy computational burden to the clients, and makes the SW scheme unfriendly to dynamic update of the clients’ files.

The employment of erasure correction code in [SW13,DVW09] is directly served to the security proof: it is much easy for the extractor to recoverρfraction of a file than the whole one. However, this imposes a great computational burden for the client to encode the original fileM0intoM. IfM0hasw0blocks, the error-erasure encoding will extend thew0blocks tow blocks, then ρ= w_w0. The expansion rate is defined asθ =_ww0. For a security level of 80 bits, all the operations should be over finite fieldFpwithp at least 80 bits. However, such an encoding needs at leastO₍w02₎multiplications over the prime field Fp. For example, if a 1GB fileM0is divided into 220blocks of length 1KB, then the encoding needs at least(1024×8/80)×240 > 246 modular multiplications overFp. Such a computational complexity served to store a file is unbearable to clients.

On the other hand, the originalw0 blocks are encoded tow blocks. When one of thew0 blocks is modified (deletion, insertion, update), at least anotherw−w0blocks changes correspondingly.

• The challenge vector₍v1,v2,· · ·,vw) ∈ (Fp)w cannot be reduced to a binary vector to save commu-nication and computational overhead. The challenge vector implies the aggregated response should be(Pw

i=1viMi, Pw

i=1viσi). To reduce the communication band and the computational overhead of the

storage provider, the SW scheme suggests that onlyl(l≤w) random locations in the vector are chosen to take values from a small setB⊆Fp. IfB=_{1_}, the computation of₍Pw

i=1viMi, Pw

i=1viσi)is reduced

(5)

scheme, as shown in [SW13]. Therefore, it is inevitable for the storage provider to use multiplications overFpto prepare the aggregated response.

• Theoretically, the storage provider is able to correct errors in M if an Error-Correction Code instead of an Erasure Correction Code is involved. However, the decoding algorithm is much slower than the encoding algorithm, and computational complexity of decoding is significantly larger thanO₍w02₎, wherew0is the number of blocks of the original file.

1.4 Our Contributions

In this paper, we proposed a PoDP/PoR scheme based on Maximum Rank Distance (MRD) codes. The main idea is as follows.

Divide the original fileM into blocksM = ₍M1,M2,· · ·,Mw), each block being a matrix oft×kover a small fieldFq. Block Mi is encoded into an MRD codewordCi, which is a matrix of t× nover Fq, for

i = 1,2,· · ·,w. Then each codewordCi is used to generate a tag σσσi = Ci · k+ ki, where k, ki, i = 1,2,· · ·,ware authentication keys, andσσσi,k, kiare vectors overFq. The final file to be stored in the server isM∗ =(C1||σσσ1,C2||σσσ2,· · ·,Cw||σσσw).

A challenge vector₍v1,v2,· · ·,vw) is just a random sequence of lengthw over the fieldFq. The corre-sponding valid response should be a linear combination of the blocks₍Pw

i=1vi◦Ci, Pw

i=1vi◦σσσi), where◦is

scalar multiplication overFq.

The validity of the response can be verified withPw

i=1vi◦σσσi= Pwi=1vi◦Ci

·k+Pw

i=1vi◦ki, using the authentication keyk, ki, i = 1,2,· · ·,w. Pseudo-random functions are used to generateki to reduce the local key storage.

The linearity of the MRD code results in homomorphic authenticators (tag), just like the scheme by Shacham and Waters (the SW scheme). However, our proposal has different features from the SW scheme.

• Efficient computation by the service provider. The MRD encoding plants algebraic structure in blocks so that the generation of tags and aggregation of messages and tags can be computed over smaller fieldFq. As to the instantiation ofF2, the challenge vector(v1,v2,· · ·,vw)can be taken as a random binary sequence of lengthw, and the computation of the storage server for the response is simply XOR. This is the most efficient operation for the storage server.

• Mild amount of pre-computation by the client.The MRD encoding is applied block-wise. If we take a file as a matrix. Then each row, as a block, is encoded into an MRD codeword, in contrast to each column being encoded into an Erasure Correction Codeword in the SW scheme. The file encoding involvesw MRD encoding, so the encoding computational complexity is ofO(w) in terms of MRD encoding (compared with theO₍w2₎complexity of the SW scheme). When parameters of the MRD code are fixed, the MRD encoding consumes a constant amount of computation.

(6)

• Efficient error location and correction for dependable storage.The linearity of the MRD codes also enable the storage provider to implement an efficient self-audit. If there is a failure in a self-audit, it is possible for the provider to locate the error block and fix it with MRD decoding. To locate an error, binary search applies with complexity ofO(logw). Compared with other error-correction codes, an MRD code is more powerful since it is capable of correcting row erasures, column erasures and rank errors.

1.5 Performance Comparison

Below gives a comparison among the ABCHKPS [ABC+07], SW [SW13], and our schemes. For a 1GB file, the block size is 1024-bit (as suggested in the ABCHKPS scheme [ABC+07]), 1KB (the SW scheme [SW13]), 4KB (as suggested in Section5, our proposal). The block number isw =223, 220, 218respectively.

Unforgeability defines the successful probability ζof a service provider in an interaction whentblocks of the file are deleted. It is closely related to the sampling strategy. The SW scheme has the same sampling strategy as the ABCHKPS scheme, hence they share the same unforgeability. When samplingcblocks among the totallywblocks for an interaction,ζcan approximately be estimated by the probability that no deleted block is sampled, i.e., ζ ≈ w−t

c

. _w

c

≥ ₍1−t/₍w−c+1₎₎c.Let the number of sampled blocks for each interaction be c = 460, as suggested in [ABC+07]. We have a different sampling strategy, namely, each block is sampled with probability 1/2. Thereforeζ ≈1/2t. It is easy to see that the unforgeabilityζof our scheme is better, especially when the fraction of deleted blocks is small. To obtain the same unforgeability as our scheme, both the ABCHKPS scheme and the SW scheme have to sample about half of the blocks, i.e., c≈w₂.

Suppose the multiplication inFphas computational complexityO₍₍logp₎2₎and exponentiation inFphas computational complexityO((logp)3₎_{in terms of bit-XOR. The SW scheme takes an 80-bit} _p_for_Fp_{. Our}

proposal takes a binary field as suggested in Section5.

With the unforgeability of our scheme as a benchmark, we show the computation complexity of the three schemes in Table1. The table includes precomputation, computation of the client in one interaction, and computation of the server in one interaction.

Table 1: Computational Complexity of the ABCHKPS, SW and Our Schemes

Computation

Scheme

ABCHKPS SW Ours

Precomput. (bit-XOR) 253 259 249

Client Comput. (bit-XOR) 231 231 231

Server Comput. (bit-XOR) 252 238 233

(7)

2 PoDP/PoR Scheme: Definition and Security Model

2.1 Notations and Assumptions

We use the following notations throughout the paper.

• Fq: a finite field withqelements, i.e.,GF(q).

• λ: security parameter, which is an integer tending to infinity. The current security standard requires thatλ ≥80.

• k: a column vector of dimensiontoverFq, or an element inFqt.

• k>: transpose of the vectork.

• ₍C1,· · ·,Cn)T: it represents

* . .

,

C1 ...

Cn

+ / /

-instead of* . .

,

C₁>

...

C_n>

+ / /

-, whereC1,· · ·,Cnare matrices.

• ◦: scalar multiplication overFq. Letk = k(1),k(2),· · ·,k(n)

>

∈ ₍Fq)n, andv ∈ Fq. Thenv ◦k =

v·k(1),v·k(2),· · ·,v·k(n)>, where·is the multiplication overFq.

• ~a: a row vector of column vectors, i.e.,~a=(a1,a2, . . . ,an), whereaiis a column vector overFq. It can also be regarded as a matrix overFq. Thenv◦~a=₍v◦a1,v◦a2,· · ·,v◦an).

• A||B: the concatenation ofAandB.

• _|S_|: the cardinality of setS.

• s←S: sample elementsuniformly at random from setS.

• _|s_|: the bit length of strings.

• a ← Alg(x): run the algorithmAlg(·₎with input x and obtainaas an output, which is distributed according to the internal randomness of the algorithm.

• A function f₍λ₎isnegligibleif for everyc>0 there exists aλc such that f(λ)<1/λcfor all λ >λc.

• PRF: pseudo-random function.

• PPT: probabilistic polynomial time.

To reduce communication band and local storage, we will use pseudo-random functions to generate the key sequenceki,i=1,2,· · ·,w. Below define pseudo-random functions.

Letl1andl2be two positive integers, which are polynomially bounded functions in security parameter λ. LetPRF={Fs}s∈Sbe a family of functions indexed bys, andFs:{0,1}l1 →{0,1}l2. FunctionFs(·)can also be represented byF₍s,·₎. LetΓl1,l2 be the set of all functions from{0,1}

l1 _to_{₀,₁_}l2_.

Definition 1 ([Sho04]) Given an adversaryAwhich has oracle access to a function inΓl1,l2, suppose thatA always outputs a bit. The advantage ofAoverPRFis defined as

AdvAPRF(1λ)=

PrAFs₍₁λ₎=₁_|_s_←_S₋_PrfAf₍₁λ₎=₁_| _f _←_Γ

l1,l2

g .

Define the advantage ofPRFasAdvPRF(λ)= max

PPTAAdv

A PRF(1

λ₎_. _PRF_{is called pseudo-random if}_Adv

PRF(λ)is negligible.

(8)

2.2 PoDP/PoR Scheme

A PoDP/PoR scheme consists of seven PPT algorithmsKeyGen,Encode,Decode,Challenge,Response,Verification

and an audit algorithmAudit, where the algorithmsChallenge, ResponseandVerificationconstitute an in-teraction protocol. A PoDP/PoR system associates a data storage service provider(P)with clients(V). The algorithms in a PoDP/PoR scheme works as follows.

Key generation:(sk,pp)←KeyGen(1λ). Each clientVcalls(sk,pp)←KeyGen(1λ)to get his own private keyskand a public parameterpp.Vkeeps the private keysksecret and stores₍sk,pp₎locally.

Storage encoding: M∗←Encode₍sk,pp, M₎. When a clientV prepares a file M for outsourced storage, he/she will callM∗ ←Encode(sk,pp,M)to encodeMtoM∗. The clientVsubmitsM∗ to the storage provider.

Storage recovery: _{M,⊥_}←Decode₍sk,pp,M∗₎. When a clientVgets back the out-sourced fileM∗, he/she will call{M,⊥_} ←Decode(sk,pp,M∗)to recover the original fileM or ⊥indicating that M∗ is too corrupted to recoverM.

Storage. The storage providerPstoresM∗submitted by the clientV.

Audit: β←Audit(u,Interaction(P(M∗,pp)V(sk,pp))). A clientVcan run the interactive protocolInter

-action₍P₍M∗,pp₎V₍sk,pp₎₎with the storage providerPforutimes, which is polynomial inλ. The storage provider Ppossesses M∗ and the clientVkeeps secret key sk. The protocol consists of the following three algorithms.

Interaction₍P₍M∗,pp₎V₍sk,pp₎₎

ch←Challenge(1λ): This is a PPT algorithm run by the verifierV. On input the security parameter

λ, this algorithm outputs a challengech.

re←Response(pp,M∗,ch): This is a deterministic algorithm run by the proverP. On input the public parameter pp, the encoded fileM∗ and the challengech, the algorithm returns a corre-sponding responsere.

b←Verification(pp,sk,ch,re): This is a deterministic algorithm run by the verifierV. On input the public parameter pp, the secret keysk, the challenge/response pair(ch,re), the algorithm returns a bitb∈_{0,1_}, indicating acceptance with 1 or rejection with 0.

Ifb=0, the interaction protocol outputs⊥; otherwise the protocol outputs(ch,re), and we call(ch, re₎a consistent pair and the execution of the protocol successful.

Afteruexecutions of the interaction protocol in the audit, the client will output a bit βaccording to some audit strategy. If β = 1, the audit succeeds and the client will be convinced that the storage provider is still storing the file M∗. If β = 0, the audit fails, and the client will consider that his/her fileM∗has been corrupted or lost.

Correctnessof a PoDP/PoR scheme requires that for all(sk, pp,M∗)such that(sk,pp)←KeyGen(1λ)and M∗ ←Encode₍sk,pp,M₎, the following two statements hold with probability 1.

• The original file M is always recovered from the correctly encoded fileM∗, i.e., M = Decode(sk, pp, M∗)ifM∗←Encode(sk,pp,M).

• For honest P,V, algorithm Verification₍pp,sk,ch,re₎ always outputs 1 in the interaction protocol, hence the interaction protocol outputs a challenge/response, i.e,(ch,re)←Interaction(P(M∗,pp)

V(sk,pp₎₎.

(9)

Unforgeability for PoDP. We consider the probabilityζof a successful interaction in which a storage provider uses a deleted version of M∗ to reply a client’s challenges. This probability ζwill play an important role in the auditing strategy: an audit is claimed to be successful only if the number of the success-ful interactions among the totaluinteractions is larger than ζ·u. We will define and evaluate this probability in an unforgeability game. This probability is related to the deleted amount ofM∗.

Soundness for PoR. Each successful execution ofInteractionprotocol brings some confidence to the client that his/her file is still stored by the storage provider. Successful executions ofInteractionprotocol pro-vide consistent challenge/response₍chi,rei)pairs and each pair may provide some information about the original M. The soundness of the PoR scheme suggests that the original fileM can be extracted from those consistent challenge/response(chi,rei)pairs collected from successful interactions as long as the number (polynomial inλ) of executions ofInteractionprotocol is big enough.

2.3 Unforgeability of PoDP Scheme

A malicious storage provider may delete those data that is rarely accessed by clients to save storage space, or suffer from data loss due to outside attacks. However, the storage provider may still try to convince the client its storage of the fileM∗_{, even though a file}_M∗_{is partially missing. Unforgeability of a PoDP scheme requires}

that the probability of a successful interaction should be low if some part ofM∗ is deleted. Unforgeability explains how much confidence a successfulInteractionprotocol gives a client. It suggests how to set an audit strategy such that the malicious server cannot save on space by deleting a portion ofM∗ and still pass an audit. This probability is related to the proportion of the deleted to the whole file.

Given a PoDP scheme (KeyGen,Encode,Decode,Challenge,Response,Verification,Audit), we define the following unforgeability gameUnforge_APoDP₍λ₎between an adversaryA=₍A1,A2)and a challengerV.

UnforgePoDP_A (λ)

1. The challengerVobtains a secret key and a public parameter by₍sk,pp₎ ←KeyGen₍1λ₎, and sends ppto the adversaryA1.

2. A1chooses a messageMfrom the message spaceM, and sendsM toV.

3. VencodesMtoM∗withM∗ ←Encode₍sk, pp,M₎, and sendsM∗toA1.

4. A1deletesδfraction ofM∗to get ˜M∗. The adversaryA2is given(pp,M˜∗).

5. Test stage.Vlaunches anInteractionprotocol by sendingA2a challengech, withch←Challenge(1λ). A2computes a responsere0←A2(pp,M˜∗,ch).

A=(A1,A2)wins ifVerification(pp,sk,ch,re0)=1, i.e., thisInteractionprotocol is successful.

Definition 2 A PoDP scheme is₍δ,ζ₎-unforgeable, if any PPT adversaryA=₍A1,A2)wins in the above game UnforgePoDP_A (λ)with probability at mostζ. In formula,

Pr

      

Verification₍pp,sk,ch,re0₎=1

(sk,pp)←KeyGen(1λ), M←A1(pp), M∗ ←Encode(sk,pp,M), _M˜∗ _←_A

1(pp,M,M∗), ch←Challenge(1λ), re0←A2(pp,M˜∗,ch)

      

≤ ζ,

whereM˜∗is aδfraction-deleted version ofM∗, andA1,A2share neither coins nor state.

2.4 Soundness of PoR Scheme

Even a truthful storage provider may malfunction in an audit, due to temporary system failure, and that does not imply that the provider does not possess the audited data. To be more robust, it is desirable that the PoR system supports an-admissible storage prover, where the storage prover convincingly answers an

(10)

as long as is larger than some threshold and the number of executions ofInteractionprotocol in the audit is big enough. This can be characterized by thesoundness of Proof of Retrievability(PoR).

A (cheating) prover ˜P is called -admissible, if it convincingly answers a challenge with probability at least , i.e., PrVerification(pp,sk,ch,re0)=1 ch←Challenge(1λ),re0←P˜ ₍pp,ch)

≥ . The PoR system is-sound if there exists a PPT algorithmExtractor, which recoversMon input the public parameter pp, the secret key sk and the output of the interactive protocolInteraction₍P˜ ₍pp,·₎ V₍sk,pp₎₎, i.e.,

Extractor pp,sk,Interaction(P˜ (pp,·₎V₍sk,pp))

= M, with overwhelming probability.

The formal definition of soundness of PoR scheme has been presented in [DVW09,SW13]. Here we give a refined description.

Given a PoR scheme (KeyGen,Encode,Decode,Challenge,Response,Verification,Audit), we define the following soundness gameSound_APoR₍λ₎between an adversaryAand a challengerV.Ais going to create an adversarial prover ˜P, and a PPTExtractoraims to extract the original file from interactions with ˜P.

SoundPoR_A ₍λ₎

1. The challengerVobtains a secret key and a public parameter by(sk,pp) ←KeyGen(1λ), and sends ppto the adversaryA.

2. Achooses a messageMfrom the message spaceM, and sendsMtoV.

3. VencodesMtoM∗withM∗ ←Encode(sk, pp,M), and sendsM∗toA.

4. Test stage. The adversaryAgets protocol access toV₍sk, pp₎and can interact withVfor a polynomial times. Then the adversaryAoutputs an-admissible prover ˜P. This-admissible prover ˜Pfunctions as an oracle, which will output a consistentrefor a proper query (challenge)chwith probability at least.

5. A PPT Extractor is given the public parameterpp, the secret keyskand oracle access to the-admissible prover ˜P. After querying ˜Pfor a polynomial times, the Extractor outputs a fileM0.

Definition 3 ([DVW09]) A PoR scheme is -sound, if there exists a PPT Extractor, such that for any PPT adversaryAwhich outputs an -admissibleP˜ through interacting with Vin the above game SoundPoR_A (λ),

Extractoris able to recover the original fileM by queryingP˜ except with negligible probability. In formula,

Pr

"

M0_,M

(sk,pp₎←KeyGen₍1λ₎, M ←A₍pp₎, M∗ ←Encode₍sk,pp,M₎, ˜

P←A₍pp, M,M∗,access toV(sk,pp)), M0←Extractor(pp,sk,P˜ )

#

is negligible.

3 Maximum Rank Distance Codes and Gabidulin Codes

Rank distance was first considered for error-correcting codes by Delsarte [Del78], and lots of work has been devoted to rank distance properties, code construction, and efficient decoding [HL14,HVLN15]. In

[Del78,Gab85,Rot91], a Singleton bound on the minimum rank distance of codes was established, and

a class of codes achieving the bound with equality was constructed, e.g. Maximum Rank Distance (MRD) codes. Gabidulin codes is one of MRD codes, which can be considered as the rank metric analogues of Reed-Solomon codes. In [WXS03], MRD codes was used in authentication.

3.1 Rank Distance Codes

LetFqbe a finite field, andFqtbe an extended field of degreet. Let Fqtnbe then-dimensional vector space

overFqt. Given a vector~a=(a₁,a₂, . . . ,a_n)∈(Fqt)n, each entrya_i ∈Fqtcan be expressed as at-dimensional

column vector overFq, i.e.,ai = a(

1)

i ,a

(2) i ,· · ·,a

(t) i

>

(11)

Definition 4 ([GY06]) A rank distance codeCR over finite fieldFqt is a subspace of Fqt

n

, satisfying the fol-lowing properties.

(a) For each codeword~c∈CR, the rank weight (overFq) of~c, denoted asrank(~c|Fq), is defined as the rank of

the correspondingt×n matrix overFqwhen~cis expressed as a matrix overFq.

(b) For any two codewords~a,~b∈CR, the rank distance (overFq) of~aand~bis defined asdR(~a,~b)=rank(~a−

~_b_|_Fq₎_.

(c) The minimum rank distance of the codeCR, denoted asdR(CR), is the minimum rank distance of all possible

pairs of distinct codewords inCR, i.e.,dR(CR)=min_~_a_,~b∈CRdR(~a,

~_b₎_.

We callCR a[q,c,n,t,d] rank distance code, if each codeword consists ofn elements inFqt, c = |C_R|is the

number of codewords,d=dR(CR)is the minimum rank distance.

3.2 Maximum Rank Distance Codes and Gabidulin Codes

For a[q,c,n,t,d]rank distance code CR witht ≥ n, it was shown thatdR ≤ dH in [Gab85], where dH denotes the minimum Hamming distance ofCR. With the Singleton Bound for block codes, we have the minimum rank distance ofCRsatisfiesd=dR ≤n−k+1, wherek=logqtc.

Definition 5 A[q,c,n,t,d]rank distance codeCRwitht≥n is called the maximum-rank-distance code (MRD

code), ifCR achieves the boundd=n−k+1, wherek=logqtc.

Definition 6 A_[q,k,n,t_]Gabidulin codeCR witht ≥ n consists of a generation matrixGk×n, a parity check

matrixHn×(n−k)overFqt and two algorithms(GabiEncode,GabiDecode).

Generation MatrixGk×n/Parity Check MatrixHn×(n−k). The generation matrixGk×nis determined by n

el-ements (g1,g2, . . . ,gn), withgi ∈ Fqt, i = 1,2,· · ·,n, and all the n elements are linearly

indepen-dent over Fq. The parity check matrixHn×(n−k)is determined by the generation matrixGk×nsatisfying

Gk×n·Hn×(n−k)=0. In [WSB10], a fast algorithm was shown to findhi∈Fqt,i=1,2,· · ·,n, such that

Gk×n=

* . . . . .

,

g1 g2 · · · gn

g₁q g₂q · · · gqn

... ... ... gqk

−1

1 g

qk−1

2 · · · g

qk−1 n

+ / / / / /

-, Hn×(n−k)=

* . . . . . . .

,

h1 hq₁ · · · hq

n−k−1 1 h2 hq₂ · · · hq

n−k−1 2 ... ... ... hn hqn · · · h

qn−k−1

n

+ / / / / / / /

-. (1)

GabiEncode(m~,Gk×n). Taking as input the generation matrixGk×nand the message vector~m=(m1,m2,· · ·, mk)∈(Fqt)k, it computes~c=m~ ·G_k_×_n, and outputs the codeword~c=(c₁,c₂,· · ·,c_n).

GabiDecode₍~r,Hn×(n−k)). Taking as input the parity check matrixHn×(n−k)and a vector~r=(r1,r2,· · ·, rn) ∈

(Fqt)n, it outputsm~ =(m₁,m₂,· · ·,m_k). Refer to [GP08] for the specific decoding algorithm.

A_[q,k,n,t_]Gabidulin codeCRis a[q,qtk,n,t,n−k+1]MRD code and

CR=

~c

~c=m~ ·Gk×n,m~ =(m1,m2,· · ·,mk),mi∈Fqt,i=1,2,· · ·,k ⊆(Fqt)n.

The correctness of Gabidulin Code requires that ~m = GabiDecode(~c,Hn×(n−k)) if~c ← GabiEncode(

~

m,Gk×n)for allm~ ∈(Fqt)k.

Lots of work has been done on the efficient decoding of Gabidulin and other MRD Codes. Decoding algorithms have been proposed to decode erasures or errors or both of them simultaneously.

(12)

• Row erasures. Letxdenote the number of row erasures. Row erasures means the locations of erroneous rows are known.

• Column erasures. Letydenote the number of column erasures. Column erasures means the locations of erroneous columns are known.

• Unknown errors. Letzbe the rank of error matrixEt×n, where the error locations are unknown.

Efficient (PPT) decoding algorithmsm~ ←GabiDecode(~r,Hn×(n−k))exist for erasure and error-correction decoding [GP08], which correct x-fold row erasures,y-fold column erasures, andz-fold rank errors simul-taneously, provided thatx+y+2z≤ d−1, whered=n−k+1.

4 PoDP/PoR Scheme from MRD Codes

Throughout the paper, we will use Gabidulin Code as an instantiation of MRD code. The only reason to use Gabidulin Code instead of a general MRD code is that Gabidulin Code has explicit generation matrix and parity check matrix, which helps us to analyze the performance of our proposal.

The primitives involved in the PoDP/PoR scheme are

• a Pseudo-Random FunctionPRF:Kpr f ×{0,1}∗ →Fqt, which uses a seedk0, randomly chosen from

spaceKpr f, to generate pseudo-random keys withPRF(k0,i)fori=1,2,· · ·. The length of the seed

k0_{is determined by the security parameter}_λ_.

• A _[q,k,n,t_] Gabidulin code CR with t ≥ n, which is associated with (Gk×n,Hn×(n−k), GabiEncode,

GabiDecode)defined in Definition6.

The PoDP/PoR scheme constructed from Gabidulin codes consists of the following algorithms.

• (sk,pp) ← KeyGen(1λ): The key generation algorithm takes as input the security parameter λ, chooses random elementsk ∈ Fqn andk0 ∈ K_{pr f}. Choose a Pseudo-Random Function (PRF)PRF :

Kpr f×{0,1}∗ →Fqtand a[q,k,n,t]Gabidulin codeC_R. Output the public parameterpp=(PRF,C_R)

and the secret keysk=(k,k0).

• M∗ ←Encode₍sk, pp,M): The storage encoding algorithm takes as input the secret keysk, the public parameterppand the original fileM. It will compute an encoded fileM∗as follows.

(1) Block Division: The original fileMis divided intowblocks, denoted by(M1,M2,· · ·,Mw)T, each blockMi=(mi1,mi2,· · ·,mik)∈(Fqt)kconsisting ofkelements ofF_qt. Express eachm_ij ∈F_qtas

a column vector of dimensiontoverFq, i.e.,mij = m(_ij1),m(_ij2),· · ·,m(_ijt)

>

. Then each blockMi can be expressed as at×kmatrix overFq, i.e.

Mi =

* . . . . . .

,

m(_i1₁) m(_i₂1) · · · m(_ik1) m(_i2₁) m(_i₂2) · · · m(_ik2)

... ... ...

m(t)_i₁ m(t)_i₂ · · · m(t)_ik

+ / / / / / /

-.

(2) Gabidulin Encoding: The original fileMis encoded into Gabidulin codewords block by block with Ci←GabiEncode(Mi,Gk×n),i=1,2,· · ·,w. More precisely, given blockMi=(mi1,mi2,· · ·,mik)

∈₍Fqt)k, the codewordC_i =(c_i₁,c_i₂,· · ·,c_in) ∈(F_qt)n is computed asC_i = (c_i₁,c_i₂,· · ·,c_in) =

(13)

Similarly, cij = c(

1) ij ,c

(2) ij ,· · ·,c

(t) ij

>

, which is a vector overFq, and each codeword Ci can be expressed as at×nmatrix overFq, i.e.

Ci =

* . . . . . . ,

c_i(1₁) c(_i₂1) · · · c(_in1) c_i(2₁) c(_i₂2) · · · c(_in2)

... ... ...

c_i(t₁) c(_i₂t) · · · c(_int)

+ / / / / / / -.

(3) Tag Generation: Compute a tagσσσifor each codewordCiwith the secret keysk=(k,k0).

(i) Expressk∈Fqn as a vector of dimensionnoverF_q, i.e.,k= k(1),k(2),· · ·,k(n)>.

(ii) Computeki =PRF(k0,i)fori=1,2,· · ·,w.Expresski ∈Fqtas a vector of dimensiontover

Fq, i.e.,ki = k(

1)

i ,k

(2) i ,· · ·,k

(t) i

>

.

(iii) For each codewordCi, a tagσσσi ∈Fqt is computed with

σ

σσi =Ci·k+ki=

* . . . . . . ,

c_i(₁1) c(_i₂1) · · · c(_in1) c_i(₁2) c(_i₂2) · · · c(_in2)

... ... ...

c_i(₁t) c(_i₂t) · · · c(_int)

+ / / / / / / -· * . . . . . ,

k(1) k(2)

...

k(n)

+ / / / / / -+ * . . . . . . ,

k(_i1) k(_i2)

...

k(_it)

+ / / / / / / -, (2)

where all the operations are in the basic fieldFq.

(4) Storage FileM∗: Each blockMiis encoded as(Ci||σσσi)=(ci1,ci2,· · ·,cin,σσσi). The final encoded fileM∗_{consists of blocks attached with tags:}

M∗ =₍C1||σσσ1,C2||σσσ2,· · ·,Cw||σσσw)T =

* . . . . . ,

c11 c12 · · · c1n σσσ1 c21 c22 · · · c2n σσσ2 ... ... ... ... cw1 cw2 · · · cw n σσσw

+ / / / / / -,

where each entry in the matrix is an element fromFqt.

• {M,⊥_}←Decode(sk, pp,M∗): ParseM∗ asM∗= C0₁||σσσ0 1,C

0 2||σσσ

0 2,· · ·,C

0

w||σσσw0

T

.

fori=1,2,· · ·,wdo

IfC0_i ·Hn×(n−k),0 return “⊥”.

Ifσσσ_i0,Ci0·k+kireturn “⊥”.

M_i0←GabiDecode(C_i0,Hn×(n−k)).

end for

ReturnM = M₁0,M₂0,· · ·,M_w0T

.

• _{(ch,re₎,⊥_} ← Interaction₍P₍M∗,pp₎ V₍sk,pp₎₎: The interactive protocol consists of three algorithms.

ch←Challenge(1λ): The challengech=(v1,v2,· · ·,vw)is chosen randomly from(Fq)w.

re←Response₍pp,M∗,ch): (i) First the encoded file M∗ is segmented intow blocks. Let M∗ =

(C1||σσσ1,C2||σσσ2,· · ·,Cw||σσσw)T.

(ii) Then the responsereis computed with aggregation.

re=ch◦M∗ =₍v1,v2,· · ·,vw)◦

* . . . . . ,

c11 · · · c1n σσσ1 c21 · · · c2n σσσ2 ... ... ... cw1 · · · cw n σσσw

+ / / / / / -= Pw

i=1

vi◦ci1,· · ·,

w

P

i=1 vi◦cin,

w

P

i=1 vi◦σσσi

!

(14)

where “◦” is scalar multiplication over vector space on Fq. More precisely, express cij =

c(_ij1),c(_ij2),· · ·,c(_ijt)>

as a t-dimensional vector over Fq. Thenvi ◦ cij = vi ·c(

1) ij ,vi · c

(2) ij ,

· · ·,vi·c(_ijt)

>

.

b←Verification(pp,sk,ch,re): The input is given by the public parameterpp=(PRF,CR), where

CR is associated with the generation matrix Gk×n and parity check matrix Hn×(n−k), the secret key sk = ₍k,k0₎ ∈ Fqn ×K_{pr f}, the challenge ch = (v₁,v₂,· · ·,v_w) ∈ (F_q)w, and responsere =

(¯c1,¯c2,· · ·,¯cn,σσσ¯¯¯)∈(Fqt)n+1.

(i) Parsereas ₍C¯,σσσ¯¯¯₎, where ¯C = ₍¯c1,¯c2,· · ·,¯cn). Check whether ¯C is an MRD codeword by testing ¯C·Hn×(n−k) =0. If not, outputb=0.

(ii) Expressk∈Fqn as a vector of dimensionnoverFq, i.e.,k= k(1),k(2),· · ·,k(n)>. Compute

ki =PRF(k0,i) ∈Fqt fori= 1,2,· · ·,w.Expressk_i as a vector of dimensiontoverFq, i.e.,

ki= k(

1)

i ,k

(2) i ,· · ·,k

(t) i

>

.

(iii) Express ¯Cas a matrix overFq, i.e., ¯C =

* . . . . . . , ¯

c₁(1) ¯c₂(1) · · · ¯c(n1) ¯

c₁(2) ¯c₂(2) · · · ¯c(n2)

... ... ...

¯

c(t)₁ ¯c(t)₂ · · · ¯c(t)n

+ / / / / / /

-and ¯σσ¯σ¯ =

* . . . . . , ¯

σ(1) ¯

σ(2)

... ¯ σ(t) + / / / / / -. (iv) Check ¯ σ¯ σσ¯ = * . . . . . . , ¯

c₁(1) ¯c₂(1) · · · ¯c(n1) ¯

c₁(2) ¯c₂(2) · · · ¯c(n2)

... ... ...

¯

c(t)₁ ¯c(t)₂ · · · ¯c(t)n

+ / / / / / / -· * . . . . . ,

k(1) k(2)

... k(n) + / / / / / -+ * . . . . . . , Pw

i=1vi·k

(1) i

Pw

i=1vi·k

(2) i

...

Pw i=1vi·k

(t) i + / / / / / / -, (3)

where all the operations are in the fieldFq.If Eq. (3) holds, return 1, otherwise return 0.

• {0,1}←Audit. The clientVexecutes theInteractionprotocolutimes.

b=0;

fori=1 toudoInteraction(P(M∗,pp)V(sk,pp))

ch←Challenge₍1λ₎; re←Response₍pp,M∗,ch₎;

IfVerification(pp,ch,re)=1 thenb←b+1;

end for

If (b/u> 1_q) return 1, otherwise return 0.

In the audit,b/u> 1_q meansb/u−1_q is non-negligible, sinceuis polynomial inλ.

TheCorrectnessof the above PoDP/PoR scheme is guaranteed by the following facts. For all₍sk,pp,M∗₎ such that(sk,pp)←KeyGen(1λ)andM∗ ←Encode(sk,pp,M),

• the original file M is always recovered from the correctly encoded fileM∗, due to the correctness of the encoding/decoding algorithms of Gabidulin codes;

(15)

due to the following facts:

w

P

i=1

vi◦σσσi =(v1,v2,· · ·,vw)◦

* . . . . . , σ σ σ1 σ σ σ2 ... σ σ σw + / / / / /

-=(v1,v2,· · ·,vw)◦

* . . . . . , * . . . . . , C1 C2 ... Cw + / / / / / -·k+

* . . . . . , k1 k2 ... kw + / / / / / -+ / / / / / -= Pw

i=1 vi◦ci1,

w

P

i=1

vi◦ci2,· · ·,

w

P

i=1 vi◦cin

!

·k+

w

P

i=1

vi◦ki=(¯c1,¯c2,· · ·,¯cn)·k+ w

P

i=1 vi◦ki,

where ¯cj = ¯c(_j1),¯c(_j2),· · ·,c¯(_jt)

>

,k = k(1),k(2),· · ·,k(n)>

andki = k(_i1),k_i(2),· · ·,k(_it)

>

, which are vectors overFq.

5 Performance Analysis

To evaluate the efficiency of the proposed PoDP/PoR scheme, we will analyze the local storage of the client, the expansion rate θof the storage, which is defined as the ratio of the length of encoded file M∗ to that of the original file M, the length of the challenge, denoted by|ch|, the length of the response, denoted by

|re_|, the computational complexity of the storage provider to compute the response, and the computational complexity of the client to verify the response.

LetMulFq denote a multiplication andAddFq denote an addition over finite fieldFq.

We will use a pseudo-random functionPRF:_{0,1_}λ×_{0,1_}∗ →Fqt as an instantiation.

Local storage: The client will store the keyk ∈ Fqn, the seedk0 ∈ {0,1}λ of PRF, and the parity check

matrixHn×(n−k)of the[q,k,n,t]Gabidulin code. The parity check matrix is determined byhi ∈ Fqt,

i=1,2,· · ·,naccording to Eq. (1). Hence the local storage is₍nlogq+λ+ntlogq₎bits.

Storage expansion rate: θ= |_|M_M∗_|| = n+_k1.

Length of challenge: ch∈₍Fq)w_{, hence}_|_ch_|₌_w_log_q_bits.

Length of response: reconsists of a Gabidulin codeword and a tag, hence_|re_|=₍n+1₎tlogqbits.

Computational complexity of the storage server: To computere, the storage server will do(n+1)tw(q−

1)/qmultiplications and(n+1)t(w−1)(q−1)/qadditions overFqon average. Hence there are totally

(n+1₎tw₍q−1₎/q·MulFq +(n+1)t(w−1)(q−1)/q·AddFq on average.

Computational complexity of the client: To test the consistency of(ch,re), the client needs(n−k)n multi-plications and₍n−k₎₍n−1₎additions overFqtto test Gabidulin codeword, and anothertn+w t(q−1)/q

multiplications andt₍n−1₎+t₍w−1₎₍q−1₎/qadditions overFq on average. Hence there are totally

(n−k)n·MulFqt+(n−k)(n−1)·AddFqt+(tn+w t(q−1)/q)·MulFq+(t(n−1)+t(w−1)(q−1)/q)·AddFq.

Now we instantiate the PoDP/PoR scheme with security parameter λ =80, and a_[q,k,n,t_]Gabidulin Code withq=2,k=128,n=t=256 andd=129.q=2 means that the addition over the binary field is reduced to XOR.

We consider a fileMof size 1G B. The number of blocks isw=218.

Table 2: Instantiation of PoDP/PoR Scheme with [2, 128, 256, 256]-Gabidulin Code

Local Rateθ _|ch_| _|re_| Server Comput. Client Comput.

(16)

During an interaction, the server will implement 233 bit-XOR to prepare a response re. To verify the consistency of the response, the client’s computation is dominated by 215·MulF₂₂₅₆. One MulF₂₂₅₆ needs O₍256·256₎=O₍216₎bit operations. Hence the client’s computation complexity isO₍231₎.

Consider a modular exponentiation in RSA with a 1024-bit modulus. It will take 1024 modular squares and about 512 modular multiplications. One modular multiplication needsO(1024·1024) = O(220) bit operations. So a modular exponentiation in RSA with a 1024-bit modulus has computation complexity of O(230).

Therefore, to audit the storage of 1GB file, the computational complexity of both client and server in an interaction is comparable to 2 and 8 modular exponentiations with a 1024-bit modulus respectively.

6 The Security of the Proposed PoDP/PoR Scheme

Remember that in the audit of the PoDP/PoR scheme, the audit strategy is determined by a threshold1_q. Only iffraction , > 1_q, of theuexecutions of Interactionprotocol is successful, does the audit claim success.

TheUnforgeabilityof the PoDP/PoR scheme will justify the necessity of the threshold 1_q. We will prove that a malicious storage provider, who deletes at least one block of M∗, replies a response rethat passes verification in anInteractionprotocol with probability at most 1_q.

On the other hand, the-soundnessof the PoDP/PoR scheme justifies the sufficiency of the threshold 1_q. We will prove that as long as a storage provider replies a responserethat passes verification with probability

> 1

q, there exists a PPT extractor recovering the original file M as long as the number of executions of

Interactionprotocol is big enough.

Definition 7 A responsere is calledvalidto a challengech if re=Response(pp,M∗,ch)and the challenge/response

(ch,re₎is called avalidpair. A responsere is calledconsistentto a challengech ifVerification₍pp,sk,ch,re₎=1, and the challenge/response₍ch,re₎is called aconsistentpair.

Given a challenge ch, there may exists manyconsistentresponses, but there is only onevalidresponse due to the deterministic algorithmResponse. Hence, a consistent pair is not necessarily a valid pair.

Before the formal proof of Unforgeability and Soundness of the MRD-based PoDP/PoR scheme, we will slightly change the PoDP/PoR scheme. The key sequenceki =PRF(k0,i),i=1,2,· · ·,w, which is used to compute tags for blocks and to verify the consistency of a challeng/response pair, is replaced with a truly random sequence overFqt. Then we have the following two lemmas.

Lemma 1 Suppose thatki,i=1,2,· · ·,w , which are involved in algorithmsEncode(sk,pp,M)andVerification

(pp,ch,re₎of the above PoDP/PoR scheme, is randomly chosen fromFqtinstead of settingk_i =PRF(k0,i).

Sup-pose that some blocks ofM∗are missing. In an execution ofInteractionprotocol of the PoDP/PoR scheme, suppose that the challenge vectorch=(v1,v2,· · ·,vw) ∈ Fq

w

hits at least one deleted block ofM∗, i.e, there exists an indexj such that vj,0and thej-th block of M∗is missing, then any adversaryApresents a consistent response

re=(¯c1,¯c2,· · ·,¯cn,σ¯σσ¯¯)with probability at most1/qn.

Proof:Suppose that there areedeleted blocks, and letM∗|un= Ci1||σσσi1,Ci2||σσσi2,· · ·,Ciw−e||σσσiw−e

T

denote the remaining₍w−e₎undeleted blocks. According to Eq. (2), we have

σ

σσil =Cil ·k+kil,l=1,2,· · ·,w−e. (4)

The challenge vector is ch = (v1,v2,· · ·,vw). For any adversaryA who gives a consistent response

re=₍¯c1,¯c2,· · ·,¯cn,σ¯σσ¯¯)satisfyingVerification(pp,sk, ch,re)=1, let ¯C=(¯c1,¯c2,· · ·,¯cn), then we have

¯

σσσ¯¯ =C¯·k+

w

P

l=1

vl◦kl. (5)

Let{j1, j2,· · ·,je}denote the indices of the blocks missing fromM∗|un.Eq. (5) and Eq. (4) gives

¯

σ¯

σ−

w−e

P

l=1

vil ◦σσσil = C¯−

w−e

P

l=1 vil ◦Cil

!

·k+

e

P

l=1

(17)

To the adversary Awho knows the deleted version M∗_|un, forging a consistent response re = C¯,σσσ¯¯¯

satisfying Eq. (5) is equivalent to forging a pair(_C˜_,_σσσ_˜˜˜)satisfying

˜

σ˜

σ=_C˜·k+

e

P

l=1

vjl◦kjl. (7)

Fixing a pair(_C˜_,σ˜σσ˜˜₎, now we count how manykandkjl, l = 1,2,· · ·,e, satisfy Eq. (7). There exists

at least one nonzero entry in₍vj1,vj2,· · ·,vje), since the challenger vector hits at least one deleted block.

Without loss of generality, we assume thatvje ,0. Whenkandkjl,l=1,2,· · ·,e−1, are freely chosen,kje

will be uniquely determined by Eq. (7). That counts toqn+t(e−1)for Eq. (7). Sincekandkjl are uniformly

distributed, an adversaryAcan present a consistent response with probabilityqn+t(e−1)/qn+te=q−t ≤ q−n,

due tot≥ n. tu

Lemma 2 Suppose thatki,i=1,2,· · ·,w is randomly chosen fromFqt instead of settingk_i =PRF(k0,i).In

an execution ofInteractionprotocol of the PoDP/PoR scheme, given a valid challenge/response pair₍ch,re₎, any adversaryAoutputs a consistent but invalid responsere0with respect to the same challengech, i.e., re0,re but Verification(pp,sk,ch,re0) =1, with probability at most1/qd, whered is the minimum rank distance of the MRD Code.

Proof: Let M∗ = (C1||σσσ1,C2||σσσ2,· · ·,Cw||σσσw)T be the storage file generated by Encode(sk,pp,M). Let

ch=(v1,v2,· · ·,vw)be the challenge vector.

Let re =

w

P

i=1

vi◦ci1,· · ·,

w

P

i=1 vi◦cin,

w

P

i=1 vi◦σσσi

!

be the valid response computed byResponse(pp,M∗,

ch₎. Define ¯cj = w

P

i=1

vi◦cijand ¯σσσ= w

P

i=1

vi◦σσσi, thenre=(¯c1,¯c2,· · ·,¯cn,σσσ¯).

Let re0 = ¯c₁0,¯c0₂,· · ·,¯c0_n,σσσ¯0

be the consistent but invalid response forged by adversaryA for chal-lenge ch. The invalidity ofre0impliesre _, re0. The consistency ofre0requires that ¯σσσ0 is determined by

¯

c₁0,¯c0₂,· · ·,¯c0_n

. Consequently,re_,re0means₍¯c1,¯c2,· · ·,¯cn), ¯c01,¯c 0 2,· · ·,¯c

0

n

. Now we analyze the probability the pair(ch,re0)is invalid but consistent. First we assume that ¯c₁0,¯c₂0,· · ·,¯c0_n

is also a Gabidulin codeword, otherwise it would have been rejected by the first step ofVerification.

The factVerification(pp,sk,ch,re)= 1 andVerification(pp,sk,ch,re0)= 1 gives us the following equa-tions.

¯

σσσ=(¯c1,¯c2,· · ·,¯cn)·k+ w

P

i=1

vi◦ki, σσσ¯0= ¯c0₁,¯c0₂,· · ·,¯c0n

·k+

w

P

i=1

vi◦ki. (8)

Definek0=

w

P

i=1

vi◦ki. We know thatkiis randomly chosen fromFqtfori=1,2,· · ·,w, hencek0is also

a random element inFqt. Then Eq. (8) are reformed with

¯

σσσ=(¯c1,¯c2,· · ·,¯cn)·k+k0, σσσ¯0= ¯c01,¯c 0 2,· · ·,¯c

0

n

·k+k0. (9)

Define∆¯cj=¯cj−¯c0_jand∆σσσ¯ =σσσ¯ −σσσ¯0, then Eq. (9) are equivalent to the following two equations

¯

σσσ=(¯c1,¯c2,· · ·,¯cn)·k+k0, ∆σσσ¯ =(∆¯c1,∆¯c2,· · ·,∆¯cn)·k. (10)

Next we determine the number of solutions for the unknownskandk0. To satisfy the left equation of Eq. (10), each freely chosen value ofkuniquely determines the value ofk0, and there are totallyqnchoices for

k.

As to the right equation of Eq. (10), we express all the entries as vectors overFq, then it turns to

* . . . . . , ∆σ¯(1)

∆σ¯(2)

... ∆σ¯(t)

+ / / / / / -= * . . . . . . ,

∆c¯(₁1) ∆c¯₂(1) · · · ∆¯c(_n1)

∆c¯(₁2) ∆c¯₂(2) · · · ∆¯c(_n2)

... ... ... ∆c¯(₁t) ∆c¯₂(t) · · · ∆¯c(nt)

+ / / / / / / -· * . . . . . ,

k(1) k(2)

...

k(n)

+ / / / / /

-,∆C¯· * . . . . . ,

k(1) k(2)

...

k(n)