FlexList: Optimized Skip List for Secure Cloud Storage

15  Download (0)

Full text

(1)

FlexList: Optimized Skip List for Secure Cloud Storage

Ertem Esiner

Koc¸ University

Adilet Kachkeev

Koc¸ University

Alptekin K¨upc¸¨u

Koc¸ University

¨

Oznur ¨

Ozkasap

Koc¸ University

Abstract

With the increase in popularity of cloud data stor-age, efficiently proving the integrity of the data stored at an untrusted server has become significant. Authen-ticated Skip Lists and Rank-based AuthenAuthen-ticated Skip Lists (RBASL) have been used in cloud storage as a ran-domized alternative to balanced trees, without the need to perform complex balancing and restructuring opera-tions. In a dynamic file scenario, an RBASL falls short when updates are not proportional to a fixed block size; such an update to the file, however small, may translate toO(n)block updates to the RBASL, for a file withn

blocks.

To overcome this problem, in this paper we intro-duce FlexList: Flexible Length-Based Authenticated Skip List. FlexList works equally well for any type of up-date, translating an update toO(u)insertions, removals, or modifications, whereuis the size of the update. We further present various optimizations on the four types of skip lists (regular, authenticated, rank-based authenti-cated, and FlexList), focusing on the use of FlexList in the secure cloud storage setting. We obtain the optimal skip list structure without unnecessary links or nodes. Furthermore, we compute one single proof to answer multiple (non-)membership queries and obtain efficiency gains of 30% and 50% in terms of proof time and size, respectively.

1

Introduction

Data outsourcing has become quite popular in recent years both in industry (e.g., Amazon S3, Dropbox, Google Drive) and academia [3, 5, 11, 12, 17, 26, 27]. A client outsources her data to the third party data stor-age provider (server), which is supposed to keep the data intact and make it available to her. The problem is that the server may be malicious, and even if the server is trustworthy, hardware/software failures may cause data corruption. The client should be able to efficiently and securely check the integrity of her data without

down-loading the entire data from the server [3].

One such model proposed by Ateniese et al. is Prov-able Data Possession (PDP) [3] for provable data in-tegrity. In this model, the client can challenge the server on random blocks and verify the data integrity through a proof sent by the server. Although PDP and related schemes can be applied to static cases, blockwise update operations (insertion, removal, modification) show poor performance [3, 4, 11, 17, 26]. While the static scenario can be applicable to some systems (e.g., archival stor-age at the libraries), for many applications it is important to take into consideration the dynamic scenario, where the client keeps interacting with the outsourced data in a read/write manner, while maintaining the data posses-sion guarantees. Ateniese et al. [5] proposed Scalable PDP, which overcomes this problem with some limita-tions (only a pre-determined number of operalimita-tions are possible within a limited set of operations). Erway et al. [12] proposed a model called Dynamic Provable Data Possession(DPDP), which extends the PDP model and provides a dynamic solution. Implementation of the DPDP scheme requires an underlying authenticated data structure based on a skip list [25].

Authenticated skip lists were presented by Goodrich and Tamassia [15], where skip lists and commutative hashing are employed in a data structure for authenti-cated dictionaries. A skip list is a key-value store whose leaves are sorted by keys. Each node stores a hash value calculated with the use of its own fields and the hash val-ues of its neighboring nodes. The hash value of the root is the authentication information (meta data) that the client stores in order to verify responses from the server. To in-sert a new block into an authenticated skip list, one must decide on a key value for insertion since the skip list is sorted according to the key values. This is very useful if one, for example, inserts files into directories, since each file will have a unique name within the directory, and searching by this key is enough. However, when one considers blocks of a file to be inserted into a skip list, the

(2)

blocks do not have unique names; they have indices. Un-fortunately, in a dynamic scenario, an insertion/deletion would necessitate incrementing/decrementing the keys of all the blocks till the end of the file, resulting in de-graded performance. DPDP [12] employs Rank-based Authenticated Skip List (RBASL) to overcome this lim-itation. Instead of providing key values in the process of insertion, theindexvalue where the new block should be inserted is given. These indices are imaginary and no node stores any information about the indices. Thus, an insertion/deletion does not propagate to other blocks.

Theoretically, an RBASL provides dynamic updates with O(logn) complexity, assuming the updates are mul-tiples of the fixed block size. This makes the RBASL inefficient in practice, since a variable size update leads to the propagation of changes to other blocks. Therefore, one variable size update may affect O(n)other blocks. We discuss the problem in detail in Section 3. We pro-pose FlexList to overcome the problem in DPDP. With our FlexList, we use the same idea but instead of the in-dices of blocks, inin-dices of bytes of the data are used, which ensures the availability of searching, inserting, re-moving, modifying, or challenging a specific block in-cluding the byte at a specific index of the data. Since in practice a data alteration occurs starting from an index of the file, not an index of a block of the file, our DPDP with FlexList (FlexDPDP) performs much faster than the orig-inal DPDP with RBASL. In case of variable block size, DPDP mentions about the idea where the client makes updates on a range of bytes instead of blocks. Never-theless, we show that a naive implementation of the idea leads to a security gap in the storage system, as we dis-cuss in Section 5.1.

Our contributionsare as follows:

• We design and implement FlexList. Our implemen-tation uses theoptimalnumber of links and nodes. This optimization is applicable to previous skip list types.

• Our FlexList requires O(1)block updates when a variable sized update occurs while an RBASL re-quires worst-caseO(n)block updates.

• We provide multi-prove and multi-verify capabili-ties in cases where the client challenges the server for multiple blocks using authenticated skip lists, rank-based authenticated skip lists and FlexLists. Our algorithms provide an optimalproof, without any repeated items.

• We apply our FlexDPDP to secure cloud storage. The experimental results show efficiency gains of 30% in terms of proof time and 50% in terms of size.

2

Related Work

The advantage of skip list over the alternatives is that it keeps itself balanced probabilistically, without the need

for complex operations [25]. It offerslogarithmicsearch, modify, insert, and remove operations with high prob-ability [24]. Skip lists have been extensively studied [2, 6, 10, 12, 16, 19, 23]. They are used as authenti-cated data structures in outsourced network storage [16], with authenticated relational tables for database manage-ment systems [6], in timestamping systems [7, 8], in out-sourced data storages [12, 14], and for authenticating queries for distributed data of web services [23]. In a skip list, not every edge or node is used during a search or update operation; therefore those unnecessary edges and nodes can be omitted. Similar optimizations for authenticated skip lists were tested in [28]. The im-plementation of an authenticated skip list for a two-party protocol was presented in [22]. Furthermore, as observed in DPDP [12] for an RBASL, some corner nodes were eliminated to decrease the overall number of nodes. It is applicable for the regular and the authenticated skip lists as well. When the corner nodes that have noafter

(i.e., right) neighbor connected, the predecessor node’s

a f terlink is connected as the unnecessary corner node’s successor. These optimization not only results in fewer nodes in the data structure (improving space efficiency) but also fewer hash computations during the query and update stage (improving time efficiency), and results in smaller proof size (decreasing communication cost). For dynamic provable data possession in a cloud stor-age setting, Erway et al. [12] were the first to introduce the new data structureRank-based authenticated skip list (RBASL)which is a special type of authenticated skip list [16]. In contrast to an authenticated skip list in an RBASL, one can search with indices of the blocks. This gives the opportunity to efficiently check the data in-tegrity using block indices as proof and update query pa-rameters in DPDP. To employ indices of the blocks as search keys, the authors proposed to use authenticated ranks. Each node in the RBASL has a rank, indicat-ing the number of the leaf-level nodes that are reachable from that particular node. Leaf-level nodes having no

a f terlinks have a rank of 1, meaning they can be used to reach themselves only. Wang et al. [29] proposed the usage of Merkle tree [21] which works perfectly for the static scenario. A hash map of a whole file or block by block can be applied for the same scenario as in Table 1. In PDP there is no data structure used for the au-thentication of blocks, therefore it is static, except for Scalable PDP [5], which allows a limited number of up-dates. However, for the dynamic case we would need an authenticated balanced tree such as the data structure proposed in [30], called range-based 2-3 tree, yet there is no algorithm that has been previously presented for rebalancing neither a Merkle tree nor a range-based 2-3 tree with efficiently updating and maintaining authenti-cation information. Nevertheless, such algorithms have

(3)

Hash Map (whole file) Hash Map (block by block) PDP[3] Merkle Tree[29] Balanced Tree (2-3 Tree)[30] RBASL[12] FlexList

Storage (client) O(1) O(n) O(1) O(1) O(1) O(1) O(1)

Proof Complex-ity

O(n) O(1) O(1) O(logn) O(logn) O(logn) O(logn)

Dynamic (insert, remove, modify) - - - - + (balancing issues) + (fixed block size) +

Table 1: Complexity and capability table of data structures that are proposed. n: number of blocks been studied in detail for the authenticated skip list data

structure [22].

Moreover, an RBASL handles the problem with block numbers in PDP [3], which was proposed by Ateniese et al. to provide provable data integrity. In this model, there is a client who wants to outsource her file and a server that takes the responsibility for the storage of the file. The client preprocesses the file and maintains meta data to verify the proofs from the server. Then she sends the file to the server. When the client needs to check whether her data is intact or not, she challenges some blocks. Upon receipt of the request, the server gener-ates the proof for the challenges and sends it. The client verifies the data integrity of the file or rejects. Unlike in PDP, an RBASL employs ranks instead of block num-bers. Therefore, when there is an insertion or deletion of a particular block, PDP will have to change indices of consequent blocks.

Nevertheless, in a realistic scenario, the client may wish to change a part of a block, not the whole block. This can be problematic to handle in an RBASL. To partially modify a particular block in an RBASL, we not only modify a specified block but also may have to change all following blocks. So the number of modi-fications isO(n) in the worst case scenario for DPDP and PDP as well. Our proposed data structure FlexList, based on an authenticated skip list performs dynamic op-erations (modify, insert, remove) for cloud data storage, havingO(1)variable block size updates.

3

Definitions

Skip Listis a probabilistic data structure [25] presented

as an alternative to balanced trees. It is easy to implement without complex balancing and restructuring operations such as those in AVL or Red-Black trees [2, 13]. A skip list keeps its nodes ordered by their key values. We call a leaf-level node and all nodes directly above it at the same index atower.

Figure 1 demonstrates a search on a skip list. In a ba-sic skip list, the nodes includekey,level, anddata(only leaf level nodes) information, andbelowanda f terlinks (e.g.,v2.below=v3andv2.a f ter=v4). The search path

for the node with key 24 is highlighted. We start from the root (v1) and follow the link tov2, sincev1’sa f terlink

leads it to a node which has a greater key value than the key we are searching for (∞>24). Then, fromv2we

fol-low linkl1tov4, since the key value ofv4is smaller than

(or equal to) the searched key. In general, if the key of the node wherea f terlink leads is smaller or equal to the key of the searched node, we follow that link, otherwise we follow thebelowlink. Using the same decision mech-anism, we follow the highlighted links until the searched node is found at the leaf level (if it does not exist, then the node with key immediately before the searched node is returned).

It is clear that some of the links are never used in the skip list, such asl2, since any search operation with key

greater or equal to 11 will definitely follow linkl1, and

a search for a smaller key would never advance through

l2. Thus we say links, such asl2, that are not present

on any search path, are unnecessary. When we remove unnecessary links, we observe that some nodes, which are left withouta f terlinks (e.g.,v3), are also

unneces-sary since they do not provide any new dependencies in the skip list. Although it does not change the asymptotic complexity, it is beneficial not to include them for time and space efficiency. An optimized version of the skip list can be observed when dashed links and nodes are ex-cluded. Formally:

• Alinkisnecessaryif and only if it is on any search

path.

• Anodeisnecessaryif and only if it is at the leaf

level or has a necessarya f terlink.

An authenticated skip list is constructed with the use

of acollision-resistant hash function. A hash function

is a deterministic function that takes an arbitrary-length string as input, and produces a predefined-length string as the output.

All proofs in our scheme depend on the existence of a collision-resistant hash function. A collision for the function H : D→R is the existence of two inputs x1, x2

∈D such that H(x1) = H(x2) and x16=x2. A function

H is collision-resistant if, for any probabilistic polyno-mial time adversary, it is infeasible to find two such dis-tinct inputs [18]. Let||denote concatenation. Through-out our study we will use: hash(x1,x2, ...,xm) to mean

(4)

Figure 1: Regular skip list with search path of node with key 24 highlighted. Numbers on the left represent levels. Numbers inside nodes are key values. Dashed lines indicate unnecessary links and nodes.

Anauthenticated skip listkeeps a hash value in each

node. Nodes at level 0 keep links to file blocks (may link to different structures e.g., files, directories; anything to be kept intact) [16]. A hash value is calculated with the following inputs:levelandkeyof the node, and the hash values of the nodea f terand the nodebelow. Through the inputs to the hash function, all nodes are dependent on their after and below neighbor. Thus, the root node is dependent on every leaf node, and due to the collision resistance of the hash function, knowing the hash value of the root is sufficient for a later integrity check. Note that if there is no nodebelow, the data or a function of the

data(which we will calltagin the following sections) is used instead of the hash of thebelowneighbor.

A rank-based authenticated skip list (RBASL) is

different than an authenticated skip list by means of how it indexes the data [12]. An RBASL hasrank informa-tion (used in hashing instead of thekeyvalue), meaning how many nodes are reachable from that node, and is ca-pable of performing all operations that an authenticated skip list can in this context.

Figure 2: Skip list alterations depending on an update request.

3.1

FlexList

An RBASL is meant to be used with fixed size blocks since a search (consequently insert, remove, modify) by index of data is not possible with rank information. For

example, Figure 2-A represents the outsourced file di-vided into blocks of fixed size. In our example, the client wants to change “brown” in the string “The quick brown fox jumps over the lazy dog...” with “red” and herdiff al-gorithm returns [delete from index 11 to 15] and [insert “red” from index 11 to 13]. Apparently, a modification to the 3rd block will occur. With a rank-based skip list,

to continue functioning properly, a series of updates is required as shown in Figure 2-B which asymptotically corresponds toO(n)alterations. Otherwise, the begin-ning and the ending indices of each block will be com-plicated to compute, requiring O(n)time to translate a diff algorithm output to a block modification. This issue is discussed further in Section 5. With our FlexList, only one modification suffices as indicated in Figure 2-C.

Due to the lack of providing variable block sized oper-ations with an RBASL, we present FlexList which over-comes the problem and serves our purposes in cloud data storage. A FlexList stores, at each node, how manybytes

can be reached from that node, instead of how many

blocksare reachable. Therankof each leaf-level node is computed as the sum of thelengthof itsdataand therank

of thea f ternode (0 ifnull). Thelengthinformation of each data block is added as a parameter to the hash calcu-lation of that particular block. We discuss the insecurity of an implementation that does not include the length in-formation in the hash function calculation in Section 5.1. Note that when the length of the data at each leaf are considered as a unit, the FlexList reduces to an RBASL (thus, ranks only count the number of reachable blocks). For additional information see Appendix A.

4

FlexList-based Dynamic Secure

Cloud Storage

In this section, we describe the application of our FlexList to integrity checking in secure cloud storage systems. There are two main parties: a client and a server. The cloud server stores a file on behalf of the client. Erway et al. [12] showed that an RBASL can be created on top of the outsourced file to provide proofs of integrity (see Figure 3). The main advantage of a FlexList compared to an RBASL is that data bytes are easily reachable even though the data blocks are variable sized. Thus it is faster than an RBASL sinceO(u)

(5)

oper-Figure 3: Client Server interactions in FlexDPDP. ations are needed when a variable sized update of length

uis conducted, whereasO(n)operations are needed with an RBASL wherenis the total number of blocks in the file, as discussed in Section 3. The leaves of the FlexList represent file blocks, and the path from a leaf to the root roughly represents a proof of membership (i.e., in-tegrity). We focus on optimized proof generation and verification mechanisms, where multiple blocks can be challenged at once. The following are the definitions of crucial algorithms for secure cloud storage [12]:

• Challenge is a probabilistic function run by the client to request a proof of integrity for randomly selected blocks. In our scheme, the client generates a random seed and the number of selected blocks

numto send to the server.

• Proveis run by the server in response to a challenge. Using the seed that he receives, the server generates a challenge vector andsortsit inascendingorder, together with a random-value vector, both of length

num. Using thegenMultiProofalgorithm with these inputs, the server generates a proof using FlexList, adds tags of the challenged blocks, and calculates a block sum (Algorithm 4.2). Then he sends these to the client.

• Veri f yis a function run by the client upon receipt of the proof. The client uses theverifyMultiProof

algorithm with the challenge vector that she already has, the proof vector, the tag vector, and the block sum that she received from the server as inputs (Al-gorithm 4.3). The function consists of two stages: verification of tags, and skip list proof verification. If both of the stages are verified successfully, it

out-putsaccept, otherwisereject.

• prepareU pdateis a function run by the client when she changes some part of her data. She sends the up-date information to the server (i.e., operation, index, and data).

• per f ormU pdateis run by the server in response to

an update request. The response consist of three phases, a proof generation for the parts to be altered, an update operation to the FlexList, and a second proof generation for the parts that are altered. • veri f yU pdate is run by the client upon receipt of

the update proof. The Client first checks if the data was intact before the update by verifying the first proof. Then the client applies the update locally on top of the the first proof and computes the new meta data value. Using the new meta data value, she verifies the second proof to see if the server has performed the update correctly. If both proofs are verified it returnsaccept, otherwisereject.

The FlexDPDP scheme uses homomorphic verifiable tags(as DPDP); multiple tags can be combined to ob-tain a single tag that corresponds to combined blocks [4]. Tags are small compared to data blocks, enabling storage in memory. Authenticity of the skip list guarantees in-tegrity of tags, and tags protect the inin-tegrity of the data blocks.

4.1

Preliminaries

Before providing optimized proof generation and veri-fication algorithms, we introduce essential methods used in our algorithms, to determine intersection nodes, search multiple nodes, and update rank values. Table 2 shows the notation used in this section.

isIntersection: This function is used whensearchMulti

checks if a given node is an intersection. A node is an intersection point of proof paths of two indices when the first index can be found following thebelowlink and the second index is found by following thea f ter link (re-member that the indices are in ascending order). There are two conditions for a node to be called an intersection node.

• The current node follows thebelowlink according to the index we are building the proof path for. • The current node needs to follow thea f terlink to

reach thelastthelement of challenged indices in the vectorC.

If one of the above conditions is not satisfied, then there is no intersection, and the method returnsfalse. Other-wise, it decrementslastand continues trying until it finds a node which cannot be found by following thea f terlink and returnslast0 (to be used in the next call of

isInter-section) andtrueas the current nodecnis an intersection

point. Note that this method directly returnsfalseif there is only one challenged index.

searchMulti (Algorithm 4.1) : This algorithm is used

ingenMultiProof to generate the proof path for multiple nodes without unnecessary repetitions of proof nodes. Figure 4, where we challenge the node at the index 450,

(6)

Symbol Description

r Rank

hash Hash value of a node

p Proof node

rs Rank state, indicates byte count to the left from current node and used to recoverivalue when roll-back to a state is done

state State, created in order to store from which node the algorithm will continue, contains a node, rank state, and last index

C Challenged indices vector, in ascending order

V Verify challenge vector, used to verify if the proof belongs to challenged blocks, in terms of indices

M Block sum

P Proof vector, stores proof nodes for all challenged blocks

T Tag vector of challenged blocks

t Current index in tag vector

start Start index from whichupdateRankSumshould start

end,last Index values inupdateRankSumandisIntersection

ts Intersection stack, stores states at intersections insearchMultialgorithm

th Intersection hash stack, stores hash values to be used at intersections

ti Index stack, stores pairs of two integer values, employed inupdateRankSum

Table 2: Symbols used in our algorithms. clarifies how the algorithm works. Our aim is to provide

the proof path for the challenged node. We assume that in the search, the current nodecnstarts at the root (w1

in our example). Therefore, initially the search indexiis 450, the rank statersandf irstare zero, the proof vector

Pand intersection stacktsare empty.

Each proof nodepis created using the current node’s

levelandrankvalues. There are two common scenarios for setting hash andrgtOrDwnvalues:

(1) When the current node goesbelow, we set the hash of the proof node to the hash of the current node’s

a f terand itsrgtOrDwnvalue todwn.

(2) When the current node goesa f ter, we set the hash of the proof node to the hash of the current node’s

belowand itsrgtOrDwnvalue torgt.

Forw1, a proof node is generated using scenario (1),

where p.hash = v1.hash and p.rgtOrDwn = dwn. For

w2, the proof node is created as described in scenario

(2) above, where p.hash = v2.hash and p.rgtOrDwn =

rgt. The proof node forw3is created using scenario (2).

Forw4andw5, proof nodes are generated as in scenario

(1). The last nodec1is the challenged leaf node, and the

proof node for this node is also created as in scenario (1). Note that in the second, third, and fifth iterations of the while loop, the current node is moved to a sub skip list (at Line 13 in Algorithm 4.1). Lines 4-5 and 6-7 in Algo-rithm 4.1 are crucial for generation of proof for multiple blocks. We discuss them later in this section.

updateRankSum: This algorithm, used in

verifyMulti-Proof, is given as input the rank difference, the verify challenge vectorV, and indices start and end (onV). The output is a modified version of the verify challenge vectorV0. The procedure is called when there is a

tran-sition from one sub skip list to another (larger one). The method updates entries starting fromstartthtoendth in-dices by rank difference, where rank difference is the size of the larger sub skip list minus the size of the smaller sub skip list.

4.2

Multi-Prove and Multi-Verify

Client server interaction (Figure 3) starts with the client pre-processing her data (creating a FlexList for the file and calculating tags for each block of the file). The client sends the random seed she used for generating the FlexList to the server along with a public key, data, and tags. Using the seed, the server constructs a FlexList over the blocks of the data and assigns tags to leaf-level nodes. For the client to verify that the server constructed the correct FlexList, the server sends the hash value of the root to the client. When the client checks and verifies that the hash of the root value is the same as the one she had calculated, she can safely remove her data and the FlexList, and keeps the root value as meta data for later use in the proof verification mechanism. Before proceed-ing with the proof generation, we show how to compute a tagTand a block sumM.

We use an RSA group Z∗N, where N =pqis the product of two large prime numbers, andgis a high-order ele-ment∈Z∗N. It is important that the server does not know

pandq. The tagTof a blockmis computed asT =gm

modN. The block sum is computed as M=

|C| ∑

i=0

aimCi whereCis the challenge vector containing block indices andaiis the random value for theithchallenge.

genMultiProof (Algorithm 4.2): Upon receipt of the

(7)

Figure 4: Proof path for challenged index 450 in a FlexList. challenge vectorCand random valuesAaccordingly and

runs thegenMultiProof algorithm in order to get tags, file blocks, and the proof path for the challenged indices. The algorithm searches for the leaf node of each chal-lenged index and stores all nodes across the search path in the proof vector. However, we have observed that reg-ular searching for each particreg-ular node is inefficient. If we start from the root for each challenged block, there will be a lot of replicated proof nodes. In the example of Figure 4.2, if proofs were generated individually, w1,w2

and w3would be replicated 4 times, w4and w53 times,

and c32 times. To overcome this problem we save states

at each intersection node. In ouroptimalproof, only one proof node is generated for each node on any proof path.

Algorithm 4.1: searchMulti Algorithm

Input:cn,C,f irst,last,rs,P,ts Output:cn,P,ts

// Index of the challenged block is calculated according to the current sub skip list root

i=Cf irst−rs

1

// Create and put proof nodes on the search path of the challenged block to the proof vector

whileUntil challenged node is includeddo

2

p= new proof node withcn.levelandcn.r

3

// End of this branch of the proof path is when the current node reaches the challenged node

if cn.level = 0 and i ¡ cn.lengththen

4

p.setEndFlag();p.length=cn.length

5

//When an intersection is found with another branch of the proof path, it is saved to be continued again, this is crucial for the outer loop of ‘‘multi’’ algorithms

if isIntersection(cn, C, i, lasti,rs)then

6

//note that lasti becomeslasti+1 in isIntersection method

p.setInterFlag();state(cn.a f ter,lasti,

7

rs+cn.below.r)is added tots// Add a state for

cn.a f ter to continue from there later // Missing fields of the proof node are set according to the link current node follows

if (CanGoBelow(cn, i))then

8

p.hash =cn.a f ter.hash;p.rgtOrDwn=dwn

9

cn=cn.below//unless at the leaf level

10

else

11

p.hash =cn.below.hash;p.rgtOrDwn=rgt

12

// Set index and rank state values according to how many bytes at leaf nodes

are passed while following the a f ter link

i-=cn.below.r;rs+=cn.below.r;cn=cn.a f ter

13

pis added toP

14

We explain Algorithm 4.2 using Figure 5 and notations in Table 2. By taking the index array of challenged nodes as input (challenge vectorCcontains [170,320,470,660]

Algorithm 4.2: genMultiProof Algorithm

Input:C, A

Output:T,M,P

Let C= (i0, . . . ,ik) where ij is the (j+1)th challenged

index; A= (a0, . . . ,ak) where aj is the (j+1)th

random value; statem= (nodem,lastIndexm, rsm)

cn=root;rs= 0;M= 0;ts,PandTare empty; state(root, k,

1

rs) added tots

// Call searchMulti method for each challenged

block to fill the proof vector P

for i=0to kdo

2

state=ts.pop()

3

cn=searchMulti(state.node,C,i,state.end,state.rs,P,ts)

4

// Store tag of the challenged block and compute the block sum

cn.tagis added toTandM+=cn.data*ai

5

Figure 6: Proof vector for Figure 5 example. in the example), thegenMultiProofalgorithm generates the proofP, collects tags in the tag vectorT, calculates the block sumM at each step, and returns all three. We start traversing from the root (w1in our example) by

re-trieving it from the stackts at Line 3 of Algorithm 4.2.

Then, in the loop, we callsearchMulti, which returns proof nodes forw1,w2,w3andc1. The state of nodew4

was saved in the stacktsas it is thea f terof an

intersec-tion node, and theintersectionflag for proof node forw3

was set. Note that proof nodes at the intersection points store no hash value. The second iteration starts fromw4,

which is the last saved state, new proof nodes forw4,w5

andc2are added to the proof vectorP, whilec3is added

to the stack ts. The third iteration starts from c3 and

searchMulti returnsP, after adding c3 to it. Note that

w6is added to the stack. In the last iteration,w6andc4

(8)

Figure 5: Multiple blocks are challenged in a FlexList. is empty, the loop is over. Note that all proof nodes of

challenged indices have theirendflags and length values are set (Line 5 of Algorithm 4.1). WhengenMultiProof

returns, the output proof vector should be as in Figure 6. We use this vector inverifyMultiProofalgorithm.

verifyMultiProof (Algorithm 4.3):There are two steps

in the verification process: tag verification and skip list verification.

Tag verificationis done as follows: Upon receipt of the

tag vectorT and the block sumM, the client calculates

tag=

|C| ∏

i=0

Tai

i modNand accepts ifftag=gMmod N.

FlexList verificationinvolves calculation of hashes for

the proof vectorP. The hash for each proof node can be calculated in different ways as described below using the example from Figure 5 and Figure 6.

The hash calculation always has thelevelandrank val-ues stored in a proof node as its first two arguments.

• If a proof node is marked asendbutnot intersec-tion(e.g., c4, c2, and c1), this means the

correspond-ing node was challenged (to be checked against the challenged indices later), and thus its tag must ex-ist in the tag vector. We compute the correspond-ing hash value uscorrespond-ing that tag, the hash value stored in the proof node (null for c4since it has noa f ter

neighbor, the hash value of v4for c2, and the hash

value of v3 for c1), and the corresponding length

value (110 for c4, 80 for c2and c1).

• If a proof node is not marked andrgtOrDwn=rgt

or level=0 (e.g., w6, w2), this means the a f ter

neighbor of the node is included in the proof vector and the hash value of itsbelow is included in the associated proof node (if the node is at leaf level, the tag is included instead). Therefore we compute the corresponding hash value using the hash value stored in the corresponding proof node and the pre-viously calculated hash value (hash of c4is used for

w6, hash of w3is used for w2).

• If a proof node is marked asintersection andend

(e.g., c3), this means corresponding node was both

challenged (thus its tag must exist in the tag vec-tor) and it is on the proof path of another challenged node, therefore itsa f terneighbor is also included

in the proof vector. We compute the correspond-ing hash value uscorrespond-ing the correspondcorrespond-ing tag from the tag vector and the previously calculated hash value (hash of w6for c3).

• If a proof node is marked asintersectionbutnot end

(e.g., w5and w3), this means the node was not

chal-lenged but both its a f ter andbelow are included in the proof vector. Hence, we compute the corre-sponding hash value using the previously calculated two hash values (the hash values calculated for c2

and for c3, respectively, are used for w5, and the

hash values calculated for c1 and for w4,

respec-tively, are used for w3).

• If none of the above is satisfied, this means a proof node has onlyrgtOrDwn=dwn(e.g., w4and w1),

meaning thebelowneighbor of the node is included in the proof vector. Therefore we compute the cor-responding hash value using the previously calcu-lated hash value (hash of w5 is used for w4, and

hash of w2is used for w1) and the hash value stored

in the corresponding proof node.

We treat the proof vector (Figure 6) as a stack and do necessary calculations as discussed above. The calcula-tion of hashes is done in the reverse order of the proof generation in genMultiProof algorithm. Therefore, we perform the calculations in the following order: c4, c6,

c3, c2, w5,. . .until the hash value for the root (the last

element in the stack) is computed. Observe that to com-pute the hash value for w5, the hash values for c3and

c2are needed, and this reverse ordering always satisfies

these dependencies. As another example, the hash com-putation of w3needs the hash values computed for c1and

w4, but they will already be computed when it is the turn

to compute the hash of w3. Finally, we compute the

cor-responding hash values for w2and w1. When the hash

for the last proof node of the proof path is calculated, it is compared with the meta data that the client possesses (in Line 22 of Algorithm 4.3).

The checks above make sure that the nodes whose proofs were sent are indeed in the FlexList that corre-spond to the meta data at the client. But, the client also has to make sure that the server indeed proved storage of the data that she challenged, since the server may have

(9)

lost those blocks but instead may be proving storage of some other blocks at different indices. To prevent this, the verify challenge vector, which contains the start in-dices of the challenged nodes (150,300,450, and 460 in our example), is generated by the rank values included in the proof vector (in Lines 5,9,10,13,14, and 18 of Al-gorithm 4.3). With the start indices and the lengths of the challenged nodes given, we check if each challenged index is included in a node that the proof is generated for (as shown in Line 22 of Algorithm 4.3). For instance, we know that we challenged index 170, c1 starts from 150

and is of length 80. We check if 0<170−150<80. Such a check is performed for each challenged index and each proof node with anendmark.

Algorithm 4.3: verifyMultiProof Algorithm

Input:C,P,T,MetaData

Output:acceptorreject

Let P = (A0, . . . ,Ak), where Aj= (levelj, rj, hashj,

rgtOrDwnj, isInterj, isEndj, lengthj) for j=0, . . . ,k; T = (tag0, . . . ,tagn), where tagm= tag for challenged

blockm for m=0, . . . ,n;

start=n;end=n;t=n;V=0; hash=0; hashprev=0;

1

startTemp=0;thandtiare empty stacks

// Process each proof node from the end to calculate hash of the root and indices of the challenged blocks

for j=k to0do

2

if isEndjand isInterjthen

3

hash =hash(levelj,rj,tagt,hashprev,lengthj);

4

decrement(t)

updateRankSum(lengthj,V,start,end);

5

decrement(start)// Update index values of challenged blocks on the leaf level of current part of the proof path

else ifisEndjthen

6

if t6=nthen

7

hashprevis added toth

8

(start,end)is added toti

9

decrement(start);end=start

10

hash =hash(levelj,rj,tagt,hashj,lengthj);

11

decrement(t)

else ifisInterjthen

12

(startTemp,end) =ti.pop()

13

updateRankSum(rprev,V,startTemp,end)// Last

14

stored indices of challenged block are updated to rank state of the current intersection

hash =hash(levelj,rj,hashprev,th.pop())

15

else if rgtOrDwnj=rgt or levelj=0then

16

hash =hash(levelj,rj,hashj,hashprev)

17

updateRankSum(rj−rprev,V,start,end)//

18

Update indices of challenged blocks, which are on the current part of the proof path

else

19

hash =hash(levelj,rj,hashprev,hashj)

20

hashprev= hash;rprev=rj

21

//endnodes is a vector of proof nodes marked as

End in the order of appearance in P

if∀a, 0≤a≤n ,0≤Ca−Va<endnodesn−a.length OR hash

22 6 =MetaDatathen returnreject 23 returnaccept 24

4.3

Verifiable Dynamic Updates with

vari-able block size

The main purpose of the insert, remove and modify op-erations (update opop-erations) of our FlexList being em-ployed in the cloud setting is that we want the update

operations to be verifiable. The purpose of the following algorithms is to verify the update operation and compute new meta data to be stored at the client through the proof sent by the server.

verifyModify(C,P,T,M,Pnew,Tnew,Mnew,tag,data,

MetaData)→ {acceptorreject,MetaData0}(Algorithm 4.4): Theveri f yModi f yalgorithm is run at the client to approve the modification. The server sends two proof vectors for the given block to the client, one for before the modification and one for after. The client alters the last element of the old proof vector and calculates temp meta data accordingly. Later she calculates the meta data value for the new proof vector and compares it with the temp meta data. If they are the same then modification is accepted, otherwise rejected.

Algorithm 4.4: verifyModify Algorithm

Input:C,P,Pnew,T,Tnew,M,Mnew,tag,data,MetaData Output:acceptorreject,MetaData0

Let C= (i0) where i0 is the modified index; P

= (A0, . . . ,Ak), whereAj= ( levelj, rj, hashj,

rgtOrDwnj, isInterj, isEndj, lengthj)for j=0, . . . ,k; T = (tag0), wheretag0 is tag for block0 before

modification; P, T, M and Pnew, Tnew, Mnew are

the proof, tags and block sum before and after

the modification respectively; tag and data are

the new tag and data of the modified block

if !VerifyMultiProof(C, P, T , MetaData)then

1 returnreject; 2 else 3 i= size(P) - 1 4

hash =hash(Ai.level, Ai.rank - Ai.length +data.length,

5

tag, Ai.hash,data.length)

// Calculate hash values until the root of the Flexlist

NewMetaData = calculateRemainingHashes( i-1, hash,

6

data.length - Ai.length,P)

ifVerifyMultiProof(C, Pnew, Tnew, NewMetaData)then

7 Metadata= NewMetaData 8 returnaccept 9 else 10 returnreject 11

verifyInsert(C,P,T,M,Cnew,Pnew,Tnew,Mnew,tag,

data,level,MetaData)→ {acceptorreject,MetaData0} (Algorithm 4.5) is run to verify the correct insertion of a new block to the FlexList, using two proofs sent by the server. Proof vectorPis generated for the intended left neighbor of the node to be inserted and proof vector

Pnewis generated after insertion, for the node which is

in-serted. It calculates the new meta data using the proofP

as if the new node has been inserted in it. The inputs are the challenged block indices, proofs, tags, block sums, and the new block information. The output is acceptif the new root calculated is equal to the root from Pnew,

otherwisereject.

The algorithm is explained using Figure 7 as an ex-ample where a verifiable insert at index 450 occurs. The algorithm starts with the computation of the hash values for the proof node n3ashashTowerat Line 5 and v2as

hash at Line 7. Then the loop handles all proof nodes until the intersection point of the newly inserted node n3

(10)

Figure 7: Verifiable insert example. calculates the hash value for v1ashash. The second

it-eration yields a newhashTowerusing the proof node for d2. The same happens for the third iteration but using the

proof node for d1. Then the hash value for the proof node

c3is calculated ashash, and the same operation is done

for c2. The hash value for the proof node m1

(intersec-tion point) is computed by takinghashandhashTower. The rest is simple, the algorithm calculates all remaining hash values until the root. The last hash value computed is the hash of the root, which is the new meta data. If

Pnew is verified using the new meta data then the meta

data stored at the client is updated.

Algorithm 4.5: verifyInsert Algorithm

Input:C,P,T,M,Cnew,Pnew,Tnew,Mnew,tag,data,level,

MetaData

Output:acceptorreject,MetaData0

Let C= (i0)where i0 is the index of the left

neighbor; Cnew= (i1) where i1 is the index of the

new node; P = (A0, . . . ,Ak), where Aj= ( levelj, rj,

hashj, rgtOrDwnj, isInterj, isEndj,lengthj) for

j=0, . . . ,k; T = (tag0)where tag0 is for precedent

node of newly inserted node; P, T, M and Pnew,

Tnew, Mnew are the proof, tags and block sum

before and after the modification respectively;

tag, data and level are the new tag, data and level of the inserted block

if !VerifyMultiProof(C, P, T , MetaData)then

1

returnreject;

2

else

3

i = size(P) - 1; rank = Ai.length; rankTower = Ai.rank

-4

Ai.length +data.length

hashTower =hash(0, rankTower,tag, Ai.hash,

5

data.length)

if level6=0then

6

hash =hash(0, Ai.length,tag0, 0);

7

decrement(i)

8

while Ai.level6=level or (Ai.level = level and

9

!Ai.rgtOrDwn = rgt)do if Ai.rgtOrDwn = rgtthen

10

rank += Ai.rank - Ai+1.rank

11

// Ai.length is added to hash

calculation if Ai.level =0

hash =hash(Ai.level, rank, Ai.hash, hash)

12

else

13

rankTower += Ai.rank - Ai+1.rank

14

hashTower =hash(Ai.level, rankTower,

15

hashTower, Ai.hash)

decrement(i)

16

hash =hash(level, rank + rankTower, hash, hashTower)

17

NewMetaData= calculateRemainingHashes(i, hash,

18

data.length,P)

if VerifyMultiProof(Cnew, Pnew, Tnew, NewMetaData)

19 then MetaData = NewMetaData 20 returnaccept 21 returnreject 22

verifyRemove (C, P, T, M, Pnew, Tnew, Mnew,

Meta-Data)→ {acceptorreject,MetaData0}(Algorithm 4.6) is run to verify the correct removal of a block in the FlexList, using the two proofs sent by the server. Proof vectorPis generated for the left neighbor and the node to be deleted. Proof vector Pnew is generated after the

remove operation, for the former left neighbor of the re-moved node. It calculates the new meta data using the proofPas if the node has been removed. The inputs are the proofs, tags, block sums, and the new block informa-tion. The output is acceptif the new root calculated is equal to the root fromPnew, otherwisereject.

Algorithm 4.6: verifyRemove Algorithm

Input:C,P,T,M,Pnew,Tnew,Mnew,MetaData Output:acceptorreject,MetaData0

Let C= (i0,i1) where i0, i1 are the index of the left

neighbor and the removed index respectively; Cnew= (i1); P = (A0, . . . ,Ak), where Aj= (levelj, rj,

hashj, rgtOrDwnj, isInterj, isEndj, lengthj) for

j=0, . . . ,k; T = (tag0,tag1)where tag1 is tag value

for deleted node and tag0 is for its precedent

node ; P, T, Mand Pnew, Tnew, Mnew are the

proof, tags and block sum before and after modification respectively;

if !VerifyMultiProof(C, P, T , MetaData)then

1

returnreject

2

else

3

dn = size(P) - 1; i = size(P) - 2; last = dn

4

while !Ai.isEnddo

5

decrement(i)

6

rank = Adn.rank; hash =hash(0, rank, tag0, Adn.hash,

7

Adn.length)

decrement(dn)

8

if!Adn.isEnd or !Ai.isInterthen

9

decrement(i)

10

while!Adn.isEnd or !Ai.isInterdo

11

ifAi.level<Adn.level or Adn.isEndthen

12

rank += Ai.rank - Ai+1.rank

13

// Ai.length is added to hash

calculation if Ai.level =0

hash =hash(Ai.level, rank, Ai.hash, hash)

14

decrement(i)

15

else

16

rank += Adn.rank - Adn+1.rank

17

hash =hash(Adn.level, rank, hash, Adn.hash)

18

decrement(dn)

19

decrement(i)

20

NewMetaData= calculateRemainingHashes(i, hash,

21

Alast.length,P)

ifVerifyMultiProof(Cnew, Pnew, Tnew, NewMetaData)then

22 MetaData = NewMetaData 23 returnaccept 24 returnreject 25

The demonstration of the algorithm will be discussed through the example in Figure 8 where a verifiable re-move occurs at index 450. The algorithm starts by plac-ing iteratorsianddnat the position of v2(Line 6) and

(11)

Figure 8: Verifiable remove example. d4(Line 4) respectively. At Line 7, the hash value (hash)

for the node v2is computed using the hash information at

d4.dnis then updated to point at node d3at Line 8. The

loop is used to calculate the hash values for the newly added nodes in the FlexList using the hash information in the proof nodes of the deleted nodes. The hash value for v1 is computed by using hashin the first iteration.

The second and third iterations of the loop calculate the hash values for m2and m1by using hash values stored at

the proof nodes of d3and d2respectively. Then the hash

calculation is done for c3by using the hash of m1. After

the hash of c2is computed using the hash of c3, the

al-gorithm calculates the hashes until the root. The hash of the root is the new meta data. IfPnewis verified using the

new meta data then the meta data at the client is updated.

5

Analysis

5.1

Security Analysis

Note that within a proof vector, all nodes which are marked with the end flag “E” contain the length of their associated data. These values are used to check if the proof in the process of verification is the proof of the block corresponding to the challenged index. A careless implementation may not consider the authentication of the length values. To show the consequence of not au-thenticating the length values, we will use Figure 5 and Figure 6 as an example.

The scenario starts with the client challenging the server on the indices{170, 400, 500, 690} that corre-spond to nodes c1, v4, c3, and c4respectively. The server

finds out that he does not possess v4anymore, and

there-fore instead of that node, he will try to deceive the client by sending a proof for c2. The proof vector will just be

the same as the proof vector illustrated in Figure 6 with a slight change done to deceive the client. The change is done to the fourth entry from the top (the one corre-sponding to c2): Instead of the original length 80, the

server puts 105 as the length of c2. The verification

al-gorithm at the client side will accept this fake proof as follows:

• The block sum value and the tags get verified since both are prepared using genuine tags and blocks of the actual nodes.

• The client cannot realize that the data of c2counted

in the block sum is not 105 bytes, but 80 bytes in-stead. This is because the largest challenged data (the data of c4of length 110 in our example) hides

the length of the data of c2.

• Since the proof vector contains genuine nodes (though not necessarily all the challenged ones), when the client usesverifyMultiProof algorithm on the proof vector from Figure 6, the check on Line 22 (Algorithm 4.3), “hash 6= MetaData” will be passed.

• The client also checks that the proven nodes are the challenged ones by comparing the challenge indices with the reconstructed indices by “∀a, 0≤a≤ n , 0≤Ca−Va<endnodesn−a.length” (Algorithm

4.3 on Line 22). This check will also be accepted because:

– c1is claimed to start at index 150 and contain

80 bytes, and hence includes the challenged index 170 (verified as 0<170−150<80).

– c2is claimed to start at index 300 and contain 105bytes, and hence includes the challenged index 400 (verified as 0<400−300<105).

– c3is claimed to start at index 450 and contain

100 bytes, and hence includes the challenged index 500 (verified as 0<500−450<100).

– c4is claimed to start at index 640 and contain

110 bytes, and hence includes the challenged index 690 (verified as 0<690−640<110).

Solutions: There are two possible solutions. We either

may include the authenticated rank values of the right neighbors of the end nodes to the proofs, or use the length of the associated data in the hash calculation of the leaf nodes. We choose the second solution, which is authenti-cating the length values, since adding the neighbor node to the proof vector also adds a tag and a hash value, for each challenged node, to the communication cost.

All rank values, which are used in the calculation of the start indices of the challenged nodes, are used in hash calculations as well. Therefore, bothlengthandrank val-ues contribute in the calculation of the hash value of the root. To deceive the client, the adversary should fake the rank or length value of at least one of the proof nodes. If the adversary sends a verifying proof vectorPfor any node other than the challenged ones, we can break the

(12)

collision resistance of the hash function used using a sim-ple reduction.

5.2

Performance Analysis

We have developed a prototype implementation of an op-timized FlexList (on top of our opop-timized skip list and authenticated skip list implementations). We used C++ and employed some methods from the Cashlib library [20, 9]. The experiments were conducted on a 64-bit ma-chine with a 2.4GHz Intel 4 core CPU (only one core is active), 4GB main memory and 8MB L2 cache, running Ubuntu 12.10. As security parameters, we used 1024-bit RSA modulus, 80-bit random numbers, and SHA-1 hash function, overall resulting in an expected security of 80-bits. 256 512 1024 2048 4096 8192 16384 32768 65536 131072 0 50 100 150 200 250

Block Size [Bytes]

Server Time [ms]

16 MB File 160 MB File 1600 MB File

Figure 9: Server time for 460 random challenges as a function of block size (block count x10 for 160MB, x100 for 1600MB). 101 102 103 104 105 106 0 0.1 0.2 0.3 0.4 0.5 Number of blocks Time [ms] Insert VerInsert Strong VerInsert Modify VerModify Strong VerModify Remove VerRemove Strong VerRemove

Figure 10: Performance evaluation of FlexList methods and their verifiable versions.

We have tested the server computation time in the DPDP setting as a function of the block size by fixing the file size to 16MB, 160MB, and 1600MB. All our re-sults are the average of 10 runs. As shown in Figure 9, with the increase in block size, the time required for the proof generation increases, since with a higher block

size, the block sum generation takes more time. Inter-estingly though, with extremely small block sizes, the number of nodes in the FlexList become so large that it dominates the proof generation time. Since 2KB block size seemed to be optimal for various file sizes, our other tests employ 2KB blocks (except for performance gain graph).

We have also tested the basic functions of the FlexList (insert,modi f y,remove) against their verifiable versions, as shown in Figure 10. The regularinsertmethod takes more than any other method, since it needs extra time for the memory allocations. Theremovemethod takes less time than themodi f ymethod, because there is no I/O de-lay and at the end of theremovealgorithm there are less nodes that need recalculation of the hash and rank values. As expected, the complexity of the FlexList operations increases logarithmically. A weak verifiable version of an update is the proof generated before the update and the update itself. A strong verifiable version addition-ally needs the proof for the altered block after the up-date has been applied. The client not only upup-dates her meta data but also verifies if it is correctly updated at the server. The guarantee, of having same meta data both at the client and the server, is achieved with the additional server time and the communication cost. The weak ver-ifiable versions of the functions require an average over-head of 0.05 ms and the strong verifiable versions need 0.1 ms for a single run.

1 2 3 4 5 6 7 8 9 10 x 104 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2x 10 5 Number of blocks

Number of links or nodes

Optimal number of nodes Optimal number of links Standard number of nodes Standard number of links

Figure 11: The number of nodes and links used on top of leaf level nodes, before and after optimization.

Our optimization, removing unnecessary links and nodes, ends up with 50% less nodes and links on top of the leaf nodes, which are always necessary since they keep the file blocks. Figure 11 shows the number of links and nodes used before and after optimization. As FlexList is a randomized data structure, so our results are the average of 10 runs. The expected number of nodes in a regular skip list is 2n[25] (wherenrepresents the num-ber of blocks):nleaf nodes andnnon-leaf nodes. Each

(13)

non-leaf node makes any left connection below its level unnecessary as described in Section 3. So that we have

nunnecessary links: half at the leaf level and half at the non-leaf level, since in the skip list half of all nodes and links are at the leaf level. Since there aren/2 non-leaf un-necessary links, it means that there aren/2 non-leaf un-necessary nodes as well, according to unun-necessary node definition (Section 3). Hence, there are n−n/2=n/2 non-leaf necessary nodes. Since each necessary node has 2 links, in total there are 2∗n/2=nnecessary links above the leaf level. Therefore, in Figure 11, there is an overlap between the standard number of non-leaf nodes (n) and the optimal number of the non-leaf links (n).

0 50 100 150 200 250 300 350 400 450 0 0.5 1 1.5 2 2.5 3 Number of challenges Ratio

Total Gain (time) Total Gain (size) Skip List Gain (time) Skip List Gain (size)

Figure 12: Performance gain graph ([460 single proof / 1 multi proof] for 460 challenges).

The performance of our optimized implementation of the proof generation and verification mechanism is eval-uated in terms of communication and computation. We take into consideration the case where the client wishes to detect with more than 99% probability if more than a 1% of her 1600MB data is corrupted by challenging 460 blocks as in PDP and DPDP [3, 12]. In our exper-iment, we used a FlexList with 100,000 nodes, where a file block size is 16KB (same as in PDP and DPDP).

In Figure 12 we plot the ratio of the unoptimized proofs over our optimized proofs in terms of theFlexList

proofsize and computation, as a function of the

num-ber of challenged nodes. The unoptimized proofs cor-respond to proving each block separately, instead of us-ing ourgenMultiProof algorithm for all of them at once. Our multi-proof optimization results in35%

computa-tion and 60%communication gainsfor FlexList proofs.

This corresponds to proofs being up to1.5 times as fast

and2.5 times as small. We measure percentage gain

in the total size of aproof of DPDP with FlexListand computation done by the server in Figure 12. With our optimizations, we clearly see a gain of about30% and

50%for the overall computation and communication

respectively, corresponding to proofs being up to 1.35

times as fastand 2 times as small. The whole proof

roughly consists of 160KB FlexList proof, 57KB of tags, and 16KB of block sum. Thus, for 460 challenges as

sug-gested by PDP and DPDP [3, 12], we obtain a total proof size of 233KB, and the computation takes about 21ms.

1 2 3 4 5 6 7 8 9 10 x 104 0 1 2 3 4 5 6 7 8 9 10x 10 5 Number of nodes Ratio Ratio: DPDP − FlexDPDP

Figure 13: Ratio graph on DPDP and FlexDPDP. As we discussed earlier, a FlexList provides block-wise fully dynamic solution. An RBASL cannot han-dle variable block size updates well since it requires all consequent blocks (after the updated one) to be modi-fied as well. Figure 13 shows the ratio between DPDP and FlexDPDP proof generation times. It clearly demon-strates that a FlexList computes variable-size updates dramatically faster than an RBASL. In our scenario, the client randomly chooses the index for a variable block size update (this is the average over 10 random updates). A FlexList only modifies the necessary blocks with the update. An RBASL alters the contents of the necessary block and all consequent nodes as well. Moreover, the server in DPDP has to compute the new tag values of all the updated nodes. We assume that an RBASL will have to modify all consequent nodes’ contents each time for the variable block size updates.

It is very interesting to compare FlexDPDP, which is a dynamic system, with static PDP. The server in PDP computes the sum of the challenged blocks and the mul-tiplication and exponentiation of their tags. FlexDPDP server only computes the sum of the blocks and FlexList proof, but not the multiplication and exponentiation of their tags, which is the most expensive cryptographic computation of all. In such a scenario FlexDPDP server outperforms PDP server, since the multiplication of tags in PDP takes much longer than the FlexList proof gen-eration in FlexDPDP. Note that, it is possible that PDP can also be implemented by the server sending the tags to the client and the client computing the multiplication and exponentiation of the tags. If this is done in a PDP implementation, even though the proof size grows, then PDP server can respond to challenges faster than FlexD-PDP. Therefore, we realize that it is an implementation decision for PDP where to handle the multiplication and exponentiation of tags.

(14)

References

[1] Blinded for review purposes.

[2] A. Anagnostopoulos, M. T. Goodrich, and R. Tamas-sia. Persistent authenticated dictionaries and their appli-cations. InISC, 2001.

[3] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kiss-ner, Z. Peterson, and D. Song. Provable data possession at untrusted stores. InACM CCS, 2007.

[4] G. Ateniese, S. Kamara, and J. Katz. Proofs of stor-age from homomorphic identification protocols. In

ASI-ACRYPT, 2009.

[5] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik. Scalable and efficient provable data possession. In

Se-cureComm, 2008.

[6] G. D. Battista and B. Palazzi. Authenticated relational tables and authenticated skip lists. InDBSec, pages 31– 46, 2007.

[7] K. Blibech and A. Gabillon. Chronos: an authenticated dictionary based on skip lists for timestamping systems. InSWS, pages 84–90, 2005.

[8] K. Blibech and A. Gabillon. A new timestamping scheme based on skip lists. InICCSA (3), pages 395–405, 2006.

[9] Brownie cashlib cryptographic library.

http://github.com/brownie/cashlib.

[10] S. A. Crosby and D. S. Wallach. Authenticated dictio-naries: Real-world costs and trade-offs. ACM Trans. Inf.

Syst. Secur., 14(2):17, 2011.

[11] Y. Dodis, S. Vadhan, and D. Wichs. Proofs of retrievabil-ity via hardness amplification. InTCC, 2009.

[12] C. Erway, A. K¨upc¸¨u, C. Papamanthou, and R. Tamassia. Dynamic provable data possession. InACM CCS, 2009. [13] C. C. Foster. A generalization of avl trees. Commun.

ACM, 16(8):513–517, Aug. 1973.

[14] M. T. Goodrich, C. Papamanthou, R. Tamassia, and N. Triandopoulos. Athos: Efficient authentication of out-sourced file systems. InISC, pages 80–96, 2008. [15] M. T. Goodrich and R. Tamassia. Efficient

authenti-cated dictionaries with skip lists and commutative hash-ing. Technical report, Johns Hopkins Information Secu-rity Institute, 2001.

[16] M. T. Goodrich, R. Tamassia, and A. Schwerin. Imple-mentation of an authenticated dictionary with skip lists and commutative hashing. InDARPA, 2001.

[17] A. Juels and B. S. Kaliski. PORs: Proofs of retrievability for large files. InACM CCS, 2007.

[18] J. Katz and Y. Lindell. Introduction to Modern Cryptog-raphy (Chapman & Hall/Crc CryptogCryptog-raphy and Network

Security Series). Chapman & Hall/CRC, 2007.

[19] P. Maniatis and M. Baker. Authenticated append-only skip lists.Acta Mathematica, 2003.

[20] S. Meiklejohn, C. Erway, A. K¨upc¸¨u, T. Hinkle, and A. Lysyanskaya. Zkpdl: Enabling efficient implemen-tation of zero-knowledge proofs and electronic cash. In

USENIX Security, 2010.

[21] R. Merkle. A digital signature based on a conventional encryption function.LNCS, 1987.

[22] C. Papamanthou and R. Tamassia. Time and space ef-ficient algorithms for two-party authenticated data struc-tures. InICICS, pages 1–15, 2007.

[23] D. J. Polivy and R. Tamassia. Authenticating distributed data using web services and xml signatures. InIn Proc.

ACM Workshop on XML Security, 2002.

[24] W. Pugh. A skip list cookbook. Technical report, 1990. [25] W. Pugh. Skip lists: a probabilistic alternative to balanced

trees.Communications of the ACM, 1990.

[26] H. Shacham and B. Waters. Compact proofs of retriev-ability. InASIACRYPT, 2008.

[27] P. T. Stanton, B. McKeown, R. C. Burns, and G. Ateniese. Fastad: an authenticated directory for billions of objects. 2010.

[28] R. Tamassia and N. Triandopoulos. On the cost of au-thenticated data structures. InIn Proc. European Symp.

on Algorithms, volume 2832 of LNCS.

[29] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou. Enabling public veriability and data dynamics for storage security in cloud computing. InESORICS, 2009.

[30] Q. Zheng and S. Xu. Fair and dynamic proofs of retriev-ability. InCODASPY, 2011.

[31] M. Zhou, R. Zhang, W. Xie, W. Qian, and A. Zhou. Secu-rity and privacy in cloud computing: A survey. InIEEE SKG, pages 105–112, 2010.

A

FlexList

Figure 14: A FlexList example with 2 sub skip lists indi-cated.

Sub skip list: To make our FlexList algorithms easier to un-derstand, consider Figure 14. Let the search index be 250 and the current node start at the root (v1). The current node follows

itsbelowlink tov2and enters a sub skip list (big dashed

rect-angle). Now,v2is the root of this sub skip list and the searched node is still at index 250. In order to reach the searched node, the current node moves tov3, which is the root of another sub

(15)

skip list (small dashed rectangle). Now, the searched byte is at index 150 in this sub skip list. Therefore the searched index is updated accordingly. The amount to be reduced from the search index is equal to the difference between the rank values ofv2 andv3, which is equal to the rank ofbelowofv2. Whenever the current node follows ana f terlink, the search index should be updated. To finish the search, the current node follows the

a f terlink ofv3to reach the node containing index 150 in the

sub skip list with rootv3.

A.1

Methods of FlexList

FlexList is a particular way of organizing data for secure cloud storage systems. We need some basic functions to be available, such as search, modify, insert and remove. These functions are employed in the verifiable updates. All algorithms are designed to fill a stack for the possibly affected nodes. This stack is used for recalculation of rank and hash values accordingly. Details of these algorithms can be found in the full version [1]. search(i)→ {nodei,tn}is the algorithm to find a particular

byte. It takes the indexias the input, and outputs the node at indexiwith the stacktn filled with the nodes on the search

path. Any value between 0 and the size of our FlexList is valid to be searched. It is not possible for a valid index not to be found in the FlexList.

modify(i,data)→ {nodei,tn}just searches for a node by

in-dex of a byte included in the node, and updates its data, then recalculates hash values along the search path. The input con-tains the indexiand the newdata. The outputs of this algo-rithm are the modified node and stacktnfilled with nodes on

the search path.

insert(i,data)→ {tn}is run to add a new node to the skip list

with a random level by adding new nodes along the insertion path. The inputs are the indexianddata. The algorithm gener-ates a random levell, then creates the new node with the given

dataand attaches it to indexi, along with the necessary nodes until the levell. Note that this index should be the beginning index of an existing node, since inserting a new block inside a block makes no sense. In case of an addition inside a block we can do the following: search for the block including the byte where the insertion will take place, add our data in between the first and second part of the data found to obtain a new data and employmodifyalgorithm (if the new data is long, we can divide it into parts and send it as one modify and a series of inserts). As output, the algorithm returns the stacktnfilled with nodes

on the search path of the new block.

remove(i)→ {tn}is run to remove the node which starts with

the byte at indexi. As input, it takes the indexi. The algorithm detaches the node to be removed and all other nodes above it while preserving connections between the remaining nodes. As output, the algorithm returns the stacktnfilled with the nodes

on the search path of the left neighbor of the node removed.

B

Conclusion and Future Work

The security and privacy issues are significant obstacles to-wards the cloud storage adoption [31]. With the emergence of the cloud storage services, the data integrity has become one of the most important challenges. Early works have shown that

the static solutions with optimal complexity [3, 26], and the dynamic solutions with logarithmic complexity [12] are within reach. However, DPDP [12] solution is not applicable to the real life scenarios since it supports only fixed block size and therefore lacks flexibility on the data updates, while the real life updates are likely not of constant block size. We have extended their work in several ways and provided a new data structure (FlexList) and its optimized implementation for use in the cloud data storage. A FlexList efficiently supports the variable block sized dynamic provable updates, and we showed how to handle multiple proofs at once, greatly improving scalability.

As future work, we plan to further study parallelism and en-ergy efficiency, and deploy our implementation on the Planetlab as a server-client system. We also aim to extend our system to peer-to-peer settings.

Figure

Updating...

References