Proof of Ownership-based Methods - : Construct flat Fairness tree with

Step 4 : Construct flat Fairness tree with

2.4.3 Proof of Ownership-based Methods

Data deduplication is a type of single-instance data storage (data compression) techniques, which is used to remove data redundancy and duplicate copies of data to provide a cost-efficient storage (Mandagere, Zhou, Smith, & Uttamchandani, 2008; Meyer & Bolosky, 2012). Deduplication techniques are responsible to recognize a common set of bytes within or between files (known as chunks) and allow the storing of a single instance of each chunk, irrespective of the number of repetitions. In typical storage system that support data deduplication, a client needs to convince the server of having a copy of the outsourced file by sending a hash of file to the server. If the server finds this hash in the database, accepts the client’s claim and marks the client as the owner of that file; otherwise ask him to upload the entire file. However, this method is vulnerable against some security attacks, because anyone who gets the hash value is permitted to access the file (Halevi, Harnik, Pinkas, & Shulman-Peleg, 2011; Harnik, Pinkas, & Shulman-Peleg, 2010).

Transferring File (F)

Proof message Sibling paths

Verifying the proof

Prover Verifier

Dividing F into L bits by Pairwise Independent Hashing

Generating the MHT F Challenge the root of MHT the number of leaves Re generating the MHT and computing the sibling paths

Figure 2.15: Proof of Ownership Protocol (Sookhak et al., 2014)

scheme in which the data owner is able to convince the server without transferring the file. This method (that is called Proof of Ownership (POW)) is constructed on the basis of Merkle Hash Tree (MHT) and Collision-Resistant Hash (CRH) functions. Since in the POW scheme the client needs to persuade the cloud, the role of prover and verifier is reversed. When the server receives the input file(F), he maps it to L bits using Pairwise

Independent Hashing (X = hPIH(F)), and then constructs a Merkle tree based on it (R = MHT (X )). The verifier returns the root of the MHT and the number of its leaves (RMHT , nl) as the verification information. During the Proof step, the client computes

the sibling-paths of all the leaves as a proof of deduplication and sends it to the cloud. Finally, the verifier validates the sibling-paths with regard to MHT(X ). Figure2.15 shows

the core idea behind the Proof of ownership method.

Design a secure data deduplication method to provide simultaneously data auditing and deduplication in cloud computing is contradictory because security and efficiency

aspects seem to be two conflicting goals. In other words, deduplication eliminates the identical contents, while data security attempts to encrypt all the contents, and the same contents can then be encrypted by two different keys to create different ciphertexts (Mar- ques & Costa, 2011; Storer et al., 2008).

Zheng and Xu(2012) introduced the first Proof of Storage with Deduplication (POSD) method by integrating two different concepts of Proof of Data Possession(PDP) and Proof of Ownership (POW) with the purpose of providing both security and efficiency. This method consists of four steps, such as key generation, uploading, auditing and deduplication. Before uploading the file to the cloud, the honest data owner is responsible to generate two pairs of public and private keys for checking data integrity and deduplica- tion. The data owner divides the file into n block with m bits and then computes the block tags to upload to the cloud. The POSD method includes two challenge-response algorithms such as data auditing, which the data owner validate the correctness of data by sending challenge to the cloud as a prover, and deduplication, which the cloud verifies the claim of a client for having a copy of outsourced data.

The POSD scheme needs to meet the following security requirements: (1) server unforgeablity in which the server should provide a valid response to the client challenges with non-negligible probability, and (2)(k,Θ) uncheatablity in which dishonest clients are

not able to deceive the server with non-negligible probability. The validity of the POSD scheme depends on the reliability of the clients in terms of performing key generation, while this assumption is not reasonable in the cross-multiple users and the cross-domain environment of cloud computing.

Shin, Hur, and Kim(2012)showed that dishonest client is able to manipulate the key generation step to create weak keys. As a result, this method fails to fulfill the unforgeablity and uncheatablity features as security requirements and it is vulnerable to some attack scenarios such as (1) Malware Distribution: a malicious client uploads a file in

the cloud and modifies it by attaching a malware to this file. This malware is distributed among clients when they execute the deduplication step to take the ownership of the orig- inal file, and (2) Unintended content distribution network: an adversary has capability to transfer the data to the other malicious client through the cloud by using the weak key. Shin et al.(2012) improved the security of POSD scheme by minimizing the client capability to control the key generation step. The core idea behind this method, namely Improved POSD (I-POSD) is to blind the keys with random values, when the server re- ceive the data and corresponding tags.

The POSD scheme (Zheng & Xu, 2012) also suffers from the linear communication and computational cost on the client side regarding to the number of elements in each data block and the number of checking blocks during data auditing step. As a result, by increasing the number of mobile users, the communication and computational cost incur a huge overhead on the client who utilizes the resource constrained devices (e.g. Smart mobile phones) to access the cloud storage.

Yuan and Yu(2013b) proposed a new data storage auditing with deduplication capability to address the linear communication and computational cost on POSD scheme based on polynomial-based authentication tags and homomorphic linear authenticators. The communication complexity of auditing step in this method, that is called Public Cloud Auditing with Deduplication (PCAD), depends on transferring a challenge message, the aggregation tags of data blocks and the proof information which impose O(1) as a total

communication complexity on the client. However, the total communication complexity in POSD method is O(s + k) because the CSP needs to transfer k authentication tags of

the challenging blocks and s aggregated data blocks to the verifier, where s is the size of data block.

In document Dynamic remote data auditing for securing big data storage in cloud computing / Mehdi Sookhak (Page 69-73)