A Survey on Data Integrity of Cloud Storage in Cloud Computing

(1)

A Survey on Data Integrity of Cloud Storage in Cloud Computing

Mr.Vitthal Raut, Prof. Suhasini Itkar

Department Computer Engineering, PES Modern College of Engineering, Pune, India.

[email protected], [email protected] A B S T R A C T

Cloud computing is an emerging technology aimed at providing various computing and storage services over the Internet. In cloud computing, the computing resources are Memory, storage and processor are not physically present at the user’s location. Instead, they are located outside the premises and managed by a service provider .The user can access the resources via the Internet.

The main focus is data security and Integrity of cloud computing. This survey paper aims to analyses various security issue in cloud computing and data integrity proof. This paper describes the various security issue related to cloud computing. To check integrity of data, the user must be able to use the assist of a TPA. The TPA has experience in checking integrity of the data .It also help low computational devices like PDA check the data integrity of large data file. The data in the cloud should be correct, consistent, accessible and high quality. The aim of this survey paper is to analyses ensuring the integrity of the data and provides the proof that data is in secured manner.

INDEX TERM: Cloud computing, Data Integrity, TPA, Cloud client.

I. INTRODUCTION

Cloud computing, to put it simply, means “Internet Computing.” The Internet is commonly visualized as clouds; hence the term “cloud computing” for computation done through the Internet. With Cloud Computing users can access database resources via the Internet from anywhere, for as long as they need, without worrying about any maintenance or management of actual resources. Besides, databases in cloud are very dynamic and scalable.

“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”

Cloud Computing has emerged from the technologies including grid computing, distributed computing, parallel computing, virtualization technology and utility computing [1] According to Amarnath Jasti et al.

[2], virtualization optimizes the application performance in a cost effective manner, but it can also introduce a few security risks. In cloud, security plays a major role due to the fact that customers outsource their data and computation tasks on cloud servers, which are controlled and managed by potentially untrustworthy cloud providers.

Security in Cloud Computing

The popularity of Cloud Computing is mainly due to the fact that many enterprise applications and data are moving into cloud platforms; however, lack of security is the major barrier for cloud adoption [3].

According to a recent survey by International Data Corporation (IDC), 87.5% of the masses belonging to varied levels starting from IT executives to CEOs have said that security is the top most challenge to be dealt with in every cloud service. Many of the threats found in existing platforms. Out of them, the Security Threat is considered to be of High Risk

(2)

59 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org The major security aspect is Confidentiality, Integrity, Authentication, Authorization, Non-repudiation and Availability which are further explained below:

Confidentiality is the process of making sure that the data remains private, confidential and restricted from unauthorized users [4]. Data encryption is one of the most popular options of security before pushing the data into cloud.

Integrity is the guarantee by which the data is protected from accidental or deliberate (malicious) modification. Hashing techniques, digital signatures and message authentication codes are used to preserve data integrity [5]. Integrity problems are in big scale due to the multi-tenancy characteristic of cloud [6]

Authentication is the mechanism by which the systems may securely identify their users. Authorization determines the level of access to system resources attributed to a particular authenticated user [7]

Non-repudiation is an extension to the identification and authentication service. It is used to ensure that the messages sent are properly received and acknowledgements are sent back to the sender. In other words, establishing a two way communication between a sender and a receiver.

Availability ensures that an organization has its full set of computing resources available and usable at all times for its real users [8]. In this paper we will discuss about the integrity of data in cloud storage.

II. Related Work:

Several papers have been studied in the area of cloud computing security. Jinpeng et al [9] proposes a model to manage the virtual machine image in a cloud environment in secure manner. The advantage of this system is that the access permission is private so that untrusted parties cannot access the system.

The main drawback is that the image filters cannot be accurate so that system does not eliminate the risk entirely.

Miranda, Siani [10] proposes a client based privacy manager for reducing the risk of misused the user’s private data and also assist the cloud computing provider to conform the privacy law. The service provider has to provide honest cooperation with the privacy manager. Otherwise this method is not effective one.

Cong et al [11] proposes the system uses holomorphic token with distributed verification of erasure coded data. It effectively detects an unauthorized access in cloud environment

Kevin Hemalen et al [12] presents a layered framework for secure cloud. This system builds a trusted application from untrusted components.

Song et al [13] proposes data protection as services, which offer data security and privacy on cloud Platform. These services can be provided using full disk encryption technique but it slow down data access time.

In 2012, S.U.Muthunaga et al [14], surveyed the security threats related to virtualization and proposed an efficient cloud protection system architecture, intends to provide security to the guest VM from the other guest VM. However, the primary aim is limited to focus on the security vulnerabilities in cloud infrastructure. Although there is a considerable amount of on-going research for developing security tools; there is a need to consider the specific challenges faced by Cloud Computing. Since virtualization plays a major role in Cloud Computing, it is essential to consider the additional threats introduced by virtualization. Providing such a kind of complete survey becomes the motivation for us to present our survey.

(3)

60 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org Sravan Kumar R and Ashutosh Saxena [15] have worked to facilitate the client in getting a “proof of integrity of the data “ which he wishes to store in the cloud storage servers with bare minimum costs and efforts. [15]

Proof of retrievability model proposed by Juels and kaliski are among the first few attempt to formalize the notion of “Remotely and reliably verifying data integrity without retrieving the data file”.

Some scheme provides a weaker guarantee by ensuring storage complexity. Earlier scheme has to access entire file from the server to check data integrity and which is not feasible in case of small computational devices.

III. DATA INTEGRITY PROVING SCHEMES:

Juels and Kaliski [16] proposed a scheme called Proof of Retrievability (POR). Proof of retrievability means verify the data stored by user at remote storage in the cloud is not modified by the cloud. POR for huge size of files named as sentinels. The main role of sentinels is cloud needs to access only a small portion of the file (F) instead of accessing entire file. Sravan and Saxena [15] proposed a Schematic view of a proof of retrievability based on inserting random sentinels in the data file

Provable Data Possession (PDP)

Definition: A PDP scheme checks that a file, which consists of a collection of n blocks, is retained by a remote cloud server. The data owner processes the information file to generate some metadata to store it locally. The file is then sent to the server, and the owner deletes the native copy of the file. The owner verifies the possession of file in using challenge response protocol. This technique is used by clients to check the integrity of the data and to periodically check their data that is stored on the cloud server. So this technique ensures server security to the client.

a) Naive Method:

The main idea behind this algorithm is to compare the data. In this method client will compute the hash value for the file F and having key K (i.e. (K, F)) and subsequently it will send the file F to the server.

Clients are having different collection of keys and hash values so it can check multiple check on the file F.

Whenever client wants to check the file it release key K and sends it to the server, which is then asked to recomputed the hash value, based on F and K. Now server will reply back to the client with hash value for comparison.

Limitation

This method gives the strong proof that server is having the original file F.But this method has high overhead as every time hashing process is run over the entire file. It is having very high computation cost.

b) Original Provable Data Possession

In this technique, the data is pre-processed before sending it to the cloud server.

(4)

As shown in fig 1 the data is filled with some tag value or say meta-data for Verification at the client side.

Now entire data is sent over to the server and at the client side meta-data is stored.

This meta-data is used for the verification when user need for it. When user wants to check for integrity it will sends the challenge to the server at that time server will respond with the data. Now the client will compare the reply data with the local meta-data. The characteristic of this scheme is that it supports both encrypted data and plain data. It offers public verifiability. It is efficient because small portion of the file needs to be accessed to generate proof on the server.

Limitation:

This technique is only applicable to the static files (i.e. append-files only).

3.2 Proof Of Retrivability (POR):

Juels and Kaliski [16] proposed a scheme called Proof of Retrievability. Proof of retrievability means Verify the data stored by user at remote storage in the cloud is not modified by the cloud. POR for huge size of files named as sentinels. The main role of sentinels is cloud needs to access only a small portion of the file (F) instead of.

Fig 2-A data file with 6 blocks

In this scheme data are divided into number of block as shown in figure 2. This technique uses the auditing protocol when solving the problem of integrity. Here any client who wants to check the integrity

(5)

62 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org of the outsourced data then there is no need to retrieve full content .Here user stores only a key, which is used to encode a file F which gives the encrypted file F’. This procedure leaves the set of sentinel values at the end of the file F’. Server only stores F’. Server doesn’t know that where the sentinel value are stored because they indistinguishable from regular and it is randomly stored in the file F’.

Author of paper [15] proposed a Schematic view of a proof of retrievability based on inserting random sentinels in the data file. Semantic view of POR is shown in Figure.3

Fig 3-Schematic view of a POR

The above architecture describes that; user (cloud client) likes to store a file (F) in the cloud server (archive). Before storing the file to the cloud, owner needs to encrypt the file in order to prevent from the unauthorized access. When client send the challenge to the server to check for integrity, at that time in challenge response protocol server will ask to return a subset of sentinels in . If the data is tampered or deleted the sentinels may get corrupted or lost and so the server is unable to generate the proof of the original file. In this way client can prove that server has modified or corrupted the file.

IV) Steps for Proof of Retrievability

In this scheme, does not involve the encryption of the whole data. This scheme encrypt only few bits of data per data block thus reducing the computational overhead on the clients. This scheme is very useful for small computations devices like PDA, Mobile etc.

Let us consider a verifier wishes to the store the file with the archive. Let this file F contains a number of block . Now initially we have to append the metadata. Each block has m bits of Meta data in them.

This will look like as follows

Fig 4: A data block of the file F with random bits

(6)

Let us consider g be the function defines as

Where k is no bits in a block which we wish to read as a metadata. This function generates bits position within m bits that are in data block. The value of is choice of verifier and it’s only known to him.

Step 2: Encrypting Metadata:

This function encrypts a metadata from each block using suitable encryption scheme. Let us consider represent the metadata ok k bits in block, the metadata is encrypted using suitable algorithm and it gives modified Meta data

Let we assume be the function which generate bit integer for each .

For the metadata of each data block the number is added to get a new bit number .

= +

Step 3: Appending of Meta data:

All the metadata bit blocks that are generated using the above procedure are to be metadata should be appended to the file before storing it at the cloud server. The file along with the appended metadata

is archived with the cloud

Encrypted File (˜f) =data file (F) + Meta data ( ) Step 4: Verification Phase:

Let the verifier wants to verify the integrity of the file . It throws a challenge to the archive and asks it to respond. The challenge and the response are compared and the verifier accepts or rejects the integrity proof. Suppose the verifier wishes to check the integrity of block. The verifier challenges the cloud storage server by specifying the block number and a bit number generated by using the function which only the verifier knows

The metadata sent by the cloud is decrypted by using the number and the corresponding bit in this decrypted metadata is compared with the bit that is sent by the cloud. Any Mismatch between the two would mean a loss of the integrity.

IV. CONCLUSION AND FUTURE SCOPE

Though Cloud computing offers great potential to improve productivity and reduces costs. It also imposes many new security risks which are related to cloud storage. As cloud is mainly used for the storage of the data, data integrity is the main issue of the client side because after uploading data to the

(7)

64 | © 2014, IJAFRC All Rights Reserved www.ijafrc.org server, client will lost the control of the data. There are so many techniques available in the literature, out of which we have analyze Provable Data Possession (PDP) and Proof of retrievability (POR), This paper facilitate the client in getting a proof of integrity of the data which He/She wishes to store in the cloud storage servers with bare minimum costs and efforts. The scheme used in this paper reduce the computational and storage overhead of the client as well as to minimize the computational overhead of the cloud storage server. This also minimized the size of the proof of data Integrity so as to reduce the network bandwidth consumption. At the client we only store two functions, the bit generator function g, and the function h which is used for encrypting the data. Hence the storage at the client is very much Minimal as compared. It cannot handle to case when the data need to be dynamically changed. Hence developing on this will be a future challenge. Data integrity scheme should be applicable for both malicious and unreliable host this is also another most serious challenge.

V. REFERENCES

[1] Murat Kantarcioglu, Alain Bensoussan and SingRu, “Impact of Security Risks on Cloud Computing Adoption”, IEEE, 2011, pp. 670-674.

[2] Jasti, Amaranth, “Security in Multi-tenancy Cloud”, IEEE 2010.

[3] Bansidhar Joshi, A. Santhana Vijayan, Bineet Kumar Joshi, “Securing Cloud computing Environment against DDoS Attacks”, IEEE, 2011, pp. 1-5.

[4] Ramgovind S, Eloff MM and Smith E, “The Management of Security in Cloud Computing”, IEEE, 2010.

[5] Wentao Liu, “Research on Cloud Computing Security Problem and Strategy”, IEEE, 2012, pp. 1216- 1219.

[6] Nitin Singh Chauhan and Ashutosh Saxena. “Energy Analysis of Security for Cloud Application [7] Rohit Bhadauria, Rituparna Chaki, Nabendu Chaki and Sugata Sanyal, “A Survey on Security Issues

in Cloud Computing”, IEEE 2010

[8] Xiaojun Yu and Qiaoyan Wen “A view about cloud data security from data life cycle”, IEEE 2010.

[9] Jinpeng Wei, Xiaolan Zhang, Glenn Ammons, Vasanth Bala, peng Ning “Managing Security of virtual machine images in a cloud”

[10] Miranda Mowbray, Siani Pearson “A Client –based privacy Manager for Cloud Computing:

OMSWARE '09: Proceedings of the Fourth International ICST Conference on communication system software and middleware”.

[11] Cong Wang, Qian Wang, Kui Ren, and Wenjing Lou, “Ensuring Data Storage Security in Cloud Computing”, 17th International workshop on Quality of Service,2009, IWQoS, Charleston, SC, USA, July 13-15, 2009, ISBN: 978-1-4244-3875-4, pp.1-9.

(8)

April-June 2010, international Journal of Information Security and Privacy.

[13] Song.D, Shi.E, Fischer.I, Shankar.U,”Cloud Data protection for the masses”, IEEE computer Society, Vol: 45, issue. pg.: 39-45, ISSN: 0018-9162.

[14] S.U.Muthunagai, C.D. Karthic and S. Sujatha.”Efficient access of Cloud Resources through Virtualization Techniques”. IEEE, 2012, pp. 174-178.

[15] Saxena, Sravan Kumar and Ashutosh,”Data Integrity Proofs in Cloud Storages”, IEEE 2011.

[16] A. Juels and B.S. Kaliski, Jr., “Pors: proofs of retrievability for large files,” in CCS ’07: Proceedings of the 14th ACM conference on Computer and communications security

[17] Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D. Song. “Provable Data Possession at Untrusted Stores. “ACM Conf. Computer and Comm. Security (CCS ’07), 598-609, 2007.”