Security Challenges in Cloud Storage

(1)

Security Challenges in Cloud Storage

Ginjala Srikanth Reddy

Asst. Professor Dept of CSE-KITS Khammam-A.P, India

Annamchinni Dileep

Kumar

Asst. Professor Dept of CSE-KITS Khammam-A.P, India

P.Narasimha Rao

Asst. Professor Dept of CSE – SRR Khammam-A.P, India

Kuthumbaka Kavitha

Asst. Professor

Dept. of C.S. S.R. & B.G.N.R. Khammam-A.P, India

Abstract—Cloud computing is the delivery of computing

and storage capacity as a service to a heterogeneous community of end-recipients. Cloud computing has brought advantages in the form of online storage. Data security for such a cloud service encompasses several aspects including secure channels, access controls, and encryption. And, when we consider the security of data in a cloud, we must consider the security triad: confidentiality, integrity, and availability. In the cloud storage model, data is stored on multiple virtualized servers. Physically the resources will span multiple servers and can even span storage sites. Cloud computing has been envisioned as the de-facto solution to the rising storage costs of IT Enterprises. One of the important concerns that need to be addressed is to assure the customer of the integrity i.e. correctness of his data in the cloud. As the data is physically not accessible to the user the cloud should provide a way for the user to check if the integrity of his data is maintained or is compromised.

In this paper we provide a scheme which gives a proof of data integrity in the cloud which the customer can employ to check the correctness of his data in the cloud. This proof can be agreed upon by both the cloud and the customer and can be incorporated in the Service level agreement (SLA). This scheme ensures that the storage at the client side is minimal which will be beneficial for thin clients.

Key Words —Cloud Computing, confidentiality, integrity, and availability.

I. I

NTRODUCTION

A common aspect of many cloud-based storage offerings is the reliability and availability of the service. The goal of cloud computing is to apply traditional supercomputing, or high-performance computing power, normally used by military and research facilities, to perform tens of trillions of computations per second, in consumer-oriented applications such as financial portfolios or even to deliver personalized information, or power immersive computer games.

To do this, cloud computing networks large groups of servers, usually those with low-cost consumer PC technology, with specialized connections to spread data-processing chores across them. This shared IT infrastructure contains large pools of systems that are linked together. Often, virtualization techniques are used to maximize the power of cloud computing. The standards for connecting the computer systems and the software needed to make cloud computing work are not fully defined at present time, leaving many companies to define their own cloud computing technologies. Cloud computing systems offered by companies, like IBM's "Blue Cloud" technologies for example, are based on open standards and open source software which link together computers that are used to to deliver Web 2.0 capabilities

like mash-ups or mobile commerce.

Cloud computing has started to obtain mass appeal in corporate data centers as it enables the data center to operate like the Internet work through the process of enabling computing resources to be accessed and shared as virtual resources in a secure and scalable manner.

For a small and medium size business (SMB), the benefits of cloud computing is currently driving adoption. In the SMB sector there is often a lack of time and financial resources to purchase, deploy and maintain an infrastructure (e.g. the software, server and storage). In cloud computing, small businesses can access these resources and expand or shrink services as business needs change. The common pay-as-you-go subscription model is designed to let SMBs easily add or remove services and you typically will only pay for what you do use.

Cloud computing services need to address the security during the transmission of sensitive data and critical applications to shared and public cloud environments. The cloud environments are scaling large for data processing and storage needs. Cloud computing environment have various advantages as well as disadvantages on the data security of service consumers. In Cloud computing environment data protection as the most important security issue. In this issue, it concerns Include the way in which data is accessed and stored, audit requirements, compliance, and notification Requirements, issues involving the cost of data breach, and damage to brand value. In the cloud storage infrastructure, regulated and sensitive data needs to be properly segregated. It is very new Concept which is use for Data securing with the Help of Digital Signature in the cloud computing. In the service provider’s data center, protecting data privacy and managing compliance are critical by using encrypting and managing encryption keys of data in transfer to the cloud

(2)

Storing of user data in the cloud despite its advantages has many interesting security concerns which need to be extensively investigated for making it a reliable solution to the problem of avoiding local storage of data [2]. Many problems like data authentication and integrity (i.e., how to efficiently and securely ensure that the cloud storage server returns correct and complete results in response to its clients’ queries [3]), outsourcing encrypted data and associated difficult problems dealing with querying over encrypted domain [4] were discussed in research literature. In this paper we deal with the problem of implementing a protocol for obtaining a proof of data possession in the cloud sometimes referred to as Proof of retrievability (POR).This problem tries to obtain and verify a proof that the data that is stored by a user at a remote data storage in the cloud (called cloud storage archives or simply archives) is not modified by the archive and thereby the integrity of the data is assured. Such kinds of proofs are very much helpful in peer-to-peer storage systems, network file systems, long term archives, web-service object stores, and database systems. Such verification systems prevent the cloud storage archives from misrepresenting or modifying the data stored at it without the consent of the data owner by using frequent checks on the storage archives. Such checks must allow the data owner to efficiently, frequently, quickly and securely verify that the cloud archive is not cheating the owner. Cheating, in this context, means that the storage archive might delete some of the data or may modify some of the data. It must be noted that the storage server might not be malicious; instead, it might be simply unreliable and lose or inadvertently corrupt the hosted data. But the data integrity schemes that are to be developed need to be equally applicable for malicious as well as unreliable cloud storage servers. Any such proofs of data possession schemes do not, by itself, protect the data from corruption by the archive. It just allows detection of tampering or deletion of a remotely located file at an unreliable cloud storage server. To ensure file robustness other kind of techniques like data redundancy across multiple systems can be maintained. While developing proofs for data possession at untrusted cloud storage servers we are often limited by the resources at the cloud server as well as at the client. Given that the data sizes are large and are stored at remote servers, accessing the entire file can be expensive in I/O costs to the storage server. Also transmitting the file across the network to the client can

consume heavy bandwidths. Since growth in storage capacity has far outpaced the growth in data access as well as network bandwidth, accessing and transmitting the entire archive even occasionally greatly limits the scalability of the network resources.

Fig.2. Schematic view of a proof of retrievability based on inserting random sentinels in the data file F[3].

The problem is further complicated by the fact that the owner of the data may be a small device, like a PDA (personal digital assist) or a mobile phone, which have limited CPU power, battery power and communication bandwidth. Hence a data integrity proof that has to be developed needs to take the above limitations into consideration.

Fig.3. Cloud Computing Data Security Mode [1]

The scheme should be able to produce a proof without the need for the server to access the entire file or the client retrieving the entire file from the server. Also the scheme should minimize the local computation at the client as well as the bandwidth consumed at the client.

II. D

ATA

S

ECURITY

P

ROBLEM

O

F

C

LOUD

C

OMPUTING

(3)

change or migration does not affect the services provided by the service provider. if user need more services, the provider can meet user’s needs without having to concern the physical hardware. However, the virtual server from the logical server group brings a lot of security problems. The traditional data center security measures on the edge of the hardware platform, while cloud computing may be a Server in a number of virtual servers, the virtual server may belong to different logical server group, virtual server; therefore there is the possibility of attacking each other, which brings virtual servers a lot of security threats. Virtual machine extending the edge of clouds makes the disappearance of the network boundary, thereby affecting almost all aspects of security, the traditional physical isolation and hardware-based security infrastructure can not stop the clouds computer environment of mutual attacks between the virtual machine.

B.Consistency of Data

Cloud environment is a dynamic environment, where the user's data transmits from the data centre to the user's client. For the system, the user's data is changing all the time. Read and write data relating to the identity of the user authentication and permission issues. In a virtual machine, there may be different users’ data which must be strict managed. The traditional model of access control is built in the edge of computers, so it is weak to control reading and writing among distributed computers. It is clear that traditional access control is obviously not suitable for cloud computing environments. In the cloud computing environment, the traditional access control mechanism has serious shortcomings.

C .New Technology

The concept of cloud computing is built on new architecture. The new architecture comprised of a variety of new technologies, such as Hadoop, Hbase, which enhances the performance of cloud systems but brings in risks at the same time. In the cloud environment, users create many dynamic virtual organizations, first set up in co-operation usually occurs in a relationship of trust between organizations rather than individual level. So those users based on the expression of restrictions on the basis of proof strategy is often difficult to follow; which frequently occurs in many of the interactive nodes between the virtual machine, and is dynamic, unpredictable. Cloud computing environment provides a user of the "buy" the full access to resources which has also increased security risks.

III. R

EQUIREMENT OF

S

ECURITY

Following, with the analysis of the widely used cloud computing technology--HDFS (Hadoop Distributed File System), we will get the data security needs of cloud computing. HDFS is used in large-scale cloud computing in a typical distributed file system architecture, its design goal is to run on commercial hardware, due to the support of Google, and the advantages of open source, it has been applied in the basis of cloud facilities. HDFS is very similar to the existing distributed file system, such as GFS

(Google File System); they have the same objectives, performance, availability and stability. HDFS initially used in the Apache Nutch web search engine and become the core of Apache Hadoop project. HDFS used the master/slave backup mode. As shown in Figure 4. The master is called Namenode, which manages the file system name space and controls access to the client. Other slave nodes is called Datanode, Datanode controls access to his client. In this storage system, a file is cut into small pieces of paper, Namenode maps the file blocks to Datanodes above. While HDFS does not have the POSIX compatibility, the file system still support the creation, delete, open, close, read, write and other operations on files.

Fig.4. HDFS Architecture

By analyzing of HDFS [1], data security needs of cloud computing can be divided into the following points, the client authentication requirements in login, the vast majority of cloud computing through a browser client, such as IE, and the user’s identity as a cloud computing applications demand for the primary needs. The existence of a single point of failure in Namenode: if namenode is attacked or failure, there will be disastrous consequences on the system. So the effectiveness of Namenode in cloud computing and its efficiency is key to the success of data protection, so to enhance Namenode’s security is very important. The rapid recovery of data blocks and r/w rights control: Datanode is a data storage node, there is the Possibility of failure and cannot guarantee the availability of data. Currently each data storage block in HDFS has at least 3 replicas, which is HDFS’s backup strategy. When comes to how to ensure the safety of reading and writing data, HDFS has not made any detailed explanation, so the needs to ensure rapid recovery and to make reading and writing data operation fully controllable cannot be ignored.

(4)

as access control, file encryption, such as demand for cloud computing model for data security issues must be taken into account.

IV. R

ELATED

W

ORK

The simplest Proof of retrivability (POR) scheme can be made using a keyed hash function hk(F). In this scheme the verifier, before archiving the data file F in the cloud storage, pre-computes the cryptographic hash of F using hk(F) and stores this hash as well as the secret key K. To check if the integrity of the file F is lost the verifier releases the secret key K to the cloud archive and asks it to compute and return the value of hk(F). By storing multiple hash values for different keys the verifier can check for the integrity of the file F for multiple times, each one being an independent proof. Though this scheme is very simple and easily implementable the main drawback of this scheme are the high resource costs it requires for the implementation. At the verifier side this involves storing as many keys as the number of checks it want to perform as well as the hash value of the data file F with each hash key. Also computing hash value for even a moderately large data files can be computationally burdensome for some clients (PDAs, mobile phones, etc). As the archive side, each invocation of the protocol requires the archive to process the entire file F. This can be computationally burdensome for the archive even for a lightweight operation like hashing. Furthermore, it requires that each proof requires the prover to read the entire file F - a significant overhead for an archive whose intended load is only an occasional read per file, were every file to be tested frequently [5]. Ari Juels and Burton S. Kaliski Jr proposed a scheme called Proof of retrievability for large files using ”sentinels”[5]. In this scheme, unlike in the key-hash approach scheme, only a single key can be used irrespective of the size of the file or the number of files whose retrievability it wants to verify. Also the archive needs to access only a small portion of the file F unlike in the key-has scheme which required the archive to process the entire file F for each protocol verification. This small portion of the file F is in fact independent of the length of F. The schematic view of this approach is shown in Figure 1.In this scheme special blocks (called sentinels) are hidden among other blocks in the data file F. In the setup phase, the verifier randomly embeds these sentinels among the data blocks. During the verification phase, to check the integrity of the data file F, the verifier challenges the prover (cloud archive) by specifying the positions of a collection of sentinels and asking the prover to return the associated sentinel values. If the prover has modified or deleted a substantial portion of F, then with high probability it will also have suppressed a number of sentinels. It is therefore unlikely to respond correctly to the verifier.To make the sentinels indistinguishable from the data blocks, the whole modified file is encrypted and stored at the archive. The use of encryption here renders the sentinels indistinguishable from other file blocks. This

scheme is best suited for storing encrypted files. As this scheme involves the encryption of the file F using a secret key it becomes computationally cumbersome especially when the data to be encrypted is large. Hence, this scheme proves disadvantages to small users with limited computational power (PDAs, mobile phones etc.). There will also be a storage overhead [14] at the server, partly due to the newly inserted sentinels and partly due to the error correcting codes that are inserted. Also the client needs to store all the sentinels with it, which may be a storage overhead to thin clients (PDAs, low power devices etc.).

V. O

UR

C

ONTRIBUTION

We present a scheme which does not involve the encryption of the whole data. We encrypt only few bits of data per data block thus reducing the computational overhead on the clients.

Fig.6. A data file F with 6 data blocks

The client storage overhead is also minimized as it does not store any data with it. Hence our scheme suits well for thin clients. In our data integrity protocol the verifier needs to store only a single cryptographic key- irrespective of the size of the data file F- and two functions which generate a random sequence. The verifier does not store any data with it. [13] The verifier before storing the file at the archive, preprocesses the file and appends some meta data to the file and stores at the archive. At the time of verification the verifier uses this meta data to verify the integrity of the data. It is important to note that our proof of data integrity protocol just checks the integrity of data i.e. if the data has been illegally modified or deleted. It does not prevent the archive from modifying the data. In order to prevent such modifications or deletions other schemes like redundant storing etc, can be implemented which is not a scope of discussion in this paper.

VI. A

D

ATA

I

NTEGRITY

P

ROOF IN

C

LOUD

B

ASED ON

S

ELECTING

R

ANDOM

B

ITS IN

D

ATA

B

LOCKS

(5)

A. Setup phase

Let the verifier V wishes to the store the file F with the archive. Let this file F consist of n file blocks. We initially preprocess the file and create metadata to be appended to the file. Let each of the n data blocks have m bits in them. A typical data file F which the client wishes to store in the cloud is shown in Figure 7. The initial setup phase can be described in the following steps 1) Generation of meta-data: Let g be a function defined as Follows

(3) Where k is the number of bits per data block which we wish to read as meta data. The function g generates for each data block a set of k bit positions within the m bits that are in the data block. Hence g(i, j) gives the jth bit in the ith data block. The value of k is in the choice of the verifier and is a secret known only to him. Therefore for each data block we get a set of k bits and in total for all the n blocks we get n * k bits. Let mi represent the k bits of meta data for the ith block. Figure 8 shows a data block of the file F with random bits selected using the functioning.

2) Encrypting the meta data: Each of the meta data from the data blocks mi is encrypted by using a suitable algorithm to give a new modified meta data Mi. Without loss of generality we show this process by using a simple XOR operation. Let h be a function which generates a k bit integer ai for each i. This function is a secret and is known only to the verifier V.

(4)

For the meta data (mi) of each data block the number ai is added to get a new k bit number Mi.

(5)

In this way we get a set of n new meta data bit blocks. The encryption method can be improvised to provide still stronger protection for verifiers data.

3) Appending of meta data: All the meta data bit blocks that are generated using the above procedure are to be concatenated together. This concatenated meta data should be appended to the file F before storing it at the cloud server. The file F along with the appended meta data e F is archived with the cloud. Figure 4 shows the encrypted file e F after appending the meta data to the data file F.

B. Verification phase

Let the verifier V want to verify the integrity of the file F. It throws a challenge to the archive and asks it to respond. The challenge and the response are compared and the verifier accepts or rejects the integrity proof. Suppose the verifier wishes to check the integrity of nth block. The verifier challenges the cloud storage server by

Fig.7. A data block of the file F with random bits selected in it

Fig.7. The encrypted file which will be stored in the cloud.

Specifying the block number i and a bit number j generated by using the function g which only the verifier knows. The verifier also specifies the position at which the meta data corresponding the block i is appended. This meta data will be a k-bit number. Hence the cloud storage server is required to send k+1 bits for verification by the client. The meta data sent by the cloud is decrypted by using the number _i and the corresponding bit in this decrypted meta data is compared with the bit that is sent by the cloud. Any mismatch between the two would mean a loss of the integrity of the clients data at the cloud storage.

VII. C

ONCLUSION AND

F

UTURE

W

ORKS

(6)

R

EFERENCES

[1] Dai Yuefa, Wu Bo, Gu Yaqiang, Zhang Quan, Tang Chaojing “Data Security Model for Cloud Computing” Proceedings of the 2009 International Workshop on Information Security and Application.

[2] Cong Wang, Qian Wang, and Kui Ren, Wenjing Lou “Ensuring Data Storage Security in Cloud Computing” Trans. Storage, vol. 3, no. 2, pp. 121–158, 2008.

[3] E. Mykletun, M. Narasimha, and G. Tsudik, “Authentication and integrity in outsourced databases,” Trans. Storage, vol. 2, no. 2, pp. 107–138, 2006.

[4] D. X. Song, D. Wagner, and A. Perrig, “Practical techniques for searches on encrypted data,” in SP ’00: Proceedings of the 2000 IEEE Symposium on Security and Privacy. Washington, DC, USA: IEEE Computer Society, 2000, p. 44.

[5] A. Juels and B. S. Kaliski, Jr., “Pors: proofs of retrievability for large files,” in CCS ’07: Proceedings of the 14th ACM conference on Computer and communications security. New York, NY, USA: ACM, 2007, pp. 584–597.

[6] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D. Song, “Provable data possession at untrusted stores,” in CCS ’07: Proceedings of the 14th ACM conference on Computer and communications security. New York, NY, USA: ACM, 2007, pp. 598–609.

[7] L. M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A break in the clouds: towards a cloud definition,” ACM SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 50–55, 2009.

[8] S. Kamara and K. Lauter, “Cryptographic cloud storage,” in RLCPS, January 2010, LNCS. Springer, Heidelberg.

[9] A. Singhal, “Modern information retrieval: A brief overview,” IEEE Data Engineering Bulletin, vol. 24, no. 4, pp. 35–43, 2001. [10] I. H. Witten, A. Moffat, and T. C. Bell, “Managing gigabytes:

Compressing and indexing documents and images,” Morgan Kaufmann Publishing, San Francisco, May 1999.

[11] D. Song, D. Wagner, and A. Perrig, “Practical techniques for searches on encrypted data,” in Proc. of S&P, 2000.

[12] E.J. Goh, “Secure indexes,” Cryptology ePrint Archive, 2003, http:// eprint.iacr.org/2003/216.

[13] Y.-C. Chang and M. Mitzenmacher, “Privacy preserving keyword searches on remote encrypted data,” in Proc. of ACNS, 2005.

[14] R. Curtmola, J. A. Garay, S. Kamara, and R. Ostrovsky, “Searchable symmetric encryption: improved definitions and efficient constructions,” in Proc. of ACM CCS, 2006.

A

UTHOR

’

S

P

ROFILE

Ginjala Srikanth Reddy

Received B.Tech degree in Computer Science & Engineering from JNTU Hyderabad, Received M.Tech degree in Web Technologies from JNTU Hyderabad, and currently working as an Asst. Professor, Dept of CSE in Khammam Institute of

Technology & Sciences, Khammam.G.Srikanth Reddy may be reached at E-mail ID : [email protected]

Annamchinni Dileep Kumar

Received MCA degree in Computer Applications from KU Warangal, Pursuing M.Tech degree in Computer Science & Engineering from JNTU Hyderabad.Annamchinni Dileep Kumar may be

reached at E-mail ID : [email protected]

P. Narasimha Rao

Received MCA degree in Computer Applications from KU Warangal, Received M.Tech degree in Web Technologies from JNTU Hyderabad, and currently working as an Assoc. Professor, Dept of CSE in SRR ENGINEERING COLLEGE, Khammam. P.NARASIMHA RAO may be reached at E-mail ID :[email protected]