Implementation of Authorized Deduplication:
An Approach for Secure Cloud Environment
Vivek Waghmare, Smita Kapse, Purnima Selokar
Student, Dept. of CSE., YCCE, Nagpur, India
Assistant Professor, Dept. of CSE, YCCE, Nagpur, India
Assistant Professor, Dept. of CSE, GHRIETW, Nagpur, India
ABSTRACT: Cloud computing provides an individual user unlimited storage space, availability and accessibility of data anytime at anywhere. Cloud service provider is able to maximize data storage space by integrating data deduplication into cloud storage, although data deduplication removes redundant data occurs in cloud environment. A data privacy preserving is also an important topic to be considered and to achieve this data confidentiality in order to support deduplication.So, a new type of convergent encryption technique is used for encrypting the data before outsourcing. This paper propose how cloud service and storage providers employ data deduplication technique without giving access to either user’s plaintext or user’s decrypted data.
KEYWORDS: Deduplication; privacy preserving; convergent technique.
I. INTRODUCTION
The main purpose of cloud computing is to provide services based on distributed machine and virtual machine technology. It also provides users with reasonable computing services and storage services which benefits greatly. A survey done recently by the New York Times , they had study on rented Amazon’s cloud computing and they found that cloud computing services are used more than 11 million stories in one day to convert electronic document for users’ to browse, and the total cost was around $240[2]. As cloud computing is getting popular day by day as the number of users is increasing, so security issues of the user data on cloud environment are emerging [3]. Users may move its sensitive data in a cloud environment, so to provide a proper security control to protect integrity and privacy of user data remain main concern [13]. Cloud computing technique is computing services through the internet delivering different services on demand. There are different types of services that are provided by cloud computing such as SaaS, PaaS, IaaS etc. to the user [10]. Today’s cloud service providers offer client different type of services such as large storage space and parallel computing of resources at very cheap price. Data deduplication in the cloud is a new technology that works on rapidly increasing amount of digital data in cloud data storage, these techniques used for identifying redundant data. The resulting unique single copy is stored and will then serve to all of the authorized users [12].
A. User:An authorized user who wants to outsource a file on cloud environment and system provide authentication to the user to enter in the system to upload data with particular set of privileges for accessing the uploaded files. B. Public cloud:A public cloud is a storage space which allows storing files of users. In this system we include
authorization deduplication technique which not allows storing duplicate copies of file. Files which are uploaded are in the encrypted form, and those file only decrypted by authorized user only.
C. Administrator:Administrator is TPA work as expertise which is having capabilities where user does not have to trust to access the cloud storage services reliability on behalf of the user request. Private keys for privileges are managed by administrator, to get file token requests from user.
token present in database then sends an acknowledgement as duplicate file found so file can’t be uploaded on cloud. If token not present in database then token gets updated in database and sends an acknowledgement as duplicate file not found so file can uploaded on cloud. At time of file uploading, file is get encrypted with token as encrypted keys and then uploaded this encrypted file on cloud. So, on the public cloud storage files which stored are in encrypted format [1].
Fig 1. Architecture of System
II. RELATED WORK
In [9], the authors provide a system which consists of block level deduplication, convergent encryption techniques. They also used metadata manager for key management. The authors designed a system which achieves confidentiality and enables block-level deduplication at the same time. Their system is built on top of convergent encryption. They showed that it is worth performing block-level deduplication instead of file level deduplication since the gains in terms of storage space are not affected by the overhead of metadata management, which is minimum. In this paper, they mainly focused on the definition of the two most important operations in cloud storage that are storage and retrieval.
In [13], the authors used algorithms which are symmetric encryption, convergent algorithms and Proof of Ownership. An authorized data deduplication technology was proposed to protect the data security by including differential privileges of users in the duplicate check. In this system tokens of files are generated by the private cloud server with private keys. This system security analysis demonstrates that these schemes are secure in terms of insider and outsider attacks specified in the proposed security model.
III.PROPOSED SCHEME
In the authorized deduplication system terminologies which are uses are listed below:
A. Symmetric Encryption:An encryption technique where same key k is used for encryption as well as for decryption of data [13]. A symmetric encryption scheme consists of three primitive functions which are show below:
KeyGen (1λ): It is the key generation algorithm that generates key k using security parameter 1λ.
Dec (k, C): This is the symmetric decryption algorithm that takes the secret k and cipher text C and then give output as original message M [14].
B. De-Dupplication:A hashing function can be used to return a unique key for a file, based only on the contents of the data; if two file is having the same data so then the hashing function will return the same key for these two files [14]. If this key is used as the index for storing file, then any attempt to store multiple copies of the same file will be detected immediately [13].
C. Convergent Encryption:An encrypting data does not validates in the de-duplication, if two identical files encrypted with different keys will produce different encrypted data blocks which can no longer be shared. So, to overcome this, a new technique is introduced which is known as convergent encryption. The convergent encryption work is to derive the encryption key from the hash of the content of a plaintext [12] [14]. If two users with two identical plaintexts will obtain two identical cipher texts then the encryption key is same. Each file now has a separate encryption key, some mechanism is required for each owner to record and retrieve the keys from associated data blocks [14]. As the encryption key is generated from the plaintext so, there is no need for establish an agreement for key generation. Hence, convergent encryption is very good for deduplication in cloud environment [13].
D. Token Generation:The token generation is a function where a hash value of file is calculated. Hash function produces a unique value for each file; in the authorized deduplication system we used a SHA-256 for hash value calculation of file. In below Fig.2 show how token is generated for a file, a variable length file is pass through cryptographic hash function H and produce fixed length value that is token for the given file [1].
IV.EXPERIMENTAL RESULTS
A. File Size:This is factor effects on time required on processing of file on authorized deduplication system. The time required on encryption, upload increase with respect to increase in file size, but the other operation such as token generation and duplicates check time remain constant throughout.
B. Number of Stored Files:This factor is remaining main factor for this system where effect of number of stored file in system. If large number of files is stored on system so the time taken for checking token for each file in the system requires is get increases, this token checking done on hash table with help of linear search. Despite of the
linear search, time taken in duplicate not goes in worst case because there is low collision probability.
A time required for each individual process in authorized deduplication system, is shown in below table 1, there are different columns such as file size, duplicate checking time, uploading time, and last column total time which gives overall time required for complete process. As shown in table entries as size of file is increases total time required for process a file is also get increases. Other column which is duplicate checking time is kept increasing as the number of file are get increases.
Table 1 Experimental Results
Sr. No.
File Size(KB) Duplicate checking time(Sec.)
File Uploading Time(Sec.)
Total Time (Sec.)
1 2 0.171 4.762 4.933
2 2 0.083 2.647 2.73
4 6 0.076 4.315 4.391
5 7 0.067 3.939 4.006
6 8 0.1 3.228 3.328
7 8 0.091 3.742 3.833
8 12 0.07 3.939 4.009
9 21 0.018 4.13 4.148
10 22 0.045 2.237 2.282
11 22 0.068 3.804 3.872
12 24 0.095 3.93 4.025
13 42 0.07 5.02 5.09
14 44 0.082 5.863 5.945
15 60 0.065 5.4 5.465
17 76 0.068 7.077 7.145
18 121 0.089 8.828 8.917
19 196 0.13 11.223 11.353
20 291 0.15 16.491 16.641
21 382 0.117 17.136 17.253
22 446 0.145 21.939 22.084
24 510 0.712 24.571 25.283
25 710 0.751 32.795 33.546
26 745 0.73 36.149 36.879
27 777 0.748 34.792 35.54
28 893 0.824 39.637 40.461
29 919 0.897 40.336 41.233
If a duplicate files are uploaded on cloud environment then how much time will required for complete processing. In table 2 show, there are different columns such as file size, duplicate checking time, uploading time, and last column total time which gives overall time required for complete process. If duplicate is detect, then time required for uploading file is save.
Table 2.Deduplicaton Time
Sr. No. File Size(KB)
Duplicate checking time(Sec.)
File Uploading Time(Sec.)
Total Time(Sec.)
1 60 0.065 5.4 5.465
2 76 0.068 7.077 7.145
3 121 0.089 8.828 8.917
4 196 0.13 11.223 11.353
5 291 0.15 16.491 16.641
6 382 0.117 17.136 17.253
7 446 0.145 21.939 22.084
8 510 0.712 24.571 25.283
9 710 0.751 32.795 33.546
10 745 0.73 36.149 36.879
11 777 0.748 34.792 35.54
12 893 0.824 39.637 40.461
13 919 0.897 40.336 41.233
14 55 0.811 14.378 15.189
V. ANALYSIS
A. File Size: This is factor effects on time required on processing of file on authorized deduplication system. The time required on encryption, upload increase with respect to increase in file size. In the below Figure 3, show if the size of file is increases total time required for complete process is increases.
0 5 10 15 20 25 30
0 10 20 30 40 50
0 5 10 15 20 25 30
0 200 400 600 800 1000
File Size
B. Number of Stored Files:If large number of files is stored on system so the time taken for checking token for each file in the system requires is increases, as it shown in below Figure 4, in below Figure duplicate checking time is get increases as the number of stored file are get increases.
Fig 4 Number of Stored Files
C. Deduplication Ratio:The deduplicaton ratio for space is defined as percent of storage space that has save by using this authorized deduplication systems, to the original system. In the figure 5 shown that around 30% data was found to be duplicate this duplicate data are get removed by authorized deduplication system. A bandwidth which was required for this 30% duplicate data was also get saved. The deduplicaton ratio is defined as percent of Time that has save by using this authorized deduplication systems, to the original system. If a duplicate data were found then authorized deduplication system does not allowed to upload this data on cloud space. So, the time required for the uploading was also gets saved. In the figure 5 only 2 % time required as compared total time for complete process.
Fig.5Deduplication ratio for Space and Time
VI.CONCLUSION AND FUTURE WORK
In this paper, we propose secure authorized deduplication with the help of token generation mechanism. Cloud storage services offers on demand virtualized storage resources and customers only pay for the space they actually consumed.
0 0.2 0.4 0.6 0.8 1
0 10 20 30 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Duplication Checking time
Sr. No. Duplicate checking time
70% 30%
Deduplication Ratio for Space
Total Uploaded Size Duplicate found size
Deduplication Ratio for Time
TOTAL UPLOAD TIME
As the increasing demand and data store in the cloud, data deduplication is one of the techniques used to improve storage efficiency. Data deduplication is a specialized data compression technique for eliminating duplicate copies of data in storage. In this paper, we also presented convergent encryption mechanism which provided a security to our system. An encrypted file gets stored on cloud storage by using this convergent encryption. This deduplication techniques contributes a lot of benefits, along with security and privacy concerns are also get solve. As shown in analysis part as the around 2% of time required for processing as compare to the total time required for uploading time. And around 30% data were found to be duplicate data so the space required for the data are also get saved. Hence, from the analysis we come to conclusion that by using authorized dedupliation system we can save space, time and also bandwidth which are required for storing data on cloud environment.
REFERENCES
1. SmitaKapse; VivekWaghmare, “Authorized Deduplication: An Approch for Secure Cloud Environment,” in proceeding of the 1st
International Conference on Information Security and Privacy, Nagpur, pp.815-823, 2016.
2. Du Meng, “Data Security in Cloud Computing,” In proceeding of the 8th International Conference on Computer Science and Education, Colombo, pp.810-813, 2013.
3. Fawaz S. An-Anzi; Ayed A. Salman; Noby K. Jocob; JyotiSoni, “Towards Robust, Scalable and Secure Network Storage in Cloud Computing” In proceeding of the 4th International Conference on Digital information and Communication Technology and it’s
Application, Bangkok, pp.51-55, 2014.
4. ShaiHalevi; Danny Harnik; Benny Pinkas and Alexandra Shulman-Peleg, “Proofs of Ownership in Remote Storage Systems,” in proceeding of the 18th ACM Conference on Computer and Communication Security, New York, pp.491-500, 2011.
5. WarapornLeesakul; Paul Townend; and JieXu, “Dynamic Data Deduplication in Cloud Storage,” in proceeding of the IEEE 8th International Symposium on Service Oriented System Engineering, Oxford, pp.320-325, 2014.
6. Jingwei Li; Jin Li; DongqingXie; and Zhang Cai, “Secure Auditing and Deduplicating Data in Cloud,” in proceeding of the IEEE Transaction. On Computers, pp. 1, 2014.
7. Fatema Rashid; Ali Miri; Isaac Woungang, “A Secure Data Deduplication Framework for Cloud Environment,” in proceeding of the 10th Annual International Conference on Privacy, Security and Trust, Paris, pp.81-87, 2012.
8. Xuexue Jin; LingboWeiy; Mengke Yu; NenghaiYuandJinyuan Sun, “Anonymous Deduplication of Encrypted Data with Proof of Ownership in Cloud Storage,” in proceeding of the 2nd IEEE/CIC International Conference on Communications, Xi’an, pp. 224-229, 2013.
9. Pasquale Puzio; RefikMolva; MelekOnen; Sergio Loureiro, “ClouDedup: Secure Deduplication with Encrypted Data for Cloud Storage,” in proceeding of the IEEE International Conference on Cloud Computing Technology and Science, Bristol, pp. 363-370, 2013.
10. Hu Shuijing, “Data Security: the Challenges of Cloud Computing,” in proceeding of the Sixth International Conference on Measuring Technology and Mechatronics Automation, Zhangjiajie, pp. 203-206, 2014.
11. V.Nirmala; R.K.Sivanandhan; Dr.R.Shanmuga Lakshmi, “Data Confidentiality and Integrity verification using user authentication scheme in cloud,” in proceeding of the IEEE Conference on Green High Performance Computing , Nagercoil, pp.1-5, 2013.
12. Paul Anderson; Le Zhang, “Fast and Secure Laptop Backups with Encrypted De-duplication,” in proceeding of the ACM 24th international conference on Large installation system administration, Berkeley, pp.1-8, 2010.
13. Jin Li; Yan Kit Li; Xiaofeng Chen; Patrick P.C. Lee; and Wenjing Lou, “A Hybrid Cloud Approach for Secure Authorized Deduplication,” in proceeding of the IEEE Transaction on Parallel and Distributed System, pp.1206-1216, 2015.
14. Jin Li; Yan Kit Li; Xiaofeng Chen; Patrick P.C. Lee; and Wenjing Lou, “Secure Deduplication with Efficient and Reliable Convergent Key Management,” in proceeding of the IEEE Transaction on Parallel and Distributed System, pp. 1615-1625, 2014.
15. Kirubakaran. R; Prathibhan; C Mano; Karthika, C, “A cloud based model for deduplication of large data,” in proceeding of the IEEE International Conference on Engineering and Technology, Coimbatore, pp. 1-4, 2015.
BIOGRAPHY
VivekWaghmareis a student in the Computer Science and Engineering Department, YeshwantraoChavan College of Engineering is an Autonomous engineering college under theNagpur University, India.
Smita R. Kapseis an AssistantProfessor in the Computer Science and Engineering Department, YeshwantraoChavan College of Engineering is an Autonomous engineering college under the NagpurUniversity, India.