• No results found

“IMPLEMENTATION OF DATA DEDUPLICATION OVER THE CLOUD”

N/A
N/A
Protected

Academic year: 2022

Share "“IMPLEMENTATION OF DATA DEDUPLICATION OVER THE CLOUD”"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Technology (IJRCIT), Vol. 3, Issue 2, Narch-2018 ISSN:2455-3743

Copy Right to GARPH Page 22

“IMPLEMENTATION OF DATA DEDUPLICATION OVER THE CLOUD”

1ANKITA SHAHU

Department of Computer Science and Engineering, Gurunanak Institute of Engineering and Technology, Nagpur, India [email protected]

2TINA THAKUR

Department of Computer Science and Engineering, Gurunanak Institute of Engineering and Technology, Nagpur, India [email protected]

3BHAWNA PUROHIT

Department of Computer Science and Engineering, Gurunanak Institute of Engineering and Technology, Nagpur, India [email protected]

4AKSHAY PATIL

Department of Computer Science and Engineering, Gurunanak Institute of Engineering and Technology, Nagpur, India [email protected]

5MAYUR ATHWANI

Department of Computer Science and Engineering, Gurunanak Institute of Engineering and Technology, Nagpur, India mayur.athwani.014gmail.com

6PROF. AYAZ KHAN

Department of Computer Science and Engineering, Gurunanak Institute of Engineering and Technology, Nagpur, India [email protected]

ABSTRACT: Cloud computing deals with the practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer. The most important and popular cloud service is data storage.

The data which is of primary importance is stored in an encrypted manner over the cloud so as to maintain its privacy. Hackers can access the data and can cause serious damages to the proprietary data and for its privacy concern data is stored in encrypted form.

Now in cloud storage, deduplication technique is used by cloud service provider to maintain the space complexity, privacy and to avoid redundancy. This is done by avoiding duplication of data and maintains a single copy of that particular file. However in the management of big data using encrypted manner for deduplication files is tedious task. Traditional deduplication scheme cannot work in encrypted manner because encrypted data are saved as different contents by applying different encryption keys. Whereas the present scenario of deduplication scheme lacks privacy or it is suffered from weakness in providing privacy In this paper proposed system for data deduplication over the cloud storage.

Keywords: deduplication, Cloud computing, cloud service provider, redundancy

1. INTRODUCTION

Cloud computing offers different services by reorganizing various resources like storage space and providing them to users as per their demands. Cloud computing contains large number of shared pool of resources and a complex network communications at cloud server side that makes any connection enable to provide constant , scalable and quick delivery of services on demand. This type of infrastructure preparations increases system’s complexity at the host end and makes reliability factor critical to analyze where as scalability is a major concern to reassure availability of significant services as well as application execution [1]. Cloud clients transfer individual or classified information to the server side of a Cloud Service Provider and permit it to keep up this information. The main goal is to safeguard the privacy of information holders, although information is regularly put

away in cloud in an encrypted and distribute form. But this distributed information presents new difficulties for cloud deduplication, which gets to be distinctly vital for massive information stockpiling and handling in cloud. Customary deduplication plans can't take a shot at jumbled information.

2. LITERATURE REVIEW

HimshaiKamboet. al. [1] proposed a secure deduplication mechanism based on CDC and MD5has been implemented. In this mechanism real dataset is collected and then CDC algorithm is applied on that data to create random chunks.

MD5 algorithm is used for creating hash value of theses chunks created by CDC which is further inputed as a sorted chunked file into our previous model DeDup resulted into

(2)

Copy Right to GARPH Page 23

like deduplication efficiency increase and improvement in security. In CDC mechanism masking is used to generate high deduplication ratio. To implement proposed mechanism hadoop file distributed system is used at back end which helps improving network bandwidth. DeDup app is created to provide better GUI of chunking mechanism helps working easier for the encrypted data being operated on cloud storage.

Ankush R. Deshmukhet. al. [2] proposed a system in which data is stored onto cloud in the encrypted form. At the same time, checking duplication before storing data onto cloud so that the space on cloud should not go waste and rent that user has to pay to CSPs will get reduced. Access permission mechanism is also used in which user can have decryption and deduplication access permission. At the time of storing data on cloud, life span to each data item will be provided Self data destruction algorithm is used to automatically delete the data on cloud whose life span has finished.

R. Shobanaet. al. [3] compresses the data by removing the duplicate copies of identical data and it is extensively used in cloud storage to save bandwidth and minimize the storage space. To secure the confidentiality of sensitive data during deduplication, the convergent encryption technique is used to encrypt the data before outsourcing.

Dastagir Shaikh et. al. [4] propose a scheme to deduplicate encrypted data stored in cloud based on ownership challenge and proxy re-encryption. It integrates cloud data deduplication with access control. We evaluate its performance based on extensive analysis and computer simulations. The results show the superior efficiency and effectiveness of the scheme for potential practical deployment, especially for big data deduplication in cloud storage.

G.Karthika1 et. al.[5] Cloud computing is an emerging technology; it provides many services regarding software, infrastructure, storage, etc. Storage as a service is very eminent service and makes users to use their storage space resourcefully. In cloud computing, to reduce the utilization of storage area and bandwidth data deduplication technique is used. This technique eliminates the redundancy data. To provide semantic security for the data, a new scheme is projected. By sorting out the data as encrypted and un- encrypted data can attain semantic security than existing system. To improve the rate, reliability and accessibility of the data backup scheduling is proposed.

Chandra Kala G et. al. [6] proposed a authorized data de- duplication to avoid the duplicate files in the cloud storage.

Diversified privileges are given in order to protect the data privacy. A hybrid cloud architecture which is a combination of both public and private cloud is considered to solve the problem of duplication. Convergent encryption technique is used to avoid the duplicate files in the cloud storage. The file

cloud. Hence, the confidentiality of the data is maintained. In future, the files from the user system can also be requested and uploaded without affecting the confidentiality of the data.

3. DATA DEDUPLICATION

Data deduplication is one of the techniques which used to solve the repetition of data. The deduplication techniques are generally used in the cloud server for reducing the space of the server. To prevent the unauthorized use of data accessing and create duplicate data on cloud the encryption technique to encrypt the data before stored on cloud server. Cloud Storage usually contains business-critical data and processes; hence high security is the only solution to retain strong trust relationship between the cloud users and cloud service providers. Thus to overcome the security threats, this paper proposes multiple cloud storage. Thus the common forms of data storage such as files and databases of a specific user is split and stored in the various cloud storages (e.g. Cloud A and Cloud B).

4. METHODS USED IN DEDUPLICATION Following are the some basic methods used in deduplication:

A. Symetric Encryption

Symmetric encryption uses a common secret key k to encrypt and decrypt information. A symmetric encryptionscheme made up of three primary functions.

1) KeyGen SE (1λ)→ : k is the key generation algorithm that generates k using security parameter 1λ;

2) Enc SE (k, M)→ C: is the symmetric encryption algorithm that takes the secret k, and message M and then outputs the cipher text C, and

3) Dec SE (k, C) → M: is the symmetric decryption algorithm that takes the secret k and cipher text C and then outputs the original message M. Each user encrypts the data with their own encryption algorithm. In these identical data copies that produce the different cipher text, this makes the deduplication process impossible.

(3)

Technology (IJRCIT), Vol. 3, Issue 2, Narch-2018 ISSN:2455-3743

Copy Right to GARPH Page 24

5. PROPOSED METHOD FLOW

The proposed system contain four modules

• Admin

• Private Cloud

• Public Cloud

• User

Admin

Admin provides personal detail to registration. After registration admin can login into the page. It checks the user name and password. If it is correct admin can enter into it. Otherwise admin should reenter his password.

After this process admin can choose the file and also encrypt it. Admin sends the encrypted file to the server.

And also save the file in server.

Private Cloud

The private cloud is involved as a proxy to allow data owner/users to securely perform duplicate check with differential privileges. Compared with the traditional deduplication architecture in cloud computing, this is a new entity introduced for facilitating user’s secure usage of cloud service. Specifically, since the computing resources at data user/owner side are restricted and the public cloud is not fully trusted in practice, private cloud is able to provide data user/owner with an execution environment and infrastructure working as an interface between user and the public cloud. The private keys for the privileges are managed by the private cloud, who answers the file token requests from the users. The interface offered by the private cloud allows user to submit files and queries to be securely stored and computed respectively.

Public Cloud

The data owners only outsource their data storage by utilizing public cloud while the data operation is managed in private cloud. A new deduplication system supporting differential duplicate check is proposed under this hybrid cloud architecture where the S-CSP resides in the public cloud. The user is only allowed to perform the duplicate check for files marked with the corresponding privileges. This is an entity that provides a data storage service in public cloud. The S-CSP provides the data outsourcing service and stores data on behalf of the users. To reduce the storage cost, the S-CSP eliminates the storage of redundant data via deduplication and keeps only unique data.

User:

A user derives a convergent key from each

original data copy and encrypts the data copy with the

convergent key. In addition, the user also derives a tag

for the data copy, such that the tag will be used to detect

duplicates. Here, we assume that the tag correctness

property holds, i.e., if two data copies are the same, then

their tags are the same. To detect duplicates, the user

first sends the tag to the server side to check if the

identical copy has been already stored. Note that both

the convergent key and the tag are independently derived

and the tag cannot be used to deduce the convergent key

and compromise data confidentiality. Both the encrypted

data copy and its corresponding tag will be stored on the

server side.

(4)

Copy Right to GARPH Page 25

6. RESULT ANALYSIS

Figure 1: User Registration

Figure 2: Owner key Authentication process

Figure 3: Cloud Server

Figure 4: Owner frame

(5)

Technology (IJRCIT), Vol. 3, Issue 2, Narch-2018 ISSN:2455-3743

Copy Right to GARPH Page 26

Figure 5: Cipher key

Figure 6: Private Cloud

7. CONCLUSION

The cloud computing is used to provide everything as a service on demand basis. The Storage-as-a-Service is provided by Cloud Storage Provider which is used to backup the user data. The concept of cloud computing is presented and also techniques available in cloud computing is also reviewed. The cloud computing provides many advantage to the consumer. It offers everything as a service to consumer an on-demand basis. An overview of various existing backup scheduling is presented. The existing system concentrates on backup speed and reliability, but it does not provide de-duplication technique. The reliability in existing system is provided by making copy of the same data into two storage nodes. The security is also not provided to the user data. The de- duplication technique is very important to reduce the consumption of storage space. The de-duplication technique is used to avoid the data redundancy.

8. REFERENCES

[1] Himshai Kambo, Bharati Sinha, "Secure data deduplication mechanism based on Rabin CDC and MD5 in cloud computing environment” 2nd IEEE International Conference on Recent Trends in Electronics, Information &

Communication Technology (RTEICT), 2017, pg no. 400 – 404.

[2] Ankush R. Deshmukh, R. V. Mante; P. N. Chatur, "Cloud Based Deduplication and Self Data Destruction”, International Conference on Recent Trends in Electrical, Electronics and Computing Technologies (ICRTEECT), 2017, pg. no. 155- 158.

[3] R. Shobana, K. ShanthaShalini, S. Leelavathy and V.

SRIDEVI, "De-Duplication of Data in Cloud", International Journal Chem. Sci.: 14(4), 2016.

[4] Dastagir Shaikh, Pratik Sen, ZubairInamdar, "Avoidance of Duplication of Encrypted Big-data in Cloud Storage", International Journal of Advanced Research in Computer and Communication Engineering, Vol. 6, Issue 4, April 2017

[5] G.Karthika1, M.P.Ruby, S.Vijalakshmi, M.Purushothaman ,S.Anbarasan, "Data Deduplication: An approach for avoidance of duplicate data", International Journal of scientific

research and management (IJSRM),

||Volume||3||Issue||3||Pages|| \ 2457-2460||2015.

[6] Chandra Kala G, Priyanka P, Sowmya C M, Paul Jasmine Rani L, Saravanan A, "Duplication Avoidance Mechanism for Storing Data in Hybrid Cloud", International Journal of Innovative Trends and Emerging Technologies, Volume 1, Special Issue 2(ICITET 15), March 2015.

(6)

Copy Right to GARPH Page 27

[7] Kamarthi Rekha, G. Somasekhar, Dr S.Prem Kumar,

"Secure Redundant Data Avoidance over Multi-Cloud Architecture", International Journal Of Computer Engineering In Research Trends, Volume 2, Issue 8, August 2015, Pp 470- 474.

[8] Bhavin Bhoi; PravinaVyawahare, Pratik Avhad, Nikita Patil, "Data duplication avoidance in larger database", International Conference on Innovations in Information, Embedded and communication Systems (ICIIECS), 2017, Pg no 1-4.

[9] Deepali Choudhari, R. W. Deshpande, "Deduplication Techniques in Storage System", International Journal of Advanced Research in Computer and Communication Engineering, Vol. 4, Issue 11, November 2015.

References

Related documents

This study suggests that treatment with FD ibuprofen prod- ucts was associated with significantly more gastric mucosal injury than FD-APAP tablets, while the GI effects of FD-

translation; MT-Cobb: Main thoracic curve Cobb ’ s angle; MT-FI: Main thoracic curve-flexibility index; MT-RH: Main thoracic curve-rib hump; MT-TD: Main thoracic curvet thoracic

The following conclusions are based on selected columns and results are compared with static and dynamic analysis. 1)Total story drift responses ratio increases as the floor

Therefore, in this chapter, we conduct extensive simulations and measurements for evaluating the affects of environment factors (e.g., α value in request traffic, catalog size,

M'hile five of the seven lines were homozygous female fertile, all seven lines retained the dominant meiotic enhancer phenotype (for example see Table 2), and all

Self reported practices among traditional birth attendants surveyed in western Kenya a descriptive study RESEARCH ARTICLE Open Access Self reported practices among traditional

Earth Planets Space, 51, 1247?1254, 1999 Retrieval algorithm for atmospheric aerosols based on multi angle viewing of ADEOS/POLDER Sonoyo Mukai and Itaru Sano Kinki University, Faculty

Percentage of live, dead, and total canopy fuel consumed during red-phase fires, by tree mortal - ity level for (A) low, (B) moderate, and (C) high wind speeds... tality gray