DATA HOSTING IN MULTI CLOUD
USING FRAGMENTATION AND
DYNAMIC REPLICATION
Ms.Srimathi.R*Information Technology, Karunya University Coimbatore, Tamilnadu, 641114, India
Mr.E.Bijolin Edwin
Information Technology, Karunya University Coimbatore, Tamilnadu, 641114, India
Abstract- -Cloud Computing is emerging as a new trend for data hosting and data management. Enterprises and
organizations are looking for service availability and to enhance the data reliability. Several Organizations has used single cloud-providers, but it has become less popular with customers due to risks of service availability failure. So, we move towards multi-cloud it has gained attention from researchers, customers and startup companies. In this paper, division and dynamic replication of data management in the cloud storage has been proposed. It divides a file into fragments and that file is replicated over the cloud nodes. Each node stores only single fragment even in case of successful attack, no meaningful information is revealed to the attacker and the nodes are placed with certain design using T-Coloring. It reduces the access latency and improves data availability and data access. The system improves the data retrieval time and reconstructs the original cover. Keywords: Data Hosting, Fragmentation, Dynamic Replication-coloring
I.INTRODUCTION
Cloud computing access resources through internet as a service and cloud storage is emerging as a powerful paradigm for sharing information across the internet, in which it satisfies the people’s mobile data demand. Each Cloud Storage providers has different pricing policies and terms for the end users. Cloud vendors need to upgrade their infrastructure according to the changes in the environment. In cloud there is different type of services available in which storage as a service which gives several organizations an option to use that storage space available. The one area which is challenging is to manage is the data replicas so a separate storage space for replica is maintained. For increasing availability in storage data replication concept is used. Storage providers cost their users according to the usage space and bandwidth consumption. As far as using single cloud, it is considered as vendor lock-in risk because attackers can easily steal the information. So we shift to multi cloud which gives service availability and data can be easily managed. Moreover, the amount of data loss can be minimized.
II.RELATED WORK
Literature review is the most important step in any research work. Following is the literature review of some existing technique of fragmentation and replication.
Zhenhua Li et al (2014) proposed a Traffic usage efficiency of data synchronization [1] in which data sync traffic can be effectively avoided or significantly reduced via a carefully designed data sync mechanisms. It provide cloud storage services guidance to develop more efficient, traffic- economic services and help users to pick appropriate services. The important part of these services is the data synchronization operation which automatically maps the changes in user’s local file systems to the cloud via a series of network common in a timely manner.
placement of data by accessing the pattern and they are subject to optimization objectives, such as storage prices. It effectively considers the repositioning of selected objects that significantly minimize the storage cost. The advantage of this system is to reduce the vendor lock-in risk, increases the availability and minimizes the cost that the user has to pay to the cloud storage providers.
Sai Kiran M et al (2013) proposed a Cost based approach for selecting Multi-Cloud Storage [10]. Many companies are facing challenges in data migration and reducing the capital expenditure. So, this approach gives customers an option to store among to build a multi-cloud and it supports decision making in cloud computing. Solo service give customers outsourcing of data but multi cloud gives the advantage of service availability. So this paper gives an overview of a cost. For Every storage providers cost is the main cause to reduce and make
available to users.
Ruay-shiung et al (2008) proposed an algorithm for dynamic data replication mechanism called Latest Access Largest Weight (LALW) [3] is proposed. This Technique selects popular files for replication and calculates a suitable no of copies for replication. Dynamic replication collects the data access history, which contains file name, the number of requests, and records the each request came from the end user. According to that access frequencies for all files have been requested, a file is found and it is replicated to several sites to achieve system load balance.
M.Lei et al (2008) proposed a system wide data availability problem assuming limited replica storage [4]. In order to maximize data availability they use online optimizer algorithm to minimize the data missing rate. Problems are modeled in terms of an optimal solution in a static system. They propose a novel heuristic algorithm for data access with some limited resources.
Qing song et al (2010) proposed a cost effective dynamic replication scheme for large scale cloud storage system [5]. It addresses the relationship between availability and replication factor. It reduces the replica number in the metadata. It places replicas among the distributed nodes to avoid the blocking probability to improve load balance and overall performance.
Albert Y. Zomaya et al (2015) proposed an Energy-efficient data replication in cloud computing [6]. Methods like energy efficiency and bandwidth consumption is important. This is in addition to the improved quality of service QoS by this communication delays can be reduced as a result of it. At the component level, there are two main alternatives in data center (a) shutting down (b) Reducing the performance. These two methods are applicable to computing servers and network switches.
Bruno Quarrelsome et al (2011) proposed a DEPSKY, a dependable and secure storage system [7] that efficiently gives the benefits of cloud computing by using a combination of many commercial clouds to build Multi-cloud. It reduces the risks in critical cloud storage systems.
Jennifer Rexford et al (2010) proposed DONAR, a distributed system that can reduce [8] the headache of replica selection, and these services produce fewer interfacesfor specifying mapping policies. Existing approaches rely on central coordination or distributed heuristics. For replication selection we use this algorithm in a simple manner for mapping the nodes for clients.
Alain Roy et al (2012) proposed an elastic replication management system for HDFS [9].The system uses Condor to remove the extra replicas and increase the replication number. We use erasure codes for storage space in cloud and use the network bandwidth when the data becomes cold.
Mate Shweta A et al (2015) proposed a CHARM Scheme for hosting the data [13] in the multi cloud which combine two functions one is selecting a cloud which gives less storage cost and guaranteed availability and triggering to redistribute the data using transition process. When sending data to Third-Party the administrative control in cloud, becomes an issue related to security concerns. To control the outsourcing of data we use protective measures to protect data in a cloud so we use Division and replication concept that deals with the security and performance issue.
III.PROPOSED SYSTEM
Our proposed system gives the users data availability and prevents data loss, ensure the data reliability. So we use fragmentation division method and dynamic replication technique for data management. One area which is concern is to maintain the replicas of the original cover. So we use Dynamic Data Replication strategy to manage the replicas among the storage providers.
(a) Multi-Cloud
There are several storage providers which provide the end user with requirements and offers. To overcome the single cloud disadvantages we move towards the multi-cloud.Multi-Cloud is a use of two or more services to minimize the data loss and to improve the data availability. The advantage of multi-cloud is service availability and it reduces the data loss. As far as concern in multi cloud security level is the challenge one in cloud computing. Loss of data is high in single cloud. So we need to give concentration on security areas also for multi-cloud.
Fig 1. Architecture of request and response of Multi-Cloud in Storage Space
The uses of cloud computing have increased in companies and they offer many benefits in cost and availability. The main advantage of cloud is pay per use model. One of the important service offers by cloud computing is cloud data storage, in which end users does not want to store their data in their own cloud server, instead the proxy admin stores the data of users. These services provide flexibility and scalability for data storage and customers are benefited only for the amount of data they need to store for the particular period of time.
(b)Proxy Admin
Fig 2. Process of Proxy Admin with end user
(c) Data Fragmentation
Fragmentation is a splitting of data into several smaller parts. In multi cloud, the data are divided into several fragments and each fragment is placed in nodes of cloud. The main advantage of this fragmentation is even in successful attack no meaningful information is retrieved because the nodes are placed using T-Coloring concept. Mainly we use this concept for security concern; to not to locate the placed fragments in the multi-cloud.T-coloring generates a non-negative random number and construct the set T starting from Zero to the generated random number. We initially assign the colors to the nodes, so all the nodes are given open color. After the fragment is placed in the node the color changes and becomes closed node. We mainly Use this for high security level.
Fig 3.Fragmentation of files in cloud
(d) Dynamic Data Replication
Data Replication is the copy of the original cover. The main advantage of replication is if multiple copies of block exist on different data nodes, the changes of accessing one copy will increase. If anyone node fails, the data still can be recovered from the replica copy. Dynamic replication mainly deals with replica placement and selection. By using Dynamic Replication we manage the replicas by using replica manager and if replica is needed we can search through replica numbers if original cover is lost due to some attacks the retrieval or reconstruction of original file will be improved through this replication so we analyze the performance of this function.
(e) Process flow diagram
Fig.4 Flow diagram of data owner
IV.PERFORMANCE ANALYSIS
We evaluate the performance based on the data retrieval time of file. First the prices of storage providers are listed below as of the year 2014.
Table 1. Price of operation in rupees
Storage
Providers Upload Download Deletion of files
Amazon s3 66.93 6.69 free
Rackspace free free free
Windows
Azure 0.67 0.67 free
Google cloud
Storage 66.93 6.69 free
From the above table it is just the reference for users how the storage providers offer their services to the end users. Everything in cloud is pay as you use so we refer to yearly pricing strategies of the storage providers. Storage space for every user is now becoming demanding one how the cloud storage providers has been increased is shown below in yearly basis(fig.1) with reference to the internet.
The table (2) shows the performance of data retrieval time with the file size.
The table (2) shows the comparison of data retrieval in time the 50 mb file is retrieved in 10 min using the existing system. But now due to the proposed system it is accessed in7.5 min.
This graph clearly shows the differences among the existing and proposed system.
Fig 5. Consumption of storage in cloud
The following figure depicts the storage space users are increasing year by year. This figure taken from (ref-14).Due to massive data created day by day the storage and maintenance of that data becomes very challenging to storage engineers. So we clearly indicate the online storage providers and their services.
V.CONCLUSION
In this paper we introduce division of fragments and replica management in multi cloud. Proposed framework highlights the importance of storage space of data and management of replicas if one node storing the fragments failed then loss of data is lost we can retrieve only when the replicas are managed in secure manner. We used dynamic replication for the management and the fragments are allocated in multi cloud and nodes are placed by using T-coloring concept. Overall the result and proposed work shows the data availability and controlled replication in multi-cloud.
REFERENCES
[1] Z. Li, C. Jin, T. Xu, C. Wilson, Y. Liu, L. Cheng, Y. Liu,Y.Dai and Z.-L. Zhang, “Towards network-level efficiency for cloud storage services,” in Proc. ACM SIGCOMM Internet Meas. Conf., 2014, pp. 115–128.
[2] T. G. Papaioannou, N. Bonvin, and K. Aberer, “Scalia: An adaptive scheme for efficient multi-cloud storage,” in Proc. Int. Conf. High Perform. Computer. Newt. Storage, Anal., 2012, p. 20.
[3] Ruay-shiung chang, ”A dynamic data replication strategy using access weights in data grids”,2008,spiringer,pp.278-294
[4] Lei M, Vrbsky S V, Hong X. ‘An on-line replication strategy to increase availability in data grids’. Future Generation Computer Systems, 2008, 24(2). pp. 85-98.
[5] Wei Q, Veradale B, Gong B, Zeng L, Feng D. CDRM: A cost-effective dynamic replication management scheme for cloud storage cluster. In Proc. 2010 IEEE International Conference on Cluster Computing, Heraklion, Crete, Greece, Sept. 20-24, 2010, pp.188-196.
[6] Dejene Boru, Dzmitry Kliazovich, Fabrizio Granelli, Pascal Bouvry, Albert Y. Zomaya.’Energy-efficient data replication in cloud
computing datacenters’,Springer Science+Business Media New York 2015
[7] Bessani, M. Correia, B. Quaresma, F. Andre, and P. Sousa,“DepSky: Dependable and secure storage in a cloud-of-clouds,” in Proc. 6th Conf. Computer. Syst., 2013, pp. 31–46.
[8] P. Wendell, J. W. Jiang, M. J. Freedman, and J.Rexford, “Donar: Decentralized server selection for cloud services,” in Proc. ACM SIGCOMM Conf., 2010, pp. 231–242.
[9] Zhendong Cheng, Zhongzhi Luan, Alain roy, Ning Zhang and Gang Guan (2012) ’ERMS: An Elastic Replication Management System for HDFS’, IEEE International Conference cluster computing, pp .32- 40.
[10] Sai kiran M, Anusha A, Gowtham Kumar N, Praveen Kumar Rao K (2013) “selection of multi-cloud storage using cost based approach”, international journal of computer and electronics research [volume 2, issue 2]
[11] A. Duminuco and E. W. Biersack, “Hierarchical codes: A flexible trade-off for erasure codes in peer-to-peer storage systems,” Peer to- Peer Netw. Appl., vol. 3, no. 1, pp. 52–66, 2010.
[12] Windows Azure Pricing Updates. [Online]. Available: http://azure.microsoft.com/en-us/updates/azure-pricing-updates/, 2014. [13] Quanlu Zhang, Shenglong Li, Zhenhua Li, Member, IEEE, Yuanjian Xing, Zhi Yang, and Yafei Dai, Member, IEEE “ CHARM-a cost
efficient Multi-cloud data hosting scheme with high availability”, ieee transactions on cloud computing, vol. 3, no. 3, july-september 2015.