A Survey on Techniques for Third Party Auditor in Cloud Computing

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 12, December 2013)

372

A Survey on Techniques for Third Party Auditor in Cloud

Computing

Dr. M. Balaganesh

1

, Dr. R. Sureshkumar

2

, J. Venkateshan

3

1,2

Professor, 3P.G Scholar, Sembodai Rukmani Varatharajan Engineering College, Sembodai.

Abstract-We Present a error identify algorithm for using the MHealth Trust computing –Experiments show that our approaches has the following advantage over existing method:(1) Cloud services include no upfront investment on infrastructure and transferring responsibility of maintenance, backups and license management of cloud providers (2) An Optimization problem of leveraging the cloud domain to reduce the cost of information management in the SG. We propose a cloud based SG information MHealth management model and present a cloud and third party auditor using framework to Security problem solve the cost reduction problem in cloud based SG information storage and computation.The objective of this paper is to create taxonomy of the third party auditing techniques in cloud environment to survey and compare representative examples for each class of techniques.

Keywords - Cloud Computing, Mhealth, Smart Grid, Third Party Auditor.

I. INTRODUCTION

Smart Grid(SG) is an intelligent power system that uses two-way communication and information technologies, and computational intelligence to revolutionize power generation, is a delivery and consumption. Its evolution relies on the utilization and integration of advanced information technologies, which transform the energy system from an analog one to a digital one. In the vision of the SG, information plays a key role and should be managed efficiently and effectively.

One of the important trends in today‘s information management is outsourcing management tasks to cloud computing, which has been widely regarded as the next-generation computing and storage paradigm [5].The concept of cloud computing is based on large data centers with massive computation and storage capacities operated by cloud providers, which deliver computing and storage services as utilities.

The overwhelming data generated in the SG due to widely deployed monitoring, metering, measurement, and control devices calls for an information management paradigm shift in large scale data processing and storage mechanisms.

Integrating the DNA of cloud computing into the SG information management makes sense for the following four reasons.

1.1 Generation Process Services

First, highly scalable computing and storage services provided by cloud providers fit well with the requirement of the information processing in the SG. This is because the resource demands for many computation and data intensive applications in the SG vary so much that a highly scalable information storage and computing platform is required. For instance, the resource demands for the electric utility vary over the time of the day, with peak operation occurring during the day and information processing needs slowing at night.

Second, the level of information integration in the SG can be effectively improved by leveraging cloud information sharing.As stated in [8], autonomous business activities often lead to ―islands of information.‖ As a result, in many cases the information in one department of an electric utility is not easily accessible by applications in other departments or organizations. However, a well-functioning SG requires the information to be widely and highly available. The property of sharing information easily and conveniently enabled by the cloud storage provides a cost-effective way to integrate these islands of information in the SG.

Third, the sophistication of the SG may lead to a highly complex information management system. For traditional electric utilities, realizing such complicated information systems may be costly or even beyond their capacity. Therefore, it would be a good option to get the information technology sector involved and outsource some tasks to the clouds, which provide cost-effective computing and storage solutions. This relieves the pain of electric utilities in the costly information system design, deployment, maintenance, and upgrade during the massive transformation to the SG.

(2)

International Journal of Emerging Technology and Advanced Engineering

373

Outsourcing information management to the clouds allows these new players to focus on their business innovation rather than focusing on building data centers to achieve scalability goals. It also allows these players to cut their losses if the launched products do not succeed. Although a cloud-based SG information management paradigm is promising, we are still facing many challenges.

1.2 First Challenge

The first challenge is a systematic optimization on the resources in the SG, cloud providers, and networking providers. For example, a fully functioning SG may be a large-scale or even continental-wide system, where information generating sources e.g., smart meters and sensors are distributed and scattered across a large geographical area, and heterogeneous communication networks (e.g., WiMax, fiber optic networks, and powerline communications) are used for data transmission. Hence, many geographically and architecturally diverse cloud providers and networking providers may get involved. Systematically optimizing the usage of different resources in these diverse clouds and networks can reduce the overall cost of the SG information management.

1.3 Second Challenge

The second challenge is the fact that the healthy operation of the SG is dependent on the high availability and the prompt analysis of the critical information (e.g., power grid status monitoring data). Without a careful design, outsourcing information management to the clouds may bring about potential risks to the operation of the SG.

II. EXISTING PROCESS SYSTEM

This gives rise to new challenges when selecting distributed data and compute resources so that the execution of applications is time- and cost-efficient. Existing heuristic techniques select ‗best‘ data source for retrieving data to a compute resource and subsequently process task-resource assignment. However, this approach of scheduling, which is based only on single source data retrieval, may not give time-efficient schedules when: (i) tasks are interdependent on data, (ii) the average size of data processed by most tasks is large and (iii) data transfer time exceeds task computation time by at least one order of magnitude. Krin Kumar.K et.al [5] Suggests previous system proposed that a public auditing scheme consists of four algorithms (Key Gen, Sig Gen, Gen Proof, Very Proof). Key Gen is a key generation algorithm that is run by the user to setup the scheme.

Sig Gen is used by the user to generate verification metadata, which may consist of MAC, signature, or other related information that will be used for auditing. Gen Proof is run by the cloud server to generate a proof of data storage correctness, while verify proof is run by the TPA to audit the proof from the cloud server. Public auditing system can be constructed from the auditing scheme in two phases, setup and audit. Our proposed system enable privacy –preserving public auditing for cloud data storage under a fore mentioned model. Our protocol design should achieve the following security and performance guarantee:

A.Public Audit ability

To allow TPA to verify the correctness of the cloud data on demand without retrieving a copy of the whole data or introducing additional on-line burden to the cloud users.

B.Storage Correctness

To ensure that there exists no checking cloud server that can pass the audit from TPA without indeed storing user data intact.

C. Privacy-Preserving

To ensure that there exists no way for TPA to derive users data content from the information collected during the auditing process.

D. Batch Auditing

To enable TPA with secure and efficient auditing capability to cope with multiple auditing.

Figure: 1 Tpa Node Process

(3)

International Journal of Emerging Technology and Advanced Engineering

374

The results also produce a visual map of the privacy sphere that can be used in approximating the sensitivity of different territories of privacy-related text.

2.1 MHealth

The point of difference of a Social Cloud is that applications can also leverage the relationship between users to deliver shared asymmetric services leading to several potential Social Cloud application scenarios:

Figure: 2 MHealth Cloud Process

2.2 Social Computation Cloud& Social Storage Cloud

It is widely recognized that extensive computing power remains updated through personal computers. The use of a Social Cloud provides an infrastructure from which users can easily contribute computing resources to friends, companies or scientific communities. Storage is perhaps the simplest and most standardized resource for everyday users to share and commonly used to store, backup, share and replicate data.

III. SURVEY WORK PROCESS 3.1 Why a Steiner tree?

Our main objective is to connect multiple data sources to a compute resource for transferring data in parallel so that we could minimize the total transfer time. While using parallel transfers could speed up the process, we also need to optimize the total network bandwidth (path of transfers). We could construct a minimum spanning tree to connect all the data sources together. But, the resulting tree is not the smallest tree. We demonstrate this in Fig. 3a by connecting arbitrary vertices using a minimum spanning tree.

The Steiner tree is distinguished from the minimum spanning tree in that we are permitted to construct or select intermediate connection points to reduce the cost of the tree. In Fig. 3b, the vertices are connected using the minimum Steiner tree approach, which has lower path length than the minimum spanning tree.

It can be shown that the optimal group minimum spanning tree is at most twice as long as the optimal group Steiner minimal tree .

One another reason to use Steiner tree is for finding out the candidate vertices that have maximum connectivity with its neighboring vertices. For instance, the intermediate connection points (four added vertices) in Fig. 3 are all candidate nodes from where computing resources could be selected for task executions. The neighboring nodes could then be used as multiple data sources for transferring data to the intermediate computing node. The minimum spanning tree approach does not identify the candidate nodes by itself.

Figure: 3 Two way connecting vertices that minimizes total path length( Error File)

Figure: 4 Error Correction and Update File.

3.2 Third Parity Auditor

Cong Wang et.al [2] suggests in order saving the time, computation resources, and even the related online burden of users, we also provide the extension of the proposed main scheme to support third-party auditing, where users can safely delegate the integrity checking tasks to third-party auditors (TPA) and are worry-free to use the cloud storage services. Our work is among the first few ones in this field to consider distributed data storage security in cloud computing.Our contribution can be summarized as the following three aspects:

1) Compared to many of its predecessors, which only provide binary results about the storage status across the distributed servers, the proposed scheme achieves the integration of storage correctness insurance and data error

(4)

International Journal of Emerging Technology and Advanced Engineering

375

2) Unlike most prior works for ensuring remote data integrity, the new scheme further supports secure and efficient dynamic operations on data blocks, including: update, delete, and append.

3) The experiment results demonstrate the proposed scheme is highly efficient. Extensive security analysis shows our scheme is resilient against Byzantine failure, malicious data modification attack, and even server colluding attacks.

3.3 Privacy and Remote Home Health Care

The above described and other remote home health care technology solutions create the potential for great benefits for individuals. However, for many individuals, the home is a foundational area with the highest level of individual privacy. While in some of the above applications no personal health information is involved6, other health care applications require the collection, use and transmission of personal health data. Therefore, it is important to consider the privacy implications of these technologies, and to design privacy into their development and implementation.

If privacy can be taken into consideration in the development process, there is great potential that these technologies can actually increase the privacy of the individual, by providing them with greater choice and personal control over how their data is managed. Individuals would have the option of receiving care from the privacy of their own home. Further, to the extent that home health care technologies may provide the ability to proactively avoid medical complications that would require intrusive tests and provision of data, there will likely be greater privacy benefits to the use of the technology. Should these privacy benefits be realized along with the health care advantages, then these technologies will serve a clear positive-sum role in health care provision.

To continue this discussion, an understanding of the relationship between home health care applications and privacy must be created; this is addressed below.

3.4 Information Privacy Defined

Information privacy Information privacy is an individual‘s ability to exercise control over the collection, use, disclosure and retention of his or her personal information, including personal health information. Personal information (also known as personally identifiable information or ―PII‖) is any information, recorded or otherwise, relating to an identifiable individual.

Almost any information, if linked to an identifiable individual, can become personal in nature, be it biographical, biological, genealogical, historical, transactional, locational, relational, computational, vocational, or reputational. The definition of personal information is quite broad in scope. The challenges for privacy and data protection are equally broad.

When considering information and communication technologies, it is important to recognize that privacy subsumes a set of protections that extend far greater than security. We call this ―SmartPrivacy.‖7 Although building strong technological security features into a technology (―Privacy by Design‖) is vital to protecting against data breaches, they are but only one of the means used to achieve information privacy. Equally important is the development of clear information practices, outlining when, how, and the purposes for which health care technical and physical safeguards in place. Developing supporting policies, procedures and an overall accountable culture of privacy ensures that PII will be handled in a privacy respective manner by an organization and its employees, whether the PII is in electronic or paper form.

What comes before all of this, right at the outset, is ensuring that privacy is embedded earlier into the design of the systems involved. This is Privacy by Design.

3.5 Privacy by Design

Privacy by Design (PbD) is a concept developed by Dr. Ann Cavoukian, in the mid-nineties. In brief, PbD is a concept that involves embedding privacy into the design specifications of technologies. This process begins by building the principles of Fair Information Practices (FIPs, see Appendix A) into the design, operation and management of information processing technologies and systems, and then elaborates them to the gold standard of becoming the default. While PbD has information technology as its primary area of application, it has since expanded in scope to include two other areas. In total, the three areas of application are: (1) information technology; (2) accountable business practices; and (3) physical design and networked infrastructure. The current era is one of near-exponential growth in the creation, dissemination, use and retention of personally identifiable information.

(5)

International Journal of Emerging Technology and Advanced Engineering

376

Rather than following the conventional zero-sum mindset that pits privacy against availability or some other functionality, where privacy may only be attained at the expense of functionality, organizations recognize that a positive-sum model is far more desirable. Such a win-win scenario, whereby privacy and business interests may all be served, can and must be achieved. This positive-sum model may be achieved if privacy safeguards are proactively built into a system, at the outset. By embracing Privacy by Design, leading companies have turned their privacy problems into privacy solutions. In a world of increasingly savvy and privacy-aware individuals, an organization‘s approach to privacy may offer precisely the competitive advantage needed to succeed. Privacy is essential to creating an environment that fosters trusting, long-term relationships with existing customers, while attracting opportunity and facilitating the development of new ones.

IV. OPERATION OF PROPOSE (THIRD PARTY AUDITOR) NODES DEFINITION PROBLEM WORK

Third Party Auditor ( D,R,T,F,G,L,M) Given a set of data –host D,a set of compute resource R, a set of tasks T, asset of Files F both input and output Files of T), and a graph G that represent the data flow dependencies between task T,the TPA problem and Data-Tasks Scheduling Problem (DTSP) is a problem of finding assignments of tasks to compute-taks to compute –host [tasks-schedule= { tr } ,t∑T, r∑R ], and the partial data set (PD tr) to be

transferred from selected data –host to assigned compute host[data set ={{PDdi---> r }tr V di∑D, r∑R, i ≤│D│}] for

each task, such that total execution time (and cost) at rand data transfer time(and cost) in curred by data transfers{PDdi-_r} are minimize for all tasks T.The

Pre-Conditions are:Data Files are replicated across multiple data hosts.

(i)Each task Processes more than one input data Files. (ii) Total time and cost are bounded by L and M

respectively, where L signifies deadline and M denotes maximum money (real currency) that can be spent on executing all the tasks in T.The third assumption simply put constraint in the time and cost that could be spent on executing a work flow.

Figure 1 shows an example scenario for the problem give in definition 1, where tasks are mapped to resources and partial data are retrieved from multiple dad hosts.

In the figure, tasks t1 and t2 are assigned to resources r1 and r3, respectively. In this static mapping, fig.1 show partial data being retrieved in parallel from all data hosts T1,T2,T3 to the respective compute hosts where the tasks are assigned {PDT1_r1, PDT2_r1 , PDT3_r1}tr1 and {

PDT1_R3,PDT2_ r3,PDT3_ r3}t2-r3.The objective now is to

minimize the transfer time of these partial data to assigned computing resources by transferring the right amount of data from each data source for each task in the workflow.

Third Part Auditor (TPA)TPA Static Scheduling Heuristic.

We Propose an Enhanced Static Mapping Heuristic (ESMH), assuming that the scheduling system has advance information of tasks, compute and storage resourced and network statistic‘s at the time of scheduling, prior to execution. The Following information are known,Number of tasks to be scheduled:

Data hosts containing Files (Patient-MHealth) Tasks and I/O Files

T1: S1, S2, S3, S4, S5. t 1: S2, S3.

T2: S1, S2, S3, S4, S5. t 2: S2, S3,S1.

[image:5.612.327.552.353.712.2]

T3: S1, S2, S3, S4, S5. t 3: S3,S4,S5

Table 1

Third Party Auditor Notation

Symbol Meaning

Resource

D Data –Host

R A set of Computer Resources

T A set of Tasks

F A set of Files

Process

PDtr Partial Data Set.

{PDdir}tr Data –Set.

L and M Total time and Cost

TPAFile Error Correction + Update File

Se Data Loss.

(6)

International Journal of Emerging Technology and Advanced Engineering

377

Figure: 5 Third Party Auditor and task of resource mapping for distributed cloud environment.

V. CONCLUSION

In this paper, we proposed two workflow scheduling Error data and cost reduce that leverage multi-source parallel data-retrieval techniques. We showed that problem-based data retrieval from many resources (TPA -source) produces better transfer times and hence better MHealth data base for data-intensive workflows than selecting one ‗best‘ storage resource for both static and dynamic TPA methods. From our experimental results, we also conclude that, on average, cost produces more time-efficient error problem than the MHealth and TPA-Enhanced static Database Algorithm for data intensive workflows. The survey try to review typical third party auditing techniques and review the trade offs.

VI. FUTURE WORK

As part of our future work, we would like to constrain the resources and network bandwidth based on pricing (similar to the pricing model of Green Network Cloud Computing, Advance TPA and storage Clouds) and propose a multi-objective (time and cost) scheduling techniques.

REFERENCES

[1] Cong Wang, Qian Wang, Kui Ren, Ning Cao and Wenjing, (2012) ―Towared secure and Dependable storage service in cloud computing‖, IEEE.

[2] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, ―Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility,‖ Fut. Gener. Comput. Syst., pp. 599–616, 2009.

[3] X. Fang, S. Misra, G. Xue, and D. Yang, ―Smart grid—the new and improved power grid: A survey,‖ IEEE Commun. Surveys Tuts., vol. 14, no. 4, pp. 944–980, 4th quart., 2012.

[4] S.Sakr, A. Liu, D.M. Batista, andM.Alomari, ―Asurvey of large scale data management approaches in cloud environments,‖ IEEE Commun. Surveys Tuts., vol. 13, no. 3, pp. 311–336, 2011. [5] Krin Kumar. K, Padmaja.K and Radha Krishna. P., (2012)

―Automatic Protocol Blocker for Privacy-Preserving Public Auditing in Cloud Computing‖ Transaction on cloud Computing, Volume: PP, Issue: 99.

[6] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D. Song, ―Provable Data Possession at Untrusted Stores,‖ Proc. 14th ACM Conf. Computer and Comm. Security (CCS ‘07), pp. 598-609, 2007.

[7] H. Shacham and B. Waters, ―Compact Proofs of Retrievability,‖ Proc. Int‘l Conf. Theory and Application of Cryptology and Information Security: Advances in Cryptology (Asiacrypt), vol. 5350, pp. 90-107, Dec. 2008.