Sensor Data Processing in Outsourced Cloud Resources

(1)

1

Sensor Data Processing in Outsourced Cloud Resources

Mr. K. S. Saravanan 1 MCA., MPhil and S.Karthika.2 1

Asst. Professor, Department Of Computer Science and Applications, 2

M.Phil Research Scholar, Department of Computer Science,

Vivekanandha College of Arts and Sciences for Women. (Autonomous), Tiruchengode, Tamilnadu, India.

Abstract

Sensor data values are transferred and maintained in the outsourced cloud resources. The cloud and sensor networks are integrated to construct the sensor clouds. A data error detection approach is engaged for fast data error detection in big sensor data sets. The approach exploits the full computation potential of cloud platform and the network feature of WSN. Firstly, a set of sensor data error types are classified and defined. Based on that classification, the network feature of a clustered WSN is introduced and analyzed to support fast error detection and location. The error detection is based on the scale-free network topology and most of detection operations can be conducted in limited temporal or spatial data blocks instead of a whole big data set. The detection and location process can be dramatically accelerated. Furthermore, the detection and location tasks can be distributed to cloud platform to fully exploit the computation power and massive storage. The sensor cloud construction process is enhanced with optimized network partitioning methods. The error detection process is improved with error correction and recovery mechanisms. Big data cleaning methods are also integrated with the system for noise reduction process. Data summarization methods are applied to minimize the storage and computational resource requirements.

Index Terms : Sensor Networks, Cloud Computing, Error Detection and Correction, Sensor Clouds and Data Recovery

1. Introduction

Current WSNs are deployed on land,

underground and underwater.

Depending on the environment, a sensor network faces different challenges and constraints. There are five types of WSNs: terrestrial WSN, underground WSN, underwater WSN, multi-media WSN and mobile WSN. Terrestrial WSNs typically consist of hundreds to thousands of inexpensive wireless sensor nodes deployed in a given area, either in an ad hoc or in a pre-planned manner. In ad hoc deployment, sensor nodes can be dropped from a plane and randomly placed into the target area. In pre planned deployment, there is grid placement, optimal placement, 2-d and 3-d placement mo3-dels. In a terrestrial WSN, reliable communication in a dense environment is very important. Terrestrial sensor nodes must be able to effectively communicate data back to the base station. While battery power is limited and may be rechargeable, terrestrial sensor nodes however can be

equipped with a secondary power source such as solar cells. In any case, it is important for sensor nodes to conserve energy. For a terrestrial WSN, energy can be conserved with multi-hop optimal routing, short transmission range, in-network data aggregation, eliminating data redundancy, minimizing delays and using low duty-cycle operations.

A key differentiating element of a successful information technology (IT) is its ability to become a true, valuable and economical contributor to cyber infrastructure. “Cloud” computing embraces cyber infrastructure and builds upon decades of research in

virtualization, distributed computing, “grid

computing”, utility computing and more recently, networking, web and software services [9]. It implies service oriented architecture, reduced information technology Over head for the end-user, greater flexibility, reduced total cost of ownership, on demand

services and many other things. Component- based

(2)

2

substitutability, extensibility and scalability,

customizability and composability. There are other characteristics that also are very important. Those include reliability and availability of the components and services, the cost of the services, security, total cost of ownership, economy of scale and so on. Many categories of components are distinguished in the context from differentiated and undifferentiated hardware, to general purpose and specialized software and applications, to real and virtual “images”, to environments, to no-root differentiated resources, to workflow-based environments and collections of services and so on.

2. Related Work

The handiest option for handling data distributed across several datacenters is to rely on the existing cloud storage services. This approach allows transferring data between arbitrary endpoints via the cloud storage and it is adopted by several systems in order to manage data movements over wide-area networks [2]. Typically, they are not concerned by

achieving high throughput, nor by potential

optimizations, let alone offer the ability to support different data services. Our work aims is to specifically address these issues. Besides storage, there are few cloud-provided services that focus on data handling. Some of them use the geographical distribution of data to reduce latencies of data transfers. Amazon‟s CloudFront [3], for instance, uses a network of edge locations around the world to cache copy static content close to users. The goal here is different from ours: this approach is meaningful when delivering large popular objects to many end users. It lowers the latency and allows high, sustained transfer rates. Similarly [4] considered the problem of scheduling data intensive workflows in clouds assuming that files are replicated in multiple execution sites. These approaches can reduce the make span of the workflows but come at the cost and overhead of replication. In contrast, we extend this approach to exploit also the data access patterns and leverage a

cost/performance tradeoff to allow per file

optimizations of transfers.

The alternative to the cloud offerings are the transfer systems that users can choose and deploy on their own, which we generically call user-managed solutions. A number of such systems emerged in the context of the GridFTP transfer tool, initially developed for grids. In these private infrastructures, information about the network bandwidth between nodes as well as the topology and the routing strategies are publicly available. Using this knowledge, transfer strategies can be designed for maximizing certain heuristics; or the entire network of nodes across all sites can be viewed as a flow graph and the transfer scheduling can be solved using flow-based graph algorithms [5]. In the case of public clouds, information about the network topology is not available to the users. One option is to profile the performance. Even with this approach, in order to apply a flow algorithm the links between all nodes need to be continuously monitored. Such monitoring would incur a huge overhead and impact on the transfer. Among these, the work most comparable to ours is Globus Online [6], which provides high performance file transfers through intuitive web 2.0 interfaces, with support for automatic fault recovery. Globus Online only performs file transfers between

GridFTP instances, remains unaware of the

environment and therefore its transfer optimizations are mostly done statically. Several extensions brought to GridFTP allow users to enhance transfer performance by tuning some key parameters. Still, these works only focus on optimizing some specific constraints and ignore others. This leaves the burden of applying the most appropriate settings effectively to users. In contrast, we propose a self-adaptive approach through a simple and transparent interface, that doesn‟t require additional user management.

(3)

3

direct TCP connection between the source and destination by a multi-hop chain through some intermediate nodes. Multi-pathing [7] employs multiple independent routes to simultaneously transfer disjoint chunks of a file to its destination. These solutions come at some costs: under heavy load, per-packet latency may increase due to timeouts while more memory is needed for the receive buffers. End-system parallelism can be exploited to improve utilization of a single path by means of parallel streams or concurrent transfers [8]. One should also consider system configuration since specific local constraints may introduce bottlenecks. One issue with all these techniques is that they cannot be ported to the clouds, since they strongly rely on the underlying network topology, unknown at the user-level. Traditional techniques commonly found in scientific computing, e.g. relying on parallel file systems are not always adequate for processing big data on clouds. Such architectures usually assume high-performance communication between computation nodes and storage nodes. This assumption does not hold in current cloud architectures, which exhibit much higher latencies between compute and storage resources within a site, and even higher ones between datacenters.

3. Detecting Errors in Big Sensor Data on Cloud Big data is a collection of data sets so large and complex that it becomes difficult to process with on hand database management systems or traditional data processing applications. It represents the progress of the human cognitive processes, usually includes data sets with sizes beyond the ability of current technology, method and theory to capture, manage and process the data within a tolerable elapsed time. Big data has typical characteristics of five „V‟s, volume, variety, velocity, veracity and value [1]. Big data sets come from many areas, including meteorology,

connectomics, complex physics simulations,

genomics, biological study, gene analysis and environmental research. Since 1980s, generated data doubles its size in every 40 months all over the world. In the year of 2012, there were 2.5 quintillion bytes of

data being generated every day. Hence, how to process big data has become a fundamental and critical challenge for modern society. Cloud computing provides a promising platform for big data processing with powerful computation capability, storage, scalability, resource reuse and low cost and has attracted significant attention in alignment with big data.

(4)

4 WSN big data error detection commonly requires powerful real-time processing and storing of the massive sensor data as well as analysis in the context of using inherently complex error models to identify and locate events of abnormalities. The system uses an error detection approach by exploiting the massive storage, scalability and computation power of cloud to detect errors in big data sets from sensor networks. Some work has been done about processing sensor data on cloud. Fast detection of data errors in big data with cloud remains challenging. Especially, how to use the computation power of cloud to quickly find and locate errors of nodes in WSN needs to be explored. Cloud computing, a disruptive trend at present, poses a significant impact on current IT industry and research communities. Cloud computing infrastructure is becoming popular because it provides an open, flexible, scalable and reconfigurable platform. The error detection approach is based on the classification of error types. The defined error model will trigger the error detection process. The approach on cloud will be designed and developed by utilizing the massive data processing capability of cloud to enhance error detection speed and real time reaction. In addition, the architecture feature of complex networks will also be analyzed to combine with the cloud computing with a more efficient way. Complex network systems are divided into scale-free type and non scale-free type. Sensor network is a kind of scale-free complex network system which matches cloud scalability feature. The error detection approach on cloud is specifically trimmed for finding errors in big data sets of sensor networks. The main contribution of the detection is to achieve significant time performance improvement in error detection without compromising error detection accuracy.

4. Problem Statement

The cloud computing environment support

resources for big data process. Sensor networks produces huge amount of data values. Sensor network can not process the data values with its own resources. The sensor clouds are constructed to process the

sensor data values using the cloud resources. The sensor data values are transferred to the cloud

environment for data analysis. Storage and

computational resources are allocated for the sensor data process. The big data is prepared using the sensor data collections. Error detection methods are used to discover the errors in sensor data values. The following drawbacks are identified from the existing system.

 Error correction operations are not supported

by the system

 Noisy data cleaning tasks are not handled by

the system

 Data recovery operations are not supported by

the system

 Sensor cloud construction and data

communication are not optimized

5. Sensor Data Processing Framework

Wireless sensor networks are constructed to

monitor the environment. Environment information is collected by the sensor nodes. All the data values are stored under the sensor nodes. High scalable data processing is not possible in the sensor network environment. The sensor data values are collected and combined to form big data resources. The big data preparation and processing tasks are carried out with the cloud resources. The sensor network and cloud environment are integrated to form the sensor cloud architecture. Error correction is the main task to discover the errors with location information under the sensor cloud environment. The sensor cloud construction process is improved with optimized network partitioning methods. The error detection and correction operations are integrated with the system. Big data cleaning process is applied to remove noise from the big data values. Data recovery methods are applied to regenerate the original data values. Data summary models are used to reduce the resources.

(5)

5

environment is used to provide resources to the users. The hardware and software resources are provided for the computational and data sharing tasks. The cloud storages are used to store huge volume of data values. The big data manages the high scalable, high volume and high velacity data values. The sensor data values are combined to construct the big data sets. The sensor cloud is build to manage the sensor data values with data processing support. The sensor data values are maintained under the cloud environment. The error detection and correction operations are carried out under the sensor clouds. The sensor data values are transferred to the cloud for the processing tasks. The Model baed Error Detection Scheme (MEDS) is used to handle the error detection process. The error detection and correction operations are carried out with the Model based Error Detection and Correction Scheme (MEDCS). Error detection, correction and location operations are handled with cloud resources. The sensor cloud system is divided into five major modules. They are WSN construction, capture process, data management under cloud, error detection and correction and query process. The WSN construction module is designed to handle the sensor deployment operations. The data sensing is performed under the capture process module. The data management in cloud module is designed to perform the data collection and update operations. The error detection and correction module is designed to perform the data processing tasks. The user data access is carried out under the query process module.

The WSN construction module is designed to set up the sensor nodes. The sensor nodes are deployed with user selection parameters. Sensor count, sink count, sensing type and delay parameters are collection from the user. The WSN setup form shows the parameter collection process. The node list shows the list of sensor nodes deployed in the network. The coverage and energy details for all the sensor nodes are listed in the separate form. The network view shows the graphical view of the sensor network. The capture process module is designed to handle the data sensing operations. The environment information such

as temperature and pressure data values are captured and stored in the sensor local storage. The node name, IP address, data value, unit value and sensed time details are updated into the database. The capture details are listed for the selected sensor node.

The cloud resources are used to provide the storage and data processing services. The cloud resources are connected with the sensor network to form the sensor clouds. The data values are collected from the sensor nodes. The data collection is carried out with the sink nodes. The data values are received and updated into the cloud storage space. The summary view shows the summarization process results. The node and its data capture status are listed in the data summary. The optimized network partitions are constructed with region information of the sensor node. The error detection and correction module is designed to handle the data processing under clouds. The Model baed Error Detection Scheme (MEDS) is adopted for the fault detection process. The error details are listed in the error detection report. The Model based Error Detection and Correction Scheme (MEDCS) is applied for the error detection and correction tasks. The data prediction results are listed with corrected values. The query process module is designed to handle the data request process. The user query values are redirected to the cloud environment. The query response is prepared and distributed by the cloud resources. The sensed data and aggregated data values are produced in the query results. The query values and its response details are updated into the log data files.

6. Conclusion

(6)

6

fetched with reference to the cluster information. The system reduces the error detection latency and computational complexity. The error detection accuracy is also improved by the system.

REFERENCES

1. Radu Tudoran, Alexandru Costan and Gabriel

Antoniu, “OverFlow: Multi Site Aware Big Data Management for Scientific Workflows on Clouds”, IEEE Transactions On Cloud Computing, Vol. X, No. X, August 2014

2. T. Kosar, E. Arslan, B. Ross, and B. Zhang,

“Storkcloud: Data transfer scheduling and

optimization as a service,” in Proceedings of the 4th ACM Science Cloud ‟13, 2013, pp. 29–36.

3. S. Pandey and R. Buyya, “Scheduling

workflow applications based on multi source parallel data retrieval,” Comput. J., vol. 55, no. 11, pp. 1288– 1308, Nov. 2012.

4. L. Ramakrishnan, C. Guok, K. Jackson, E.

Kissel, D. M. Swany, and D. Agarwal, “On-demand overlay networks for large scientific data transfers,” in Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, ser. CCGRID ‟10, 2010.

5. T. J. Hacker, B. D. Noble, and B. D. Athey,

“Adaptive data block scheduling for parallel tcp streams,” in Proc. of the 14th

IEEE High Performance Distributed Computing, ser. HPDC ‟05, 2005.

6. W. Liu, B. Tieman, R. Kettimuthu, and I.

Foster, “A data transfer framework for large-scale science experiments,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010, pp. 717–724.

7. C. Raiciu, C. Pluntke, S. Barre, A.

Greenhalgh, D. Wischik, and M. Handley, “Data center networking with multipath tcp,” in Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, ser. Hotnets-IX, 2010, pp. 10:1–10:6.

8. W. Liu, B. Tieman, R. Kettimuthu, and I.

Foster, “A data transfer framework for large-scale science experiments,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010, pp. 717–724.

9. Esma Yildirim, Jangyoung Kim and Tevfik

Kosar, “Application-Level Optimization of Big Data