13 Page 13-17 © MAT Journals 2019. All Rights Reserved
Big Data Approach towards Networking, Data Transfer, IOT and
Security: A Review
Anju Abraham
Assistant Professor
Department of Computer Science and Engineering, T. John Institute of Technology, Bangalore,Karnataka, India
Email: [email protected]
DOI: http://doi.org/10.5281/zenodo.3413934 Abstract
Big Data has emerged as an advanced discovery for large scale data collection. Data generated by large instruments that generates semi structured and unstructured information. This information must be analysed using big data tools and must be listed and stored in such a way that it can be used for further references and usages. Data transfer is now an essential function for various discoveries, particularly within big data environments. There are various time-constraints, performance constraint. Combing the technology of Internet devices and Big Data which helps in analyzing and processing a large number of data sets that is collected from heterogeneous sources, which in turn are characterized by weak structuring and high speed of updating. Malicious attacks that can eventually prevent network services are unavoidable. Thus, in this Big data technology, particularly machine learning, has been widely used for intrusion/anomaly detection, little has been done in networking that is developed by software.
Keywords: Anomaly detection, heterogeneous sources, information security, internet devices, malicious attacks, semi-structured data.
INTRODUCTION
Big data has been used in the various way including researches. Greater scientific gadgets produces large volume of variety information’s. Time, speed and performance are the constraints related to this especially high performance conditions which need to be dealt with. There is always an exponential growth of information that is collected from these gadgets. To fully utilize these expensive computing facilities, ultra-high-throughput data transfer capabilities will be required to move here and there as per the requirement. Scientific applications typically have explicit or implicit time constraints on data transfer. Based on the nature of these time constraints, data transfer tasks can be classified into three broad categories: (a) Real-time, (b) deadline-bound, and (c) background data transfer. Cyber-physical systems (CPSs)
are everywhere in the various industries from infrastructures such as transportation including aviation sector, railways etc. and power grid systems to automated operations in various industry to services and support in business organizations like IT sector, mutual funds and insurance. There are many attacks like, flow table saturation attack, saturation attack, Topology based poisoning attack, spoofing attack, denial-of-service attack, eavesdropping attack on network infrastructures today which not only tend to compromise the availability of networks, hosts and services and supports, but also the confidentiality and integrity of the network data. Identification and tracking of these kinds of attacks are very difficult. Big data instruments and algorithms are widely used to disallow these attacks, especially supervised way of
14 Page 13-17 © MAT Journals 2019. All Rights Reserved mechanisms, machine and deep learning
has a greater contribution with respect to intrusion detection and DoS in the field of networking and its related software.
BACKGROUD AND RELATED WORK
Data Transfer Techniques and Services Several data movement tools and technologies have been developed, such as TCP-based file transfer protocol that is grid based [7] and user datagram protocol based on user datagram technology [3]. Networks include various usage of protocols like TCP and IP and UDP for transfer of data. Network resource utilizations are limited by the congestion control algorithm of TCP and UDP protocols.
Huge data transfer causes abnormal condition of networks, for example failure of server or node, the nodes contain data node, data collection canters, name nodes poor fibres, and malfunctioning of line that transfer data. Researchers have developed several data transfer services and methodologies on tops of data transfer tools and techniques like grid based file transfer to automate bulk data transfer. Private industry has also provided several data transfer Services like Dropbox, Hightail, Akamai, and Windows Azure and cloud database like mongoDB. However, these services typically are not suitable for big science data transfer. NBS and its Challenges
NBS is a network architecture that removes the entangles of the control plane from data forwarding, thus allowing the network control to be directly programmable [1, 2]. The promise of NBS is that it allows network resources to be managed and reconfigured automatically and dynamically, offering immense performance advantages for network operations. In prior work [6, 7] designed a dynamic flow scheduling system that
adaptively scheduled the switching fabric to reduce the upcoming collision due to heavy traffic in the network that can be controlled by using various protocols. Big Data Scheduler
The big data fast scheduler implements a time constraint and condition based scheduling mechanism to schedule and allocate resources for transferring bulk data that is structured, semi structured and structured. Each resource is estimated, calculated, and converted into a rate that can be framed into data transfer tasks. The big data scheduler assigns rates for transferring data in terabytes and pet bytes in the following order of priority: real-time data transfer tasks à deadline-bound data transfer tasks and background data transfer tasks. Transfer rates are assigned to real time data transfer tasks on as per the requirement basis. The scheduler schedules and assigns resources for data transfer tasks in two modes: (a) an event-driven mode, when a data transfer task is completed; and (b) a periodic mode to periodically reschedule and reassign resources for data transfer tasks to adapt rapidly to changing run-time environments. Real-time data transfer tasks, it would be naïve to make a one shot reservation for a particular data transfer task throughout its duration because run-time environments (e.g., traffic load, network, and storage conditions) will change with time. Therefore, the scheduler runs in a periodic mode to reschedule and reassign resources for a data transfer task as it depends on the deadline and size of the data set. Based on the type of the mode that is event driven and periodic the scheduler does the following task like: Admission control, Resource tracking and calculation based on usage of resources, allocating resources based on the traffic in the network and data transfer rates, brokering the resources as per the requirement reducing and balancing load on the network, assigning the resources to satisfy end to end services and support.
15 Page 13-17 © MAT Journals 2019. All Rights Reserved The available resources are shuffled
among various tasks.
BIG DATA IN NETWORKING
The large amounts of data is collected in regular basis from sensor networks that are normally stored in local databases or public cloud storage. Data generated by sensor networks can be in many different formats such as text, images, videos and audio files. The processing of these data sets can be either in batch or real time processing. Map Reduce is widely and largely used for distributed processing, it’s not already clear whether it is suitable for sensor network data [2]. In several of this research work goal is to implement and test executions of map reduce functions applied to sensor network data. The body and subject clients through a web portal report the location of their data to the portal and select features available for it. The portal communicates with a cloud infrastructure that and performs the processing selected by the client, it then returns the results to the portal. Items such as replication factor, block size, and scalability will be tested and compared.
However, the majority of the separating systems can't deal with regular changing situation of spam sends embraced by the spammers over the time. In this way improved spam control calculations or upgrading the effectiveness of different existing information mining calculations to its fullest degree are the most extreme
necessity. A similar report is introduced on different spam sifting systems embraced based on different ascribes to discover best among all to separate the best outcomes. Hadoop has been chosen as the framework to process sensor network data in a cloud computing infrastructure because of the following reasons:
• Hadoop is a framework for processing huge amounts of unstructured data • HDFS is appropriate for a
write-once-read-many model: once data is collected, sensor network data are used normally for read operations and not for write.
• Hadoop provides high scalability and expandability.
SCENARIO ON SECURITY ATTACKS
Using a big data analytic approach ultimately involves utilizing an analytic and parallel algorithm in a distributed computing framework in order to cope with enormous amount of data that do cannot be accommodated in one machine. A variety of such frameworks exist with different computing paradigms like Map Reduce, Storm, Spark, pig Hive etc.). Case 1: Smaller Network
An attacker Keep a goal to create a forged link between switches 1-2, giving a link fabrication attack.
16 Page 13-17 © MAT Journals 2019. All Rights Reserved For ARP spoofing attack, victims hosts are
X and Z for spoofing. The attacks are indicated as thick lines for two events of discovering link attacks and circles for two events involved in spoofing attacks (Figure 1).
Case 2: Medium Network
By compromising host X, an attacker goals to create a forged link that cannot be
detected between switches 1-2, giving a link fabrication attack. At a different time slot, host Z got compromised and fake link between switches 16-17 was created. For ARP spoofing attack, X and Z are victim hosts that are prone to spoofing attacks. The attacks are indicated as thick lines for two events of link discovery attacks and circles for two events of ARP spoofing attacks (Figure 2).
Figure 2: Medium network. In the ARP spoofing attack, an attacker
pretend to spoof information and messages that is being send by different nodes of the target's node or poison and corrupt the target node's which is contained in IP/MAC addresses of all nodes in the network that is maintained in the table. The attacker can use various tool [4] to scan through the network to identify vulnerable node to compromise. This helps attacker to come to a conclusion about the compromised node. From the compromised node, the attacker sends a message to a target's node pretending to be a legitimate node in the network.
IOT IN BIG DATA ENVIRONMENT The IoT applications [3−5] and hardware devices are listed assets of integrated and connected expandable based on the changing environment that is found as devices or instruments making use of the communication protocols like UDP for transferring messages and information
passing between various communication devices like servers and clients. In this case the information exchange can take place without any human intervention and guidance. Internet devices have sets of characteristics formed in accordance with the given tasks in certain research profile. Because of the incompleteness of the database terminology formation, it is reasonable to provide the basic definitions and terms specific to this modern innovation class of information and communication technologies. The basic concept of Internet devices technology is the implementation of the paradigm. According to which "almost all Internet devices are interconnected", which transforms into the implementation of the following characteristics:
• Convergence provides the ability to process random types of data that is semi structured and unstructured by different means.
17 Page 13-17 © MAT Journals 2019. All Rights Reserved and at any time whenever it is
required.
• Connection provides means for communication with any network by any means.
• Content available from anywhere at any time without content limitation. • Calculations accessing to everyone
who knows the protocol of operation. • Collections a set of services or certain
service available to solve the random list of tasks.
There are four types of functional tasks, which solution involves the use of Big Data:
1. Data storage and management. The volume of Big Data does not allow the use of traditional relational databases. 2. Unstructured data processing and
unstructured information processing. 3. Big Data analysis is complex.
4. Scaling. CONCLUSION
Security challenges in NBS network have been addressed is based on providing the security to the network. This attacks are predicted based on machine learning techniques and algorithms. Predictions are based on the presence of attack but also attack type by a predictive model. Cloud computing, Big Data, and machine learning and supervised methods of data processing and mining as well as their accumulations are used by various IoT, instruments for their collection of data in voluminous range by making use of various integrated algorithm and mechanisms.
REFERENCES
1. Duda Oleksii, Matsiuk Oleksandr, Karpinski Mikolaj (September 2018), “Information Technologies of Internet Devices and Bigdata in the Smart
Cities Projects”, CSIT.
2. Emre Unal, Sonali Sen-Baidya and Ratti Korrn (2018), “Towards prediction of Security Attacks on Software Defined Networks: A Bigdata Analytic Approach”, IEEE International Conference on Big Data. 3. Qiming Lu, Liang Zhang, S
Sasidharan, Se-young Yu, Jim Hao Chen (2018), “Big Data Express: Toward Scheduling, Predictable and High-Performance Data Transfer”, IEEE.
4. SAR Shah, W Wu, L Zang, S Sasidhara, P De Mar, C Guok (2018), J Macauley, E Pouyoul, J Kim, S Amoeba, “Net: An SDN-enabled network service for big data science”,
J. Network and Computer
Applications, Volume 119, pp. 70−82. 5. Jim Basney, Terry Fleury, Jeff Gaynor
(September 2014), “CILogon: A Federated X.509 Certification Authority for Cyber Infrastructure Logon”, Concurrency and
Computation: Practice and
Experience, Volume 26, Issue 13, pp. 2225−2239.
6. Jain Sushant et al. (2013), “B4: Experience with a globally-deployed software defined WAN”, ACM Sigcomm Computer Communication Review, Volume 43, Issue 4.
7. Abdel Salam, AM, A El-Sisi, V, V Reddy (2016), “Mitigating ARP Spoofing Attacks in Software-Defined Networks”, Proceedings of the International conference on Computer Theory and Applications (ICCTA).
Cite this article as: Anju Abraham. (2019). Big Data Approach towards Networking, Data Transfer, IOT and Security: A Review. Journal of Web Development and Web Designing,
4(3), 13–17.