1 Page 1-10 © MAT Journals 2016. All Rights Reserved
Analysis of Real-Time Big Data: Its Applications and Challenges
Anika GuptaDepartment of Computer Science and Engineering, Punjab Technical University, Jalandhar, India E-mail: [email protected]
Abstract
Timely analytics over massive information could be a key issue for achievement in several business and repair domains. Some samples of these domains embody finance, transportation, energy, security, military, and emergency response. Many massive information applications in these domains admit quick and timely analytics supported out there information to form quality selections. This paper surveys time period massive information analytics applications and their technical challenges.
Keywords: Real-time processing, big data, data analytics
INTRODUCTION
Big data refers to the large information sets obtained from numerous sources cherish science experiments, detector systems, social network activities, telecommunications information, cameras and police investigation systems, and daily business and monetary transactions. Firms collect volumes of knowledge regarding their customers, suppliers, operations, and business transactions. Sensors that area unit embedded in roads, buildings, mobile phones and devices, vehicles, and sensible energy meters generate trillions of bytes of period info. Additionally, billions of people round the world contribute on an
everyday to the increasing size and handiness of huge information through social media sites and applications. Such information supplies a good chance to know and use it in useful applications. However, the sheer size of such information sets pause new technological challenges in terms of storage capability and management, organization, process and analysis.
With big data, businesses and organizations will learn additional regarding their businesses and operations and that they will translate that data into improved deciding processes and higher
2 Page 1-10 © MAT Journals 2016. All Rights Reserved performance. The provision of huge
information will be utilised to reinforce performance in numerous application domains comparable to attention, finance, transportation, education, security, and governmental designing. Some special forms of huge information applications have to be compelled to build period of time selections to reinforce the performance of their operations or services or to extend profits. Samples of these applications are intelligent transportation, monetary market commerce and police investigation, military operations deciding, good grids, and emergency response. These applications have to be compelled to take care of current and historical large information to perform quick analysis aiding in instant deciding. Slow or untimely information assortment and analysis resulting in delayed selections will considerably scale back the performance of the large information applications. Such applications have variety of technical challenges to supply data processing thanks to the huge information obtainable for optimum or close to optimum deciding.
BACKGROUND
The term “Big data” refers to data sets that typical management systems cannot store, manage, and analyze due their large sizes.
Big data sizes vary from many dozen terabytes to multiple petabytes. Big data applications typically request new data and gather intelligence from the information and convert that into business benefits in terms of enhancing operations and profit and reducing risks and overheads. Additionally, utilizing big data will give variety of latest kinds of advanced services that have only recently close to past been introduced or are going to be introduced in the near future. These services enhance the standard of life and facilitate cut back risks and threats.
Volume
It was estimated that around 2.5 exabytes (one billion gigabytes) of data are created from different sources each day in 2012. This number is doubling every 40 months. The expanding use of the Internet, mobile devices and smartphones contributes significantly in generating this huge data. Another source of this data is sensors and sensor systems that are embedded in our life. Different types of sensors are available in roads, building, cars, different environments, and factories. These sensors collect massive amounts of collected information. Some of these sensors are fixed such as road and environmental sensors while others are mobile such as sensors installed in smartphones and
3 Page 1-10 © MAT Journals 2016. All Rights Reserved vehicles such as cars and unmanned aerial
vehicles (UAVs).
Velocity
Huge amounts of new data are created every second and added to existing big data sets. In most big data applications, this new data needs to be included in the analysis and real-time decision making. Organizing, accessing and processing the data as it is collected to be included in the decision making in real-time applications are usually considered a complex technical challenge.
Variety
Data is usually collected from different sources and in different formats. These sources can be messages, images and videos posted to social networks, readings from sensors, business transactions, and economic and political news. Most of the sources of big data are relatively new due to the recent introduction of online social networks and smartphones.
The collected data can be structured or unstructured data. It can also have different formats. Some of the data is produced in a random fashion while some is frequently or periodically produced. In the latter case, the data can also differ in the production frequencies.
APPLICATIONS
Real-time applications dissent from regular applications in one major attribute. Period applications believe instant input and quick analysis to arrive to a choice or action among a brief and really specific time line. In several cases, if a choice cannot be created among that timeline, it becomes useless. As a result, it is necessary to form all knowledge necessary for such call accessible during a timely fashion which the analysis is completed during a quick and reliable manner. There are unit completely different period massive knowledge analytics applications and that we discuss samples of these applications during this section.
Intelligent Transportation
One of the most important applications of real-time big data analytics is to enable intelligent transportation systems. These days, different sensing technologies are available to monitor traffic conditions in big cities and crowded streets. These sensing technologies are divided into two groups: road sensors and vehicle sensors. Examples of road sensors are road monitoring cameras, road tube axie sensors, vehicle inductive loop sensors, capacitance mats, and piezoelectric axie sensors [1]. Vehicle sensors include on-board cameras, GPS systems, proximity
4 Page 1-10 © MAT Journals 2016. All Rights Reserved sensors, and speedometers. These sensors
can be associated with communication capabilities such as GSM, satellite communications, WiFi, and Bluetooth to provide real-time monitoring for different conditions such as vehicle locations, average speeds, and driving behaviors of drivers as well as road conditions. The roads and vehicular sensors can generate big data that can be utilized to provide advanced intelligent transportation services in big cities. One example of such services for end users is a real-time service that provides information about shortest-time routes to any destination from the current location based on the current traffic conditions. Such a service cannot be provided without having a full picture of all roads and traffic conditions in the city where the service is provided. While this service is very useful for regular user vehicles, it is extremely important for emergency vehicles such as fire engines, ambulances, police cars, and public utility emergency cars. Current GPS systems offer drivers best route information; however, many cannot respond quickly enough to sudden changes in traffic conditions such as accidents or roadblocks. In addition it may not be possible to provide on-time information to vehicles to clear the road efficiently for emergency vehicles [2].
Within an equivalent line alternative a lot of advanced intelligent transportation services are provided admire a service that has finish users with data concerning most energy-efficient routes from any current vehicle location to any outlined destination purpose. These best energy-efficient routes cannot be outlined with solely perceptive this traffic conditions, however, additionally the previous traffic experiences, the characteristics of energy consumptions within the vehicles, and, therefore, the drivers’ driving behaviors. All this data is required to seek out the foremost energy-efficient route to a destination. Another advanced service that may be provided for merchandise delivery drivers may be a period of time and dynamic resolution of the bagman drawback for outlining the order and routes of the delivery points specified delivery time is reduced. This delivery order and routes is dynamically modified throughout the trip because of this traffic conditions to fulfil with the target of minimizing the entire delivery time [3].
Financial Market Trading and
Surveillance
Huge amounts of economic information square measure generated each second for stock and choice trades from multiple markets, currency exchange rates, interest
5 Page 1-10 © MAT Journals 2016. All Rights Reserved rates, and trade goods costs. This
information is not solely massive, however, conjointly terribly dynamic. Corporations and organizations will use this dynamic massive information to sight opportunities and threats and to quickly react to them. Samples of these opportunities are: predicting will increase or decreases in costs of some securities before an amendment really happens. The timely reaction to such opportunities is shopping for securities before their costs increase or commercialism some before the costs drop. Several sold securities are re-purchased at lower costs later to extend profits. The sooner such choices square measure created the upper the possibilities of creating additional profit. This sort of opportunities desires sensible forecast models that admit each current and historical info. In terms of economic threats, examples embrace fraud and misappropriated activities. The timely detection of exchange exchanges’ fallacious and misappropriated activities admire market manipulation, value rigging, and unknowledgeable trading helps stop such activities quickly, therefore, it improves market performance and secures investors transactions. The money opportunities cannot be utilised and, therefore, the threats cannot be prevented unless the detection method is
incredibly quick. These money opportunities and threats is discovered by humans; but, they sometimes would like a protracted time and, therefore, the opportunities would not be obtainable any further and, therefore, the threats can become before they will stop them [4]. There square measure some machine-driven commerce systems developed recently to get and react to money opportunities. In addition, there are some period of time fraud discoverion systems developed recently to detect and forestall money threats in a very timely manner. Such systems presently have faith in smaller and restricted information sets to attain period of time performance. However, for these systems to be more practical, they have to touch upon all accessible information. This money information is typically and changes dynamically, that introduces many technical challenges for such applications. Historically, data processing and Business Intelligence Techniques are primarily utilized in the money sector to find some opportunities and threats. These sorts of techniques solely work on historical information and wish high involvement from the business users. Although, these techniques are well-tried to supply helpful info for long and middle level business coming up with for the money sector,
6 Page 1-10 © MAT Journals 2016. All Rights Reserved these techniques have terribly giant
latencies with reference to detection and response. Therefore, they cannot be effectively used for timely detection and hindrance processes for a few money systems cherish machine-driven commercialism and fraud detection systems. These systems got to touch upon information streams and should deploy in no time detection and reaction processes [5].
Crowd Control
Crowd control is important for police and emergency response teams when large events take place. Examples of such events are major sports games, concerts, parades and outdoor celebrations like New Year’s Eve or the 4th of July festivities. Being able to watch and predict crowd movements and make immediate decisions such as opening more parking lots, closing some streets to facilitate pedestrian movement, or increasing police presence in some areas becomes important. This can be done using two types of sensing and tracking technologies. Traffic monitoring through traffic sensors and vehicle tracking allows for better information about incoming traffic into the event’s area thus allowing for better control on where traffic is directed. In addition using GPS and location tracking apps on the peoples’
mobile devices allows for tracking and learning about the people and their movement and concentration patterns in the area. Using the trailing applications in real time will give associate degree correct read of wherever everyone seems to be and determine overcrowded areas. Victimisation this data, police forces will be higher distributed to own optimum management over the world and very best locations to produce their services. Given the massive variety of individuals which will be gift in one location for a happening, the number of information collected is large and needs to be organized and analyzed on the spot to create choices now. What is more, in such events the possibilities of accidents and ruinous incidents additionally increase. With correct trailing associate degree analysis systems the authorities become a lot of capable in distinguishing the placement of an accident and now designing response methods equivalent to clearing routes for emergency response vehicles, evacuating the individuals within the space and providing the required support services during a timely fashion. However, to handle this sort of deciding, it is necessary to handle amounts of information should be analyzed that not a simple task is given the present technology.
7 Page 1-10 © MAT Journals 2016. All Rights Reserved
Smart Grid
A smart grid could be a restored electrical grid system that uses info and communication technology to gather and act on offered information, reminiscent of info regarding the behaviors of suppliers and customers, in an automatic fashion to feature some values. It improves the potency, reliableness, economics, and property of the assembly and distribution of electrical power. A sensible grid uses computer-based remote with two-way communication technology between power producers and customers to extend grid potency and reliableness through system self-monitoring and feedback. This involves inserting sensible sensors and meters on production, transmission, and distribution systems additionally to customers to induce granular close to period information regarding the present power production, consumption, and faults. It implements dynamic rating models for power usage to rid peaks by applying high charges throughout peak times and lower charges throughout alternative periods. This helps avoid potential power outages thanks to high client demands. It will give customers with close to period info regarding their energy use and permit them to manage their usage supported each their desires and their reasonable costs. Client devices
reminiscent of laundry machines and water heaters is more cost effective by dominant them mechanically to work throughout lower costs. Though the sensible grid has several potential edges, it needs the gathering of big quantity of information from power procedures, transmissions, distributors, and customers. Additionally, it needs process the collected information that is taken into account massive information, in period to remit some management info to boost overall the potency, reliableness, economics, and property of the complete wattage system. Necessary applications in sensible grids reminiscent of power load prognostication, power usage analysis, grid failure detection involve massive scale information and need quick process and communication mechanisms that cannot be handled by the present business process and communication platforms created for general business usage.
CHALLENEGES
There are a number of challenges facing the design, implementation and operations of real-time big data applications. Most regular big data applications are implemented using an open-loop approach. In this approach big data for a specific domain is analyzed to obtain some new information and knowledge that can be
8 Page 1-10 © MAT Journals 2016. All Rights Reserved used to enhance the operations or
profitability of that domain. The time needed to analyze the big data and make decisions is generally very long and could significantly reduce the benefits and effectiveness of the applications. Unlike regular big data applications, real-time applications must initiate fast actions that are usually bounded by specific time frames dictated by the targeted domain. Real-time big data applications are usually implemented using a closed-loop approach in which actions are usually based on the current and previous situations. To understand the challenges of designing, implementing, and operating real-time big data applications, we need to first understand their common action steps which can significantly impact the action completion times. Action times in real-time big data applications can be divided into following steps.
Real-Time Event Transfer
All current distributed application events should be transferred in real-time to where they can be processed. These events can be transferred from their distributed sources as raw events or as filtered or aggregated events. All generated current row, filtered, and aggregated events can be transferred to a centralized processing point or to distributed intermediate processing points
for pre-processing or for further filtering and aggregation before being transferred to the main decision making unit. The centralized approach is good if the current generated events are not huge and there are no limitations on the network resources used to transfer these events. The distributed approach is good if there are huge events such that it is impossible to transfer all the generated events to a single location. Filtering and aggregation will become important in this case. This can be done at the event sources and the intermediate points using an open-loop or a closed-loop approach. In open-loop approach filtering and aggregation policies are pre-defined while in closed-loop approach filtering and aggregation policies are interactively defined based on the current events and decisions, current system and network resources, or external business policies. In both approaches, event filtering and aggregation should be done to reduce network traffic and processing time and without negatively affecting the accuracy and optimality of the decision making in the real-time big-data application.
Real-Time Analytics
This involves invocations of real-time analytical services to determine the root causes for business and operational
9 Page 1-10 © MAT Journals 2016. All Rights Reserved situations and exceptions. Real-time
analytical processing may involve single or multiple integrated analytical services. These real-time analytical services should predict the performance and assessment of the risks for changing the operations and business environment. It relies on real-time information passed from the situation discovery process and other offline stored big data such as maps, previous transactions, situations and decisions. Dealing with stored big data is usually challenging. However, it can be abstracted into smaller data sets such that it can significantly enhance the execution time of analytical processing. Providing abstracted info to reinforce time interval also can be terribly difficult. Time period analytical services have to be compelled to deploy quick algorithms that give different choices at intervals a delimited time. These choices may be best or semi-optimal because of the restricted time and resources on the market. As this half may well be terribly process intensive, high performance platforms area unit typically required.
These platforms may be equipped with special hardware to execute special algorithms or with cluster computing systems.
Real-Time Decision Making
Based on the time period analytics results it is attainable to pick the most effective possibility for up these business operations or gain and verify the foremost applicable actions for a response to the business or operational setting. Configurations of deciding are often done by process business or operational rules that are developed by domain consultants and derived from strategic choices. These rules facilitate too interactively and showing intelligence reply to evolving business and operational things. A call method checks for predefined business and operational rules to require specific business or operational actions supported the choices provided by the time period analytics modules. It can even give notice accountable people or trigger alternative processes to conduct additional actions. The foremost difficult half is in a way to outline business and operational rules that facilitate correct and timely choices creating.
CONCLUSION
Big data assortment and analysis needs long periods of your time to finish. As a result period of time applications cannot right away get pleasure from massive knowledge analytics. However, several things and applications ought to modify
10 Page 1-10 © MAT Journals 2016. All Rights Reserved massive knowledge and generate leads to
period of time. During this paper, we tend to explore the various application domains that might get pleasure from massive knowledge applications in operation in real time. Rescue operations, battle field ways and choices, money operations and control area unit a number of these application domains. To make effective real time massive knowledge applications many challenges got to be addressed including: real time event (data) transfer; real time state of affairs (exceptions) discovery; real time analytics; real time deciding and capital punishment real time responses. Understanding the character of those applications and, therefore, the challenges ahead is that the initiative to making simpler and economical applications during this domain. Ensuing step is to spot the foremost appropriate solutions to those challenges and building real time massive knowledge applications that area unit reliable and capable of meeting the $64000 time demands. The effective approaches to deal with these challenges can extremely improve the event and use of real time massive knowledge applications, that successively can give varied benefits appreciate saving lives, up the standard of life, reducing risks and enhancing profitableness..
REFERENCES
1. Available at: http://www.naturalspublis hing.com/files/published/v6910010rnl 56m.pdf.
2. Available at: http://www.ijarcce.com/u
pload/2016/february-16/IJARCCE%2015.pdf.
3. Available at: http://www.pentaho.com/ assets/pdf/CqPxTROXtCpfoLrUi4Bj.p df.
4. Available at: https://journalofbigdata.s pringeropen.com/articles/10.1186/s405 37-014-0007-7.
5. Available at: http://www.cs.unibo.it/~ montesi/CBD/Articoli/SurvyBigData2. pdf.