With the rapid growth of the Social Web, increasingly more Web users have posted and extracted viewpoints about products, people, or political issues via a variety of online social media such as Blogs, forums, chat-rooms, and social networks. The big volume of user-contributed contents opens the door for automated extraction and analysis of the sentiments or emotions referring to the underlying entities such as consumer products. Sentimentanalysis is also referred to as opinion analysis, subjectivity analysis, or opinion mining  . Sentiment analy- sis aims to extract subjective feelings about some subjects rather than simply extracting the objective facets about these subjects . Analyzing the sentiments of messages posted to social networks or online forums can generate countless business values for the organizations which aim to extract timely business intelligence about how their products or services are perceived by their customers . Other possible applications of sentimentanalysis include the analysis of the propaganda and activities of cybercriminal groups who pose serious threats to business or government owned web sites .
User Behavior Analytics (UBA) uses bigdata and machine learning algorithms to assess the risk, in near-realtime, of system user activity within your organization. Why is this analysis necessary? Think about it: everyday, your employees are using user credentials to access the organizations systems from the company office during regular business hours. One day you are notified that an individuals credentials were used to connect to a database server and run queries that this user has never performed before. Is a database administrator running maintenance checks or has the system been compromised? User behavior analytics can help an organization determine what normal behavior should look like within their systems and when to be cautious of unusual activity. According to the recent SANS Analytics and Intelligence Survey, only about one-third of organizations today collect user behavior monitoring data, but approximately three- fourths of respondents say they intend to start collecting this data in the future. Understandably souser behavior analytics offer visibility into potential insider threats, show early red flags for when accounts have been compromised by external attackers and are most useful to measure changes in user behavior. Ultimately, the foundation of a behavior analytics program is to understand what normal behavior looks like to catch irregularity in the system. Below are 3 key areas to focus on when establishing behavior analytics and measuring user behaviors.
The most appropriate architecture of the quality of service for the monitoring of the mobile bigdata and the interrelate networks requires the application of several source nodes and the actions of the flows. In order for the quality of service to be fully integrative and adjustable to the realtime flows there are four main components that play in hand. The scalable qual- ity of service, congestion control, the adaptive band- width management and the data call admission control The overview of the mobile computing was initially coined in after the cloud computing concept. In the current information technology scenery the mobile cloud computing has attracted a lot of attention from many industry players thus making an area of interest (Pandey, Voorsluys, Niu, Khandoker & Buyya, 2012). In order to fully understand the workability of the MMC and its relationship with the bigdataanalytics is very essential to have clear overview of the architec- ture up on which the Mobile cloud computing relies on. The full structure operates up on several layers and components. At the users end the interaction the leads to the determination of the requirements of the quality of service is provided by the Software as the service this includes the Microsoft live mesh, the android play store and the apple cloud. Subsequently, this develops to the platform as a service this mainly involves the actual engines that operates the wares and the soft- ware’s, in this case includes the Google app engines and the Microsoft azures. The other layer that help in the determination and the analysis of the quality of service for the realtime Mobile computing environ- ment is the Infrastructures for the cases of the mobile environment this includes the EC2 and B3. The final layer is the data centers that help in the storage and the management of the mobile bigdata (Nguyen, Nguyen & Huh, 2013).
Background: Sentimentanalysis becomes ubiquitous for a variety of applications used in marketing, commerce, and public sector. This has been raising a natural interest within the academic research and industry to develop approaches and solutions for ubiquitous sentimentanalysis. However, we can observe that most of the academic research focuses on adopting state-of-the-art machine learning techniques for sentiment classification and elements of natural language processing for feature construction and evaluate them on benchmark datasets not regarding much the actual application settings. In industry the focus is on developing platforms, services and customized solutions for certain applications and for different domains. In this work we propose a generic framework for ubiquitous sentiment classification. We discuss the Rule-Based Emission Model (RBEM) algorithm that we employ for polarity detection. Results: We show with the experimental results on benchmark datasets and real case studies that the proposed framework and RBEM approach for polarity detection are indeed generic and extendable.
The realtime unstructured data often refers to the information that doesn’t follow the conventional storage of information in a row-column database. Unlike structured data it does not fit into relational databases. It is responsible for the Variety, one of the four V’s of BigData. Sources like satellite images, sensor readings, email messages, social media, web blogs, survey results, audio, videos etc., follow unstructured data. Organizations go beyond “basic” analytics and dive deeper into unstructured data to do things such as predictive analytics, temporal and geospatial visualization, sentiment, and much more. The objective of this paper is to confer model of sentimentanalysis and its various techniques. Future research directions in this field are determined based on opportunities and several open issues in BigDataanalytics.
After this we have two branches one is batch processing and another is for stream processing .In batch processing we are collecting the whole incoming data of the day or this branch is work on daily basis. Means the analyzer or analytics can be done at the end of the day. So we use hadoop for batch processing and hive for managing data over HDFS storage. And in Stream processing we are selecting those data or events which are critical i.e. predefined critical. Strom is used for stream processing of BigData, so whenever predefine critical data or events are arrives then storm will process to that data and generated analysis report within few seconds. This is analysisdata which are generated in last few seconds. So this is fast as compare to batch processing.
BigDataAnalytics: Bigdataanalytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits. The primary goal of bigdataanalytics is to help companies make more informed business decisions by enabling data scientists, predictive modelers and other analytics professionals to analyze large volumes of transaction data, as well as other forms of data that may be untapped by conventional business intelligence (BI) programs. That could include Web server logs and Internet click streamdata, social media content and social network activity reports, text from customer emails and survey responses, mobile-phone call detail records and machine data captured by sensors connected to the Internet of Things (Olson and Delen, 2008).
Abstract— BigData has drawn huge attention from researchers in information sciences, decision makers in governments and enterprises. However, there is a lot of potential and highly useful value hidden in the huge volume of data. Data is the new oil, but unlike oil data can be refined further to create even more value. Therefore, a new scientific paradigm is born as data-intensive scientific discovery, also known as BigData. The growth volume of real-timedata requires new techniques and technologies to discover insight value. In this paper we introduce the BigDatareal-timeanalytics model as a new technique. We discuss and compare several BigData technologies for real-time processing along with various challenges and issues in adapting BigData. Real- timeBigDataanalysis based on cloud computing approach is our future research direction.
Abstract: Bigdata storage and realtimedataanalysis are major challenges for IT researchers. The recent massive increase in data has not been accompanied by adequate storage technology and data processing algorithms. Understanding what people think about an idea, a product, a service or a policy is important for individuals, companies and governments. Sentimentanalysis process can be used to identify opinions expressed in text on certain subjects. The result accuracy has a direct effect on decision making in both business and government. Our focus in this paper is first to identify the critical issues associated with real-timebigdataanalysis and then to develop a new paradigm on Hadoop Ecosystem with real-timestreamdata processing to analyze Arabic tweet sentiment on Twitter. To perform real-timeanalytics, data collection should be performed using Apache Flume in order to move and aggregate all tweets received online (nearreal-time) to pre-defined locations through a channel called Sinks to the Hadoop distributed file system (HDFS). In addition, due to the serious challenges in Arabic text and speech and the high speed with which tweets arrive, we designed a complex sentimentanalysis (SA) module to process each incoming tweet in such a way that no tweets are lost without being analyzed. Also, a sentimentanalysis approach to Arabic text was developed using multiple Hive User Defended Functions (UDF). Finally, to guarantee a varied data collection, we proposed a Java MapReduce program for lexicon-based Arabic sentimentanalysis, which supports n-gram search in the lexicon. Our approach was applied to determining opinions about MERS virus in the Kingdom of Saudi Arabia on Twitter Public Stream API and the results are discussed.
An existing approach based on fuzzy logic has been introduced for opinion mining on large scale twitter data (Bing and Chan, 2014), which was an attempt at mining the meaning of the texts according to the sentiment of the attributes in the text. This method’s performance was also tested in terms of processing time improvement, where the MapReduce framework was used to increase the speed for scanning the texts before the multi-attribute mining. Besides fuzzy logic, a method based on the Hierarchical Dirichlet Process-Latent Dirichlet Allocation (HDP-LDA) was applied for unsupervised aspect identification in the SA. This method also has the ability to automatically determine the number of aspects, distinguish factual words from opinioned words and further effectively extracts the aspect specific sentiment words. The fuzzy logic and LDA approaches have successfully extracted the aspects and meaning, as shown in their experiment results. However, they have been tested on a prepared dataset mainly used for research. In fact, realdata generated on social media contains vast amounts of noise. This indicates the need for a capability to sense and identify useful messages from the online media to be used as input for any strategic marketing manoeuvring.
Lumify is a free and open source tool for bigdata fusion/integration, analytics, and visualization. Its primary features include full-text search, 2D and 3D graph visualizations, automatic layouts, link analysis between graph entities, integration with mapping systems, geospatial analysis, and multimedia analysis, real-time collaboration through a set of projects or workspaces. Datawrapper is an open source platform for data visualization that aids its users to generate simple, precise and embeddable charts very quickly. Its major customers are newsrooms that are spread all over the world. Some of the names include The Times, Fortune, Mother Jones, Bloomberg, Twitter etc.
Data processing is a platform which use for different type of analysis, it works with the input data processing and extracting proper knowledge from it. Twitter data generation having its diversity in various fields and tweets over multiple concept help in utilizing for various decisions . Here the problem associate with the previous knowledge extraction approach and twitter analysis is discussed. In various research work, processing and analysis can be performed on static data set. The existing base paper discussed about the static distribution and They also used statical graph analysis for distance computation. The existing data matching algorithm also not much effective . This research work proposed an efficient framework for processing and analysis the massive amount of complex streamdata in RealTime. This framework covers the realtimedata fetching using storm framework, data processing through NLP, use PSWNSWAP algorithm for proper sentimentanalysis with comparison parameter as computation time as well as computation cost to compute the comparative analysis and use St-QAP distance measure and finding distance optimization. The proposed algorithm St- QAP takes an input brand name and find proposition for it, with efficient results having parameters travel time and travel cost. The data processing technique produces efficient parameter computation with realtime fast and effective process over Zookeeper server.
Abstract:. Various fields like Text Mining, Linguistics, Decision Making and Natural Language Processing together form the basis for Opinion Mining or SentimentAnalysis. People share their feelings, observations and thoughts on social media, which has emerged as a powerful tool for rapidly growing enormous repository of realtime discussions and thoughts shared by people. In this paper, we aim to decipher the current popular opinions or emotions from various sources, hence, contributing to sentimentanalysis domain. Text from social media, blogs and product reviews are classified according to the sentiment they project. We re-examine the traditional processes of sentiment extraction, to incorporate the increase in complexity and number of the data sources and relevant topics, while re-populating the meaning of sentiment. Working across and within numerous streams of social media, expression of sentiment and classification of polarity is re-examined, thereby redefining and enhancing the realm of sentiment. Numerous social media streams are analyzed to build datasets that are topical for each stream and are later polarized according to their sentiment expression. In conclusion, defining a sentiment and developing tools for its analysis in realtime of human idea exchange is the motive.
DOI: 10.4236/jdaip.2018.62004 50 Journal of DataAnalysis and Information Processing messages related to a chosen topic of interest such that topic and sentiment are jointly inferred . There are many works on the topic based sentiment analy- sis where the models are tested on a batch method as listed in the reference Sec- tion. While there are many works in the topic based models for batch processing systems, there are few works in the literature on topic-based models for realtimesentimentanalysis on streaming data. Real-time topic sentimentanalysis is im- perative to meet the strict time and space constraints to efficiently process streaming data . Wang et al. in the paper  developed a system for Real-Time Twitter SentimentAnalysis of the 2012 Presidential Election Cycle using the Twitter firehose with a statistical sentiment model and a Naive Bayes classifier on unigram features. A full suite of analytics were developed for moni- toring the shift in sentiment utilizing expert curated rules and keywords in order to gain an accurate picture of the online political landscape in realtime. Howev- er, these works in the existing literature lacked the complexity of sentimentanalysis processes. Their sentimentanalysis model for their system is based on simple aggregations for statistical summary with a minimum primitive language preprocessing technique.
Real-time scoring — in real-time systems, scoring is triggered by accomplishments at the decision layer (by consumers at a website or by an operational arrangement through an API), and the absolute communications are brokered by the integration layer. In the scoring phase, some real-time systems will use the same data that are used in the data layer, but they will not use the same data. At this phase of the process, the deployed scoring rules are “divorced” from the data in the data layer or data mart. Note as well that at this phase, the limitations of Hadoop become apparent. Hadoop today is not decidedly adapted for real-time scoring, although it can be used for “nearreal-time” applications such as clearing large tables or pre-computing scores. Newer technologies such as Cloudera’s Impala are advised to advance Hadoop’s real-time capabilities.
As such there are many models of SentimentAnalysis that can be adopted on various platforms. Broadly, there are two main classification methods, namely: Lexical based and Machine learning. Many software engineering practices can be used to examine and analyze the machine learning techniques in sentimentanalysis. At the same time, it is important to have the right method of programming practice while developing a sustainable software  Machine learning techniques typically depend on regulated characterization approaches, where the emotion is classified under two heads. (i.e., positive or negative). This methodology requires labelled information to prepare the classifiers . There are 3 basic algorithms followed by the machine learning method: Naive Bayes classification, maximum entropy classification, and support vector machines . In contrast to this, we have the lexical- based techniques that utilizes a predefined rundown of words, where each word is related with an explicit emotion. The lexical techniques fluctuate as per the data set for which they were made . It also involves understanding the connection between the sentiment expressed and the document in question by calculating the semantics of the words in the data set .
Given the size of text documents, feature selection is an important step in text mining due to high dimensionality and data sparsity. A data collection contains many terms, but only a small number of these normally occur in any individual document. Several sophisticated local and global methods exist for reducing document dimensionality. Local methods remove unimportant or non- informative words, while global methods apply a global dimension reduction to transform all documents identically. Popular local methods include: stemming, which reduces words to their stem; stop word removal, which removes non-informative words; and synonym lists, which identify and reduce synonyms to a common word. Global methods include latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and nonnegative matrix factorization (NMF) that characterize documents in terms of concepts, sets of terms that represent a more complex idea discussed in a document. Finally, several techniques are available to derive information from text, such as classification, clustering, and summarization [ Kha10 ] .
In the supervised learning approach of machine learning ,pseudo codes are trained using descriptive examples, as an input in which the desired output is already known. It is basically used in applications where historical data is used to predict forthcoming data. In the unsupervised learning approach of machine learning , there is no historical data.The objective is to investigate the data and to make some useful information within it .
The Naïve Bayes Classifier is a supervised learning model which makes use of statistical method for classification. Since it’s a probabilistic model, it allows to capture the uncertainties about the model by calculating probabilities . The word ‘naïve’ means something which is simple, newbie and unaffected. So, this algorithm does the classification among classes which follow same kind of naïve features for whatever is the data set. Naïve Bayes algorithm works on Bayes theorem of conditional probability. Conditional probability is where happening of one event is conditional over another event. It gives the probability of an event based upon prior information of events that might be related to the current event. It is useful learning algorithm for observed data and past knowledge if existed. As this algorithm performs with independent features, works very fast & efficient for large data sets, includes noise and considers all possible cases; this is used for Twitter SentimentAnalysis and to classify tweets among all possible classes viz. Positive, Negative & Neutral. Its main use is in text classification and problems with multiple classes.
The term BigData has been in use since the 1990s. In 2012 Gartner update his previ- ous definition regarding BigData and defines it as follows: “BigData is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innova- tive forms of information processing that enable enhanced insight, decision-making, and process automation”. BigData is referred to the growing digital data that are difficult to manage and analyze using traditional software tools and technologies. BigData often has a large number of samples, a large number of class labels and very high dimension- ality (attributes). The target size of the BigData moving continually in 2012 was rang- ing around a few dozen terabytes to many petabytes of data. There are four attributes including volume, variety, velocity, and veracity that define BigData  Obviously, data volume is the primary attribute of BigData. By increasing the volume of the BigData, the complexity, and the underneath relationships of data increased as well. Raw data in a BigData system is unsupervised and diverse although it can consist a small quantity of supervised data. Many social media companies including Facebook, Twitter, StockTwits, LinkedIn have a large amount of data. As data become bigger Deep Learning approach become more important to provide BigDataanalysis.