International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 7, Issue 11, November 2017)
107
Big Data Analytics to Predict Customer Behavior: A Review
Poonam G. Sawant
1, Dr. B. L. Desai
21
Research Scholar, Savitribai Phule Pune University, Maharashtra, India
2
Research Guide, Manager, L&D Big Data Academy, Capgemini, Pune, Maharashtra, India
Abstract - Till date many sophisticated predictive analytics have been implemented to predict customer behavior that cannot be observed directly, but still not succeeded. Variety of data comes in high speed which is referred as „big data‟ exceeds the capacity of traditional data management tools and technologies. Big Data Analytics is a powerful way of extracting meaningful and valuable information that otherwise would be difficult to analyze and understand the behavior. In this paper, we tried to understand the traditional methods vs. Big Data technologies to predict customer behavior. This literature review reveals that, Big Data Analytic has been an emerging area of research and development in the current decade. The literature review is made by referring a variety of research papers based on their originality in order to fulfill the purpose of our research.
Keywords -Customer behavior analysis, Traditional vs. Big data, Data Mining Techniques and algorithms, Big Data analytics.
I. INTRODUCTION
Analysis of customer behavior is a powerful technique to maintain relationship with customers. It is based on customer buying behavior. Performing prediction of customer behavior is an essential but challenging task for many business organizations.Many systems have been implemented to analyze customer behavior, but yet more advancement is required to discover a market with high potential for better development as data is growing enormously which terms as Big Data. Big Data is the term for a collection of large and complex data sets that it becomes difficult to process using traditional data management tools or data analytics. Big data brings together a large amount of structured, semi structured and unstructured data with a mixture of data types and huge size. Storage, Search, Curation, Distribution, Transfer, Analysis and Visualization are the major challenges in Big Data Processing which could not be possible using traditional methods. Big data analytics is one of the most rising technology trends that have the capability to process Big Data and explore the insights. It can enable organizations to connect with customers through multiple channels by harnessing the massive volumes of new data available today. It provides organizations with greater opportunities by exposing customer’s hidden behavioral patterns and helps to bridge gap between what customers want to do and what they actually do. This information is useful to make business decisions and improve services to increase operational efficiency and create new product or markets. [1][2]
II. FORECASTING CUSTOMER BEHAVIOR
As customers, are moving from traditional to personalized solutions retaining them and attracting new customers are bigger challenges in the business world. The traditional techniques and commoditized products may result in getting new customers, but research has shown that a personalized solution is a key to retain loyal and profitable customers. Furthermore, high levels of customer satisfaction result in a powerful competitive advantage. [3]Understanding the needs of each customer from structure data as well as unstructured information from social media and customer activities can help understanding people’s individual needs and the context of what’s driving them. [4]
In a research community, a lot of work has been done to verdict on customer behavior. A customer called as consumer is any individual who purchases services for a personal use. A customer behavior is the study of individuals or groups and the methods they use to buy products or services.[5] Customer behavior is a branch which deals with the various stages a customer goes through before purchasing any products or services like need, information gathering, and another alternative and post purchase analysis.
Many factors influence buying choice of a customer ranging from psychological, economic, cultural trends as well as social and societal environment. Identification of the influenced factors can be made things favorable and goal of customer satisfaction can be achieved. For successful customer-oriented services, organizations should take efforts to know their customers by understanding what they are demanding and what we are giving. With the requirements and ways of thinking of their targeted customers they can provide more personalized experience to retain them and increase sales to maximize profit. By understanding the customer behavior, many organizations have an opportunity to develop an efficient marketing strategy and effective advertising campaigns to attract and acquire them. [6]
III. TRADITIONAL DATA ANALYTICS
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 7, Issue 11, November 2017)
108
They use proper statistical and machine learning methods to analyze huge structure data to discover knowledge. The most popular data analytics systems are RDBMS and Data mining in this category.A.Relational Data Base Management Systems
In 1970 Database Management System (DBMS) was constructed using two approaches mainly hierarchical data model for storing enormous data generated by Apollo space program and then network data model to create a standard database and resolve some of the difficulties of hierarchical model such as inability to represent complex relationships. But both models had disadvantages.
1.For answering even simple query complex programs had to be written.
2.
Minimal data independency.In early 1980’s Relational Database Management System (RDBMS) was developed for commercial use but unable to handle increasingly complex data. So later two new data models had emerged, the Object Relational Database Management System(ORDBMS) and Object-Oriented Database Management System(OODBMS) to implement the relational and object data models respectively to represent the third generation of Database Management System .[7]
As the volume of data keeps growing, the types of data generated by applications become richer than before. As a result, traditional relational databases are challenged to capture, store, search, transfer, analyze and visualize variety of bulky data.They focus on resolving the complexity of relationships among schema-enabled small amount of data only. [8]
B.Data Mining Techniques and Algorithms
The traditional database software based on SQL and the customer behavior analysis conducted by utilizing time series analysis and other techniques are not sufficient to discovering increased information. Data mining technology is increasingly being utilized for customer behavior analysis. Data mining is often defining as finding hidden information in the database. Data mining technologies and techniques for recognizing and tracking pattern with in data helps business to know their customers for reinforce and redefine customer relationship.[9] In recent years many researchers have used data mining techniques for analysing customer behaviour. Some we are listing here.
Authors H. Victor, O. Abimbola, O. Mercy, O. Esther, I Eloho in their publication “Customer behavior analytics and data mining”compared OLAP versus Data mining techniques.[11]
By the study OLAP enables users to easily extract and view data from different point of view and give summary but data mining techniques are better than it, which gives insights and details about the behaviour of individual customer. Data mining techniques analyzes relationships and patterns from transaction data based on open-ended user queries. They have analyzed customer behavior using three Data mining techniques namely Association Rule Mining, Rule Induction Technique and Apriori Algorithm which perform well.
Chris Rygielski, Jyun-Cheng Wang , David C. Yen in their research paper entitled “Data mining techniques for customer relationship management” have useddata mining techniques like Chisquare Automatic Interaction Detection (CHAID), classification, regression and Neural Networks to find out valuable customers.[12]
Arne Mauser, IljaBezrukov, Thomas Deselaers, Daniel KeysersLehrstuhl fur Informatik in their research paper titled “Predicting Customer Behavior using Naive Bayes and Maximum Entropy “tested the combinations of data mining algorithms on the data provided by a German mail-order company which was then split into a test set and a training set. They have tested four algorithms with two combinations, first one is A Maximum Entropy with Naive Bayesclassifiers and second combination is Logistic Regression with Neural Networks and Maximum Entropy which obtained the 1st and 3rd rank inData Mining Cup 2004 respectively. [13]
P.Isakki and S.P.Rajagopalan in their research paper titled “Analysis of Customer Behavior using Clustering and Association Rules” tried clustering and association techniques to predict customer behaviour. They have grouped the customers with similar purchasing behaviour using clustering techniques. Then for each cluster, an association rules are used to identify the products which frequently bought together by the customers. They have used Apriori algorithms to find the best association rules from small data set. [14]
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 7, Issue 11, November 2017)
109
According to IBM reports” Predictive modeling techniques” around 90 percent of data is available today which contains high insights useful for powerful business decisions and many predictive analytics techniques, including Predictive Model, Neural networks (NNs), clustering, support vector machines (SVMs), and association rules, presents to convert this data into real value. All these techniques identify hidden patterns from large data set of past data and resulting into a predictive model. Once predictive model is validated it is applied to a current situation to predict future. [10] From the survey the broadly used approaches and techniques for customer behavior predictions are as follows.1)Regression Algorithms: Estimating the relationships among dependent and independent variables. It predicts continues variables based on other variables in the data set.
2)Association Algorithms: if/then statements that helps to uncover hidden relationships between unconnected data in a relational database or other information repository.
3)Classification Algorithms: assigning items in a collection to target categories or classes.
4)Clustering Algorithms: Partitioning a given data set into homogeneous groups based on given features such that one group contains similar values whereas dissimilar objects are kept in different groups.
5)Artificial Neural Network Algorithms: Neural networks are computational structures of an interconnected processing elements or nodes arranged on a multilayered hierarchical architecture.
It has been observed that, researchers have applied these techniques and algorithms on tiny as well as huge amount of structured data to analyze customer behavior and not tried on large set of variety of data.Big data analytics is a technique developed to handle a special kind of data, thus many traditional data analysis methods may not be utilized for big data.
IV. BIG DATA ANALYTICS
No doubt predictive analytics using data mining techniques are efficient any time to analyze small amount of structured data only. Due to the digitization customers’ share their reviews and leave comments on various social networking sites. This generates enormous amount of data in variety and large volume. Out of this data 10% data comes in structured format which can be analyzed using data mining techniques very effectively. Remaining 90% data comes in unstructured format but gives valuable insights for business and better live. Due to the volume, velocity, variety and veracity it is referred as big data and cannot handled by traditional tools.
This data is not only priceless for banking industries but also helpful to take an effective business decisions by predicting customer’s future behavior. Such values can be extracted using big data analytics. It is the process of analyzing big data to uncover hidden patterns, unknown correlations, current market trends, customer likings and other useful business information which is the application of advanced analytics techniques on big data.[16] [17]
A.Characteristics of Big Data
Big Data is nothing but a variety of a massive and complex data that becomes very tedious to capture, store, process, retrieve and analyze it with the help of on-hand database management tools or traditional data processing techniques. According to Gartner Big Data is defined in terms of Vs as shown in following figure.
Fig1: Characteristics of Big Data
(Source: http://www.dataintensity.com/characteristics-of-big-data-part-one/)
Volume is a Quantity of data from TB to ZB. Following Table represents the increasing units of data.
Sr.No. Data Size Size
1 Bit(b) 1 or 0
2 Bite(B) 8 bits
3 Kilobyte (KB) 1000 or 210 bytes
4 Megabyte(MB) 1000KB or 220 bytes
5 Gigabyte(GB) 1000MB or 230 bytes
6 Terabyte(TB) 1000GB or 240 bytes
7 Petabyte(PB) 1000TB or 250 bytes
8 Exabyte(EB) 1000PB or 260 bytes
9 Zettabyte(ZB) 1000EB or 270 bytes
10 Yottabyte(YB) 1000ZB or 280 bytes
Fig2: Units of Data
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 7, Issue 11, November 2017)
110
Velocity refers data comes at high speed or how often it is created and variety refers data available in various formats like structured data from enterprise systems, Semi structure data in XML files and unstructured data like text files, Docs files, PDF, email body, images, video, audio etc. Now a day’s two more Vs are added veracity tells the uncertainty of data available and Value shows extracted result.[18] [19]B.Big Data Technologies and tools
Due to the flood of data in variety and high speed there has become a need for developing faster and more efficient ways of analyzing data as piles of data without processing is not enough for decision making at right time. Such data sets cannot be easily managed by traditional data processing applications. Storing, curating and analyzing such data are a very tough problem today. Therefore new analytics and tools are required to be designed and implemented. The Apache Hadoop is one of the efficient technologies for handling such kind of data.
It is an open source framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. This framework is so designed in such way that it can be scaled from a single server to thousands of machines. All these machines offer local computation and storage. Similarly NoSQL databases which typically stores key-value stores are non-relational, distributed, horizontally scalable, and schema-free are used to handle big data. Along with these technologies many traditional data analysis techniques like parallel computing, Hashing, Bloom Filter, Index, Trie tree etc can be applied on big data. [21]
V. FUTURE WORK
Till date several researchers have used relational Database Management systems, which handles only schema related data. Many have implemented Data mining techniques to predict customer behavior. The broadly used data mining approaches and techniques are Regression Algorithms, Association Algorithms, Classification Algorithms, Clustering Algorithms and Artificial Neural Network Algorithms. But it has been observed that, these techniques are used to develop predictive analytics for small amount of structured data. For big data these techniques havesome limited scope. To overcome this problem effective solution is Big Data Analytics.Based on the study it is found that a lot of research work is needed to develop techniques for predicting customer behavior. So, our future work is to develop a predictive model for customer behavior for future action plan using big data technologies.
VI. CONCLUSION
Due to the rising data sources, lots of variety data is available for business world today which can provide valuable insights about their customers for taking proper decisions at right time and provide high-quality personalized experience to their customers. It would be possible by understanding customer behavior but traditional analytical tools are not able to handle and process this data. Big data analytics is the only way to do this. Researchers have suggested various big data technologies to develop big data analytics strategy.
REFERENCES
[1] Mining Big Data: Current Status, and Forecast to the Future (SIGKDD Explorations, Volume 14, Issue 2 by Wei Fan Huawei Noah’s Ark Laband Albert Bifet Yahoo! Research Barcelona, 2013]
[2] Hadoop Analytics for Financial Services
[http://www.clouderaworkinnovations.com/hadoop-analytics-for-financial-services.html]
[3] It’s Time For Personalization in Financial Services By Matthew
Lifshotz, Director of Global Business
Developmenthttps://thefinancialbrand.com/37391/bank-personalization-product-development/
[4] “Personalization: The secret ingredient to customer engagement
for banks” May 6,
2016http://www.the-future-of-commerce.com/2016/05/06/customer-engagement-banks/ [5] Consumer Buying Behaviour – A Literature Review Abdul
Brosekhan, Dr. C. Muthu Velayutham, IOSR Journal of Business and Management (IOSR-JBM) e-ISSN : 2278-487X, p-ISSN : 2319-7668, PP 08-16 www.iosrjournals.org
[6] Factors Influencing Consumer Behavior by PinkiRani, UniversityKurkshetr, India, Excellent Publishersa, Int. J. Curr. Res.Aca. Rev. 2014;2(9);52-61). 2014
[7] Performing Customer Behaviour Analysis using Big Data Analytics by AninditaAKhade, 7th International Conference on
Communication, Computingand Virtualization 2016,
ScienceDirect, procedia Computer Science 79(2016)986-992 [8] Data Modeling for Big Data by Jinbao Zhu, Principal Software
Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies
[9] Analyzing Customer Behavior Using Data Mining Techniques: Optimizing Relationships With Customer Aditya Kumar Gupta* &Chakit Gupta**
[10] Predictive modeling techniques by IBM.
[11] Customerbehaviour analytics and data mining HaastrupAdeleye Victor, OladosuOlakunle Abimbola, OkikiolaFolasade Mercy, OladiboyeOlasunkanmi Esther, Ishola Patience Eloho
[12] Data mining techniques for customer relationship management by Chris Rygielski, Jyun-Cheng Wang , David C. Yen
[13] Predicting Customer Behavior using Naive Bayes and Maximum Entropy – Winning the Data-Mining-Cup 2004 – Arne Mauser,
IljaBezrukov, Thomas Deselaers, Daniel
KeysersLehrstuhlf¨urInformatik VI, Computer Science Department RWTH Aachen University, D-52056 Aachen, Germany
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 7, Issue 11, November 2017)
111
[14] Analysis of Customer Behavior using Clustering and Association
Rules P.Isakki alias Devi, Research Scholar,
VelsUniversity,Chennai – 117, Tamilnadu, India.
S.P.Rajagopalan Professor of Computer Science & Engineering, GKM College of Engineering & Technology, Chennai-63, Tamilnadu, India.
[15] Customer Behavior Prediction using Artificial Neural Network [16] Zeroprotraining.com
[17] Big data has big implications for knowledge management, by Judith Lamont April 2012 [Volume 21, Issue 4], KM World.
[18] Mining Big Data: Current Status, and Forecast to the Future by Wei Fan Huawei (Huawei Noah’s Ark Lab , Hong Kong ) and Albert Bifet (Yahoo! Research Barcelona, Spain) -2013.: [19] Big Data Analytics: A Literature Review Paper Nada Elgendy and
Ahmed Elragal Department of Business Informatics & Operations, German University in Cairo (GUC), Cairo, Egypt P. Perner (Ed.): ICDM 2014, LNAI 8557, pp. 214–227, 2014. © Springer International Publishing Switzerland 2014
[20] Hadoop the definitive guide by Tom White, O’REILLY. [21] Intelligent Techniques for data science by Rajendra Akerkar.Priti