• No results found

Deep Explore in Big Data Analytics for Business Intelligence

N/A
N/A
Protected

Academic year: 2021

Share "Deep Explore in Big Data Analytics for Business Intelligence"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Deep Explore in Big Data Analytics for

Business Intelligence

Ms.Divya.P *

Assistant Professor CKCET Cuddalore, INDIA E-Mail:divi.diviner@gmail.com

Mr.Murugan.R

Assistant Professor CKCET Cuddalore, INDIA E-Mail: csehod@ckcet.com

Ms.Suriya.R

Assistant Professor CKCET Cuddalore, INDIA E-Mail:suriyacse90@gmail.com Abstract: Big data analytics refers to the process of collecting, organizing and analysing large sets of data to discover patterns and other useful information. Big data analytics will help organizations to better understand the information contained within the data and will also help identify the data that is most important to the business and future business decisions. It has been embraced as a disruptive technology that will reshape business intelligence, which is a domain that relies on data analytics to gain business insights for better decision-making.The landscape of big data analytics is investigated through the lens of marketing mix framework which identify the data sources, methods and applications related to marketing perspectives namely people, product, and promotion that lay foundation for marketing intelligence. Along with this analytical methods are been discussed according to the data being collected for efficient data processing.

Keywords: Big data, big data analytics, Business intelligence, marketing mix, marketing intelligence.

1.0 INTRODUCTION

he Recent technological revolutions such as social media enable us to generate data much faster than ever before. The notion of big data and its application in business intelligence have attracted enormous attention in recent years because of its great potential in generating business impacts [12]. “Big Data” is defined as “the amount of data just beyond technology's capability to store, manage and process efficiently” .Big data can be characterized along three important dimensions, namely volume, velocity, and variety.

In marketing intelligence, which emphasizes the marketing-related aspects of business intelligence, data relevant to a company's markets is collected and processed into insights that support decision-making .Marketing intelligence has traditionally relied on market surveys to understand consumer behaviour and improve product design. For example, companies use consumer satisfaction surveys to study customer attitudes. With big data analytic technologies, key factors for strategic marketing decisions, such as customer opinions toward a product can be monitored using social media data. However, while accessibility to big data creates unprecedented opportunities for marketing intelligence, it also brings challenges to practitioners and researchers. Big data analytics is mainly concerned with three types of challenges: storage, management, and processing [21]. For

typical marketing intelligence tasks such as customer opinion mining, companies nowadays have many different ways (social media data, transactional data, survey data, sensor network data, etc.) to collect data from a variety of information sources. Based on the characteristics of collected data, different methods can be applied to discover marketing intelligence. Analysis models developed based on a single data source may only provide limited insights, leading to potentially biased business decisions. On the other hand, integrating heterogeneous information from different sources provides a holistic view of the domain and generates more accurate marketing intelligence. Unfortunately, integrating big data from multiple sources to generate marketing intelligence is not a trivial task. This prompts exploration of new methods, applications, and frameworks for effective big data management in the context of marketing intelligence. Identifying and classifying analytical processing requirements for the entire set of data elements at play is a critical requirement in the design of the next-generation data warehouse platform.

We investigate different perspectives of marketing intelligence and propose a framework to manage big data in this context. We first identify popular data sources for marketing intelligence perspectives. Then, we summarize the methods that are suitable for different data sources and marketing perspectives. We give examples of applications in different

T

(2)

perspectives. The proposed framework provides guidelines for companies to select appropriate data sources and methods for managing vital marketing intelligence to meet their strategic goals. Thus in the following contents of the paper discuss with the perspectives of marketing intelligence and methods that are suitable to different data sources.

2.0 LITERATURE REVIEW

Big data can be analyzed with the software tools commonly used as part of advanced analytics disciplines such as predictive analytics, data mining, text analytics and statistical analysis. Mainstream BI software and data visualization tools can also play a role in the analysis process. But the semi-structured and unstructured data may not fit well in traditional data warehouses based on relational databases. Furthermore, data warehouses may not be able to handle the processing demands posed by sets of big data that need to be updated frequently or even continually for example, real-time data on the performance of mobile applications. As a result, many organizations looking to collect, process and analyze big data have turned to a newer class of technologies that includes Hadoop and related tools such as YARN, Map Reduce, Spark, Hive and Pig as well as NoSQL databases[11]. Those technologies form the core of an open source software framework that supports the processing of large and diverse data sets across clustered systems.

In some cases, Hadoop clusters and NoSQL systems are being used as landing pads and staging areas for data before it gets loaded into a data warehouse for analysis, often in a summarized form that is more conducive to relational structures [17]. Increasingly though, big data vendors are pushing the concept of a Hadoop data lake that serves as the central repository for an organization's incoming streams of raw data. In such architectures, subsets of the data can then be filtered for analysis in data warehouses and analytical databases, or it can be analyzed directly in Hadoop using batch query tools, stream processing software and SQL on Hadoop technologies that run interactive, ad hoc queries written in SQL.

Potential pitfalls that can trip up organizations on big data analytics initiatives include a lack of internal analytics skills and the high cost of hiring experienced analytics professionals. The amount of information that's typically involved, and its variety, can also cause data management headaches, including data

quality and consistency issues. In addition, integrating Hadoop systems and data warehouses can be a challenge, although various vendors now offer software connectors between Hadoop and relational databases, as well as other data integration tools with big data capabilities.

3.0 ANALYTICS METHODS

Big data and its analysis can consist of multiple very different methods of analysis. Some of the methods have been utilized already for decades in business, however, through big data technology they have become more accurate and analysis is able to consider a higher variety of parameters. The different analysis methods basically focus on different information that can be found within the data. The same data may be used as basis for all different analysis methods; however the results of the analysis may show a totally different picture, even though the same data was used. Descriptive Analytics: Descriptive analytics is the use of historic data to describe mostly past events and de-scribe a given situation. In a business case, this may be the analysis of past marketing events, campaigns or demographics of a population. The major purpose is to understand and built a clearer overview of past events Diagnostic Analytics: Diagnostic analytics help to discover correlations, mostly by the use of visual tools. These correlations may trigger further investigation. As mentioned before, the results are mostly presented visually and do not aim to give a too accurate picture of a given situation. The major purpose is to unveil hidden correlations that may affect e.g. a market. This may be interesting in marketing, where diagnostic analytics can help to identify micro markets or certain patterns in customer behavior. Predictive Analytics/ Data Analytics: Predictive analytics enjoys most of the attention in the field of big data analytics. The curiosity possibly derives from the fact that predictive analytics has the possibility to predict a future scenario with a very high accuracy. Predictive analytics are often used to predict sales figures and forecasts of different kind Prescriptive Analytics: Prescriptive analysis is automated analysis of data that creates predictions and suggest decision options. This sort of analytics can mostly be found in industrial operations or inventory management. Prescriptive analytics follows clear schemes and offers a few decision options. This sort of analysis has been used for a long time already; however, with increasing amounts of data the offered options by the analysis are more comprehensive and able to consider a greater amount of parameters, thus making the information more accurate.

(3)

4.0 BIG DATA MANAGEMENT FRAMEWORK

The marketing mix framework is a well-known framework that identifies the principal components of marketing decisions, and it has dominated marketing thought, research, and practice. Initially it consist of 12 elements later it is regrouped to five elements namely product, people, price, promotion, place. In this paper, we propose a marketing mix framework to manage big data for marketing intelligence. This model classifies the research in marketing intelligence into five perspectives according to the marketing mix framework. Further, we identify common data, methods, and applications in each perspective and highlight the dominating big data characteristic with respect to each perspective. This framework provides guidelines for marketing decision-making based on big data analytics. Fig. 1 is an overview of the proposed big data management framework for marketing intelligence. First, data from various sources are retrieved and utilized to generate vital marketing intelligence. Second, a variety of analytics methods are applied to convert raw big data to actionable marketing knowledge (intelligence). Finally, both data and methods are combined to support marketing applications with respect to each perspective of the marketing mix model.

Fig.1 Big data management framework

4.1 Application

4.1.1. Customer segmentation and customer profiling

For effective marketing, it is essential to identify a specific group of customers who share similar preferences and respond to a specific marketing signal. Customer segmentation applications can help identify different communities (segments) of customers who may share similar interests. Clustering customer

groups with respect to lifecycle characteristics. Usually, various clustering and classification techniques are applied to customer segmentation and user profiling. However, customer segmentation is becoming increasingly challenging under a big data environment. The volume of call data is huge (e.g., the communication time between each pair of customers on each day), and a variety of data should be taken into account In fact, for the most fine-grained targeted marketing ,we are not talking about identifying groups of similar customers, but the “profiling” of each individual customer such that the most suitable products/services are marketed to the most appropriate individual given a steam of customer service consumption data generated in real-time [1]. Place is also an important dimension in marketing analysis. Research on place-based marketing focuses on the impact of places on marketing strategies. For example, researchers used a survey to collect customer data and study different levels of place-based marketing in the form of region of origin strategies used by wineries in their branding efforts [8].

With the widespread use of mobile technology, location-based services (LBS) can provide users personalized information in a specific location at a specific time. Location-based advertising has been proposed as an efficient marketing strategy [13]. Location is one of the most important solutions to meet consumers' need and it is a valuable source for personalized marketing information. In location-based advertising, customers can get timely advertisements or product recommendations based on their current position or predicted future position. Location-based advertising provides a new tool for companies to attract more customers and enhance brand value. One challenge for location-based advertising is how to accurately predict customers' locations. Both spatial and temporal data should be taken into consideration (temporal moving pattern mining for location-based service). We need to process a large volume of spatial and temporal data within a short time period before customers move to new locations. Thus, the “velocity” issue of big data is also one of the most challenging aspects for location-based advertising.

Researchers explored the log data in location-based social networks to uncover user profiles; these automatically discovered user profiles have the potential to be subsequently applied to location-based targeted marketing . Regression and classification methods are often utilized for location-based marketing applications. In another study, the GPS traces of individuals to uncover the location-based

(4)

dynamics of different communities. Through analyzing the dynamics of local communities, it is possible to predict their changing product/service preferences. As a result, effective marketing strategies can be developed with respect to both the place and time dynamics of a group of customers. Nevertheless, this type of application needs to deal with both the “variety” and “velocity” issues of big data. For instance, both the relational data among users in location-based social networks and GPS signals need to be analyzed to uncover the location-based dynamics of a local community. In addition, since individuals may constantly move around different places, location-based marketing applications must be able to respond quickly in order to maintain the location sensitivity with respect to the constantly moving customers.

4.1.2. Product ontology and product

reputation management

To alleviate the shortcoming of retrieving limited product reputation via survey data. An automatic framework to monitor the reputation of a variety of products by mining Web contents. Clustering and association mining techniques are among the most common methods employed to support reputation management applications. More recently a reputation management method which not only mines text-based reputation data from the Web but also considers the graphical images of products posted to the Web. Nevertheless, by the time of this writing, twenty billion images have been uploaded to Instagram.1 Given such an extraordinary size of images archived online, it is extremely challenging to analyse the sheer volume of images for product reputation management, not to mention the variety of formats of source data (e.g., text versus images). To carry out an automatic analysis of the textual comments posted to the Web for product reputation management, it is essential to develop a rich computer-based representation of product information for subsequent product reputation analysis. Recently, an automated product ontology mining method that is underpinned by latent topic modeling has been explored to build product ontologies based on textual descriptions of products extracted from online social media [24]. The automatically constructed product ontologies can be used as the basis to support product reputation management applications and other marketing intelligence applications. However, given the computational complexities involved in automated product ontology extraction from online social media, new computational methods must be developed to cope with the volume,

velocity, and variety issues of big social media data.

4.1.3. Promotional marketing analysis and recommender systems

In the increasingly competitive business environment, billions of dollars are spent on promotions each year. Thus, promotional marketing analysis has attracted a lot of attention from practitioners and researchers. Effective promotional strategies are one of the key success factors for companies to increase their sales and revenue [4]. Promotional data usually includes information about promotion types (price cut or coupons), promotion time, and purchase records during the promotional period. Early work related to promotional marketing analysis mostly focused on analyzing how different types of customers respond to different promotional strategies or how different categories of products affect the effectiveness of promotional strategies [33]. Most existing work uses regression methods to study promotions in different contexts [4].

In the big data environment, more log data becomes/is available for promotion analysis. A recent work studied WOM derived from both customer reviews and promotions. The authors found a substitute relationship between the WOM volume and coupon offerings, but a complementary relationship between WOM volume and keyword advertising. Promotional marketing analysis can also include factors from other perspectives, such as price and place. For example, enabled by mobile technologies and location-based services, companies can use customers' location information to improve their promotion strategy and select targeted customers.

To improve product awareness and promote products to potential customers, recommender systems have been widely used in the e-commerce context [15]. User rating-based collaborative filtering methods or content-based association mining methods are commonly applied to develop recommender systems. However, existing methods may not scale up to big data. For instance, given N user ratings, the general computational complexity of a collaborative filtering method is N2 [9]. Therefore, it is quite challenging to scale up existing recommender systems to cope with big data (e.g., N = t e n s o f m i l l i o n s ) and generate appropriate recommendations to potential customers in real-time as expected in e-commerce settings. This is the reason why “velocity” is one of the most challenging issues for the “promotion” perspective in the context of

(5)

marketing intelligence. There has been much research on what pricing strategies managers should follow under various situations. Traditionally, empirical research on pricing strategies uses survey data and regression methods. For example, researchers used a national mail survey to study the determinants of pricing strategies [32]. They found different pricing strategies are preferred under different marketing situations. The growth of e-commerce has made price information available on websites and researchers started using log data to study pricing strategy in e-commerce websites. For example, a recent study uses a method to estimate demand levels from sales rank and derive demand elasticity, variable costs, and the optimality of pricing choices directly from publicly available e-commerce data [17]. Based on the data derived from various log data sources, they can study the optimality of price discrimination. While regression methods are widely used for price prediction applications, association mining methods are applied to competitor analysis applications. An automated competitor analysis application does not simply identify the potential competitors of a company; it also effectively discovers the potentially competitive products and the product contexts [3]. This type of application has proven useful to facilitate the “price” aspect of the marketing mix model. However, the sheer volume of product pricing information on the Web has also posed new challenges to scale up existing applications with big data.

5.0 CONCLUSION AND FUTURE WORK

The proposed marketing mix framework for guiding research in big data management for marketing intelligence helps to achieve maximum efficient data processing. Here it is used to identify the data sources, methods, and applications in different marketing perspectives based on the analytical methods. Experienced users say it's crucial to evaluate the potential business value that big data tools offer and to keep long-term objectives in mind as you move forward. This work helps in developing the business requirements and technical specifications, as well as assessing the available options.

We further discuss the challenging issues related to big data management in the context of various marketing perspectives. Based on the framework, we highlight future research directions in big data management. Study how to select appropriate data sources for particular goals. The amount of available data is increasing. Current techniques do not allow us to process all data available in a timely manner.

Thus, data selection is a critical decision for managing marketing intelligence. How to select data that can provide the most value to business decision-making requires future research on the alignment between data and marketing intelligence goals. Investigate how to deal with the heterogeneity among data sources. For example, both customer reviews and social media data can be used to study customer opinions toward a company or a product. However, data collection and analysis methods may be different due to different structure, quality, granularity, and objective. Further, survey data and log data can also be used to study the same marketing problem. How to conduct surveys in social media and confirm the survey result with log data in social media will become an important topic in e-commerce research and applications.t

References

[1] A. Abouzeid, K. Bajda-Pawlikowski, D.J. Abadi, A. Rasin, and A. Silberschatz, “HadoopDB: An Architectural Hybrid of MapRe-duce and DBMS Technologies for

Analytical Workloads,” Proc. VLDB

Endowment, vol. 2, no. 1, pp. 922-933, 2009.

[2] C. Batini, M. Lenzerini, and S. Navathe, “A Comparative Analysis of Methodologies for Database Schema Integration,” ACM Com-putting Surveys, vol. 18, no. 4, pp. 323-364, 1986.

[3] D. Bermbach and S. Tai, “Eventual Consistency: How Soon is Eventual? An Evaluation of Amazon s3’s Consistency

Behavior,” in Proc. 6th Workshop

Middleware Serv. Oriented Computer. (MW4SOC ’11), pp. 1:1-1:6, NY, USA, 2011. [4] B. Cooper, A. Silberstein, E. Tam, R.

Ramakrishnan, and R. Sears,

“Benchmarking Cloud Serving Systems with YCSB,” Proc. First ACM Symp. Cloud Computing, pp. 143-154, 2010.

[5] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W.

Vogels, “Dynamo: Amazon’s Highly

Available Key-Value Store,” Proc. 21st ACM SIGOPS Symp. Operating Systems Princi-ples (SOSP ’07), pp. 205-220, 2007. [6] J. Dittrich, J. Quian_e-Ruiz, A. Jindal, Y.

Kargin, V. Setty, and J. Schad, “Hadoop++: Making a Yellow Elephant Run Like a Chee-tah (without it Even Noticing),” Proc. VLDB Endowment, vol. 3, no. 1/2, pp. 515-529,

(6)

2010.

[7] H. Garcia-Molina and W.J. Labio, “Efficient Snapshot Differential Algorithms for Data Warehousing,” technical report, Stanford Univ., 1996.

[8] Google Inc., “Cloud Computing-What is its Potential Value for Your Company?” White Paper, 2013.

[9] R. Huebsch, J.M. Hellerstein, N. Lanham, B.T. Loo, S. Shenker, and I. Stoica, “Querying the Internet with PIER,” Proc. 29th Int’l Conf. Very Large Data Bases, pp. 321-332, 2003.

[10] H.V. Jagadish, B.C. Ooi, K.-L. Tan, Q.H. Vu, and R. Zhang, “Speeding up Search in Peer-to-Peer Networks with a Multi-Way Tree Structure,” Proc. ACM SIGMOD Int’l Conf. Management of Data, 2006.

[11] H.V. Jagadish, B.C. Ooi, K.-L. Tan, C. Yu, and R. Zhang, “iDistance: An Adaptive B+-Tree Based Indexing Method for Nearest Neighbor Search,” ACM Trans. Database Systems, vol. 30, pp. 364-397, June 2005. [12] H.V. Jagadish, B.C. Ooi, and Q.H. Vu,

“BATON: A Balanced Tree Structure for Peer-to-Peer Networks,” Proc. 31st Int’l Conf. Very Large Data Bases (VLDB ’05), pp. 661-672, 2005.

[13] A. Lakshman and P. Malik, “Cassandra: Structured Storage Sys-tem on a P2P Network,” Proc. 28th ACM Symp. Principles of Distributed Computing (PODC ’09), p. 5, 2009.

[14] W.S. Ng, B.C. Ooi, K.-L. Tan, and A. Zhou, “PeerDB: A P2P-Based System for Distributed Data Sharing,” Proc. 19th Int’l Conf. Data Eng., pp. 633-644, 2003.

[15] Oracle Inc., “Achieving the Cloud Computing Vision,” White Paper, 2010.

[16] V. Poosala and Y.E. Ioannidis, “Selectivity Estimation without the Attribute Value Independence Assumption,” Proc. 23rd Int’l Conf. Very Large Data Bases (VLDB ’97), pp. 486-495, 1997.

[17] M.O. Rabin, “Fingerprinting by Random Polynomials,” Technical Report TR-15-81, Harvard Aiken Computational Laboratory, 1981.

[18] E. Rahm and P. Bernstein, “A Survey of

Approaches to Automatic Schema

Matching,” The VLDB J., vol. 10, no. 4, pp. 334-350, 2001.

[19] P. Rodr_ıguez-Gianolli, M. Garzetti, L. Jiang, A. Kementsietsidis, I. Kiringa, M. Masud, R.J. Miller, and J. Mylopoulos, “Data Sharing in the Hyperion Peer Database System,” Proc. Int’l Conf. Very Large Data Bases, pp. 1291-1294, 2005.

[20] Saepio Technologies Inc., “The Enterprise Marketing Manage-ment Strategy Guide,” White Paper, 2010.

[21] I.Tatarinov, Z.G. Ives, J. Madhavan, A.Y. Halevy, D. Suciu, N.N. Dalvi, X. Dong, Y. Kadiyska, G. Miklau, and P. Mork, “The Piazza Peer Data Management Project,” SIGMOD Record, vol. 32, no. 3,

[22] A. Thusoo, J. Same, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy, “HIVE: A Warehousing Solution over a Map-Reduce Framework,” Proc. VLDB Endowment, vol. 2, no. 2, pp. 1626-1629, 2009.

[23] H.T. VO, C. Chen, and B.C. Ooi, “Towards Elastic Transactional Cloud Storage with Range Query Support,” Proc. VLDB Endow-ment, vol. 3, no. 1, pp. 506-517, 2010. [24] S.Wu, S. Jiang, B.C. Ooi, and K.-L. Tan,

“Distributed Online Aggregation,” Proc. VLDB Endowment, vol. 2, no. 1, pp. 443-454, 2009.

[25] S. Wu, J. Li, B.C. Ooi, and K.-L. Tan, “Just-in-Time Query Retrieval over Partially Indexed Data on Structured P2P Overlays,”

Proc. ACM SIGMOD Int’l Conf.

Management of Data (SIGMOD ’08), 279-290, 2008

[26] S. Wu, Q.H. Vu, J. Li, and K.-L. Tan, “Adaptive Multi-Join Query Processing in PDBMS,” Proc. IEEE Int’l Conf. Data Eng. (ICDE ’09), 2009.

Author’s biography

Ms.P.Divya has obtained UG

and PG Degree in Anna University, Chennai. Her area of specialization is Image Processing and Big Data. She has published a paper International Journal and attended conferences. Presently working as Assistant Professor in CK College of Engineering and Technology, a unit of Cavin Kare, Cuddalore, Tamilnadu, India.

(7)

Dr. R. Murugan has pursued his PhD Degree in Information

and Communication

Engineering faculty at Anna University, Chennai. He has more than 17 years of teaching and research experience. He has published more than 18 papers in International/National Journals and Conferences. Currently, he is working as a Professor and Head in the Department of Computer Science & Engineering at CK College of Engineering and Technology, Cuddalore.

Ms.R.Suriya has obtained UG and PG Degree in Anna University, Chennai. Her area of specialization is Image Processing. She has published 2 papers International conferences. Presently working as Assistant Professor in CK College of Engineering and Technology, a unit of Cavin Kare, Cuddalore, Tamilnadu, India.

References

Related documents

Unit descriptor This unit describes the performance outcomes, skills and knowledge required to organise domestic and overseas business travel, including developing

Keywords: Dynamic models, Monte-Carlo (MC), Variance Reduction Technique (VRT), Antithetic Variate (AV), Control Variate (CV), Efficiency Gain (EG), Response Surface

In this example, the application of intense enforcement of parking regulations along a critical arterial roadway resulted in increasing curb-side parking capacity by reducing

Research conducted in the last decade has shown that the inclusion and involvement of patients in the management of their own health care using a patient empowerment model is

Using channels as marketing & customer service tool with support of solid customer analytics Social Media BIG DATA + CRM INFORMATION ANALYTICS 2 Integrated web intelligence

Initial Facilities has already undergone a dramatic transformation… 2009 12 stand-alone businesses Cleaning & catering Separate management teams Different finance systems

allocation across application needs, (ii) index management to facilitate indexing of data on flash, (iii) storage reclamation to handle deletions and reclamation of storage space,

For example, "From now on, let's be sure we know what time we want to leave, and if you're ready before I am, will you please just go to another room and read the paper or