Navigating Big Data business analytics

Download (0)

Full text


MWD Advisors is a specialist advisory firm which provides practical, independent industry insights to business analytics, process improvement and digital collaboration professionals working to drive change with the help of technology. Our approach combines flexible, pragmatic mentoring and advisory services, built on a deep industry best practice and technology research foundation.

© MWD Advisors 2013

Navigating Big Data business


Helena Schwenk

A special report prepared for Actuate May 2013

This report is the third in a series and focuses principally on explaining what’s needed in the business analytics layer of a Big Data platform. For more information about how this layer relates to others in a Big Data platform please refer to the corresponding papers in this series: Navigating the Big Data infrastructure layer and Turning Big Data into Big Insights. Finally, for more information about the opportunities and challenges posed by Big Data for organisations today please refer to the first paper in the series, Unlocking the potential of Big Data.

This is a special report prepared independently for Actuate. For further information about MWD Advisors’ research and advisory services please visit


© MWD Advisors 2013


Understand the business need of Big Data

To really get to value from your Big Data you first need to understand how this new world of varied and voluminous data sources can potentially solve problems or create opportunities within your business. It requires you to not only make sense of your data by analysing it and deriving meaningful insights from it, but to be able to apply those insights in a business context in a timely and impactful way.

The promise of a Big Data

platform is that it takes data in its rawest form and converts it into consumable, actionable


The concept of a ‘Big Data platform’ provides a technology framework for taking data in its rawest form, transforming it and putting it in a format where it can be consumed and acted upon by decision makers. Three core layers are required to support these capabilities: the lowest layer is responsible for the storage and organisation of data; the middle layer is where the analysis of that data occurs; and the upper layer is where data insights are discovered and

consumed. This report focuses on the second: the business analytics layer.

Business analytics brings potential to Big Data

Business Analytic tools help bring understanding and meaning to your Big Data. Technologies such as predictive analytics, for instance, can analyse and model Big Data to help make predictions about future events, whereas visual analytics tools can help identify trends or patterns in large volumes of data more easily, and text mining and natural language processing can be used to understand sentiment and extract meaning from textual data.

Tool choices are dependent on a range of factors

What tool you choose will ultimately depend on the problem your business is trying to solve. But equally it will also need to take into account other technical factors such as the type of data being analysed (whether it’s structured or multi-structured, for example) and also the scope of analysis being performed (such as whether it involves real-time,

exploratory or advanced analytic techniques). Being able to understand and map both business and IT requirements to your Business Analytic tool choices remains an important part of any Big Data initiative.


© MWD Advisors 2013

Technology cost and sophistication – driving the

Big Data train

As outlined in the first report in this series, Unlocking the potential of Big Data, in spite of all the headlines and vendor rhetoric, the ability to manage growing volumes of data is not a new phenomenon for organisations today. In fact, many early adopters of Business Intelligence (BI) and data warehousing technology (especially in the retail, telecoms and financial services industries) have long been accustomed to capturing and managing large volumes of data. Yet in spite of this we still see the rise and rise of ‘Big Data’ as a seemingly relatively new concept – so what has changed?

Through their own technology innovations, web and social data-driven businesses such as Google and LinkedIn have shown us how to process Big Data sets (in their case web searches) on massively scalable storage and computing platforms using commodity hardware. Their technology expertise and success is the inspiration behind open source Big Data technologies such as Apache Hadoop and its ecosystem of tools (which we introduce in more detail in the second report of this series, Navigating the Big Data infrastructure layer). The challenge of processing certain kinds of Big Data has also driven other technology innovations related to massive parallel processing architectures, in-memory analytics, columnar databases and complex event processing platforms. All of these pieces bring more choices to organisations that want to advance their use and management of Big Data. Similarly, enhancements in predictive analytics, text mining and advanced data visualisation tools make the exploitation of Big Data more straightforward by making it easier to discover hidden or interesting patterns and insights that, in turn, can be used to enhance productivity, drive efficiencies and growth, and create a sustainable competitive advantage.

Figure 1: Drivers of broader Big Data adoption

Source: MWD Advisors

But it’s not only technology developments spurring the advancement of Big Data; as figure 1 shows, the deployment economics of technologies are equally important. In particular, the decreasing cost of storage and memory, alongside the scalability of cloud computing platforms and appliances – together with the growing influence of open source tools – brings the promise of lower cost and more affordable Big Data platforms. The opportunities of Big Data are opening up to a wider audience, as it becomes more economically feasible to exploit, manage and leverage Big Data – especially for those organisations that may have been priced out of this activity previously.


© MWD Advisors 2013

A Big Data platform has three layers

Most of the commentary around Big Data has focused on the type of data under management – whether structured or multi-structured (defined as data stored and organised in a multitude of formats, including text, video, documents, web pages, email messages, audio or social media posts, and so on), or real-time or data in-motion. However, before any decision can be made about what kind of information and technology capabilities are required to support this data there needs to be agreement and buy-in about what you want to achieve from your Big Data initiative.

At the very least it needs to be framed by a clear strategy that helps outline how data and analytics can be tied to a particular business challenge or opportunity that needs addressing. This in turn provides the starting point from which organisations can assess the technical implications of their Big Data effort, for example by examining how data can be transformed from its raw state to a point to where it can be consumed and acted upon. To support this capability a Big Data platform needs to provide capabilities for:

 Capturing, processing and storing data

 Exploring and applying advanced analytics techniques

 Discovering and consuming insights.

Today these activities are supported by a multitude of technology components – some of them are relatively new, while others are based on existing technologies and architectures. In figure 2 we bring these concepts together as part of an overall ‘Big Data platform’ with three layers. The lowest layer is concerned with organising and storing data; the middle layer is where the analysis of that data occurs; and the upper layer is where data insights are discovered and consumed.

Figure 2: Capabilities of the Big Data platform layers

Source: MWD Advisors

Although these capabilities aren’t necessarily new to BI and data warehousing practitioners, it’s become apparent that the old models for storing and analysing data don’t necessarily apply to all Big Data assets. Not only is the amount of data vast and potentially more time-sensitive in nature, but the variety of data to be managed can be far greater – and this is markedly changing the requirements of the technology needed. This report focuses principally on explaining what’s needed in the analytics layer of a Big Data platform. Please refer to the other papers in this series for an explanation of the other two layers.


© MWD Advisors 2013

Getting to grips with Big Data business analytics

Within the Big Data analytics layer, technologies extract value from data by exploring, modeling and analysing it. Assuming that your company has been successful in organising and storing its Big Data assets then it’s at this point that the data comes to life and organisations have the potential to unlock valuable insights within it. However, before any decision is made about what technology to use, any organisation embarking on a Big Data initiative needs to be clear about the business challenge or opportunity they are trying to address through its use, whether it’s about devising a more profitable pricing strategy, offering more sophisticated product recommendations, improving fraud detection or being able to apply more granular customer segmentation to your data.

Once this has been established you can then look towards how business analytic technology can help support these aims and objectives. What technology you use to support the analysis of Big Data, however, depends on two key factors: the type of data that is required for analysis (such as whether it’s structured or multi-structured data), and the use cases driving the analysis need. To help assimilate a picture of what technology fits where in a Big Data analytic environment, it’s worth classifying and grouping the different types of analysis that can be performed with these technologies. Our research suggests that three broad categories are prevalent:

Advanced analysisis a practice focused on applying sophisticated algorithms such as machine learning, predictive modeling or natural language processing algorithms to Big Data (either structured or multi-structured) to solve a particular business problem or maximise an opportunity. It can be performed by both line-of-business and/or IT users and is focused on identifying a specific goal such as predicting churn, identifying a customer’s propensity to respond or understanding consumer sentiment before the analytics process can begin.

Real-time analysis is focused on using technology enablers such as in-memory or event stream processing engines to facilitate the rapid ingestion and/or analysis of data where the results are served up in real time to a user (such as an online product recommendation, for example), or equally where the results are served up to business users in dashboards where the information is used to drive decision-making.

Exploratory analysis differs from traditional BI query and reporting as it centres on exploring a complete set of less well understood data (rather than a sample), to determine what data has value, and where the hidden patterns and trends lie within that subset without any constraints as to what those patterns or trends may infer. Exploratory analysis may be performed in an academic or research setting and hence requires a different mindset, one where an analyst or data scientist can be more creative in their analysis and one where they don’t always have a clear understanding of the questions they want to ask from the data. Table 1 below provides an overview of the key technologies you should consider as part of your Big Data analytics layer. As you can see from the table, Big Data analysis encompasses a whole range of technologies and tools. Some, such as predictive analytics or SQL tools, are well established, whereas others – especially where the analysis of multi structured data is required – shine the spotlight on a newer breed of Big Data analysis technologies such as Hadoop Hive or text analytics.


© MWD Advisors 2013

Table 1: Big Data analytics options Big Data Analysis

technology Key Facts

Predictive and advanced

analytics The main goal of predictive analytics is to develop a model using a combination of sophisticated analytic algorithms, statistical models and mathematical calculations that analyse current and historical facts to make predictions about future events.

Some database vendors support the execution of advanced analytics within the database (typically within SQL-based MPP databases) to take advantage of parallel processing capabilities of the source database to speed up query processing times. Today an increasing number of analytic applications are also being built in Hadoop HDFS using the MapReduce paradigm in languages such as R or by utilising Apache Mahout, an open source project providing a library of scalable machine learning and data mining algorithms.

In-memory visual

analytic tools Underpinned by an in-memory database, these tools support advanced users in the interactive on-the-fly exploration and analysis of large, complex structured data sets to help pin point trends, segment the data set, and identify outliers and hidden patterns far more easily and often in real time.

Text analytics Text analytics applies linguistic rules and statistical methods to automatically assess, analyse and find patterns found within large quantities of electronic text such as those found within social media posts, emails, and call centre notes.

The process of analysing text usually involves parsing and filtering the text, understanding and extracting its meaning in a structured form for use and analysis in a data store such as a data warehouse.

Sentiment analysis that utilises Natural Language Processing (NLP) techniques is a growing branch of text analytics used to extract linguistic subjective information about opinions, attitudes, emotions and perspectives from text.

SQL SQL is the primary query language used by most BI and analytics tools as well as a lot of

business analysts. While it is primarily used to query structured data, today many vendors are increasing support for querying Hadoop directly using SQL, for example by supporting a Hive interface which allows SQL to be converted to a MapReduce program and processed within Hadoop.

Event stream processing This technology detects events or patterns of events as data streams through transactional systems, networks or communications buses, before correlating and analysing the data so an appropriate action can be taken to minimise risk or maximise an opportunity, for example. Analysis of data occurs when the data is in-motion, i.e. before the data is usually stored in a database or file system, and is often used in conjunction with other technologies such as business rules, predictive analytics and optimisation techniques to help organisations automate and guide decision-making processes, for instance around detecting fraud, managing risk, optimising pricing and strategic process improvements.

Mapping Big Data technologies to analytic use cases

To help explain how these analytic use cases impact and map to your Big Data technology analytic choices, the following table takes a look at some sample Big Data applications and details what makes each technology option particularly suitable for this form of analysis. As always this should only be used as a guide as it does not take into account other factors such as interoperability with existing tools and infrastructure, budget, and skill levels that will also naturally dictate technology choices. For a more detailed explanation of each storage component mentioned please refer to the other paper in this series, Navigating Big Data infrastructure.


© MWD Advisors 2013 Example

application area Usage scenario Example data type Example technology option Customer Churn

analysis Advanced Analytics Structured data Predictive data mining models that analyse transactional, behaviour, demographic and social interaction data can take advantage of the in-database analytics and parallel processing capabilities of the SQL MPP database to run and score customers to identify those that are at risk of churning.

Marketing campaign

analysis Advanced Analytics Structured data In-memory visual analytic tools can be used to analyse revenue by market, campaign, or other attributes to help improve campaigns and market segmentation as well as identifying segments in the customer base that can be used to tailor marketing messages to particular groups or markets.

Click stream analysis Advanced

analytics Multi structured and structured data

Hadoop MapReduce programs written in R can support the parallel processing of large amounts of web log files where insights into navigation behaviour are extracted and combined with existing customer data from the warehouse to support activities such as website optimisation and conversion rate analysis.

Product affinity

analysis Advanced Analytics Multi and structured data Statistical analysis methods are used to determine the relationship between different products and/or product features based around customer purchasing patterns, interaction data, and transaction data. This data can then be analysed using data visualisation tools to identify opportunities for cross-selling and up-selling, for example. Real-time sentiment

analysis Real-time analysis Structured and multi-structured data

Event stream processing technology that combines sophisticated analytics and natural language processing technologies can be utilised to enable real-time opinion mining on millions of public tweets to gain a view into brand performance that in turn can help organisations understand target audiences and shape decision-making.

Real-time offer

management Real time analysis Structured and multi-structured data

In-memory technology and advanced analytics tools can be used to calculate loyalty card points in real time so that when a customer enters the store, they are provided with real-time offers based on loyalty status and specific store inventory.

On-line recommendation engine


Analytics Multi-structured data HDFS can be used to store and process huge volumes of online behaviour data and used in conjunction with Mahout’s library of machine learning algorithms (which operates on top of Hadoop) and the Pig language to recommend complementary products based on predictive analysis for cross-selling.

Customer segmentation analysis


analysis Structured data In-memory visual analytic tools can query and analyse large amounts of structured data providing a fast and interactive way to segment customers based on behaviour, or attributes of customer data to help quickly identify potential growth or profitable customer segments.

Drug research Exploratory Multi –

Structured Hadoop MapReduce can support the processing and interpretation of large amounts of research data. The ability to easily and economically store data in its rawest form without the need for rigid formatting means analysts can focus their efforts on building hypotheses and exploring what questions could be asked of that data.

On-line recommendation engine


Analytics Multi-structured data HDFS can be used to store and process huge volumes of online behaviour data and used in conjunction with Mahout’s library of machine learning algorithms (which operates on top of Hadoop) and the Pig language to recommend complementary products based on predictive analysis for cross-selling.


© MWD Advisors 2013

of technologies and tools to satisfy your analysis needs, some of which can be supported through traditional analytic tools, whereas others will require the introduction of new analytic practices and tools, especially where the scalability, performance and analysis capabilities of existing analytic tools have run out of steam.

Tapping into the potential of Big Data business analytics

Although the breadth and variety of Big Data analytics options available to organisations is not in question, technology choices should only form part of the equation when it comes to assessing how you move forward with a Big Data project. To really get to grips with Big Data you first need to understand exactly how you can get value from large volumes of data, very complicated data, or very fast-moving data (or a combination of any of these) prevalent across the organisation. It’s an effort that requires organisations to improve their ‘data literacy’ by finding ways of understanding how this new world of Big Data can potentially solve problems or create opportunities in their business. What it boils down to is the need to not only make sense of data and derive meaningful insights from it, but to be able to apply those insights in a business context. As we will see in the next report,

Turning Big Data into Big Insights, this is an evolving area and one in which we expect both enterprises and vendor support to develop over time.


© MWD Advisors 2013

Key considerations when planning your Big Data

business analytics investment

 Big Data Analytics encompasses a whole range of technologies and tools. Some, such as predictive analytics and visual analytics, are well established, whereas others – especially where the analysis of multi-structured data is required – shine a spotlight on a newer breed of emerging Big Data analysis technologies such as Hadoop MapReduce, R or Mahout. Today no one single technology platform can support the entire range of Big Data analysis use cases, so expect to extend your existing BI and data warehousing environment to

incorporate these newer analytic components – an effort that will increase demands on data and application integration capabilities across a more diverse analytic environment.

 The options available for applying sophisticated advanced and specialised analytics to Big Data are growing as support for running predictive analytics and machine learning algorithms both in-database or in-Hadoop (for example by using Mahout, Knime or R) increase. Be aware, however, that this will require you to step up your analytical practices and the type of skills employed within your analytics team.

 Processing and analysing text, such as conducting sentiment analysis on social media data, promises to open up new sources of intelligence for many organisations. It uses techniques such as natural language processing (NLP) to understand the opinions, attitudes and intent within text and is often used to understand the voice of the customer. However, no tool can fully automate this type of analysis; it still needs a human touch, and one that blends the power of machines with human intelligence and looks to build, train and evolve the tools language and linguistic capabilities over time.

 The unconstrained nature and scalability of the Hadoop environment and its associated technologies provides an ideal platform for iterative and exploratory data analysis. For example, it can be used to support analysts and data scientists in their quest to uncover non-obvious relationships in the data, detect hidden patterns and generate new theories, hypotheses and experiments based on a full set of data rather than just a selected sample.

 Event stream processing software is a valuable technology for continuously analysing data as it is received and hence is often used for mission-critical data and decision management applications such as real-time fraud detection, sentiment analysis and risk management. However, while this technology supports streaming and analysing data in motion,

consideration also needs to be given to the speed of the feedback loop – that is, the ability of a user or organisation to act on the information within an appropriate timescale – otherwise its value could be lost.

 Above all, before you embark on your Big Data analytic journey consideration also needs to be given to the readiness of your organisation to deal with the data deluge. This, amongst other things will involve developing the necessary skills or data 'literacy' across your

organisation to be able to understand how to value data, its quality or validity, and how it can be utilised to make more effective, accurate and informed business decisions.