• No results found

Changing the face of Business Intelligence & Information Management

N/A
N/A
Protected

Academic year: 2021

Share "Changing the face of Business Intelligence & Information Management"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Prepared by

Sharon Hobart, Matthew Connock and Robert Postill © C3 Business Solutions July 2011

1300 530 335

[email protected] www.c3businesssolutions.com GPO Box 589 Melbourne VIC 3001 Australia ABN 35 122 885 465 Melbourne Sydney Canberra Brisbane Perth

White Paper

Big Data

Changing the face of Business Intelligence & Information

Management

Everyone is talking ‘Big Data’…it’s the tech buzz word du jour. There is certainly merit in the discussion as Big Data will change the information landscape of the future and, for those who embrace it, will provide strong competitive advantage and insight like never before. Successful Business Intelligence projects of tomorrow will need to consider Big Data as part of their data landscape for the value that it delivers.

C3 Business Solutions is an award winning Business Intelligence and Information Management company. We are strictly vendor independent and provide impartial advice and innovative solutions to our clients.

(2)

2 | P a g e

Contents

What is Big Data?

3

BI & Big Data

4

Who Will Benefit?

5

The Tech

6

The Advantages

7

What It Means

8

Steps to Success

9

Beware...

10

Examples…

11

In Summary

12

(3)

3 | P a g e Data is growing...

Walmart is processing one million customer transactions

per hour which equates to around 2,500 Terabytes of

data each year.

Most websites generate Gigabytes of data every day,

much of which is thrown away.

Scientists need to manage enormous data sets in a

variety of research areas such as genomics, climate

science and meteorology, life sciences, and high energy

physics. Cern’s Large Hadron Collider, for example, is

churning out up to 40 Terabytes per second and thus,

amassing data in the tens of Petabytes range.

What is Big Data?

Big Data is exactly what it says it is; extremely large data sets that need to be distributed across different servers due to their size.

Every day, organisations are producing enormous amounts of structured and unstructured information. Structured includes trading floor financial information, line of business system data and relational database contents while unstructured data refers to the less tangible areas such as web logs, RFID tags, sensor networks, voice, video and images. Much of this today is either being backed up tape never to be used again or thrown away.

Big Data allows you to keep these huge data sets useable for analysis and ultimately strategic direction.

(4)

4 | P a g e

BI & Big Data

Big Data is likely to change the face of Business Intelligence into the future. Expect “Hadoop clusters” (see The Tech) to be the front line of much business intelligence work. It will enable the analysis of information that we’re never thought to look at before, especially unstructured data that can’t feed into most data warehouses. Even the most sophisticated data warehouses store only three years of information; after that, there is simply too much for current enterprise systems to manage. Those organisations who actively seek out data in their environment will use Big Data to control the volumes passing into their data warehouses.

Big Data removes the need for archiving which saves not only the cost of tapes or paid storage, but keeps all information live relatively cheaply. Sampling is dead; assumptions can be removed from algorithms and real information can be used for analysis.

The open source software is free and supported through both the open source community and increasingly major IT vendors, and when coupled with commodity hardware, you can scale up as much as you like for a fraction of the cost (relatively speaking of course).

Big Data does mean however, that we need to think differently about the questions we ask and the data we query. The value in unstructured data such as web logs, clickstream paths, web content, video and image, as a strategic tool is significant.

(5)

5 | P a g e

Who Will Benefit?

Organisations with truly enormous datasets in highly competitive markets will benefit the most from Big Data technology.

Telecos

Can bring together web logs, call detail records (mobile and fixed) to perform behavioural analysis across these to reduce churn, predict consumer behaviours to support long term strategic thinking

Finance

Global credit card companies (consumer behaviour, fraud analysis), international trading

exchanges (risk analysis, fraud analysis or money laundering through pattern analysis across all trades

Government Agencies

Analysis of long term statistics and analysis of information to support policy decisions (call, web, transport, and intelligence data)

Global Retailers

Multi-channel consumer behaviour analysis and web content analysis Energy / Manufacturing

Analysis from sensor networks to predict

demand forecasts and identify areas of concern such as process inefficiency or potential fraud. Science

Genomic analysis, high energy physics, astronomy, geology, climate science.

What if your car insurance

premium was based on your

specific driving habits? With

GPS data feeding directly from

your car, an insurer could tailor

premiums to specific usage or

even adjust premiums over fixed

time periods based on usage

(e.g. your premium varies each

quarter based on where you

go/how far you drive).

How about the government

charging road tax based on your

particular road usage? Again,

GPS information could feed

directly from your car into a data

lake where it could be analysed.

(6)

6 | P a g e

The Tech

The technology behind Big Data gives businesses the opportunity to access and use this valuable large scale information rather than storing it to tape and forgetting about it forever. It complements your existing investment in Data Warehouse technology.

Big Data is characterised by Open Source technologies, often originating from the large web players such as Facebook, Yahoo or Google. There are many of these NoSQL (Not Only SQL) databases in circulation (see http://nosql-database.org/ for a detailed list) but by far the most widely accepted is Hadoop.

Hadoop is an open source project that originally grew out of Yahoo. It is intended to ease the complexities of performing large-scale batch operations on data and is managed within the Apache project framework.

The Hadoop project now has contributions from Yahoo, Google, Apple and Facebook; arguably, employing some of the smartest minds in the business.

Hadoop is now being embraced by the mainstream with Cloudera having been the main provider of distributions to date. Things are however heating up with some of the major BI vendors announcing support for, or solutions using big data technology. EMC has just announced EMC Greenplum Hadoop distribution. IBM’s BigInsights offering uses a Hadoop base for storage and processing, IBM InfoSphere for integration, specific analytic solutions and integrates with Netezza. Teradata has partnered with Cloudera to provide integration of data from the Hadoop HDFS into Teradata. In addition, key vendors such as Informatica (via EMC), Microstrategy and

(7)

7 | P a g e

The Advantages

Scale

We can now build data lakes that enable a broader analysis of information. Big Data also keeps the information live (rather than archiving to tape which usually results in dead data).

Budget

Aside from the hardware costs, building a petabyte sized data lake is minimal in terms of toolsets. Storage is now cheap, it’s almost free. Hadoop also means inexpensive hardware to build scale as we need it.

New Information

New types of information, in particular unstructured data, can be analysed to provide value to the business. For example, analyse web content to determine sentiment which would prove very useful for military intelligence or for large eCommerce vendors.

Never Lose Data Again

Hadoop is redundant and reliable; it doesn’t stop or lose data even in the event of hardware failure as the data is replicated in multiple locations.

(8)

8 | P a g e

What It Means

Different Skills

Organisations will need people who can manage and analyse data on a huge scale. Data Scientist roles are now appearing on job sites as organisations look for individuals who can help them understand their data. Look to the big web companies such as Google, Facebook and Amazon who are leading the way. “I keep saying that the sexy job in the next 10 years will be

statisticians and I’m not kidding”, Hal Varian, Google Chief Economist.

Different Toolsets

We need to think differently about our toolsets and get comfortable with open source quickly. Most Big Data tools are open sourced.

Democratisation of Algorithms

Most algorithms you need have been written and open sourced. The benefit is in the data and the business problems you apply them to.

Architecture Will Change

Big Data works in data lakes (Hadoop clusters) and not only runs analysis on this mass scale information but also becomes a source for data warehouses. We can now run machine learning on 100Gb+ of image data and clickstream analysis on 100Tb data on the same

platform.

Hardware will change

We will need to rely less on small numbers of large machines and look more at large numbers of commodity hardware (long-term perhaps even cloud resources).

The temporal nature of data is changing...

Because the volumes are increasing so rapidly. Batch operations are back and they are feeding more traditional BI technologies.

(9)

9 | P a g e

Steps to Success

The following steps will help you incorporate Big Data into your BI program successfully.

1. Gain Executive Support

Which is based on an acceptance of the value of evidence based strategy (i.e. they will already be using data warehouse and probably data mining extensively within the organisation). Find a commercial problem, not a technical problem to apply this to.

2. Get the Right People

You will need people who can manage large, distributed data sets and the hardware that comes with it. Next are the people who can make sense of all the data and can then put that into a business context. Think data scientists as opposed to existing data analysts and data miners. 3. Embrace Open Source

Traditional vendors are not the answer here. You’ll need to get comfortable quickly with open source. The innovators here are communities made up of the smartest people from the smartest companies around; Google, Yahoo, Apple, Facebook.

4. Buy capacity from small standard units

Infrastructure as a service (IaaS) vendors and cloud resources provide massive time-to-market and timeliness advantages to those organisations capable of taking advantage.

5. Find a data source you don’t use

For example many organisaitons don’t derive value from their websites. What happens to web logs? Ask questions like what’s the least popular web page or what’s the busiest time of day for your website? You should be able to work out which ISP your customers use. Could you use that information for joint marketing?

6. Visualisation

Think about new ways of presenting data as some analysis simply won’t make sense using tables or graphics.

(10)

10 | P a g e

Beware...

Big Data is definitely here to stay; it will change the BI landscape and provide a valuable data resource for organisations. However, as with any new technology, there are a number of things to be aware of.

Skills are critical

Big Data is in its infancy and requires a different skill set from your existing data warehouse team. You need the right people to query the data such as data scientists (data quants) rather than traditional SQL query writers.

Don’t start from scratch

Get a Hadoop distribution from Cloudera or EMC. This will provide the basic tools you need. PoC only at Scale

Benefits will kick in at the hundreds of Gigabytes range, not on a couple of laptops! Manage Expectations

Big Data is good for large scale analytics and long-term strategic direction. Don’t think it will deliver monthly management reporting or that you can use it for ad-hoc queries over structured data.

(11)

11 | P a g e Data Storage Byte Table 1000 Megabytes = 1 Gigabyte 1000 Gigabytes = 1 Terabyte 1000 Terabytes = 1 Petabyte 1000 Petabytes = 1 Exabyte 1000 Exabytes = 1 Zettabyte 1000 Zettabytes = 1 Yottabyte 1000 Yottabytes = 1 Brontobyte 1000 Brontobytes = 1 Geopbyte

Examples…

There are an emerging number of examples where Big Data is being used to make strategic decisions at some of the world’s more forward thinking organisations. A simple google search reveals many such examples…(http://wiki.apache.org/hadoop/PoweredBy)

Ebay

Enables search optimisation and research on the eBay network with 532 node clusters handling 5.3 Petabytes.

Facebook

Enables reporting/analytics and machine learning for Facebook advertising. Uses a 1,100 machine cluster (8,800 cores) storing 12Tb of raw data.

LinkedIn

Uses Hadoop to power People You May Know using the Graph algorithm provided in MapReduce

CERN

Uses Hadoop to manage their data from Atlas and other components of LHC via University of Nebraska to search for Higgs Boson particle – 54 Petabytes under storage (this year alone) Yahoo

Over 100,000 CPUs in more than 40,000 machines are running Hadoop. The biggest is cluster used to support research for Ad Systems and Web Search (total over 16 Petabytes of storage). Yahoo homepage personalisation is provided based on Hadoop anlaysis – they have seen twice the uptake. Yahoo Mail anti-spam analysis – sees 40% less spam than Hotmail (Yahoo figures).

VISA

Replaced legacy ETL subsystem with Hadoop based alternative that is more flexible, faster and cheaper.

Chase

Analyse long-term historical trade data to identify fraudulent activity and build real-time fraud prevention

The British Library

The Library is working with IBM using their Hadoop based BigSheets solution to preserve and analyse all the websites in the .uk top level domain to provide a unique view of British online activity over time; something that was simply not possible in the past.

(http://news.cnet.com/8301-13846_3-10459507-62.html

And

(12)

12 | P a g e

In Summary

There is an estimated one Zettabyte (or 1,000 Exabytes) of information currently stored worldwide and by 2030 this is predicted to increase to as much as 700 Zettabytes1. The sheer

volume of information that organisations now store, and therefore, want to access for

competitive advantage and strategic decision making, means that we must rethink the way we store information. Parking data with strategic value on twenty Gigabyte tapes is not the answer. As the amount of data available continues to grow rapidly, businesses that fail to develop the skills to manage and analyse it will find themselves at a competitive disadvantage.

As more and more organisations move into statistics and data mining to set strategic direction, the need for greater insights to stay ahead of the pack is required.

Used properly, Big Data will help organisations manage risk better, and improve the customer experience, fundamentally changing the way information management operates.

1 During the 2010 Hadoop World Conference, Abhishek Mehta, then a managing director at Bank of America, and now founder of

Tresata, cited a Cisco Systems’ estimate that by 2013 700 zetabytes (ZBs) of data would be flowing across the Internet. A zetabyte represents 1,000 exabytes (EBs), or 1 million petabytes (PBs).

References

Related documents

As you may recall, last year Evanston voters approved a referendum question for electric aggregation and authorized the city to negotiate electricity supply rates for its residents

This study is aimed at using textural features extracted by contourlet, incorporated with patient information, with the intention of establishing an SVM model, that will better

En viktig aspekt vid övergång från konventionellt till ekologisk odling är att det inte är tillåtet att odla samma gröda både konventionellt och ekologiskt inom

Imaging results of a vertical dry cask (upper row: images using a perfect algorithm, lower row: images using PoCA, left column: fully loaded, center column: half loaded, right

The significant advantage of using KLT and KT rotary joints is the fact that there is no coolant leakage from the drain port since TESS technology ensures that the seal faces are

Abstract: The objectives of this research are to analyze (1) competitiveness and potential market attractiveness and business strength of rubber Sumatran in the

In conclusion, for the studied Taiwanese population of diabetic patients undergoing hemodialysis, increased mortality rates are associated with higher average FPG levels at 1 and

or on a weekend, please report to the Public Relations office which is located across the street from the Mission’s main office, behind the gift shop.. If no one from Public