• No results found

What happens when Big Data and Master Data come together?

N/A
N/A
Protected

Academic year: 2021

Share "What happens when Big Data and Master Data come together?"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Jeremy Pritchard

What happens when Big Data and Master

Data come together?

Master Data Management

(2)

What is Master Data?

Master data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise including customers, prospects, citizens, suppliers, sites,

hierarchies and chart of accounts.

Gartner

Master data is data that is shared by multiple computer systems.

The Information Difference

Master data is information that is key to the operation of a business…persistent, non-transactional data that defines a business entity for which there is, or should be, an agreed-upon view across the organisation.

Wikipedia

Master data is often one of the key assets of a company.

Microsoft

What is Master Data Management?

Master data management is a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise’s official shared master data assets.

Gartner

Master Data Management comprises a set of processes, governance, policies, standards and tools that consistently defines and manages the master data.

Wikipedia

The creation of:

The Golden Record

Single Version of the Truth

(3)

Types of data in an organisation

Unstructured

Found in e-mail, white papers,

magazine articles, corporate intranet portals, product specifications, marketing

collateral, and PDF files

Transactional

Related to sales, deliveries, invoices, trouble tickets, claims, and other monetary and

non-monetary interactions

Hierarchical

Stores the relationships between other data such

as company organisational structures

or product lines.

Master

Critical nouns of a business and fall generally into the groupings: people, places and things, Metadata

Data about other data and includes: report definitions, column descriptions in a database, log files,

connections, and configuration files

The What, Why, and How of Master Data Management – Microsoft November 2006

Understanding Master Data

Think of nouns and verbs

Bob Smith buys a widget (SKU #A1234) and ships it to his home address The master data elements are the nouns and are people, things, and places The transactional data elements are verbs that describe what happens to those people, places, and things.

widget (SKU #A1234)

Bob Smith home address

(4)

Deciding what Master Data should be Managed

Generally speaking, master data should meet the following requirements:

Reuse Value

Volatility

Cardinality Lifetime

Volatility

Any given day:

21,994 will change their

address

3,112 will change their

name

46,152 will change

jobs

1,920 will change their

address

32 will change their name

1,200 will change their

telephone number 896

directorship changes will

occur

96 new business will

start

Better Information through Master Data Management – MDM as a Foundation for BI – Oracle September 2011

(5)

Master Data Management

CRM Marketing ERP WMS Financial

Name: Bob Smith Tel: 01323-456842 DOB:

Gender: Male

Name: Smith, Bob Tel: (01283)56982 DOB: 23/10/1971 Gender:

Name: B Smith Tel: (0)1323456842 DOB: 23-Oct-71 Gender: M Name: Bob Smith

Tel: 01323 456842 DOB:

Gender: M

Name: B Smith Tel: 01323 456842 DOB: 23/10/71 Gender: M

Name: Bob Smith Tel: 01283 56982 DOB: 23/10/71 Gender:

Name: Bob Smith Tel: 01323 456842 DOB: 23/10/71 Gender: M

Master Data Management Architectures

Consolidated

Master is Single Version of Truth

Data Quality at Master

Updates occur at Sources

Updates propagated to Master

Coexistence

Master is Single Version of Truth

Data Quality is ongoing

Updates occur at Sources or Master

Updates propagated to other Sources

Registry

Multiple Versions of Truth

Data Quality is ongoing

Updates occur at Sources

Keys and Metadata in Registry

Updates optionally propagated to other Sources

Centralised

Master is Single Version of Truth

Data Quality at Master

Updates occur at Master

Updates propagated to Sources

(6)

The Current Landscape of MDM Systems

Aberdeen Group – April 2012

45% of survey respondents have

no formal MDM system

Reported Success of MDM Programs

Information Difference – July 2012

45% of survey respondents said

their projects were successful or

very successful

(7)

Key Domains to be Managed

Information Difference – July 2012

The top two domains were

customers and products

What about you?

Do you have a Master Data Management

solution running in your organisation?

(8)

Big Data

What is Big Data?

a term applied to voluminous data objects that are variety in nature – structured, unstructured or a semi-structured, including sources internal or external to an organisation, and generated at a high degree of velocity with some level uncertainty pattern, that does not fit neatly into traditional, structured, relational data stores and requires strong sophisticated information ecosystem with high performance computing platform and analytical capabilities to capture, process, transform, discover and derive insights with some level of confidence and accuracy to provide business value within a reasonable elapsed time.

The Big Data Institute (TBDI)

high-volume, -velocity and -variety information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making.

Gartner

"Big Data" describes data sets so large and complex they are impractical to manage with traditional software tools.

Wikipedia

(9)

What is Big Data?

63% of IT and

business

executives

“Worldwide Big Data Ecosystem” – IDC 30thJuly 2013

are not

familiar with

the phrase

Moore’s Law

His 1965 paper noted that the number of components in integrated circuits had doubled every year from the invention of the integrated circuit in 1958 until 1965 and predicted that the trend would continue

"for at least ten years".

The capabilities of many digital electronic devices are strongly linked to Moore's law: processing speed, memory capacity, sensors and even the number and size of pixels in digital cameras

The number of transistors on integrated circuits

doubles approximately every two years.

(10)

Moore’s Law

Three Dimensions of Big Data

Volume Velocity Variety

(11)

Three Dimensions of Big Data

Volume Velocity Variety

1,000,000,000 1,000,000,000,000 1,000,000,000,000,000 1,000,000,000,000,000,000 1,000,000,000,000,000,000,000 1,000,000,000,000,000,000,000,000 1,000,000,000,000,000,000,000,000,000

Data Size

Brontobyte Yottabyte

Exabyte

Zettabyte

Petabyte

10 27

10 24

10

21

10

18

1015 1012

109 Terabyte Gigabyte

(12)

Global Data

“The Big Vs of Big Data” – PROs 24thJuly 2013 & “The Four Vs of Big Data” – IBM – 25th July 2013

2.5 Exabytes

are created in the digital universe

every day

2,500,000,000,000,000,000

Bytes

2.3 Trillion Gigabytes

has been

has been

has been

has been

created in the

created in the

created in the

created in the

last last

last last years 2

“The Big Vs of Big Data” – PROs 24thJuly 2013

(13)

Global Data Volume (in Zettabytes)

“The Big Vs of Big Data” – PROs 24thJuly 2013 & “The Four Vs of Big Data” – IBM – 25th July 2013

0.13

2005

2005

2005

1.4

2011

2011

2011

2.7

2012

2012

2012

8

2015

2015

2015 2020 2020 2020

40

1 Zettabyte = 1 trillion Gigabytes

1 billion Terrabytes

Three Dimensions of Big Data

Volume Velocity Variety

(14)

By 2020, business transactions on the internet (B2B and B2C) will reach 450

billionper day By 2020, business transactions on the internet (B2B and B2C) will reach 450

billionper day

Data Velocity

“The Four Vs of Big Data” – IBM – 25thJuly 2013

The economist – Feb 25th 2010 IDC

The New York Stock Exchange captures

1TBof trade information during

each trading session The New York Stock

Exchange captures 1TBof trade information during

each trading session

Wal-Mart handles more than 1 million customer transactions every

hour Wal-Mart handles

more than 1 million customer transactions every

hour

“The Big Vs of Big Data” – PROs 24thJuly 2013

Social Media Data Velocity

[ ]

950 million users

generate 2.7 billion likes

on Facebook per day

[ ]

400 million new tweets are created by users

each day

[ ]

2 million Google search queries per minute

[ ]

24 Petabytes of data processed per day

(15)

Three Dimensions of Big Data

Volume Velocity Variety

Variety

Purchase Transactions

Website Traffic

Rewards Programs

Twitter

Facebook

Blog content

Personal Health Monitors

Videos Email Logfiles

Metering Data Clickstreams

Business Reports

Mobile Data

Location data

Sensor data

(16)

Sensor Data

“The Four Vs of Big Data” – IBM – 25thJuly 2013

Modern cars have close to 100 sensors that monitor items such

as fuel level and tyre pressure Modern cars have

close to 100 sensors that monitor items such

as fuel level and tyre pressure The Large Hadron

Collider has 150 millionsensors delivering data 40 milliontimes per

second The Large Hadron

Collider has 150 millionsensors delivering data 40 milliontimes per

second

Stephen Brobst, CTO Teradata - 2010 Sverre Jarp: CTO at Cern – 6thJune 2013

Within the next five years, sensor data will hit the crossover point with unstructured data generated by social media. From there, the sensor data

will dominate by factors 10-to-20 times that of social media.

Within the next five years, sensor data will hit the crossover point with unstructured data generated by social media. From there, the sensor data

will dominate by factors 10-to-20 times that of social media.

Boeing 737 generates 240 Terabytesof flight

data during a singletransatlantic

flight Boeing 737 generates 240 Terabytesof flight

data during a singletransatlantic

flight

Three Dimensions of Big Data

Volume Velocity Variety

(17)

Traditional Data Warehousing

Labour intensive, heavy indexing, aggregations and partitioning

Hardware intensive:

massive storage; big servers Expensive and complex More Data,

More Data Sources

More Kinds of Output Needed by More Users,

More Quickly Limited Resources

and Budget

010101010101010101010101010 1 0101010101010101010101010 01010101010101010101

01

1 0101010101010101010

101

10

0

1 1 010101010101010101010

1010 01010101010101010101010101

01

010 11 Real time data

Multiple databases External Sources

Big Data Technology Challenge

Top Big Data Challenges

The biggest challenge for survey

respondents was determining how to

get value from big data

(18)

Big Data Investments on the Rise

“Big Data Adoption in 2013” - Gartner 12 September 2013

64% of survey respondents are

investing or planning to invest in Big

Data

Types of Data Analysed

“Big Data Adoption in 2013” - Gartner 12 September 2013

The top three types of data were

transactions, log data and machine

or sensor data

(19)

What about you?

Do you have a Big Data project running

in your organisation?

Master Data & Big Data

How can they work together?

(20)

A Mismatched Pair?

Master Data

Relatively small

Highly structured

Domain specific

Non-transactional

Trusted Master Data

Relatively small

Highly structured

Domain specific

Non-transactional

Trusted

Big Data

Large volumes

Potentially unstructured

High velocity

Varied

Generally transactional

Questionable trustworthiness

Don’t try putting Big Data through your MDM solution

(21)

MDM as a search index for big data

Big Data sources may contain new insights but they are often hard to identify and place quickly and cost-efficiently.

If you want to perform targeted analysis on Big Data, you need to know what you’re looking for.

MDM is used to guide big data analysis

Example – Understanding Customer Interactions

Customer Services

Customers

Social Media

(22)

Extracting Master Data from Big Data

Augment traditional information data with dynamically derived data from Big Data sources

Distil the data down to have meaning Enhance the “360 degree view” of MDM

Example – Is this customer a safe driver?

Hire Car

7 10

Customer

Sensor Data

(23)

Is there a connection between Big Data and MDM?

The Information Difference - September 2012

44% of survey respondents believe

there is a significant connection

between MDM and Big Data

Link Specifics

67% of survey respondents believe

the link is from MDM to Big Data.

17% believe the link is from Big Data

to MDM

(24)

What about you?

Do you consider Social Media to be the most

important Big Data to your organisation?

What about you?

Are you currently or do you have any plans to

link Master Data Management

and Big Data in in your organisation?

(25)

Information Builders

Information Builders

38 years of expertise 1,350 dedicated professionals 60 offices worldwide Tens of thousands of customers Millions of users

Our Mission:

To provide the best software and services for

business intelligence, analytics and information management

Transform data into business value Allow every stakeholder to make better decisions

Inject valuable insight throughout your business

(26)

Information Builders

The Information Stack

Business Intelligence Advanced Analytics Performance Management

Integration Infrastructure Data Integration Universal Adapter Suite Data Quality Management Master Data Management Data Governance

Information Builders

A unique and Complete Solution

References

Related documents

Water quality investigations were refined to identify sources of acid mine drainage within Gosline, Lovers Lane, and Turkey Run, conduct tributary mass-balance chemical water

 If students know that the primary response “sulphur dioxide” is untrue, they tend to pick option D because sulphur dioxide does not appear in options A and D and more than one

(2018) How Will the Chocolate Industry Approach Cocoa Farmer ‘Living Income’?, 3 May, www.confectionerynews.com/Article/2018/05/03/

In keeping with the ILO’s global estimate classifications, child labour in domestic work statistically includes: (i) all children aged 5-11 years engaged in domestic work;

Rm4 UG06 (=No. 95eb) Pen arm with guidance. 95ed) Float with guide rod, spare. 95eg) Float vessel with float pen arm. 95k4) Collecting vessel, spare, capacity 4.5

While Europe associated the security challenge posed by al-Qaeda and its supporters with an external foreign threat – originating, in the majority of cases, outside of its

Negative-pressure houses with built-up litter presented higher emission rates during the first rearing week due to the high NH 3 concentration during the brooding period, when

An HBB–BC optimization algorithm as the combination of the BB–BC algorithm and the capability of the PSO algorithm for multi-objective reconfiguration and capacitor placement