Jeremy Pritchard
What happens when Big Data and Master
Data come together?
Master Data Management
What is Master Data?
Master data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise including customers, prospects, citizens, suppliers, sites,
hierarchies and chart of accounts.
Gartner
Master data is data that is shared by multiple computer systems.
The Information Difference
Master data is information that is key to the operation of a business…persistent, non-transactional data that defines a business entity for which there is, or should be, an agreed-upon view across the organisation.
Wikipedia
Master data is often one of the key assets of a company.
Microsoft
What is Master Data Management?
Master data management is a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise’s official shared master data assets.
Gartner
Master Data Management comprises a set of processes, governance, policies, standards and tools that consistently defines and manages the master data.
Wikipedia
The creation of:
The Golden Record
Single Version of the Truth
Types of data in an organisation
Unstructured
Found in e-mail, white papers,
magazine articles, corporate intranet portals, product specifications, marketing
collateral, and PDF files
Transactional
Related to sales, deliveries, invoices, trouble tickets, claims, and other monetary and
non-monetary interactions
Hierarchical
Stores the relationships between other data such
as company organisational structures
or product lines.
Master
Critical nouns of a business and fall generally into the groupings: people, places and things, Metadata
Data about other data and includes: report definitions, column descriptions in a database, log files,
connections, and configuration files
The What, Why, and How of Master Data Management – Microsoft November 2006
Understanding Master Data
Think of nouns and verbs
Bob Smith buys a widget (SKU #A1234) and ships it to his home address The master data elements are the nouns and are people, things, and places The transactional data elements are verbs that describe what happens to those people, places, and things.
widget (SKU #A1234)
Bob Smith home address
Deciding what Master Data should be Managed
Generally speaking, master data should meet the following requirements:
Reuse Value
Volatility
Cardinality Lifetime
Volatility
Any given day:
21,994 will change their
address
3,112 will change their
name
46,152 will change
jobs
1,920 will change their
address
32 will change their name
1,200 will change their
telephone number 896
directorship changes will
occur
96 new business will
start
Better Information through Master Data Management – MDM as a Foundation for BI – Oracle September 2011
Master Data Management
CRM Marketing ERP WMS Financial
Name: Bob Smith Tel: 01323-456842 DOB:
Gender: Male
Name: Smith, Bob Tel: (01283)56982 DOB: 23/10/1971 Gender:
Name: B Smith Tel: (0)1323456842 DOB: 23-Oct-71 Gender: M Name: Bob Smith
Tel: 01323 456842 DOB:
Gender: M
Name: B Smith Tel: 01323 456842 DOB: 23/10/71 Gender: M
Name: Bob Smith Tel: 01283 56982 DOB: 23/10/71 Gender:
Name: Bob Smith Tel: 01323 456842 DOB: 23/10/71 Gender: M
Master Data Management Architectures
Consolidated
• Master is Single Version of Truth
• Data Quality at Master
• Updates occur at Sources
• Updates propagated to Master
Coexistence
• Master is Single Version of Truth
• Data Quality is ongoing
• Updates occur at Sources or Master
• Updates propagated to other Sources
Registry
• Multiple Versions of Truth
• Data Quality is ongoing
• Updates occur at Sources
• Keys and Metadata in Registry
• Updates optionally propagated to other Sources
Centralised
• Master is Single Version of Truth
• Data Quality at Master
• Updates occur at Master
• Updates propagated to Sources
The Current Landscape of MDM Systems
Aberdeen Group – April 2012
45% of survey respondents have
no formal MDM system
Reported Success of MDM Programs
Information Difference – July 2012
45% of survey respondents said
their projects were successful or
very successful
Key Domains to be Managed
Information Difference – July 2012
The top two domains were
customers and products
What about you?
Do you have a Master Data Management
solution running in your organisation?
Big Data
What is Big Data?
a term applied to voluminous data objects that are variety in nature – structured, unstructured or a semi-structured, including sources internal or external to an organisation, and generated at a high degree of velocity with some level uncertainty pattern, that does not fit neatly into traditional, structured, relational data stores and requires strong sophisticated information ecosystem with high performance computing platform and analytical capabilities to capture, process, transform, discover and derive insights with some level of confidence and accuracy to provide business value within a reasonable elapsed time.
The Big Data Institute (TBDI)
high-volume, -velocity and -variety information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making.
Gartner
"Big Data" describes data sets so large and complex they are impractical to manage with traditional software tools.
Wikipedia
What is Big Data?
63% of IT and
business
executives
“Worldwide Big Data Ecosystem” – IDC 30thJuly 2013
are not
familiar with
the phrase
Moore’s Law
His 1965 paper noted that the number of components in integrated circuits had doubled every year from the invention of the integrated circuit in 1958 until 1965 and predicted that the trend would continue
"for at least ten years".
The capabilities of many digital electronic devices are strongly linked to Moore's law: processing speed, memory capacity, sensors and even the number and size of pixels in digital cameras
The number of transistors on integrated circuits
doubles approximately every two years.
Moore’s Law
Three Dimensions of Big Data
Volume Velocity Variety
Three Dimensions of Big Data
Volume Velocity Variety
1,000,000,000 1,000,000,000,000 1,000,000,000,000,000 1,000,000,000,000,000,000 1,000,000,000,000,000,000,000 1,000,000,000,000,000,000,000,000 1,000,000,000,000,000,000,000,000,000
Data Size
Brontobyte Yottabyte
Exabyte
Zettabyte
Petabyte
10 27
10 24
10
2110
181015 1012
109 Terabyte Gigabyte
Global Data
“The Big Vs of Big Data” – PROs 24thJuly 2013 & “The Four Vs of Big Data” – IBM – 25th July 2013
2.5 Exabytes
are created in the digital universe
every day
2,500,000,000,000,000,000
Bytes
2.3 Trillion Gigabytes
has been
has been
has been
has been
created in the
created in the
created in the
created in the
last last
last last years 2
“The Big Vs of Big Data” – PROs 24thJuly 2013
Global Data Volume (in Zettabytes)
“The Big Vs of Big Data” – PROs 24thJuly 2013 & “The Four Vs of Big Data” – IBM – 25th July 2013
0.13
2005
2005
2005
1.4
2011
2011
2011
2.7
2012
2012
2012
8
2015
2015
2015 2020 2020 2020
40
1 Zettabyte = 1 trillion Gigabytes
1 billion Terrabytes
Three Dimensions of Big Data
Volume Velocity Variety
By 2020, business transactions on the internet (B2B and B2C) will reach 450
billionper day By 2020, business transactions on the internet (B2B and B2C) will reach 450
billionper day
Data Velocity
“The Four Vs of Big Data” – IBM – 25thJuly 2013
The economist – Feb 25th 2010 IDC
The New York Stock Exchange captures
1TBof trade information during
each trading session The New York Stock
Exchange captures 1TBof trade information during
each trading session
Wal-Mart handles more than 1 million customer transactions every
hour Wal-Mart handles
more than 1 million customer transactions every
hour
“The Big Vs of Big Data” – PROs 24thJuly 2013
Social Media Data Velocity
[ ]
950 million usersgenerate 2.7 billion likes
on Facebook per day
[ ]
400 million new tweets are created by userseach day
[ ]
2 million Google search queries per minute[ ]
24 Petabytes of data processed per dayThree Dimensions of Big Data
Volume Velocity Variety
Variety
Purchase Transactions
Website Traffic
Rewards Programs
Blog content
Personal Health Monitors
Videos Email Logfiles
Metering Data Clickstreams
Business Reports
Mobile Data
Location data
Sensor data
Sensor Data
“The Four Vs of Big Data” – IBM – 25thJuly 2013
Modern cars have close to 100 sensors that monitor items such
as fuel level and tyre pressure Modern cars have
close to 100 sensors that monitor items such
as fuel level and tyre pressure The Large Hadron
Collider has 150 millionsensors delivering data 40 milliontimes per
second The Large Hadron
Collider has 150 millionsensors delivering data 40 milliontimes per
second
Stephen Brobst, CTO Teradata - 2010 Sverre Jarp: CTO at Cern – 6thJune 2013
Within the next five years, sensor data will hit the crossover point with unstructured data generated by social media. From there, the sensor data
will dominate by factors 10-to-20 times that of social media.
Within the next five years, sensor data will hit the crossover point with unstructured data generated by social media. From there, the sensor data
will dominate by factors 10-to-20 times that of social media.
Boeing 737 generates 240 Terabytesof flight
data during a singletransatlantic
flight Boeing 737 generates 240 Terabytesof flight
data during a singletransatlantic
flight
Three Dimensions of Big Data
Volume Velocity Variety
Traditional Data Warehousing
Labour intensive, heavy indexing, aggregations and partitioning
Hardware intensive:
massive storage; big servers Expensive and complex More Data,
More Data Sources
More Kinds of Output Needed by More Users,
More Quickly Limited Resources
and Budget
010101010101010101010101010 1 0101010101010101010101010 01010101010101010101
01
1 0101010101010101010
101
10
0
1 1 010101010101010101010
1010 01010101010101010101010101
01
010 11 Real time data
Multiple databases External Sources
Big Data Technology Challenge
Top Big Data Challenges
The biggest challenge for survey
respondents was determining how to
get value from big data
Big Data Investments on the Rise
“Big Data Adoption in 2013” - Gartner 12 September 2013
64% of survey respondents are
investing or planning to invest in Big
Data
Types of Data Analysed
“Big Data Adoption in 2013” - Gartner 12 September 2013
The top three types of data were
transactions, log data and machine
or sensor data
What about you?
Do you have a Big Data project running
in your organisation?
Master Data & Big Data
How can they work together?
A Mismatched Pair?
Master Data
• Relatively small
• Highly structured
• Domain specific
• Non-transactional
• Trusted Master Data
• Relatively small
• Highly structured
• Domain specific
• Non-transactional
• Trusted
Big Data
• Large volumes
• Potentially unstructured
• High velocity
• Varied
• Generally transactional
• Questionable trustworthiness
Don’t try putting Big Data through your MDM solution
MDM as a search index for big data
Big Data sources may contain new insights but they are often hard to identify and place quickly and cost-efficiently.
If you want to perform targeted analysis on Big Data, you need to know what you’re looking for.
MDM is used to guide big data analysis
Example – Understanding Customer Interactions
Customer Services
Customers
Social Media
Extracting Master Data from Big Data
Augment traditional information data with dynamically derived data from Big Data sources
Distil the data down to have meaning Enhance the “360 degree view” of MDM
Example – Is this customer a safe driver?
Hire Car
7 10
Customer
Sensor Data
Is there a connection between Big Data and MDM?
The Information Difference - September 2012
44% of survey respondents believe
there is a significant connection
between MDM and Big Data
Link Specifics
67% of survey respondents believe
the link is from MDM to Big Data.
17% believe the link is from Big Data
to MDM
What about you?
Do you consider Social Media to be the most
important Big Data to your organisation?
What about you?
Are you currently or do you have any plans to
link Master Data Management
and Big Data in in your organisation?
Information Builders
Information Builders
38 years of expertise 1,350 dedicated professionals 60 offices worldwide Tens of thousands of customers Millions of users
Our Mission:
To provide the best software and services for
business intelligence, analytics and information management
Transform data into business value Allow every stakeholder to make better decisions
Inject valuable insight throughout your business
Information Builders
The Information Stack
Business Intelligence Advanced Analytics Performance Management
Integration Infrastructure Data Integration Universal Adapter Suite Data Quality Management Master Data Management Data Governance