• No results found

The Lab and The Factory

N/A
N/A
Protected

Academic year: 2021

Share "The Lab and The Factory"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

The Lab and The Factory

Architecting for Big Data Management

April Reeve

(2)

“A good speech should be like a

woman's skirt: long enough to cover

the subject and short enough to create

interest.”

(3)

April Reeve

• Twenty five years doing data oriented stuff

• Data Management disciplinesData Management disciplinesData Management disciplinesData Management disciplines ––– Data Integration, Data Governance, Data – Data Integration, Data Governance, Data Data Integration, Data Governance, Data Data Integration, Data Governance, Data

Modeling, Data Quality, Business Modeling, Data Quality, Business Modeling, Data Quality, Business

Modeling, Data Quality, Business Intelligence, Master Data Intelligence, Master Data Intelligence, Master Data Intelligence, Master Data Management,

Management, Management,

Management, Data ConversionData ConversionData ConversionData Conversion, Data , Data , Data , Data Warehousing , Enterprise Content Warehousing , Enterprise Content Warehousing , Enterprise Content Warehousing , Enterprise Content Management, Big Data Management

Management, Big Data Management Management, Big Data Management Management, Big Data Management

• Currently implementing Data Governance programs and developing Big Data

Strategies for Life Sciences and Financial Services organizations

• Certifications –

– Certified Data Management Professional (DAMA)Certified Data Management Professional (DAMA)Certified Data Management Professional (DAMA)Certified Data Management Professional (DAMA)

– Certified Data Governance and Stewardship Professional (DGSP)Certified Data Governance and Stewardship Professional (DGSP)Certified Data Governance and Stewardship Professional (DGSP)Certified Data Governance and Stewardship Professional (DGSP) – Certified Business Intelligence Professional (CBIP)Certified Business Intelligence Professional (CBIP)Certified Business Intelligence Professional (CBIP)Certified Business Intelligence Professional (CBIP)

– Certified in Enterprise Governance of IT (ISACA) – Certified Information Systems Auditor (ISACA)

• Masters degree in Financial Management (financial risk management, derivatives

(4)

Agenda

Big Data

The Data Scientist environment for predictive analytics

– the Lab

Operationalizing predictions – the Factory

How does it fit with legacy data management

(5)

Analytics Maturity

(6)

• Volume:Volume:Volume:Volume: data volumes approaching multiple petabytes

• VelocityVelocityVelocityVelocity: data being generated and ingested for analysis in real-time

• VarietyVarietyVarietyVariety: tabular, documents, e-mail, metering, network, video, image, audio

• ComplexityComplexityComplexityComplexity: different standards, domain rules, and storage formats per data type

More than just about data volume, smart big data

strategies also consider the velocity, variety, and

complexity of information

Transactional Data Transactional Data Transactional Data Transactional Data Documents DocumentsDocuments

Documents Smart GridSmart GridSmart GridSmart Grid

Variety Variety Variety

Variety ComplexityComplexityComplexityComplexity

Velocity Velocity Velocity

Velocity VolumeVolumeVolumeVolume

Gartner March 2011 New insights on customers, products, and operations Contextual and location-aware delivery to any device Images Images Images

(7)

Big Data Goal:

More, Faster, Better Data for Purpose

Area Revolution

Latency “No time to read. In-memory is the new DB”

Enrichment “Tagging is the new Transformation”

Query “Federated Query is the new ETL”

Purpose “Purposeful View is the new Master”

Analytics “Predictive is the new Reactive”

(8)

Predictive Analytics

The Data Scientist chooses Internal and External data

(lots of it!) and throws into an Analytical Sandbox

The Data Scientist identifies patterns in the data and

develops predictive models of behavior involving

combining historical information concerning a

(9)

What is Data Science?

Data Science refers to the scientific method:

The scientist (Data Scientist) develops a hypothesis

(model of behavior)

Using a large amount of historical data and statistical

(10)
(11)

Leveraging Big Data for Action

The organization develops software which populates

models using historical customer information and

installs into the operational reporting environment

Real time processing combines customer information

(12)

Leveraging Big Data for Action –

(13)

Big Data Analytics Architecture

• In “Big Data” management we need:

• A “Lab” or “Sandbox” “Lab” or “Sandbox” “Lab” or “Sandbox” environment that is very dynamic and “Lab” or “Sandbox” can be used by the Data Scientists to throw in or throw away massive amounts of structured and unstructured data against which to do analysis, find patterns and insights, and develop models

(14)

New Data Hubs –

The Analytical Sandbox & NoSQL Data Stores

Hadoop Data Store Hadoop Data Store Analytic Sandbox

Exploratory Analytic Environment Structured BI Reporting Environment

Data Preparation and Enrichment ALL data fed into

Hadoop Data Store

(15)

Data Latency Spectrum

Use Case Time Interval

Ultra low latency messaging < 100 microseconds

Extreme transaction processing < 1 millisecond

Streaming data analysis; no intermediate persistence < 100 milliseconds

Real time event characterization < 1 second

Complex event processing; near real-time dashboards < 30 seconds

Operational dashboard < 5 minutes

Intraday analysis < 2 hours

Daily rollup ≤ 24 hours

(16)

Considerations in Organizing People

The Lab

• “In their search for new

insights, data scientists write enormous quantities of code. But it is not designed to meet commercial standards for

scalability, security, and stability. You create and support commercial-grade code in the factory.”

The Factory

• “The [Factory] requires many more people with a wider variety of skill sets, a more rigid environment, and

different sorts of metrics…. To be clear, creativity and

experimentation are

important in the factory, but you must not expect more than incremental thinking and production-oriented solutions.”

From Article From Article From Article

(17)
(18)

Contact Information

April Reeve

– EMC Consulting – Enterprise Information Management Practice

[email protected] • +1 (201) 396-1831

• @Datagrrl on Twitter

• Blog - http://infocus.emc.com/april_reeve/

• Book Book Book Book ---- “Managing Data in Motion “Managing Data in Motion “Managing Data in Motion –“Managing Data in Motion ––– Data Integration Data Integration Data Integration Data Integration

(19)

References

Related documents

This paper contributes to the both Accounting and economic literature by testifying results from an exploration of the factors affecting the economics fluctuation

It has to be answered in affirmative when (a) the ascendant is moveable and (b) occupied or aspected by its lord or benefics and unoccupied and unaspected by

100 10 20 30 40 50 60 70 80 90 100 Ca pa ci ty % 200 300 Number of Cycles Battery Capacity 400 500 600 700 800 Quality Monitoring Ensure your battery is performing as planned • Set

Situating the ‘trend’ – as he calls it – of Neo-Orthodoxy within the wider context of ‘globalization and cultural assimilation’, Giannakopoulos refers to ‘processes

  Campus and Location 

In 2007, the company launched a range of 7 ounce-denominated investment bars that bear the names (in Chinese) of China Banknote Printing and Minting Corporation and China

The findings of this study point to a trend of increasing use of mobile devices and intensification of private online access among children in Brazil. In the light of this

The systems based on diesel generation installed by the Ministry of Energy and Petroleum to supply electricity to areas which are far from the national grid have experienced