• No results found

A RoadMap to Data Science. Dr. Geoffrey Malafsky CEO, Phasic Systems Inc

N/A
N/A
Protected

Academic year: 2021

Share "A RoadMap to Data Science. Dr. Geoffrey Malafsky CEO, Phasic Systems Inc"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Dr. Geoffrey Malafsky

CEO, Phasic Systems Inc.

www.phasicsystemsinc.com

(2)

•  Geoffrey Malafsky, Ph.D, Founder and CEO, former

scientist

▫  Nanotechnology researcher (Naval Research Laboratory)

▫  Technology advisor and sleuth

–  DARPA

–  MEMS

–  Situational Awareness via real-time information fusion

–  Office of Naval Research

–  MEMS

–  Littoral sensors

–  Dept of Energy: Nanotechnology dual use

▫  Applying science to difficult data challenges as consultant,

(3)

What is Data Science?

•  Latest in long line of “hot” IT topics

▫  IT follows Neil Young: “ It is better to burn out than it is to rust”

•  Data Science is different than past IT hot spots

▫  “Science” binds it to a well structured culture, procedures,

and ethics

▫  Science is fundamentally rigorous in maintaining auditable,

open lineage of data collection, data rationalization, data

analysis, theory comparison, adjudicating possible scenarios, and making conclusions

•  Data Science is not analytics, Business Intelligence,

(4)

Big or Small Data: It Is the Quality That Counts

Social media analysis,

“Big Beautiful Data: See Our Social Exchange from Twitter to CNN”, Kristina Farrah, 2April2012,

(5)

Data Science As A Form of Science Study

•  scientific method (Encyclopædia Britannica, Inc.)

mathematical and experimental techniques employed in the natural sciences; more specifically, techniques used in the construction and testing of scientific hypotheses. Many

empirical sciences, especially the social sciences, use mathematical tools borrowed from probability theory and statistics, together with such outgrowths of these as

decision theory, game theory, utility theory, and operations research. Philosophers of science have addressed general methodological problems, such as the nature of scientific explanation and the justification of induction.

(6)

Data Science From A Practitioner

•  Mike Loudikes, “What is Data Science?”, 2June2010, http://

radar.oreilly.com/

▫  Data scientists combine entrepreneurship with patience, the

willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: “here’s a lot of data, what can you make from it?”

(7)

Our Data Science Principles

•  Data Science is the field applying the scientific method to

data collection, management, analysis, and reporting as a single integrated environment for general business

purposes

•  Rely on well known and practiced methods of data

collection, correction, integration, pedigree tracking, quality assurance, statistical analysis, model design and testing, tabular and graphical presentation, and visible traceability of conclusions through all analysis and conclusion steps

(8)

Data Science Roadmap

•  Understand what it is and is not (ignore the cacophony of charlatans

and certificate mills)

•  Identify high value insights (note not BI nor reports) to your

C-executives that they want and can turn into action

▫  This makes Data Science applied instead of basic

•  Start small; plan a big win; find a senior management champion;

don’t wait for organizational clearance (they are waiting for you to succeed or fail first); be prepared for significant resistance and civil disobedience (work around)

•  Continuously communicate that the win is a win for everyone and no

one has to give up control

•  Package results in extremely pretty and informative visualizations

(9)

Foundations

•  Data collection

▫  Multi-source: warehouse, external structured sets,

unstructured high volume (email, social media), images, sensors, metadata

▫  Multi-format

▫  Raw versus refined and corrected

•  Data rationalization

▫  Continuous cleaning, correcting, aligning, adjudicating

▫  Little errors grow exponentially; little garbage in à large

(10)

•  Data analysis

▫  Multi-technique: statistics, models, graphical, linear/non-linear equations

▫  Understand the scope, limits, and biases or each technique, especially statistics (be skeptical)

•  Making conclusions

▫  You will likely be wrong 80% of the time – this is a good thing

▫  Keep it to yourself until you challenge, probe, rebuke, debunk

▫  Make sure you can support every contention you make you hard facts and figures, or clear valid analysis steps

•  Presenting results

▫  Show the main results as simply as possible

(11)

Focus on Data Rationalization

•  Most data environments are badly misaligned with

semantically unknown relationships and value conflicts

•  There will never be perfect data but you cannot even start

doing analysis until you control your data and understand the good, bad, and untrusted

•  Data Rationalization is the process of building and

managing a continuously adaptive data environment that fuels current and future business needs for decision

(12)
(13)
(14)

The Ψ–KORS™ System Model

(15)
(16)

Example solution:

1.  Create table – title aligned to business = Garage

2.  Create vocabulary for distinct use cases system, value analysis, business use = (spaces, spaces.description, spaces.national, spaces.state, listingservice, ….)

3.  Define ETL logic

4.  Merge in warehouse and process in virtualization layer 5.  Change as needed

(17)

Summary

•  Data Science is new and exciting

•  It is an excellent career opportunity for explorers with

discipline and a continuous zeal for investigation and uncovering important new insights

•  To succeed, the result must be important to a senior

decision maker

▫  Get champion at beginning by making business case of big

win for small investment

▫  Expect resistance and work to turn nay into yay with constant

“no one loses” communication

(18)

More

•  Look for in-depth learning webinar on Data Science and

Data Rationalization

•  New PSI-KORS Foundation will promulgate

non-commercial use of Ψ–KORS metamodel and Corporate NoSQL

•  Contact us to bring success into your career and

References

Related documents

INTRODUCTION TO ERP: Integrated Management Information Seamless Integration – Supply Chain Management – Integrated Data Model – Benefits of ERP – Business

Data Quality Management: The business processes that ensure the integrity of an organization's data during collection, application (including aggregation), warehousing, and

Financial Reporting and Analysis Legal Environment of Business Marketing Management Operations Management. Organization Information Systems

Launching a Business Analytics Career business analysis business intelligence business management content management data governance data integration data mining data modeling

The Data Lifecycle Challenge Data collection Data processing Data analysis Insight discovery Insight reporting Insight application Repurpose data New challenges..

BUSINESS MONITORING BUSINESS INSIGHTS ANALYTICS LED OPTIMISATION DATA MONETISATION BUSINESS AND PROCESS TRANSFORMATION FUNCTIONAL LEVEL REPORTING ONLY. TRADITIONAL

Provisioning Data for Visual Access and Analysis In most organizations, users need to tap multiple data sources to fuel visual reporting, analytics, and discovery..

This document is intended to serve as an analysis on business reporting tool known as Integrated Resource Management System (IRMS) in terms of reporting format, data integrity