Dr. Geoffrey Malafsky
CEO, Phasic Systems Inc.
www.phasicsystemsinc.com
• Geoffrey Malafsky, Ph.D, Founder and CEO, former
scientist
▫ Nanotechnology researcher (Naval Research Laboratory)
▫ Technology advisor and sleuth
DARPA
MEMS
Situational Awareness via real-time information fusion
Office of Naval Research
MEMS
Littoral sensors
Dept of Energy: Nanotechnology dual use
▫ Applying science to difficult data challenges as consultant,
What is Data Science?
• Latest in long line of “hot” IT topics
▫ IT follows Neil Young: “ It is better to burn out than it is to rust”
• Data Science is different than past IT hot spots
▫ “Science” binds it to a well structured culture, procedures,
and ethics
▫ Science is fundamentally rigorous in maintaining auditable,
open lineage of data collection, data rationalization, data
analysis, theory comparison, adjudicating possible scenarios, and making conclusions
• Data Science is not analytics, Business Intelligence,
Big or Small Data: It Is the Quality That Counts
Social media analysis,
“Big Beautiful Data: See Our Social Exchange from Twitter to CNN”, Kristina Farrah, 2April2012,
Data Science As A Form of Science Study
• scientific method (Encyclopædia Britannica, Inc.)
mathematical and experimental techniques employed in the natural sciences; more specifically, techniques used in the construction and testing of scientific hypotheses. Many
empirical sciences, especially the social sciences, use mathematical tools borrowed from probability theory and statistics, together with such outgrowths of these as
decision theory, game theory, utility theory, and operations research. Philosophers of science have addressed general methodological problems, such as the nature of scientific explanation and the justification of induction.
Data Science From A Practitioner
• Mike Loudikes, “What is Data Science?”, 2June2010, http://
radar.oreilly.com/
▫ Data scientists combine entrepreneurship with patience, the
willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: “here’s a lot of data, what can you make from it?”
Our Data Science Principles
• Data Science is the field applying the scientific method to
data collection, management, analysis, and reporting as a single integrated environment for general business
purposes
• Rely on well known and practiced methods of data
collection, correction, integration, pedigree tracking, quality assurance, statistical analysis, model design and testing, tabular and graphical presentation, and visible traceability of conclusions through all analysis and conclusion steps
Data Science Roadmap
• Understand what it is and is not (ignore the cacophony of charlatans
and certificate mills)
• Identify high value insights (note not BI nor reports) to your
C-executives that they want and can turn into action
▫ This makes Data Science applied instead of basic
• Start small; plan a big win; find a senior management champion;
don’t wait for organizational clearance (they are waiting for you to succeed or fail first); be prepared for significant resistance and civil disobedience (work around)
• Continuously communicate that the win is a win for everyone and no
one has to give up control
• Package results in extremely pretty and informative visualizations
Foundations
• Data collection
▫ Multi-source: warehouse, external structured sets,
unstructured high volume (email, social media), images, sensors, metadata
▫ Multi-format
▫ Raw versus refined and corrected
• Data rationalization
▫ Continuous cleaning, correcting, aligning, adjudicating
▫ Little errors grow exponentially; little garbage in à large
• Data analysis
▫ Multi-technique: statistics, models, graphical, linear/non-linear equations
▫ Understand the scope, limits, and biases or each technique, especially statistics (be skeptical)
• Making conclusions
▫ You will likely be wrong 80% of the time – this is a good thing
▫ Keep it to yourself until you challenge, probe, rebuke, debunk
▫ Make sure you can support every contention you make you hard facts and figures, or clear valid analysis steps
• Presenting results
▫ Show the main results as simply as possible
Focus on Data Rationalization
• Most data environments are badly misaligned with
semantically unknown relationships and value conflicts
• There will never be perfect data but you cannot even start
doing analysis until you control your data and understand the good, bad, and untrusted
• Data Rationalization is the process of building and
managing a continuously adaptive data environment that fuels current and future business needs for decision
The Ψ–KORS™ System Model
Example solution:
1. Create table – title aligned to business = Garage
2. Create vocabulary for distinct use cases system, value analysis, business use = (spaces, spaces.description, spaces.national, spaces.state, listingservice, ….)
3. Define ETL logic
4. Merge in warehouse and process in virtualization layer 5. Change as needed
Summary
• Data Science is new and exciting
• It is an excellent career opportunity for explorers with
discipline and a continuous zeal for investigation and uncovering important new insights
• To succeed, the result must be important to a senior
decision maker
▫ Get champion at beginning by making business case of big
win for small investment
▫ Expect resistance and work to turn nay into yay with constant
“no one loses” communication
More
• Look for in-depth learning webinar on Data Science and
Data Rationalization
• New PSI-KORS Foundation will promulgate
non-commercial use of Ψ–KORS metamodel and Corporate NoSQL
• Contact us to bring success into your career and