BIG DATA
DR. KLARA NELSONTHE UNIVERSITY OF TAMPA
TBTLA PRESENTATION AUGUST 14, 2014
8/14/14
WHAT IS BIG DATA?
"Volumes of data that are unusually large, or types of data that are unstructured"
Thomas Davenport, Keeping Up with the Quants, 2013, p. 6
“The emerging technologies and practices that enable the collection, processing, discovery, analysis, and storage of large volumes and disparate types of data, quickly and cost effectively.”
SAS Best Practices Team Definition http://tamaradull.com/2013/02/20/the-5-ws-what-is-big-data/
8/14/14
WHAT IS BIG DATA?
Big data Traditional analytics
Type of data Unstructured formats Formatted in rows and columns
Volume of data
100 TB to PB Tens of TB or less Flow of data Constant flow of data Static pool of data Analysis
methods
Machine learning Hypothesis-based Primary
purpose
Data-based products Internal decision support and services Source: Thomas Davenport, 8/14/14 Big Data @ Work, 2014, Table 1-1, p. 4
THE 5 V'S OF BIG
DATA
Volume
Data size
Variety
Many different types
Velocity
High-velocity capture, discovery, and/or analysis
Veracity
Quality / Trustworthiness
Value
8/14/14 01.ibm.com/software/data/bigdata/ http://www-05.ibm.com/fr/events/netezzaDM_2012/Solutions_Big_Data.pdf
TYPICAL DATA SET SIZE
Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report”, p. 31. 8/14/14
CUSTOMER TRANSACTIONS:
#1 SOURCE OF LARGE DATA
Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report”, p. 9. 8/14/14
THE 5 V'S OF BIG DATA: VALUE
• Integrating ‘V’ – doing something valuable with the data, turning data into dollars • Being able to translate
massive amounts of data into real insights and realizing value from that insight
BIG DATA = BIG ROI
Healthcare
20% decrease in patient mortality by analyzing streaming patient data
Telco
92% decrease in processing time by analyzing networking and call data
Utilities
99% improved accuracy in placing power generation resources by analyzing 2.8 petabytes of untapped data
8/14/14
Big Data at UPS to shave ONE MILE off each DRIVER's ROUTE a day would save the
firm $50 MILLION a year.
Healthcare, Telco, Utilities: http://www-01.ibm.com/software/data/bigdata/industry.html UPS: Christian Science Monitor, Aug 12, 2013, p. 32
THE 8 MOST IN-DEMAND BIG DATA ROLES
Role Average Annual Salary ($)
Visualization Tool Developers (Expert Level) 150,000 – 175,000
Hadoop Developers 150,000 – 175,000
Data Scientists 125,000 – 140,000
Information Architects 113,750 – 135,350
ETL Developers 110,000 – 130,000
Predictive Analytics Developers 103,700 – 129,000 Data Warehouse Appliance Specialist 97,950 – 123,600
OLAP Developers 97,900 – 115,550
http://www.computerworld.com/slideshow/detail/138836/The-8-most-in-demand-big-data-roles-#slide7 , February 17, 2014
THE BIG DATA
LANDSCAPE
http://blogs-images.forbes.com/davefeinleib/files/2014/06/big-data-landscape-jul-4-2012-00111.png
8/14/14
WHAT IS BIG DATA
TECHNOLOGY?
"Big data technology is capable of handling a
lot of data. Big data handles data cheaply. Big
data handles data in the form of unstructured
strings of data. Big data does its searches
independently. Big data is used to store and
manage large amounts of data. That’s what big
data is."
Bill Inmon
Source:"Big Data Technology Does Not Replace a Data Warehouse", http://www.b-eye-network.com/view/16714, January 10, 2013 8/14/14
TECHNOLOGIES: DATA
WAREHOUSE VS. BIG DATA
Use the best tool for the job depending on the business requirements:
• Discovery of unexplored business questions • Clean, consistent, high
quality data
• Low latency, interactive reports, OLAP • Raw unstructured data • Analysis of preliminary
data
Source: http://tamaradull.com/2013/03/20/the-5-ws-when-should-we-use-big-data-vs-data-warehousing-technologies/
8/14/14
The average data miner reports using 5 tools, but conducts 76% of their work in their primary tool.
WHICH
DATA
MINING/
ANALYTIC
TOOLS
ARE
USED?
Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report”, p. 31. 8/14/14
PREPARING STUDENTS TO
WORK WITH BIG DATA
Analytics courses
• ITM 466 Business Intelligence and Analytics (Elective) • ITM 615 Business Analytics (MBA Decision Analysis Elective)
Course topics
• Assessing analytics competencies of organizations (e.g., Davenport's DELTA)
• Analytical thinking stages • Ethics of analytics / big data • Data quality
• Data warehouses & other technologies • Data mining methods
8/14/14
TECHNOLOGIES USED IN THE
BUSINESS ANALYTICS COURSES
• SAP Business Objects • Microsoft Excel • Tableau Software
• SQL Server Data Tools for building analysis databases and data mining
• IBM SPSS Statistics Suite for research and analysis • IBM SPSS Modeler for predicting future behavior (data
mining)
• IBM SPSS Text Analytics for mining unstructured data sources
• IBM Digital Analytics (formerly Coremetrics Web Analytics)
8/14/14
Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report”, p. 36.
DATA MINING ALGORITHMS
DATA MINERS &
ITM 466/615 STUDENTS ARE USING
8/14/14
denotes algorithms covered hands-on in ITM 466/615
THE CHALLENGES OF BIG DATA & BIG
DATA ANALYTICS
Delivering Value
"Through 2015, 85% of Fortune 500 organizations will be unable to exploit big data for competitive advantage." (Gartner)
Data • Silos • Quality • Storage Enterprise strategy Talent
• Lack of IT/technical skills • Lack of domain knowledge • Lack of analytical thinking
skills
Organizational culture Technologies and tools Big data as IT-driven projects
8/14/14
THE CHALLENGES OF BIG DATA AND BIG
DATA ANALYTICS
Ethics
"A code of conduct to refer to in judging what is right and what is wrong"
regarding the ways we • gatherdata and
intelligence • usedata and
intelligence • guide individual and
organizational conduct
through use of data and intelligence
Frank Buytendijk quotes on Analytics and Ethics from the TDWI Las Vegas 2012 World Conference
• "Are there things you shouldn't do?"
• "It seems like we are doing things because we can." • "The key thing is that
technology is answering questions that weren't even asked."
• "Tools are creating ethical issues, and we don't even have the mechanism to do something about it."
8/14/14