• No results found

BIG DATA. Value 8/14/2014 WHAT IS BIG DATA? THE 5 V'S OF BIG DATA WHAT IS BIG DATA?

N/A
N/A
Protected

Academic year: 2021

Share "BIG DATA. Value 8/14/2014 WHAT IS BIG DATA? THE 5 V'S OF BIG DATA WHAT IS BIG DATA?"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

BIG DATA

DR. KLARA NELSON

THE UNIVERSITY OF TAMPA

TBTLA PRESENTATION AUGUST 14, 2014

8/14/14

WHAT IS BIG DATA?

"Volumes of data that are unusually large, or types of data that are unstructured"

Thomas Davenport, Keeping Up with the Quants, 2013, p. 6

“The emerging technologies and practices that enable the collection, processing, discovery, analysis, and storage of large volumes and disparate types of data, quickly and cost effectively.”

SAS Best Practices Team Definition http://tamaradull.com/2013/02/20/the-5-ws-what-is-big-data/

8/14/14

WHAT IS BIG DATA?

Big data Traditional analytics

Type of data Unstructured formats Formatted in rows and columns

Volume of data

100 TB to PB Tens of TB or less Flow of data Constant flow of data Static pool of data Analysis

methods

Machine learning Hypothesis-based Primary

purpose

Data-based products Internal decision support and services Source: Thomas Davenport, 8/14/14 Big Data @ Work, 2014, Table 1-1, p. 4

THE 5 V'S OF BIG

DATA

Volume

Data size

Variety

Many different types

Velocity

High-velocity capture, discovery, and/or analysis

Veracity

Quality / Trustworthiness

Value

8/14/14 01.ibm.com/software/data/bigdata/ http://www-05.ibm.com/fr/events/netezzaDM_2012/Solutions_Big_Data.pdf

(2)

TYPICAL DATA SET SIZE

Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report”, p. 31. 8/14/14

CUSTOMER TRANSACTIONS:

#1 SOURCE OF LARGE DATA

Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report”, p. 9. 8/14/14

THE 5 V'S OF BIG DATA: VALUE

• Integrating ‘V’ – doing something valuable with the data, turning data into dollars • Being able to translate

massive amounts of data into real insights and realizing value from that insight

BIG DATA = BIG ROI

Healthcare

20% decrease in patient mortality by analyzing streaming patient data

Telco

92% decrease in processing time by analyzing networking and call data

Utilities

99% improved accuracy in placing power generation resources by analyzing 2.8 petabytes of untapped data

8/14/14

Big Data at UPS to shave ONE MILE off each DRIVER's ROUTE a day would save the

firm $50 MILLION a year.

Healthcare, Telco, Utilities: http://www-01.ibm.com/software/data/bigdata/industry.html UPS: Christian Science Monitor, Aug 12, 2013, p. 32

THE 8 MOST IN-DEMAND BIG DATA ROLES

Role Average Annual Salary ($)

Visualization Tool Developers (Expert Level) 150,000 – 175,000

Hadoop Developers 150,000 – 175,000

Data Scientists 125,000 – 140,000

Information Architects 113,750 – 135,350

ETL Developers 110,000 – 130,000

Predictive Analytics Developers 103,700 – 129,000 Data Warehouse Appliance Specialist 97,950 – 123,600

OLAP Developers 97,900 – 115,550

http://www.computerworld.com/slideshow/detail/138836/The-8-most-in-demand-big-data-roles-#slide7 , February 17, 2014

(3)

THE BIG DATA

LANDSCAPE

http://blogs-images.forbes.com/davefeinleib/files/2014/06/big-data-landscape-jul-4-2012-00111.png

8/14/14

WHAT IS BIG DATA

TECHNOLOGY?

"Big data technology is capable of handling a

lot of data. Big data handles data cheaply. Big

data handles data in the form of unstructured

strings of data. Big data does its searches

independently. Big data is used to store and

manage large amounts of data. That’s what big

data is."

Bill Inmon

Source:

"Big Data Technology Does Not Replace a Data Warehouse", http://www.b-eye-network.com/view/16714, January 10, 2013 8/14/14

TECHNOLOGIES: DATA

WAREHOUSE VS. BIG DATA

Use the best tool for the job depending on the business requirements:

• Discovery of unexplored business questions • Clean, consistent, high

quality data

• Low latency, interactive reports, OLAP • Raw unstructured data • Analysis of preliminary

data

Source: http://tamaradull.com/2013/03/20/the-5-ws-when-should-we-use-big-data-vs-data-warehousing-technologies/

8/14/14

The average data miner reports using 5 tools, but conducts 76% of their work in their primary tool.

WHICH

DATA

MINING/

ANALYTIC

TOOLS

ARE

USED?

Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report”, p. 31. 8/14/14

(4)

PREPARING STUDENTS TO

WORK WITH BIG DATA

Analytics courses

• ITM 466 Business Intelligence and Analytics (Elective) • ITM 615 Business Analytics (MBA Decision Analysis Elective)

Course topics

• Assessing analytics competencies of organizations (e.g., Davenport's DELTA)

• Analytical thinking stages • Ethics of analytics / big data • Data quality

• Data warehouses & other technologies • Data mining methods

8/14/14

TECHNOLOGIES USED IN THE

BUSINESS ANALYTICS COURSES

• SAP Business Objects • Microsoft Excel • Tableau Software

• SQL Server Data Tools for building analysis databases and data mining

• IBM SPSS Statistics Suite for research and analysis • IBM SPSS Modeler for predicting future behavior (data

mining)

• IBM SPSS Text Analytics for mining unstructured data sources

• IBM Digital Analytics (formerly Coremetrics Web Analytics)

8/14/14

Rexer Analytics (2013), "2013 Data Miner Survey - Summary Report”, p. 36.

DATA MINING ALGORITHMS

DATA MINERS &

ITM 466/615 STUDENTS ARE USING

8/14/14

denotes algorithms covered hands-on in ITM 466/615

THE CHALLENGES OF BIG DATA & BIG

DATA ANALYTICS

Delivering Value

"Through 2015, 85% of Fortune 500 organizations will be unable to exploit big data for competitive advantage." (Gartner)

Data • Silos • Quality • Storage Enterprise strategy Talent

• Lack of IT/technical skills • Lack of domain knowledge • Lack of analytical thinking

skills

Organizational culture Technologies and tools Big data as IT-driven projects

8/14/14

(5)

THE CHALLENGES OF BIG DATA AND BIG

DATA ANALYTICS

Ethics

"A code of conduct to refer to in judging what is right and what is wrong"

regarding the ways we • gatherdata and

intelligence • usedata and

intelligence • guide individual and

organizational conduct

through use of data and intelligence

Frank Buytendijk quotes on Analytics and Ethics from the TDWI Las Vegas 2012 World Conference

• "Are there things you shouldn't do?"

• "It seems like we are doing things because we can." • "The key thing is that

technology is answering questions that weren't even asked."

• "Tools are creating ethical issues, and we don't even have the mechanism to do something about it."

8/14/14

THANK

YOU!

References

Related documents

The theoretical concerns that should be addressed so that the proposed inter-mated breeding program can be effectively used are as follows: (1) the minimum sam- ple size that

According to the aforementioned issues, seven research questions were raised in this study, in which the emphasis of each components of sustainable development, such as the

Therefore, various laboratory equipment used in learning media with the help of ICT can be developed simulation application.. Particularly in the field of

activity (e.g., from older individuals convalescent from RSV infection in the present study), neutralization predominates at high concentrations in serum even in U937 cells..

Young People's Health in Context: Health Behaviour in School-aged Children (HBSC) study?. Health Policy for Children and

Figure 3 indicat the effect of the wind velocity on the overall heat-loss coefficient of the flat plate solar collector (1-5 m/s) on the losses coefficient, was

Nurses feel that both the software and the nurse are essential to clinical decision-making, and describe a process of ‘dual decision- making’, with the nurse as active decision