• No results found

MB8002 week1

N/A
N/A
Protected

Academic year: 2020

Share "MB8002 week1"

Copied!
44
0
0

Loading.... (view fulltext now)

Full text

(1)

Murtaza Haider

(2)

THE BIG PLAN

FOR TODAY

Introductions

Who am I?

Who are you?

Why the book?

Why are You Here?

Course Introduction

(3)
(4)
(5)

MY JOURNEY

WITH DATA

SCIENCE

Civil Engineering

Journalism

Nesbitt Burns

U of Toronto

Housing & Transportation

McGill

Infrastructure, land development &

logistics

Travel demand models

Ryerson

Supply chain

Housing

(6)

“This book may reduce the scarcity of data scientists, but it will certainly increase their value. It teaches many things, but most

importantly it teaches how to tell a story with data.”

— Thomas H. Davenport,

Distinguished Professor, Babson College; Research Fellow, MIT; author of Competing on Analytics and Big Data @ Work

Murtaza Haider How this book is different?

1. It’s not trying to turn you into a statistician 2. It repeats the important lessons

3. It believes analytics are performed to tell fascinating stories

4. It teaches you three things:

(7)
(8)
(9)
(10)
(11)
(12)
(13)

R and R Studio

(14)
(15)

WHAT IS

DATA

SCIENCE?

 Data Science is what data scientists do  Who are data scientists?

 I define data scientist as someone who finds

solutions to problems by analyzing big or small data using appropriate tools and then tells stories to communicate her findings to the relevant stakeholders.

 I do not use the data size as a restrictive

clause. A data below a certain arbitrary threshold does not make one less of a data scientist.

 Nor is my definition of a data scientist

restricted to particular analytic tools, such as machine learning.

 As long as one has a curious mind, fluency in

analytics, and the ability to communicate the findings, I consider the person a data

scientist.

 Harvard Business Review called data science

(16)
(17)

BUSINESS ANALYSTS/DATA SCIENTISTS

While the world is awash with large volumes of data,

inexpensive computing power, and vast amounts of digital

storage, the skilled workforce capable of analyzing data and

interpreting it is in short supply.

A 2011 McKinsey Global Institute report suggests that “the

(18)

WHO USES

DATA?

EVERYONE

Getting Started with Data Science (GSDS) is an applied text on

analytics written for

professionals like Chelsea Clinton who either perform or manage analytics for small and large corporations.

 Ms. Clinton credited statistical analysis software (Stata) for helping her to “absorb

information more quickly and mentally sift through and

catalog it.”

 The text is equally appealing to those who would like to develop skills in analytics to pursue a career as a data analyst

(19)

TEACHING

PHILOSOPHY

 Unlike academic research, industry research delivers

reports that often have only three key ingredients: namely

 Summary tabulations  Insightful graphics  Narrative.

 A review of the reports produced by the industry

leaders, such as PricewaterhouseCoopers, Deloitte, and large commercial banks, revealed that most used

simple analytics—i.e., summary tabulations and insightful graphics to present data-driven findings.

 Industry reports seldom highlighted advanced

statistical models or other similar techniques. Instead, they focused on creative prose that told stories from data.

 GSDS appreciates the fact that most working analysts

will not be required to generate reports with advanced statistical methods, but instead will be expected to summarize data in tables and charts (graphics) and wrap these up in convincing narratives.

 Thus, GSDS extensively uses graphs and tables to

(20)

THE STORY

TELLING

DIFFERENTIATOR

This book is as much about

storytelling as it is about analytics. I

believe that a data scientist is a

person who uses data and analytics to

find solutions to problems, and then

uses the findings to tell the most

convincing and compelling story.

I believe that unless a data scientist is

willing to tell the story, she will remain

in a back office job where others will

use her analytics and findings to build

the narrative and receive praise, and

in time, promotions.

Storytelling is, in fact, the final and

most important stage of analytics.

Therefore, successful communication

of findings to stakeholders is as

(21)
(22)

BACK TO

STORYTELLING

 Storytelling is equally important to the biggest big data firm in the world.

 “Google has a very data-led culture. But we care just as much about the storytelling...”

 Lorraine Twohill, who served as Google’s senior vice president of global marketing in 2014.

 Twohill believes “getting the storytelling right— and having the substance and the authenticity in the storytelling— is as respected internally as [is] the return and the impact.”

 “If you fail on the messaging and storytelling, all that those tools will get you are a lot of bad impressions.”

 “[T]here is one very important aspect we look for, which perhaps differentiates a data analyst from other technologists. It exponentially improves their career prospects if they can match this technical, data-geek knowledge with great communication and presentation skills.”

(23)

America’s chief data scientist, D.J.

Patil.

White House Office of Science and

Technology Policy.

A “data scientist is that unique blend

of skills that can both unlock the

(24)

GROWING

DATA PAINS

Our digital footprint has expanded

rapidly over the past 10 years.

The size of the digital universe was

roughly 130 billion gigabytes in

1995.

(25)

 SAP, a leader in data and analytics, reported

from a survey that 92% of the responding firms in its sample experienced a significant

increase in their data holdings.

 At the same time, three-quarters identified

the need for new data science skills in their firms.

 Accenture believes that the demand for data

scientists outstripped supply by 250,000 in 2015 alone.

 A similar survey of 150 executives by KPMG in

2014 found that 85% of the respondents did not know how to analyze data.

 “Most organizations are unable to connect

the dots because they do not fully

understand how data and analytics can transform their business.”

 Alwin Magimay, head of digital and

(26)

WHAT’S IN

IT FOR YOU

 Realizing the demand sooner than other universities, North Carolina University launched a Master’s in Analytics degree in 2007.

 Michael Rappa, director of the Institute for Advanced Analytics informed, the New York Times that each one of the 84 graduates of the class of 2012 received a job offer.

 Those without experience on average earned $89,000 and the experienced graduate netted more than $100,000.

 Galvanize: Its data science course runs for 12 weeks

(27)
(28)
(29)
(30)
(31)
(32)
(33)

 Google Flu forecasts

 Target predicting teenage pregnancies

 Another example of big data hubris dates back to

1936 when Alfred Landon, a Republican, was contesting the American presidential elections against F.D. Roosevelt.

 The Literary Digest decided to survey 10 million

individuals, which constituted one-fourth of the electorate, about their choice of the presidential candidate.

 The Digest compiled 2.4 million responses

received in the mail and claimed that Alfred Landon would win by a landslide. It predicted Landon would receive 55% of the vote, whereas Roosevelt would receive 41%.

 F.D. Roosevelt won by a landslide by securing

61% of the votes.

 George Gallup, a pollster, conducted a small

(34)
(35)

Do attractive professors get better teaching evaluations?

Are religious individuals more or less likely to have extramarital affairs?

What motivates one to start smoking?

What determines housing prices more: lot size or the number of bedrooms?

How do teenagers and older people differ in the way they use social media?

Who is more likely to use online dating services?

Why do some people purchase iPhones and others Blackberries?

(36)
(37)
(38)
(39)

Data from University of Texas

98 instructors

463 courses

Teaching evaluation score registered by the students

The Beauty Panel

5 student ranked professors for beauty

Caveats

Beauty Evaluations done by a separate group

Teaching effectiveness might depend upon the following:

Knowledge of the subject matter

Eagerness and enthusiasm to transfer knowledge

The ability to make complex appear simple

Respect for learners

(40)
(41)

Dataset:

TeachingRatings.

rda

DESCRIPTION:

Data on course evaluations, course characteristics, and professor

characteristics for 463 courses for the academic years 2000-2002 at the

University of Texas at Austin.

FORMAT:

A data frame containing 463 observations on 13 variables.

BEAUTY

Rating of the instructor’s physical appearance by a panel of six

students, averaged across the six panelists, shifted to have a mean of

zero.

EVAL

(42)

 MINORITY FACTOR

 Does the instructor belong to a

minority (non-Caucasian)?

 AGE

 Professor’s age

 GENDER

Factorindicating instructor’s gender.

 CREDITS FACTOR

 Is the course a single-credit elective

(e.g., yoga, aerobics, dance)?

 DIVISION FACTOR

 Is the course an upper or lower

division course? (Lower division

courses are mainly large freshman and sophomore courses)?

 NATIVE FACTOR

 Is the instructor a native English

speaker?

 TENURE FACTOR

 Is the instructor on tenure track?

 STUDENTS

 Number of students that participated

in the evaluation.

 ALLSTUDENTS

 Number of students enrolled in the

course.

 PROF

(43)
(44)

We will use the following tools

SPSS

R

R Commander

Rstudio

Data Scientist Workbench

Help with software

https://sites.google.com/site/statsr4us/intro/software

Learning R

https://sites.google.com/site/statsr4us/intro/software/learning-r

First Session in R Cmdr

References

Related documents

These 275 participants represent 76% of the total population of 362 students who were eligible to participate, including 38 students participating in the CTSP during the

Adaptative and Accesible Virtual Learning Environment International Conference  Worldwide Member of the Technical Program Committee June 2012 – present.. Member of the

Beyond the issues related to communications and team work skills, Canadian ICT firms also experience gaps in terms of niche and combined technical skills specific to each company’s

High recurring costs hit low-income sectors especially hard, because low fixed telephony penetration in poor households makes them more dependent on access to

• Display the dependent member • Click the Edit Member Record button • Click Admin to enable the contract fields • Click the Set Responsible Member button.. • Enter

As a kind of folk manual cotton textile in Shandong, Lu brocade takes cotton as main raw material, and adopts manual spinning, manually dyeing, and manual weaving, achieving

In Chapter 3, we studied microstructural and mechanical properties of MMNCs processed via two in situ methods, namely, in situ gas-liquid reaction (ISGR) and

14 When black, Latina, and white women like Sandy and June organized wedding ceremonies, they “imagine[d] a world ordered by love, by a radical embrace of difference.”