Murtaza Haider
Email: [email protected]
Welcome Who am I? Who are you? Why the book?
Why are You Here? Because you love data
You want to learn a new language Introducing Big Data
The Learning Path
“This book may reduce the scarcity of data
scientists, but it will certainly increase their value. It teaches many things, but most
importantly it teaches how to tell a story with
data.”
—Thomas H. Davenport,
Distinguished Professor, Babson College; Research Fellow, MIT; author of Competing on Analytics and Big Data @ Work
Murtaza Haider
How this book is different?
1. It’s not trying to turn you into a statistician
2. It repeats the important lessons
3. It believes analytics are performed to tell fascinating stories
4. It teaches you three things:
Who you are? What you know?
• In a world awash with Big Data and Analytics, businesses and
institutions are increasingly competing on analytics. For this, they need professionals skilled in data/statistical analysis.
• McKinsey Global Institute estimates a shortage of hundreds of thousands of skilled data scientists.
• It’s time to say Hi to data science!
• The workshop provides hands-on training in skills necessary to be proficient in a data-centric world.
• Prerequisites: Curiosity, high-school math, prescribed book, a laptop computer, and willingness to learn R.
Our address on the web
http://tinyurl.com/r-analytics What you need for today
Have R & R Studio installed on your device
Do good looking people get higher salaries/promotions?
Hamermesh, Daniel S. and Amy M. Parker (2005). Beauty in the
Classroom: Instructors' Pulchritude and Putative Pedagogical
Size is the first, and at times, the only dimension that leaps out at
the mention of big data.
We offered a broader definition of big data that captures its other
unique and defining characteristics.
The rapid evolution and adoption of big data by industry has
leapfrogged the discourse to popular outlets, forcing the academic press to catch up.
A particular distinguishing feature of this paper is its focus on
analytics related to unstructured data, which constitute 95% of big data.
This paper highlights the need to develop appropriate and
efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats.
The heterogeneity, noise, and the massive size of structured big
Three dimensions Volume
Variety Velocity
Gartner, Inc. defines big data in similar terms:
“Big data is high-volume, high-velocity and high-variety information
assets that demand cost-effective, innovative forms of information
processing for enhanced insight and decision making.”
TechAmerica Foundation defines big data as follows:
“Big data is a term that describes large volumes of high velocity,
complex and variable data that require advanced techniques and
technologies to enable the capture, storage, distribution, management,
o Our digital footprint has expanded rapidly over the past 10 years.
o The size of the digital universe was roughly 130 billion gigabytes in 1995.
o By 2020, this number will swell to 40 trillion gigabytes. o Companies will compete for hundreds of thousands, if not
millions, of new workers needed to navigate the digital world. o No wonder the prestigious Harvard Business Review called data
o A report by the McKinsey Global Institute warns of huge talent shortages for data and analytics.
o By 2018, the United States alone could face a shortage of
o140,000 to 190,000 people with deep analytical skills
o1.5 million managers and analysts with the know-how to use the
analysis of big data to make effective decisions.
o SAP reported from a survey that 92% of the responding firms in its sample experienced a significant increase in their data
holdings.
oAt the same time, three-quarters identified the need for new data
science skills in their firms.
o Accenture believes that the demand for data scientists may outstrip supply by 250,000 in 2015 alone.
1.Morning session 1. 10:00 –11:00
1. Welcome and Introductions
2. What is Big Data? (Chapters 1 and 2) 2. 11:00 –12:00
1. R: Let’s learn a new language 3. 12:00 –12:45
1. Lunch 2. Afternoon session
4. 12:45 –1:45
1. Introducing data
2. Data types and structures 5. 1:45 –2:45
1. Serving Tables (Chapter 4) 6. 2:45 –3:00
1. Break 7. 3:00 –4:00
1. Graphic Details (Chapter 5) 8. 4:00 –4:45
1. Regression Models: The mother of all analytics (Chapter 7)
9. 4:45 - 5:00