Statistics Meets Big Data
統計遇見大數據
Dr. Aijun Zhang
Spring 2016@HKBU
STAT3980/MATH4875 Overview
Course Title:
Selected Topics in Statistics – Statistics in Banking and Finance (銀行與金融中的統計應用)
Course Objective:
This course aims to provide senior students with statistical methods and applications in banking and finance. Real case studies will be discussed. R/Spark/Python programming techniques will be introduced so that the students may get some hands on experience with data analytics.
Class Schedule:
Assessment
No. Assessment Methods Weighting Remarks
1 Continuous Assessment 30% In-class assignment (about 3 times) to help
practice the basic concepts.
2 Mini-project 30% Group project (of size 2~3 students) during the
2nd half of the course. You are expected to work independently on real datasets. Each group will deliver a written report with oral presentation.
3 Final Examination 40% Final examination to see how far you have
achieved intended learning outcomes especially in the knowledge domain. You are expected to have a thorough understanding on some
important statistical methods and machine learning techniques in banking and finance.
Course Outline
Part I: Statistics Meets Big Data
A. Statistics as Data Science B. Explorative Data Analysis C. Basic Statistical Models D. Machine Learning
E. Distributed Computing
Part II: Banking and Finance Applications
A. Quantitative Risk Management B. Credit Scoring
C. Credit Risk Modeling
D. Rise of Model Risk Management E. Other Miscellaneous Topics
Reference Texts
HKBU Library Online Access(3rdedition)
Part I: Statistics Meets Big Data
A.
Statistics as Data Science
B.
Explorative Data Analysis
C.
Basic Statistical Models
D.
Machine Learning
What is Statistics?
Statistics is the science of learning from data, and of collection, organization, analysis,
interpretation, and presentation of data. It also includes the planning of data collection
in terms of design of surveys and experiments. (See Wikipedia.)
A Brief History
•
Unlike mathematics with a long history, statistics is said to start around 1749.
•
The term "statistics" originally designated systematic collection of
demographic and economic data by states.
•
Later it broadened to cover the collection, summary, and analysis of data.
•
Today, statistics is widely employed in government, business and all the
sciences.
What do statisticians do?
Job Types (What my stats friends are doing):
Financial Analyst/Quant/Programmer in streets, banks, hedge fund, etc
Data Analyst/Statistician/Scientist in Google, Yahoo!, LinkedIn
Consultant/Data Specialist/Analyst in McKinsey, IBM
Academic roles in Universities and Research Institutes
Job Market:
NYT 2009 article: For Today’s Graduate, Just One Word: Statistics
"I keep saying that the sexy job in the next 10 years will be statisticians," said Hal Varian, chief economist at Google. "And I’m not kidding.“
McKinsey 2011 Report: Big data: The next frontier for competition
Best Jobs by CareerCast.com
Rank 2011 2012 2013 2014 2015
1 Software Engineer Software Engineer Actuary Mathematician Actuary
2 Mathematician Actuary Biomedical Engineer University Professor Audiologist
3 Actuary HR Manager Software Engineer Statistician Mathematician
4 Statistician Dental Hygienist Audiologist Actuary Statistician
5 Comp. Systems Analyst Financial Planner Financial Planner Audiologist Biomedical Engineer
6 Meteorologist Audiologist Dental Hygienist Dental Hygienist Data Scientist
7 Biologist Occupational Therapist Occupational Therapist Software Engineer Dental Hygienist
8 Historian Online Ads Manager Optometrist Comp. Systems Analyst Software Engineer
9 Audiologist Comp. Systems Analyst Physical Therapist Occupational Therapist Occupational Therapist
10 Dental Hygienist Mathematician Comp. Systems Analyst Speech Pathologist Comp. Systems Analyst
Top 10 reasons to be a statistician
1. Statisticians are significant.
2. Estimating parameters is easier than dealing with real life. 3. I always wanted to learn the entire Greek alphabet.
4. The probability a statistician major will get a job is > .9999. 5. If I flunk out I can always transfer to Engineering.
6. We do it with confidence, frequency, and variability. 7. You never have to be right - only close.
8. We're normal and everyone else is skewed.
9. The regression line looks better than the unemployment line. 10. No one knows what we do so we are always right.
More Statistical Jokes
• There are three kinds of lies: lies, damned lies, and statistics.
• Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital. • I asked a statistician for her phone number... and she gave me an estimate.
• Three statisticians went out hunting, and came across a large deer. The first statistician fired, but missed, by a meter to the left. The second statistician fired, but also missed, by a meter to the right. The third statistician didn't fire, but shouted in triumph, "On the average we got it!”
Statistics vs. Probability
The two topics are used to be studied together, however statistics and probability are two separate disciplines:
• Probability deals with predicting the likelihood of future events. • Statistics deals with analysis of the frequency of past events.
• Probability is primarily a theoretical branch of mathematics, which studies the consequences of mathematical definitions.
• Statistics evolves to an independent science, which tries to make sense of observations in the real world.
• See Wikipedia for a list of probability topics. • See Wikipedia for list of statistics topics.
Statistics vs. Probability
通俗地讲:概率是已知桶里黑白子的分布,问抓到手里会是什么状况(比如有多大可能抓到 白子、黑子)?而统计是从多次抓到手中的情况,推算桶里黑白子的分布。
Statistics as Data Science
• Google trends: data mining, data science, machine learning, big data (Search items in comparison)
• Statistics is the science of dealing with data, learning from data, and extracting meaning from data.
• Data science is more demanding. It lies in the center of statistics/mathematics, hacking skills and substantive expertise; see Drew Conway’s Venn diagram for detailed explanation.
Statistical applications in diverse fields
• Statistical use is pervasive wherever there exist data. The fields of application of statistics are many and very diverse.
• John Tukey (1915 – 2000): The best thing about being a statistician is that you get to play in everyone’s backyard.
• Long list of fields of application of statistics: Actuarial science, Agriculture, Bioinformatics, Biostatistics, Business Intelligence, Chemometrics, Clinical Trial, Communication Study, Econometrics, Engineering, Environmetrics, Finance, Genetics, Geostatistics, Hedge Fund, Information Technology, Insurance, Management Science, Manufacturing, Marketing, Medical Statistics,
Pharmaceutics, Physics, Politics, Process Control, Psychometrics, Public Health, Quality and Productivity, Reliability, Risk