• No results found

Statistics Meets Big Data 統 計 遇 見 大 數 據

N/A
N/A
Protected

Academic year: 2021

Share "Statistics Meets Big Data 統 計 遇 見 大 數 據"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

Statistics Meets Big Data

統計遇見大數據

Dr. Aijun Zhang

Spring 2016@HKBU

(2)

STAT3980/MATH4875 Overview

Course Title

Selected Topics in Statistics – Statistics in Banking and Finance (銀行與金融中的統計應用)

Course Objective

This course aims to provide senior students with statistical methods and applications in banking and finance. Real case studies will be discussed. R/Spark/Python programming techniques will be introduced so that the students may get some hands on experience with data analytics.

Class Schedule

(3)

Assessment

No. Assessment Methods Weighting Remarks

1 Continuous Assessment 30% In-class assignment (about 3 times) to help

practice the basic concepts.

2 Mini-project 30% Group project (of size 2~3 students) during the

2nd half of the course. You are expected to work independently on real datasets. Each group will deliver a written report with oral presentation.

3 Final Examination 40% Final examination to see how far you have

achieved intended learning outcomes especially in the knowledge domain. You are expected to have a thorough understanding on some

important statistical methods and machine learning techniques in banking and finance.

(4)

Course Outline

Part I: Statistics Meets Big Data

A. Statistics as Data Science B. Explorative Data Analysis C. Basic Statistical Models D. Machine Learning

E. Distributed Computing

Part II: Banking and Finance Applications

A. Quantitative Risk Management B. Credit Scoring

C. Credit Risk Modeling

D. Rise of Model Risk Management E. Other Miscellaneous Topics

(5)

Reference Texts

HKBU Library Online Access(3rdedition)

(6)

Part I: Statistics Meets Big Data

A.

Statistics as Data Science

B.

Explorative Data Analysis

C.

Basic Statistical Models

D.

Machine Learning

(7)

What is Statistics?

Statistics is the science of learning from data, and of collection, organization, analysis,

interpretation, and presentation of data. It also includes the planning of data collection

in terms of design of surveys and experiments. (See Wikipedia.)

(8)

A Brief History

Unlike mathematics with a long history, statistics is said to start around 1749.

The term "statistics" originally designated systematic collection of

demographic and economic data by states.

Later it broadened to cover the collection, summary, and analysis of data.

Today, statistics is widely employed in government, business and all the

sciences.

(9)
(10)

What do statisticians do?

Job Types (What my stats friends are doing):

 Financial Analyst/Quant/Programmer in streets, banks, hedge fund, etc

 Data Analyst/Statistician/Scientist in Google, Yahoo!, LinkedIn

 Consultant/Data Specialist/Analyst in McKinsey, IBM

 Academic roles in Universities and Research Institutes

Job Market:

NYT 2009 article: For Today’s Graduate, Just One Word: Statistics

"I keep saying that the sexy job in the next 10 years will be statisticians," said Hal Varian, chief economist at Google. "And I’m not kidding.“

McKinsey 2011 Report: Big data: The next frontier for competition

(11)

Best Jobs by CareerCast.com

Rank 2011 2012 2013 2014 2015

1 Software Engineer Software Engineer Actuary Mathematician Actuary

2 Mathematician Actuary Biomedical Engineer University Professor Audiologist

3 Actuary HR Manager Software Engineer Statistician Mathematician

4 Statistician Dental Hygienist Audiologist Actuary Statistician

5 Comp. Systems Analyst Financial Planner Financial Planner Audiologist Biomedical Engineer

6 Meteorologist Audiologist Dental Hygienist Dental Hygienist Data Scientist

7 Biologist Occupational Therapist Occupational Therapist Software Engineer Dental Hygienist

8 Historian Online Ads Manager Optometrist Comp. Systems Analyst Software Engineer

9 Audiologist Comp. Systems Analyst Physical Therapist Occupational Therapist Occupational Therapist

10 Dental Hygienist Mathematician Comp. Systems Analyst Speech Pathologist Comp. Systems Analyst

(12)

Top 10 reasons to be a statistician

1. Statisticians are significant.

2. Estimating parameters is easier than dealing with real life. 3. I always wanted to learn the entire Greek alphabet.

4. The probability a statistician major will get a job is > .9999. 5. If I flunk out I can always transfer to Engineering.

6. We do it with confidence, frequency, and variability. 7. You never have to be right - only close.

8. We're normal and everyone else is skewed.

9. The regression line looks better than the unemployment line. 10. No one knows what we do so we are always right.

(13)

More Statistical Jokes

• There are three kinds of lies: lies, damned lies, and statistics.

• Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital. • I asked a statistician for her phone number... and she gave me an estimate.

• Three statisticians went out hunting, and came across a large deer. The first statistician fired, but missed, by a meter to the left. The second statistician fired, but also missed, by a meter to the right. The third statistician didn't fire, but shouted in triumph, "On the average we got it!”

(14)

Statistics vs. Probability

The two topics are used to be studied together, however statistics and probability are two separate disciplines:

Probability deals with predicting the likelihood of future events. • Statistics deals with analysis of the frequency of past events.

Probability is primarily a theoretical branch of mathematics, which studies the consequences of mathematical definitions.

Statistics evolves to an independent science, which tries to make sense of observations in the real world.

• See Wikipedia for a list of probability topics. • See Wikipedia for list of statistics topics.

(15)

Statistics vs. Probability

通俗地讲:概率是已知桶里黑白子的分布,问抓到手里会是什么状况(比如有多大可能抓到 白子、黑子)?而统计是从多次抓到手中的情况,推算桶里黑白子的分布。

(16)

Statistics as Data Science

• Google trends: data mining, data science, machine learning, big data (Search items in comparison)

• Statistics is the science of dealing with data, learning from data, and extracting meaning from data.

• Data science is more demanding. It lies in the center of statistics/mathematics, hacking skills and substantive expertise; see Drew Conway’s Venn diagram for detailed explanation.

(17)

Statistical applications in diverse fields

• Statistical use is pervasive wherever there exist data. The fields of application of statistics are many and very diverse.

• John Tukey (1915 – 2000): The best thing about being a statistician is that you get to play in everyone’s backyard.

• Long list of fields of application of statistics: Actuarial science, Agriculture, Bioinformatics, Biostatistics, Business Intelligence, Chemometrics, Clinical Trial, Communication Study, Econometrics, Engineering, Environmetrics, Finance, Genetics, Geostatistics, Hedge Fund, Information Technology, Insurance, Management Science, Manufacturing, Marketing, Medical Statistics,

Pharmaceutics, Physics, Politics, Process Control, Psychometrics, Public Health, Quality and Productivity, Reliability, Risk

References

Related documents

Interestingly, John states that in his vision, the book oflife was not the only one which was opened for the judgement of the dead (Rev. Hawthorne's Black Man, presumably Satan, has

 Presented at the Annual Latino Enhancement Cooperative’s Latino Leadership Conference at Indiana University to provide pre-collegiate preparation seminars for high school

Conceptualizing advanced nursing practice: Curriculum issues to consider in the educational preparation of advanced practice nurses in

We study the computational complexity of the following decision problems related to the L -cyclomatic number of directed hypergraphs: computing the L -cyclomatic number of a

Survey responses indicated that during the 2017-2018 academic year, approximately 46% of graduates (equaling approximately 135) found academic positions, strongly suggesting

From 1990 through 1999 almost 3.2 billion guilders from the Netherlands’ budget for development assistance were spent on relief of the external debt of developing countries. A

Someone who holds each type of card will, as the first two columns of Table 4 show, have approximately 5.6 percentage points lower checking account balances (measured relative to

allocation across application needs, (ii) index management to facilitate indexing of data on flash, (iii) storage reclamation to handle deletions and reclamation of storage space,