• No results found

01_Overview.pdf

N/A
N/A
Protected

Academic year: 2020

Share "01_Overview.pdf"

Copied!
48
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

1of 1

Introduction to Statistical Machine Learning

Cheng Soon Ong & Christian Walder

Machine Learning Research Group Data61 | CSIRO

and

College of Engineering and Computer Science The Australian National University

Canberra February – June 2017

(2)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Part I

(3)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

3of 1

Course information

Lecture Manning Clarke Center T6 Tuesday 4.00 pm - 5.30 pm Wednesday 4.00 pm - 5.30 pm

Tutorial CSIT ( building 108 ) No tutorial this week choose one

N113 - Wednesday 5.30 - 7.30 pm N111 - Thursday 11 am - 1 pm

Assignment Warm-up 2% and Two assignments, 19% each

Exam Written exam, 60%

Info Wattle and

(4)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Junk Mail Filtering

(5)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

5of 1

gmail - Priority inbox

https://www.youtube.com/watch?v=5nt3gE9dGHQ

(6)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Handwritten Digit Recognition

Given handwritten ZIP codes on letters, money amounts on cheques etc.

(7)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

7of 1

Galaxy classification

http://www.galaxyzoo.org/#/classify

Given images of the sky from SDSS and CTIO, and crowdsourced labels

(8)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Backgammon

(9)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

9of 1

Go

Uses a combination of deep neural networks and tree search.

(10)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

10of 1

Image Denoising

1 Learn astatisticsover patches from many images of

natural scenes

Learning High-Order MRF Priors of Color Images

2 4 6 8

0.01 0.015 0.02 0.025 0.03

5 1015 20 −0.01 0 0.01 0.02 0.03 0.04

20 40 60 −5

0 5 10 15x 10

−3

2 4 6 8

0.01 0.015 0.02 0.025 0.03

5 10 15 20 −0.01 0 0.01 0.02 0.03 0.04

20 40 60 −5

0 5 10 15x 10

−3

Figure 1.Above are the filters trained on 3x3 intensity, 3x3 color, and 5x5 color, respectively (the filter with highest

variance, which corresponds to a flat grey patch in all three cases, is excluded). The left column shows these filters sorted according to their eigenvalues (largest to smallest), while the right column shows them sorted by their importance after learning (α’s, smallest to largest). The three graphs on the left plot theα’s with respect to the rank of the patches in the left column. The three graphs on the right plot theα’s with respect to the rank of the patches in the right column. The fact that the plots are non-monotonic when sorted by eigenvalue demonstrates the necessity for learning theα’s by maximum-likelihood. The fact that the two rightmost graphs are approximately concave reveals that, although high-order

50 000patches (5×5×3) from the Berkeley Segmentation Database.

(11)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

11of 1

Image Denoising

2 Use this knowledge to a denoise ayet unseenimage

Original image Noise added Denoised

(12)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Separating Audio Sources

Cocktail Party Problem (human brains may do it differently ;–)

Microphones

Audio Sources Audio Mixtures

10 20 30 40 50

�0.5 0.5

10 20 30 40 50

�0.5 0.5 1.0

10 20 30 40 50

�1.0

�0.5 0.5 1.0

10 20 30 40 50

�1.0

�0.5 0.5 1.0

(Jaakko Särelä et. al.,

(13)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

13of 1

Predicting Solar Panel Output

Photovoltaics now very close to grid electricity in price Distributed system of generators

Energy market

Great Machine Learning Problem:Predict the solar energy

output (variability primarily due to clouds) for Australia

Pilot project in Canberra : Use cheap cameras to take

360◦sky photos in several location.

Learn to predict 3-D model of cloud movement.

Learn orientation and efficiency of solar panels for each house from time series of energy output.

(14)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Other applications of Machine Learning

autonomous robotics, bioinformatics,

detecting network intrusion, neuroscience,

medical diagnosis, stock market analysis, social network analysis,

traffic and infrastructure planning, ..

.

See more aten.wikipedia.org/wiki/Machine_

(15)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

15of 1

What is common to these examples?

1 Given some data (e.g. hand written digits).

2 Possibly some extra information (e.g. which digit does this

number represent)

3 Goal: Built a machine which can learn from the given data

(16)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Flavour of this course

Formalise intuitions about problems

Use language of mathematics to express models Geometry, vectors, linear algebra for reasoning Probabilistic models to capture uncertainty Calculus to identify good parameters Design and analysis of algorithms Numerical algorithms in python

(17)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

17of 1

What is Machine Learning?

Definition (First Try)

Machine learning is concerned with the design and development of algorithms that allow machines to learn.

machines? computers? HAL? to learn?

need to quantify "learning"

to improve their performance over time

Definition (Second Try)

Machine learning is concerned with the design and

(18)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

What is Machine Learning?

Definition (First Try)

Machine learning is concerned with the design and development of algorithms that allow machines to learn.

machines? computers? HAL? to learn?

need to quantify "learning"

to improve their performance over time

Definition (Second Try)

Machine learning is concerned with the design and

(19)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

19of 1

What is Machine Learning?

Definition (First Try)

Machine learning is concerned with the design and development of algorithms that allow machines to learn.

machines? computers? HAL? to learn?

need to quantify "learning"

to improve their performance over time

Definition (Second Try)

Machine learning is concerned with the design and

(20)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

What is Machine Learning?

What is the source of the improved performance? New insights by the algorithm designer?

Definition (Final Version)

Machine learning is concerned with the design and

(21)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

21of 1

What is Machine Learning?

What is the source of the improved performance? New insights by the algorithm designer?

Definition (Final Version)

Machine learning is concerned with the design and

(22)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

What is Machine Learning?

Definition (Mitchell, 1998)

A computer program is said to learn from experienceEwith

respect to some class of tasksTand performance measureP,

if its performance at tasks inT, as measured byP, improves

(23)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

23of 1

What is the challenge?

Given only some examples.

Need to derive a relation for many more (possibly infinite) unseen examples.

Occam’s Razor (also written as Ockham’s razor, William Ockham, circa 1285 – 1349): “Plurality must never be posited without necessity”

“The simplest explanation or strategy tends to be the best one.”

(24)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Statistical

Machine Learning - Some History

1960’s : symbolic AI; computers learn rules from data; analysis of the underlying statistics is seldom done. Perceptron (Rosenblatt, 1957), "Perceptrons" (Minsky and Papert, 1969)

1980’s : artificial neural networks

1990’s - 2000’s : statistical machine learning (kernel methods, decision trees, graphical models)

Why Statistical Machine Learning not earlier?

fastercomputers withlargermemory to represent statistical models have become available

numerical methods on the desktop computer (BLAS, LAPACK, Optimisation)

found new interesting classes of algorithms (e.g. on graphs) large amounts of data available which can be tapped into (flickr, social networks)

(25)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

25of 1

Supervised learning

Machine learning is about prediction

Examples/features x1, . . . ,xn∼ X

Labels/annotations y1, . . . ,yn∼ Y

Predictor fw(x) :X → Y

Estimate best predictor = training = learning

Given data(x1,y1), . . . ,(xn,yn), find a predictorfw(·).

No mechanistic model of the phenomenon

There is relatively large amounts of data (examples,x

usuallyRd)

(26)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Strategy in this course

Estimate best predictor = training = learning

Given data(x1,y1), . . . ,(xn,yn), find a predictorfw(·).

1 Identify the type of inputxand outputydata 2 Propose a (linear) mathematical model forfw 3 Design an objective function or likelihood 4 Calculate the optimal parameter (w)

(27)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

27of 1

Related Fields

Artificial Intelligence - AI Statistics

Game Theory

Neuroscience, Psychology Data Mining

(28)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Artificial Intelligence - AI

Artificial intelligence is the intelligence of machines and the branch of computer science which aims to create it.

The field was founded on theclaimthat human intelligence

can be so precisely described that it can be simulated by a machine.

Central areas: reasoning, knowledge, planning,learning,

communication, perception and the ability to move and manipulate objects (autonomous robotics).

Philosophical questions: Can a machine have a mind and consciousness? Are there limits to how intelligent

(29)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

29of 1

Statistics

Descriptive Statistics: Summarize or describe a collection of data

Inferential Statistics: Draw inferences about a process taking randomness and uncertainty into account What can be inferred from data and some modelling assumptions?

(30)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Neuroscience, Psychology

How do animals learn?

Modelling of human learning (e.g. Bayesian models of human inductive learning)

increasing interaction between Statistical Machine Learning and Neuroscience, Psychology

e.g. NIPS - Neural Information Processing Systems Conference 2009:

“Discriminative Network Models of Schizophrenia”, “Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning”,

“Canonical Time Warping for Alignment of Human Behavior”

(31)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

31of 1

Data Mining

searching through large data sets (databases) goal: extracting hidden patterns from data examples: bioinformatics, genetics, medicine

(32)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Computer Science

"What can be (efficiently) automated?" Algorithms

Data Structures

(33)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

33of 1

Adaptive Control Theory

consider systems with parameters changing (slowly) over time or being uncertain

Example: aircraft which changes its weight over time depending on fuel consumption (which in turn depends on the wind)

(34)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Some Basic Notation - Data

The set of all input data is denoted asX. For instance,

X ={x|xis an image containing a handwritten digit}.

One data point withDelements :

x=   x1 . . . xD 

= (x1, . . . ,xD)T.

Data matrix : A set ofNdata pointsxi, wherei=1. . .N,

X=

  

x1,1. . .x1,D

x2,1. . .x2,D . . .

xN,1. . .xN,D

    =   xT1

. . .

xTN

.

(Note : Each data pointxiis a column vector, but appears

as a row vector inX.)

IfD=1,Xis a vector ofNscalar data points. We write

(35)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

35of 1

Some Basic Notation - Targets

A target can be from a finite discrete set (’labels’) or from

R.

(Note: Can extend this idea tom-dimensional labels and

Rm.)

Set of TargetsT, e.g.

T ={one,two,three,four,five,six,seven,eight,nine,zero}.

An ordered set ofNscalar labelst=

 t1

. . .

tN

(36)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Supervised Learning

Given are pairs of dataxi∈ X and targetsti∈ T in the

form(xi,ti), wherei=1..N.

Learn a mapping between the dataXand the targett

(37)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

37of 1

Supervised Learning

Given are pairs of dataxi∈ X and targetsti∈ T in the

form(xi,ti), wherei=1..N.

Learn a mapping between the dataXand the targett

(38)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Unsupervised Learning

Given only the dataxi∈ X.

Discover (=learn) some interesting structure inherent in

(39)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

39of 1

Unsupervised Learning

Given only the dataxi∈ X.

(40)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

(41)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

41of 1

Reinforcement Learning

Example: Game playing. There is one reward at the end of the game (negative or positive).

Find suitable actions in a given environment with the goal of maximising some reward.

correct input/output pairs never presented Reward might only come after many actions.

(42)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Reinforcement Learning

observation1 Agent action1 choose action reward1 receive reward observation2 action2 reward2 Agent choose action receive reward Agent rewardi actioni observationi receive reward ... choose action

Exploration versus Exploitation.

Well suited for problems with a long-term versus short-term reward trade-off.

(43)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

43of 1

Other Machine Learning Types

Active Learning

The algorithm may choose which dataxi∈ Xto select next when building the model.

The order of the data isactivelychosen by the algorithm at run-time.

Transduction

The algorithms is allowed to use the test data (but of course not labels!) when building a model.

Estimation with missing variables.

(44)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Training Regimes

Batch Learning

All training dataX={x1, . . . ,xn}and targetst={t1, . . . ,tn}

are given.

Learn a mapping fromxitotiwhich can then be applied to

yet unseen dataX0={x01, . . . ,x

0

m}to findt0={t01, . . . ,t

0

m}.

Online Processing

Pairs of(xi,ti)become available one at a time.

At each step, learn and refine a mapping fromxitotiwhich

(45)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

45of 1

Python

dynamically typed programming language (no declarations for variables)

supports object oriented, imperative, and functional programming style

many built-in data types (str, tuple, list, set, dict, . . . ) packages for scientific programming (numpy, scipy, pandas, matplotlib)

easily extensible to use code written in C and C++ (or FORTRAN for that matter)

Python runs on Windows, Linux/Unix, Mac OS X, Android, Raspberry Pi

(46)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Jupyter Notebooks

Literate programming environment

Markdown cells also understand LATEX

(47)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

47of 1

(48)

Introduction to Statistical Machine Learning

c

2016

Ong & Walder Data61 | CSIRO The Australian National

University

Figure

Figure 1. Above are the filters trained on 3x3 intensity, 3x3 color, and 5x5 color, respectively (the filter with highest

References

Related documents

Keywords: genetic programming, cellular genetic algorithms, data mining, classification, evolutionary algorithms, knowledge discovery in databases, machine learning..

In Section 3, we provide a performance evaluation of various machine learning algorithms such as Decision Trees, Naive Bayes, Support Vector Machines, and ensemble techniques such

Large-scale Machine Learning; Distributed Optimization; Graph Factorization; Matrix Factorization; Asynchronous Algorithms; Graph Algorithms; Graph

IRJET In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and

4-6 We hypothesize that by using machine learning algorithms and big data to incorporate hundreds of variables we can improve upon the dichotomous tree and logistic

Key words — Bluetooth, BLE beacons, Machine Learning algorithms, random forest, Naive

Keywords: Wind speed, Extreme values, Machine learning algorithms, Spatial modelling,

Keywords: large-scale linear classification, logistic regression, support vector machines, open source, machine