Machine Learning (COMP-652)

(1)

Machine Learning (COMP-652)

Instructor: Doina Precup Email: [email protected]

Grader: Pavel Dimitrov

Class web page:

http://www.cs.mcgill.ca/ dprecup/courses/ml.html

1

Outline

Administrative issues

What is machine learning?

Why study machine learning?

Formulating machine learning problems

Machine learning questions

(2)

Administrative issues

Class materials:

– Tom Mitchell, Machine Learning (main text)

– Additional readings: distributed in class and/or posted on the web page

– Class notes: posted on the web page

Prerequisites:

– Knowledge of a programming language (e.g. C, C++, Java, LISP, Matlab)

– Some knowledge of probabilities and statistics

– Some AI background is recommended but not required

3

Evaluation

Seven homework assignments (35%)

Two in-class written examinations (20%)

Project (30%)

– Reading research papers on a chosen topic

– Implementing and/or experimenting with algorithms related to the topic

– a written report on your findings – Possibly a class presentation

Reading assignments (15%)

Participation to class discussions (up to 2% extra credit)

(3)

What is learning?

H.Simon: Any process by which a system improves its performance

M.Minsky: Learning is making useful changes in our minds

Michalsky: Learning is constructing or modifying representations of what is being experienced

Valiant: Learning is the process of knowledge acquisition in the absence of explicit programming

5

Why study machine learning?

Easier to build a learning system than to hand-code a working program! E.g.:

– Robot that learns a map of the environment by wandering around it

– Programs that learn to play games by playing against themselves

Improving on existing programs, e.g

– Instruction scheduling and register allocation in compilers – Combinatorial optimization problems

Discover knowledge and patterns in highly dimensional,

complex data

(4)

Why study machine learning?

Solving tasks that require a system to be adaptive, e.g.

– Speech and handwriting recognition – “Intelligent” user interfaces

Understanding animal and human learning – How do we learn language?

– How do we recognize faces?

Creating real AI!

“If an expert system–brilliantly designed, engineered and

implemented–cannot learn not to repeat its mistakes, it is not as intelligent as a worm or a sea anemone or a kitten.” (Oliver Selfridge).

7

Very brief history

Studied ever since computers were invented (e.g. Samuel’s checkers player)

Coined as “machine learning” in late 70s - early 80s

Very active research field, several yearly conferences (e.g., ICML, NIPS), major journals (e.g., Machine Learning, Journal of Machine Learning Research)

The time is right to start studying in the field!

– Recent progress in algorithms and theory – Growing flood of on-line data to be analyzed – Computational power is available

– Growing demand for industrial applications

(5)

Related disciplines

Artificial intelligence

Probability theory and statistics

Computational complexity theory

Control theory

Information theory

Philosophy

Psychology and neurobiology

9

What are good machine learning tasks?

There is no human expert E.g., DNA analysis

Humans can perform the task but cannot explain how E.g., character recognition

Desired function changes frequently

E.g., predicting stock prices based on recent trading data

Each user needs a customized function

E.g., news filtering

(6)

Three niches for machine learning

Data mining

– Using historical data to improve decisions E.g., medical records medical knowledge – Finding patterns in data

E.g., finding new star clusters

Software applications we cannot program by hand E.g., autonomous driving, speech recognition

Self customizing programs

E.g., newsreader that learns user interests

11

Typical datamining task

Data:

Patient103 _time=1 Patient103 ... Patient103

time=2 time=n

Age: 23

FirstPregnancy: no Anemia: no Diabetes: no

PreviousPrematureBirth: no

...

Elective C−Section: ? Emergency C−Section: ?

Age: 23

FirstPregnancy: no Anemia: no

PreviousPrematureBirth: no Diabetes: YES

... Emergency C−Section: ? Ultrasound: abnormal Elective C−Section: no

Age: 23

FirstPregnancy: no Anemia: no

PreviousPrematureBirth: no

...

Elective C−Section: no Ultrasound: ? Diabetes: no

Emergency C−Section: Yes Ultrasound: ?

Given:

9714 patient records, each describing a pregnancy and birth

Each patient record contains 215 features

Learn to predict: classes of future patients at high risk for

Emergency Cesarean Section

(7)

Datamining result

Patient103 _time=1 Patient103 ... Patient103

time=2 time=n

Age: 23

FirstPregnancy: no Anemia: no Diabetes: no

PreviousPrematureBirth: no

...

Elective C−Section: ? Emergency C−Section: ?

Age: 23

FirstPregnancy: no Anemia: no

PreviousPrematureBirth: no Diabetes: YES

... Emergency C−Section: ? Ultrasound: abnormal Elective C−Section: no

Age: 23

FirstPregnancy: no Anemia: no

PreviousPrematureBirth: no

...

Elective C−Section: no Ultrasound: ? Diabetes: no

Emergency C−Section: Yes Ultrasound: ?

One of 18 learned rules:

If No previous vaginal delivery, and

Abnormal 2nd Trimester Ultrasound, and Malpresentation at admission

Then Probability of Emergency C-Section is 0.6 Over training data: 26/41 = .63,

Over test data: 12/20 = .60

13

Problems too difficult to program by hand

ALVINN [Pomerleau] drives 70 mph on highways

Sharp Left

Sharp Right

4 Hidden Units

30 Output Units

30x32 Sensor Input Retina Straight

Ahead

(8)

15

Software that customizes itself

Interactive software is everywhere (text editors, web browsers, spreadsheets, ...)

Most programs can be customized, by setting “preferences”

But this is a tedious, manual process, not accessible to all users

Better solution: watch what the user is doing, and try to model its interests/goals, then use the model to get better interaction

E.g., web pages that re-organize the links

(9)

What is a learning problem?

Learning = Improving with experience at some task More precisely:

Improve over task ,

with respect to performance measure

,

based on experience

. E.g. Learn to play checkers

: Play checkers

: % of games won in world tournament

: opportunity to play against self

17

Posing learning problems

Task Performance Training Definition Measure Experience

Speech recognition Robot driving

Language learning

(10)

Type of training experience

Direct or indirect?

Teacher or not?

A problem: is training experience representative of performance goal?

19

Choose the target function

??

...

(11)

Possible definition for target function

if

is a final board state that is won, then

if

is a final board state that is lost, then

if

is a final board state that is drawn, then

if

is a not a final state in the game, then

, where

is the best final board state that can be achieved starting from

and playing optimally until the end of the game.

This gives correct values, but is not operational

21

Choose representation for target function

Collection of rules?

Neural network ?

Polynomial function of board features?

...

(12)

A representation for learned function

: number of black pieces on board

: number of red pieces on

: number of black kings on

: number of red kings on

: number of red pieces threatened by black (i.e., which can be taken on black’s next turn)

: number of black pieces threatened by red

23

Obtaining training examples

One rule for estimating training values:

!#"%$

'&

(

*),+.-#-

where:

: the true target function

(

: the learned function

/0!#"1$

: the training value

(13)

Choose weight tuning rule

Gradient Descent Do repeatedly:

1. Select a training example

at random 2. Compute

:

0 ! "1$

(

3. For each board feature

^"

, update weight

^"

:

" & " - "

-

is some small constant, say 0.1, to moderate the rate of learning

25

Design choices

Determine Target Function

Determine Representation of Learned Function Determine Type of Training Experience

Determine Learning Algorithm

Games against self Games against

experts Table of correct

moves

Linear function of six features

Artificial neural network Polynomial

Gradient descent

Board

➝ value Board

➝ move

Completed Design

...

Linear programming

...

(14)

Some issues in machine learning

What algorithms can approximate functions well (and when)?

How does number of training examples influence accuracy?

How does complexity of hypothesis representation impact it?

How does noisy data influence accuracy?

What are the theoretical limits of learnability?

How can prior knowledge of learner help?

What clues can we get from biological learning systems?

How can systems alter their own representations?

27

Important application areas

Bioinformatics: sequence alignment, analyzing micro-array data,

Computer vision: object recognition, tracking, segmentation, active vision, ...

Robotics: state estimation, map building, decision making

Graphics: building realistic simulations

Speech: recognition, speaker identification

Financial analysis: option pricing, portfolio allocation

E-commerce: automated trading agents, data mining, spam, ...

Medicine: diagnosis, treatment, drug design,...

Computer games: building adaptive opponents

Multimedia: retrieval across diverse data bases

(15)

What is the future?

Today: tip of the iceberg

First-generation algorithms: neural nets, decision trees, regression ...

Applied to well-formated database

Budding industry

Opportunity for tomorrow: enormous impact

Learn across multiple databases, plus the web and news wires

Learn by active experimentation

Learn decisions rather than predictions

Cumulative, lifelong learning

Programming languages with learning embedded?

29