• No results found

Machine Learning (COMP-652)

N/A
N/A
Protected

Academic year: 2021

Share "Machine Learning (COMP-652)"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Machine Learning (COMP-652)

Instructor: Doina Precup Email: [email protected]

Grader: Pavel Dimitrov

Class web page:

http://www.cs.mcgill.ca/ dprecup/courses/ml.html

1

Outline



Administrative issues



What is machine learning?



Why study machine learning?



Formulating machine learning problems



Machine learning questions

(2)

Administrative issues



Class materials:

– Tom Mitchell, Machine Learning (main text)

– Additional readings: distributed in class and/or posted on the web page

– Class notes: posted on the web page



Prerequisites:

– Knowledge of a programming language (e.g. C, C++, Java, LISP, Matlab)

– Some knowledge of probabilities and statistics

– Some AI background is recommended but not required

3

Evaluation



Seven homework assignments (35%)



Two in-class written examinations (20%)



Project (30%)

– Reading research papers on a chosen topic

– Implementing and/or experimenting with algorithms related to the topic

– a written report on your findings – Possibly a class presentation



Reading assignments (15%)



Participation to class discussions (up to 2% extra credit)

(3)

What is learning?



H.Simon: Any process by which a system improves its performance



M.Minsky: Learning is making useful changes in our minds



Michalsky: Learning is constructing or modifying representations of what is being experienced



Valiant: Learning is the process of knowledge acquisition in the absence of explicit programming

5

Why study machine learning?



Easier to build a learning system than to hand-code a working program! E.g.:

– Robot that learns a map of the environment by wandering around it

– Programs that learn to play games by playing against themselves



Improving on existing programs, e.g

– Instruction scheduling and register allocation in compilers – Combinatorial optimization problems



Discover knowledge and patterns in highly dimensional,

complex data

(4)

Why study machine learning?



Solving tasks that require a system to be adaptive, e.g.

– Speech and handwriting recognition – “Intelligent” user interfaces



Understanding animal and human learning – How do we learn language?

– How do we recognize faces?



Creating real AI!

“If an expert system–brilliantly designed, engineered and

implemented–cannot learn not to repeat its mistakes, it is not as intelligent as a worm or a sea anemone or a kitten.” (Oliver Selfridge).

7

Very brief history



Studied ever since computers were invented (e.g. Samuel’s checkers player)



Coined as “machine learning” in late 70s - early 80s



Very active research field, several yearly conferences (e.g., ICML, NIPS), major journals (e.g., Machine Learning, Journal of Machine Learning Research)



The time is right to start studying in the field!

– Recent progress in algorithms and theory – Growing flood of on-line data to be analyzed – Computational power is available

– Growing demand for industrial applications

(5)

Related disciplines



Artificial intelligence



Probability theory and statistics



Computational complexity theory



Control theory



Information theory



Philosophy



Psychology and neurobiology

9

What are good machine learning tasks?



There is no human expert E.g., DNA analysis



Humans can perform the task but cannot explain how E.g., character recognition



Desired function changes frequently

E.g., predicting stock prices based on recent trading data



Each user needs a customized function

E.g., news filtering

(6)

Three niches for machine learning



Data mining

– Using historical data to improve decisions E.g., medical records medical knowledge – Finding patterns in data

E.g., finding new star clusters



Software applications we cannot program by hand E.g., autonomous driving, speech recognition



Self customizing programs

E.g., newsreader that learns user interests

11

Typical datamining task

Data:

Patient103 time=1 Patient103 ... Patient103

time=2 time=n

Age: 23

FirstPregnancy: no Anemia: no Diabetes: no

PreviousPrematureBirth: no

...

Elective C−Section: ? Emergency C−Section: ?

Age: 23

FirstPregnancy: no Anemia: no

PreviousPrematureBirth: no Diabetes: YES

... Emergency C−Section: ? Ultrasound: abnormal Elective C−Section: no

Age: 23

FirstPregnancy: no Anemia: no

PreviousPrematureBirth: no

...

Elective C−Section: no Ultrasound: ? Diabetes: no

Emergency C−Section: Yes Ultrasound: ?

Given:



9714 patient records, each describing a pregnancy and birth



Each patient record contains 215 features

Learn to predict: classes of future patients at high risk for

Emergency Cesarean Section

(7)

Datamining result

Patient103 time=1 Patient103 ... Patient103

time=2 time=n

Age: 23

FirstPregnancy: no Anemia: no Diabetes: no

PreviousPrematureBirth: no

...

Elective C−Section: ? Emergency C−Section: ?

Age: 23

FirstPregnancy: no Anemia: no

PreviousPrematureBirth: no Diabetes: YES

... Emergency C−Section: ? Ultrasound: abnormal Elective C−Section: no

Age: 23

FirstPregnancy: no Anemia: no

PreviousPrematureBirth: no

...

Elective C−Section: no Ultrasound: ? Diabetes: no

Emergency C−Section: Yes Ultrasound: ?

One of 18 learned rules:

If No previous vaginal delivery, and

Abnormal 2nd Trimester Ultrasound, and Malpresentation at admission

Then Probability of Emergency C-Section is 0.6 Over training data: 26/41 = .63,

Over test data: 12/20 = .60

13

Problems too difficult to program by hand

ALVINN [Pomerleau] drives 70 mph on highways

Sharp Left

Sharp Right

4 Hidden Units

30 Output Units

30x32 Sensor Input Retina Straight

Ahead

(8)

15

Software that customizes itself



Interactive software is everywhere (text editors, web browsers, spreadsheets, ...)



Most programs can be customized, by setting “preferences”



But this is a tedious, manual process, not accessible to all users



Better solution: watch what the user is doing, and try to model its interests/goals, then use the model to get better interaction



E.g., web pages that re-organize the links

(9)

What is a learning problem?

Learning = Improving with experience at some task More precisely:



Improve over task ,



with respect to performance measure



,



based on experience



. E.g. Learn to play checkers



: Play checkers

 

: % of games won in world tournament

 

: opportunity to play against self

17

Posing learning problems

Task Performance Training Definition Measure Experience

Speech recognition Robot driving

Language learning

(10)

Type of training experience



Direct or indirect?



Teacher or not?

A problem: is training experience representative of performance goal?

19

Choose the target function



     

??

 





??



...

(11)

Possible definition for target function



if



is a final board state that is won, then

  



if



is a final board state that is lost, then

    



if



is a final board state that is drawn, then

  



if



is a not a final state in the game, then

     

, where



is the best final board state that can be achieved starting from



and playing optimally until the end of the game.

This gives correct values, but is not operational

21

Choose representation for target function



Collection of rules?



Neural network ?



Polynomial function of board features?



...

(12)

A representation for learned function

 







 

 













  

 

 





 



  









 

: number of black pieces on board



  



: number of red pieces on











: number of black kings on



   



: number of red kings on











: number of red pieces threatened by black (i.e., which can be taken on black’s next turn)

   



: number of black pieces threatened by red

23

Obtaining training examples

One rule for estimating training values:

 !#"%$ 

'&

(

*),+.-#-

    



 

where:

 



: the true target function

 (





: the learned function

 /0!#"1$ 



: the training value

(13)

Choose weight tuning rule

Gradient Descent Do repeatedly:

1. Select a training example



at random 2. Compute

 





:

  



  

0 ! "1$



   (





3. For each board feature

"

, update weight

"

:

" & "  -  "    



 

-

is some small constant, say 0.1, to moderate the rate of learning

25

Design choices

Determine Target Function

Determine Representation of Learned Function Determine Type of Training Experience

Determine Learning Algorithm

Games against self Games against

experts Table of correct

moves

Linear function of six features

Artificial neural network Polynomial

Gradient descent

Board

➝ value Board

➝ move

Completed Design

...

...

Linear programming

...

...

(14)

Some issues in machine learning



What algorithms can approximate functions well (and when)?



How does number of training examples influence accuracy?



How does complexity of hypothesis representation impact it?



How does noisy data influence accuracy?



What are the theoretical limits of learnability?



How can prior knowledge of learner help?



What clues can we get from biological learning systems?



How can systems alter their own representations?

27

Important application areas



Bioinformatics: sequence alignment, analyzing micro-array data,



Computer vision: object recognition, tracking, segmentation, active vision, ...



Robotics: state estimation, map building, decision making



Graphics: building realistic simulations



Speech: recognition, speaker identification



Financial analysis: option pricing, portfolio allocation



E-commerce: automated trading agents, data mining, spam, ...



Medicine: diagnosis, treatment, drug design,...



Computer games: building adaptive opponents



Multimedia: retrieval across diverse data bases

(15)

What is the future?

Today: tip of the iceberg



First-generation algorithms: neural nets, decision trees, regression ...



Applied to well-formated database



Budding industry

Opportunity for tomorrow: enormous impact



Learn across multiple databases, plus the web and news wires



Learn by active experimentation



Learn decisions rather than predictions



Cumulative, lifelong learning



Programming languages with learning embedded?

29

References

Related documents

Three optimisations steps are performed for each user type using separate multiple-criteria deci- sion models: (1) McdMD – MDM selection model, (2) McdD – dimensions optimisation

Presenter, Domestic Violence and State Intervention in the American West and Australia, 1860- 1930, “Children and Families in Criminal Law” panel, Law and Society Association Annual

Carbotech makes a vital contribution to the development of sustainable energy systems with its operationally reliable and environmentally responsible plants for the upgrading and

Based on a primary survey of both urban and rural handloom weaver clusters in Ethiopia, one of the country’s most important rural nonfarm sectors, this paper examines the

O presente artigo apresenta uma experiência de trabalho com parteiras tradicionais realizada no estado do Tocantins, Brasil, entre 2010 e 2014, no âmbito do Projeto Diagnóstico

The details you have provided will be used by Millennium Insurance Company Limited to process your request in accordance with the Data Protection Acts and other applicable laws..

Nanohybrids of Mg/Al layered double hydroxide and long-chain (C18) unsaturated fatty acid anions: structure and sorptive properties.. Rafael Celis a,

We can envisage a mechanism for which …rms growing at a lower than mean rate (hence lying on the left side) are more responsive to macroeconomic ‡uctuations and in expansion