Machine Learning (COMP-652)
Instructor: Doina Precup Email: [email protected]
Grader: Pavel Dimitrov
Class web page:
http://www.cs.mcgill.ca/ dprecup/courses/ml.html
1
Outline
Administrative issues
What is machine learning?
Why study machine learning?
Formulating machine learning problems
Machine learning questions
Administrative issues
Class materials:
– Tom Mitchell, Machine Learning (main text)
– Additional readings: distributed in class and/or posted on the web page
– Class notes: posted on the web page
Prerequisites:
– Knowledge of a programming language (e.g. C, C++, Java, LISP, Matlab)
– Some knowledge of probabilities and statistics
– Some AI background is recommended but not required
3
Evaluation
Seven homework assignments (35%)
Two in-class written examinations (20%)
Project (30%)
– Reading research papers on a chosen topic
– Implementing and/or experimenting with algorithms related to the topic
– a written report on your findings – Possibly a class presentation
Reading assignments (15%)
Participation to class discussions (up to 2% extra credit)
What is learning?
H.Simon: Any process by which a system improves its performance
M.Minsky: Learning is making useful changes in our minds
Michalsky: Learning is constructing or modifying representations of what is being experienced
Valiant: Learning is the process of knowledge acquisition in the absence of explicit programming
5
Why study machine learning?
Easier to build a learning system than to hand-code a working program! E.g.:
– Robot that learns a map of the environment by wandering around it
– Programs that learn to play games by playing against themselves
Improving on existing programs, e.g
– Instruction scheduling and register allocation in compilers – Combinatorial optimization problems
Discover knowledge and patterns in highly dimensional,
complex data
Why study machine learning?
Solving tasks that require a system to be adaptive, e.g.
– Speech and handwriting recognition – “Intelligent” user interfaces
Understanding animal and human learning – How do we learn language?
– How do we recognize faces?
Creating real AI!
“If an expert system–brilliantly designed, engineered and
implemented–cannot learn not to repeat its mistakes, it is not as intelligent as a worm or a sea anemone or a kitten.” (Oliver Selfridge).
7
Very brief history
Studied ever since computers were invented (e.g. Samuel’s checkers player)
Coined as “machine learning” in late 70s - early 80s
Very active research field, several yearly conferences (e.g., ICML, NIPS), major journals (e.g., Machine Learning, Journal of Machine Learning Research)
The time is right to start studying in the field!
– Recent progress in algorithms and theory – Growing flood of on-line data to be analyzed – Computational power is available
– Growing demand for industrial applications
Related disciplines
Artificial intelligence
Probability theory and statistics
Computational complexity theory
Control theory
Information theory
Philosophy
Psychology and neurobiology
9
What are good machine learning tasks?
There is no human expert E.g., DNA analysis
Humans can perform the task but cannot explain how E.g., character recognition
Desired function changes frequently
E.g., predicting stock prices based on recent trading data
Each user needs a customized function
E.g., news filtering
Three niches for machine learning
Data mining
– Using historical data to improve decisions E.g., medical records medical knowledge – Finding patterns in data
E.g., finding new star clusters
Software applications we cannot program by hand E.g., autonomous driving, speech recognition
Self customizing programs
E.g., newsreader that learns user interests
11
Typical datamining task
Data:
Patient103 time=1 Patient103 ... Patient103
time=2 time=n
Age: 23
FirstPregnancy: no Anemia: no Diabetes: no
PreviousPrematureBirth: no
...
Elective C−Section: ? Emergency C−Section: ?
Age: 23
FirstPregnancy: no Anemia: no
PreviousPrematureBirth: no Diabetes: YES
... Emergency C−Section: ? Ultrasound: abnormal Elective C−Section: no
Age: 23
FirstPregnancy: no Anemia: no
PreviousPrematureBirth: no
...
Elective C−Section: no Ultrasound: ? Diabetes: no
Emergency C−Section: Yes Ultrasound: ?
Given:
9714 patient records, each describing a pregnancy and birth
Each patient record contains 215 features
Learn to predict: classes of future patients at high risk for
Emergency Cesarean Section
Datamining result
Patient103 time=1 Patient103 ... Patient103
time=2 time=n
Age: 23
FirstPregnancy: no Anemia: no Diabetes: no
PreviousPrematureBirth: no
...
Elective C−Section: ? Emergency C−Section: ?
Age: 23
FirstPregnancy: no Anemia: no
PreviousPrematureBirth: no Diabetes: YES
... Emergency C−Section: ? Ultrasound: abnormal Elective C−Section: no
Age: 23
FirstPregnancy: no Anemia: no
PreviousPrematureBirth: no
...
Elective C−Section: no Ultrasound: ? Diabetes: no
Emergency C−Section: Yes Ultrasound: ?
One of 18 learned rules:
If No previous vaginal delivery, and
Abnormal 2nd Trimester Ultrasound, and Malpresentation at admission
Then Probability of Emergency C-Section is 0.6 Over training data: 26/41 = .63,
Over test data: 12/20 = .60
13
Problems too difficult to program by hand
ALVINN [Pomerleau] drives 70 mph on highways
Sharp Left
Sharp Right
4 Hidden Units
30 Output Units
30x32 Sensor Input Retina Straight
Ahead
15
Software that customizes itself
Interactive software is everywhere (text editors, web browsers, spreadsheets, ...)
Most programs can be customized, by setting “preferences”
But this is a tedious, manual process, not accessible to all users
Better solution: watch what the user is doing, and try to model its interests/goals, then use the model to get better interaction
E.g., web pages that re-organize the links
What is a learning problem?
Learning = Improving with experience at some task More precisely:
Improve over task ,
with respect to performance measure
,
based on experience
. E.g. Learn to play checkers
: Play checkers
: % of games won in world tournament
: opportunity to play against self
17
Posing learning problems
Task Performance Training Definition Measure Experience
Speech recognition Robot driving
Language learning
Type of training experience
Direct or indirect?
Teacher or not?
A problem: is training experience representative of performance goal?
19
Choose the target function
??
??
...
Possible definition for target function
if
is a final board state that is won, then
if
is a final board state that is lost, then
if
is a final board state that is drawn, then
if
is a not a final state in the game, then
, where
is the best final board state that can be achieved starting from
and playing optimally until the end of the game.
This gives correct values, but is not operational
21
Choose representation for target function
Collection of rules?
Neural network ?
Polynomial function of board features?
...
A representation for learned function
: number of black pieces on board
: number of red pieces on
: number of black kings on
: number of red kings on
: number of red pieces threatened by black (i.e., which can be taken on black’s next turn)
: number of black pieces threatened by red
23
Obtaining training examples
One rule for estimating training values:
!#"%$
'&
(
*),+.-#-
where:
: the true target function
(
: the learned function
/0!#"1$
: the training value
Choose weight tuning rule
Gradient Descent Do repeatedly:
1. Select a training example
at random 2. Compute
:
0 ! "1$
(
3. For each board feature
", update weight
":
" & " - "
-
is some small constant, say 0.1, to moderate the rate of learning
25
Design choices
Determine Target Function
Determine Representation of Learned Function Determine Type of Training Experience
Determine Learning Algorithm
Games against self Games against
experts Table of correct
moves
Linear function of six features
Artificial neural network Polynomial
Gradient descent
Board
➝ value Board
➝ move
Completed Design
...
...
Linear programming
...
...
Some issues in machine learning
What algorithms can approximate functions well (and when)?
How does number of training examples influence accuracy?
How does complexity of hypothesis representation impact it?
How does noisy data influence accuracy?
What are the theoretical limits of learnability?
How can prior knowledge of learner help?
What clues can we get from biological learning systems?
How can systems alter their own representations?
27
Important application areas
Bioinformatics: sequence alignment, analyzing micro-array data,
Computer vision: object recognition, tracking, segmentation, active vision, ...
Robotics: state estimation, map building, decision making
Graphics: building realistic simulations
Speech: recognition, speaker identification
Financial analysis: option pricing, portfolio allocation
E-commerce: automated trading agents, data mining, spam, ...
Medicine: diagnosis, treatment, drug design,...
Computer games: building adaptive opponents
Multimedia: retrieval across diverse data bases
What is the future?
Today: tip of the iceberg
First-generation algorithms: neural nets, decision trees, regression ...
Applied to well-formated database
Budding industry
Opportunity for tomorrow: enormous impact
Learn across multiple databases, plus the web and news wires
Learn by active experimentation
Learn decisions rather than predictions
Cumulative, lifelong learning