Machine Learning
Today’s Class
•
People Involved
•
Is Machine Learning affecting us?
•
What is Machine Learning?
•
History
•
Example Systems that use Machine Learning
•
Related areas
•
What should you learn?
•
Course Outline
People Involved
•
Instructor
• Saket Anand
• Office Hours: Mondays, 1PM-2PM, Venue: A-510
• Email: [email protected]; Phone: +91 11 2690 7425; Campus Ext: 425 • Course Page: TBD
• Discussions/Submissions: Google Classroom
•
Teaching Assistants
Is Machine Learning Really Affecting Us?
•
Gmail and Google Search
• Smart Reply, Personalized Ads
•
Social Media Feeds
• Facebook, LinkedIn, Instagram
•
Music and Media Streaming
• Netflix, Youtube, Spotify
•
Maps, Navigation and Travel
• Google, Uber, Ola
•
Banking and Finance
• IVR, Fraud Protection
•
Online Shopping
Autonomous Driving
Natural and Spoken Language Processing
Conversational Agents and
Translators Optical Character Recognition and Translation
Apple
Siri AssistantGoogle Microsoft Cortana
Image Analysis and Computer Vision
iPhone Xs vs. Pixel 3’s Night Sight Mode
•
Google’s panorama generation
thought a skier was a mountain!
Pitfalls and Perils of ML
•
Chinese billionaire’s face
identified as jaywalker
• Surveillance system
Pitfalls and Perils of ML
•
Claim: Machines have a better
“gaydar”
• If true: privacy breach
• If false: reinforces prejudice
•
Uber hits and kills a pedestrian in
Tempe, Arizona, US
•
“DeepFakes” used to generate fake
porn using Hollywood celebrity
facial images
1Source: The Guardian
Efforts towards AI for Social Good
•
Microsoft AI for Earth
• Climate Change, Agriculture,
Biodiversity and Water
•
Google AI for Social Good
• Flood prediction
• Cardiac risk prediction
• Mapping global fishing activity
•
Wadhwani Institute for AI
• Focus on Societal problems in
India: Health, Agriculture, Education, Financial Inclusion
•
Center for AI in Society, USC
• Focus on public health, social
Source: Rolnick et al., “Tackling Climate Change with Machine Learning”, ArXiv, 10thJun. 2019 - 22 authors from 16 organizations
Vision for Wildlife
from conservation to conflict management
Individual Identification in Camera Trap Images
~31,000 images of ~1650 individual tigers (`13-`14): Over 70% tigers out of estimated population
•
Intelligent Wildlife Monitoring
• Goal: automatic indexing of image
datasets by species and individuals
• Applications: camera-trap based
population monitoring; crowd-sourced reporting of human- wildlife conflict
Saket Anand Vision / Learning
Prof. Y. V. Jhala
IIITD’s Autonomous e-Rick
Vision and Range sensors
Drive-by-Wire setup
Alexander Fell Embedded Systems
Now at SIT, Singapore
Saket Anand Vision / Learning
P. B. Sujit Robotics & Control
What is Machine Learning?
•
It is an important sub-area of AI that seeks to answer the following
question:
How can we build computer systems that automatically improve with experience?
•
More Formally:
Machine Learning Tasks (T)
•
Classification or Pattern Recognition
•
Regression or Prediction
•
Density Estimation
•
Clustering
•
Synthesis or Sampling
•
Ranking
•
Recommendation Systems
Performance (P)
•
A quantitative measure to evaluate performance
• Usually Task specific
•
Classification
• Accuracy or Error Rate
• Weighted versions of accuracy (e.g., in unbalanced data)
•
Regression
• Error measure such as ‘mean squared error’
• Application dependent – many medium errors or few large errors?
•
Density Estimation
Experience (E)
•
Supervised Learning
• Labelled data – (Data, target value)
• Target value could be category/class labels, real value, real vector, etc.
• Classification, Regression
•
Unsupervised Learning
• Only data, no labels
• Density Estimation, Clustering
•
Semi-supervised Learning
• Some labelled data and lots of unlabelled data • Multiple-Instance Learning
•
Reinforcement Learning
Example Systems that use ML
•
Google Search
•
Google Car
•
Amazon’s recommendation system
•
Adobe’s Optical Character Recognition (OCR)
•
Facebook’s face tagging, news feed
•
Apple’s Siri, Microsoft’s Cortana, Amazon’s Echo (Speech Recognition)
•
Microsoft Kinect + Xbox
•
Autopilots in aircrafts
An Incomplete History of Learning
• Turing Test (1950)
• Machines do very poorly
• Rosenblatt’s Perceptron (1960’s)
• Kickstarted the mathematical analysis of
the learning process
• Key idea behind Support Vector Machines
(SVMs) and Neural Networks
• Construction of Fundamentals of
Learning Theory (1960-70’s)
• Focus on generalization capability of
learning machines
• Performance on unseen data
• Regularization for ill-posed problems
• e.g., linear equations for ill-conditioned
matrices
• Neural Networks (1980’s)
• Connectionism
• Back-propagation [LeCun, `86] • CNNs, RNNs
• SVMs (1990’s)
• Margin Maximization
• Kernel Methods to handle non-linearity
• Deep Learning (>2006)
• Hinton, Bengio, LeCun at forefront • Abstract Representations
Related Areas
•
Statistical Modelling and Inference
• Regression, Hypothesis Testing, Significance Testing
•
Signal Processing (Data = Signal)
• Detection and Estimation
• Hypothesis Testing (Classification/Detection) • Estimation (Regression – Error Minimization)
• Representation of signals –
• Fourier, Wavelets, etc.
• Compressed Sensing and Sparse representations.
•
Optimization
What should you learn?
•
Modelling a learning problem
•
Various algorithms (techniques) for solving ML problems
•
Pitfalls while designing ML systems
• Modelling, Generalization, Regularization & Model Selection, (hyper)-Parameter tuning,
Overfitting, Underfitting
•
Importance of Domain Knowledge
• Not treating ML techniques as a black box
• Simplify the learning problem by using domain knowledge
•
Engineering Tricks
• Debugging ML systems
•
Tools
Course Outline
•
Topics:
• Empirical Risk Minimization
Framework
• Linear Models for Regression and
Classification
• Linear & logistic regression
• ML in Practice
• Training, Validation and Testing • Underfitting/Overfitting
• Hyperparameter search • Cross-validation
• Decision Trees and Random
Decision Forests
• Support Vector Machines
• Primal and Dual Formulations • Kernel Methods
• Neural Networks
• Loss and Optimization
• Convolutional Neural Networks • Generative Models
• Unsupervised Learning
• Clustering – k-means, Spectral (if
time permits)
Prerequisites
•
Required
• Linear Algebra
• Probability and Statistics
• Advanced Calculus (mainly, vector differentiation) • Introduction to Programming (Python)
• In reality you would need much more than an introduction
•
Desired
• Optimization
Learning Outcomes
•
Explain the different types of learning problems along with some
techniques to solve them
•
Model real-world problems, apply different learning techniques and
quantitatively evaluate the performance
•
Identify and use advanced techniques with the help of existing
machine learning tools and libraries
•
Analyze performance of ML techniques and comment on their
Administrivia
•
Course Webpage: TBA
• Access through my page as well
•
Google Classroom Page (HW Submission/ Discussion): TBA
•
Reading Material
• Textbook & Lecture Notes (mostly from Andrew Ng’s Stanford Course) and papers
posted on website and classroom
•
Textbook:
• Understanding Machine Learning: From Theory to Algorithms, Shai Shalev-Shwartz
and Shai Ben-David, Cambridge University Press, 2014
•
References* :
• Machine Learning, Tom M. Mitchell, McGraw Hill, 1997
• Pattern Recognition and Machine Learning, Christopher M. Bishop, Springer, 2006 • Pattern Classification, Richard Duda, Peter Hart and David Stork, 2nd ed., Wiley, 2006
Administrivia
•
Programming platform: Python
•
Grading Scheme:
• HW (4 Theory+Programming)– 30%
• Quiz (best 3/4) – 10% (No re-tests whatsoever) • Project & demo– 25%
• You can choose topics suggested by us OR pick your own
• Mid-sem – 15% • Final – 20%
•
Absolute Grading:
• ample opportunity to make through extra credit questions!
Grade A A- B B- C C- D F
Plagiarism Policy
•
IIIT-Delhi policy:
https://www.iiitd.ac.in/education/resources/academic-dishonesty
•
Zero Tolerance
• Updated Institute policy will apply
• All plagiarism cases will be on record for your tenure at IIIT-Delhi
•
All code and reports will be checked for plagiarism
•
HW theory questions may be asked in exam or quiz as is
• If correct in HW and incorrect in exam, HW question will be marked zero