Introduction and Overview
Introduction to Computer Vision I
CSE 152A
• We’ll begin with some introductory
material …
• … and end with
– Syllabus
– Organizational materials
– Wait list
What is computer vision?
Add camera as input device to computer.
Computer Vision
• An interdisciplinary field that deals with how
computers can be made to gain high-level
understanding from digital images or videos
• Other common definitions:
– In computer vision, we are trying to...describe the
world that we see in one or more images and to
reconstruct its properties (Szeliski)
– Computing properties of the 3-D world from one or
more digital images (Trucco and Verri)
– The construction of explicit, meaningful description of
physical objects from images (Ballard and Brown)
– Extracting descriptions of the world from pictures or
sequences of pictures (Forsyth and Ponce)
– To make useful decisions about real physical objects
and scenes based on sensed images (Stockman and
Shapiro)
Computer Vision
• An interdisciplinary field that deals with how
computers can be made to gain high-level
understanding from digital images or videos
• Engineering perspective
– Computer vision seeks to automate tasks that the
human visual system can do
Why is this hard?
What is in this image?
1. A hand holding a man?
2. A hand holding a mirrored sphere?
3. An Escher drawing?
4. A 1935 self portrait of Escher
• Interpretations are ambiguous
• The forward problem (graphics) is well-posed
• The “inverse problem” (vision) is not
Underestimates
• “640K ought to be enough for
anybody.”
– Bill Gates, 1981
• “... in three to eight years we will
have a machine with the general
intelligence of an average human
being ... The machine will begin to
educate itself with fantastic speed. In
a few months it will be at genius
level and a few months after that its
powers will be incalculable ...”
Should Computer Vision follow from our
understanding of Human Vision?
Yes
&
No
1.
Who would ever be crazy enough to even try creating machine vision?
2.
Human vision “works”, and copying is easier than creating.
3.
Secondary benefit – in trying to mimic human vision, we learn about it.
1.
Why limit oneself to human vision when there is even greater diversity in
biological vision
2.
Why limit oneself to biological vision when there may be greater diversity in
sensing mechanism?
3.
Biological vision systems evolved to provide functions for “specific” tasks
and “specific” environments. These may differ for machine systems
4.
Implementation – hardware is different, and synthetic vision systems may use
different techniques/methodologies that are more appropriate to computational
Hermann Grid
Scan your eyes over the figure. Do you see the gray spots at the
intersections? Stare at one of them and it will disappear.
How many red X’s are there?
Type it in the chat when you
How many red X’s are there?
Type it in the chat when you
The Near Future: Ubiquitous Vision
• Digital video has become very
inexpensive.
• It’s widely embedded in cell
phones, cars, games, etc.
• 99.9% of digitized video isn’t
seen by a person.
• That doesn’t mean that only
0.1% is important!
• And there’s an enormous
amount of image and video
content on the internet…
Applications: touching your life
• Optical Character
Recognition
• Football
• Movies
• Surveillance
• HCI – hand gestures
• Aids to the blind
• Face recognition &
biometrics
• Road monitoring
• Industrial inspection
• Virtual Earth; street view
• Robotic control
• Autonomous driving
• Space: planetary
exploration, docking
• Medicine – pathology,
surgery, diagnosis
• Microscopy
• Military
• Remote Sensing
• Digital photography
• Google Goggles
• Video games
Vision to explore the world
Image from Microsoft Virtual Earth
(see also: Google Earth)
Vision to explore other worlds
• Panorama stitching
• 3D terrain modeling
• Obstacle detection
• Position tracking
• For more, read “Computer Vision on Mars“ by Matthies et al.
Vision to recognize objects
– Point & Find, Nokia
– SnapTell.com (now Amazon)
– Google Photos
– Apple Photos
Object recognition (in supermarkets)
LaneHawk by EvolutionRobotics (now part of iRobot)
“A smart camera is flush-mounted in the checkout lane, continuously watching
for items. When an item is detected and recognized, the cashier verifies the
quantity of items that were found under the basket, and continues to close the
transaction. The item can remain under the basket, and with LaneHawk, you are
assured to get paid for it… “
Amazon Go
1. Turn-style entry. Consumer
scans in with Amazon App on
smartphone
2. Consumer goes around the
store, picks up items, adds to
bag, shops like normal
Vision to look at people
• Digital cameras, phones, Facebook, Google
Photos, Snapchat etc.
Face Detection
Vision-based biometrics
“How the Afghan Girl was Identified by Her Iris Patterns” Read the
story
1984
Age 12
2002
Age 30
Login without a password…
Fingerprint scanners on
With great power comes great
The Matrix movies, ESC Entertainment, XYZRGB, NRC
Vision for entertainment:
Vision for entertainment:
motion capture
Facial
motion
capture
Vision for entertainment
• Football first down line
Vision for augmented reality
• AR Toolkit
• Blippar
• Magic Leap
Vision for augmented reality
• Text detection, localization, and translation,
then render with similar font
Vision for augmented reality
Vision for smart cars
• Mobileye
Vision-based interaction (and games)
Nintendo Wii has
camera-based IR
tracking built in.
Digimask
: put your face on a 3D avatar.
Xbox
Kinect
Vision from your perspective
Vision for medicine
Image guided surgery
Grimson et al., MIT
3D imaging
MRI, CT
Vision for Science
Molecular Reconstruction from
Vision to bridge communication
between physical and digital worlds:
Optical Character Recogntion (OCR)
Digit recognition, AT&T labs
http://www.research.att.com/~yann/
Or more recent, see blog post about
Dropbox OCR
License plate readers
http://en.wikipedia.org/wiki/Automatic_number_plate_recognition