Elements of Visual Perception

(1)

Resmi N.G. Resmi N.G. Reference: Digital Image Processing Rafael C. Gonzalez Richard E. Woods

(2)

(3)

Human Eye- nearly a sphere; diameter 20mm approx. Three membranes enclose it:

Cornea and sclera – the outer cover Choroid

Retina

Cornea – tough transparent tissue that covers anterior Cornea – tough transparent tissue that covers anterior

surface of the eye.

Sclera – opaque membrane enclosing remainder of the

optical globe.

Choroid – directly below the sclera

(4)

Choroid coat is heavily pigmented - helps to reduce

the extraneous light entering the eye and backscatter within the optical globe.

Choroid is divided into ciliary body and iris diaphragm at the anterior extreme.

The diaphragm contracts and expands to control the The diaphragm contracts and expands to control the

amount of light that enters the eye.

Iris is the central opening of the eye. Diameter varies from 2-8mm.

Front of the iris – contains the visible pigment of the eye.

(5)

Lens – made up of concentric layers of fibrous cells

and is suspended by fibers that attach to the ciliary body.

It contains 60-70% water, 6% fat and large amount of protein.

It is colored by slightly yellow pigmentation. It is colored by slightly yellow pigmentation.

Excessive clouding of lens leads to cataract resulting in poor color discrimination and loss of clear vision.

It absorbs 8% of visible light spectrum; higher absorption occurs at shorter wavelengths.

(6)

Retina –

When eye is properly focused, light from an object outside the eye is imaged at the retina.

Pattern vision – by distribution of light receptors over the surface of retina.

the surface of retina.

Two classes of receptors:

Cones Rods

(7)

Cones – 6-7 million cones in each eye at the central

portion of retina called the fovea. - highly sensitive to color

- each cone is connected to its own nerve end - can resolve high details

- muscles controlling the eye rotate the eyeball until - muscles controlling the eye rotate the eyeball until

the image of object falls on fovea.

(8)

Rods – 75-150 million rods over the retinal surface.

-larger area of distribution

Several rods are connected to a single nerve. - reduces the amount of detail

-gives overall picture of filed of view -not involved in color vision

-not involved in color vision

- sensitive to low levels of illumination

Rod vision is dim light (or scotopic) vision Blind spot – Area without receptors.

(9)

Image formation in the eye

Principal difference between lens of eye and ordinary optical lens is that lens of eye is more flexible.

The radius of curvature of anterior surface of lens is The radius of curvature of anterior surface of lens is

greater than radius of its posterior surface.

The shape of the lens is controlled by tension in fibres of the ciliary body.

(10)

To focus on farther objects, the controlling muscles cause lens to be relatively flattened.

To focus on nearby objects, these muscles allow the lens to become thicker.

(11)

Focal length- distance between center of lens and retina(varies from 17mm-14mm).

15/100 = h/17 Or, h=2.55mm

(12)

Retinal image is reflected primarily in the area of fovea.

Perception then takes place by relative excitation of light perceptors(transforms radiant energy into electric impulses that are ultimately decoded by the electric impulses that are ultimately decoded by the brain).

(13)

Brightness Adaptation and

Discrimination

Digital images are displayed as discrete set of intensities. So, ability of eye to discriminate between different intensity levels is important.

Subjective brightness (intensity as perceived by human visual system) is a logarithmic function of light intensity incident on the eye.

(14)

Visual system can adapt to large range of intensities by changing its overall sensitivity. This property is called brightness adaptation.

Total range of distinct intensity levels it can discriminate simultaneously is small.

The current sensitivity level of visual system for any given set of conditions is called brightness adaptation level.

(15)

(16)

Brightness Discrimination

Experiment to determine ability of human visual system for brightness discrimination:

An opaque glass is illuminated from behind using a light source of intensity I.

light source of intensity I.

(17)

The ratio of the increment threshold to the background intensity, ∆I_c/I, is called the Weber ratio. When ∆I_c/I is small, small % change in intensity is

discriminable, and hence there is good brightness discrimination.

When ∆Ic/I is large, large % change in intensity is

required and hence there is poor brightness discrimination (at low levels of illumination).

When in a noisy environment you must shout to be heard while a whisper works in a quiet room.

(18)

(19)

The brightness discrimination increases as background illumination increases.

As the eye roams about the image, a different set of incremental changes are detected at each new incremental changes are detected at each new adaptation level.

The eye is thus capable of a much broader range of overall intensity discrimination.

(20)

Perceived brightness is not a simple function of intensity. Mach Bands:

(21)

(22)

(23)

Light and Electromagnetic

Spectrum

(24)

Wavelength, and frequency are related by the expression

where c is the speed of light.

Energy of various components of electromagnetic

c

λ =

ν

Energy of various components of electromagnetic spectrum is given by

where h is the Planck’s constant.

(25)

Electromagnetic wave is a stream of massless particles, each traveling in a wavelike pattern and at the speed of light.

Each massless particle contains a certain amount of energy.

energy.

Each bundle of energy is called a photon.

Light is a particular type of electromagnetic radiation that can be seen and sensed by human eye.

(26)

Visible band – from violet to red (chromatic light).

The colors that we perceive in an object are determined by the nature of the light reflected from the object.

Three basic quantities describe the quality of Three basic quantities describe the quality of

chromatic light source:

Radiance Luminance Brightness

(27)

Radiance – total amount of energy that flows from the light source (measured in Watts).

Luminance – gives a measure of amount of energy an observer perceives from a light source (measured in Lumens).

Lumens).

Brightness – intensity as perceived by human visual system.

(28)

l Luminance is the amount of visible light that comes to

the eye from a surface.

l Illuminance is the amount of light incident on a surface. l Reflectance is the proportion of incident light that is

reflected from a surface. reflected from a surface.

l Lightness is the perceived reflectance of a surface.

l Brightness is the perceived intensity of light coming from

the image itself, and is also defined as perceived luminance.

(29)

Achromatic or monochromatic light – light that is void of color, its only attribute being the intensity (ranges from black to grays to white).

(30)

Image Sensing and Acquisition

Images are generated by combination of an illumination source and reflection or absorption of energy from that source by objects to be imaged.

Three principal sensor arrangements to transform illumination energy into digital images:

Single imaging sensor Line sensor

(31)

(32)

Image Acquisition Using Single

Sensor

Incoming energy is converted to a voltage by combination of input electric power and sensor material responsive to the type of energy being detected.

detected.

Response of the sensor is the output voltage waveform which has to be digitized.

(33)

eg; photodiode

Constructed of silicon materials

(34)

To generate 2D image using single sensor, there must be relative displacements in both x and y directions between sensor and the area to be imaged.

(35)

Another example of imaging with single sensor

Place a laser source coincident with the sensor

Moving mirrors are used to control the outgoing beam Moving mirrors are used to control the outgoing beam

in a scanning pattern and to direct the laser signal onto the sensor.

(36)

Image Acquisition Using Sensor

Strips

Sensor strip has an in-line arrangement of sensors.

The sensor strip provides imaging elements in one direction.

direction.

Motion perpendicular to the strip provides imaging in the other direction, thereby completing the 2D image.

(37)

(38)

Image Acquisition Using Sensor

Arrays

(39)

Since, the sensor array is 2D, a complete image can be obtained by focusing the energy onto the surface of the array.

Imaging system collects the incoming energy from an illumination source and focuses it onto an image illumination source and focuses it onto an image plane.

The front end of the imaging system is a lens (if illumination is light), which projects the viewed scene onto the lens focal plane.

(40)

The sensor array coincident with the focal plane produces output proportional to the intensity of light received at each sensor.

This output is then digitized by another section of the This output is then digitized by another section of the

(41)

A Simple Image Formation Model

Images are denoted using two-dimensional functions of the form f(x,y).

The value of f is a positive scalar quantity. The value of f is a positive scalar quantity.

When an image is generated from a physical process, its values are proportional to energy radiated by a physical source. Hence, f(x,y) must be nonzero and finite.

(42)

The function f(x,y) is characterized by two components:

Illumination component : The amount of source illumination incident on the scene being viewed.

It is denoted by i(x,y).

Reflectance component: The amount of Reflectance component: The amount of

illumination reflected by the objects in the scene. It is denoted by r(x,y).

f(x,y) is expressed as a product of these two components.

(43)

where 0 < i(x,y) < ∞

and 0 < r(x,y) < 1

(total absorption) (total reflectance)

The nature of i(x,y) is determined by the illumination source.

source.

The nature of r(x,y) is determined by the characteristics of the imaged objects.

For images formed by transmission of the illumination through a medium (as in X-ray imaging), reflectivity is replaced by transmissivity.

(44)

The intensity of a monochrome image at any point (x₀,y₀) is called the gray level l of the image at that point.

l = f (x₀,y₀)

l lies in the range L_min ≤ l ≤ L_max L_min : should be positive

L_min : should be positive L_max : should be finite

L_min= i_min r_min L_max= i_max r_max

(45)

Image Sampling and Quantization

The output of most sensors is a continuous voltage waveform whose amplitude and spatial behaviour are related to the physical phenomenon being sensed.

This continuous sensed data has to be converted to digital form.

This involves two processes:

Sampling

(46)

Basic Concepts

An image may be continuous with respect to the x- and y-coordinates and also in amplitude.

To convert it to digital form, the function must be To convert it to digital form, the function must be

sampled in both coordinates and in amplitude.

Digitizing the coordinate values is called sampling.

Digitizing the amplitude values is called

(47)

(48)

To sample the plot of amplitude values of the continuous image along AB, take equally spaced samples along AB. This set of discrete locations give the sampled function.

The sample values still span a continuous range of The sample values still span a continuous range of gray-level values. These values also must be converted to discrete quantities(quantization) to obtain a digital image.

The gray level scale can be divided into a number of discrete levels ranging from black to white.

(49)

In the figure, one of the eight discrete gray levels is assigned to each sample.

Starting at the top of the image and carrying out this Starting at the top of the image and carrying out this procedure line by line for the entire image will produce a two-dimensional digital image.

(50)

Method of sampling is determined by the sensor arrangement used to generate the image.

Single sensing element combined with mechanical

motion

Sampling – by selecting the number of individual

mechanical increments at which the sensor is activated to collect the data.

to collect the data.

Sensing Strip

Sampling – the number of sensors in the strip limits

sampling in one direction.

Sensor array

Sampling – the number of sensors in the array limits

(51)

Representing Digital Images

The result of sampling and quantization is a matrix of real numbers.

Let the image f(x,y) be sampled such that the digital image has M rows and N columns.

(52)

(53)

The complete MxN image can be represented using matrix form.

Each element of the matrix array is called an image element, picture element or pixel.













−

=

)

1 ,

1 (

...

)

1 ,

1 (

)

0 ,

1 (

...

)

1 ,

1 (

...

)

0 ,

1 (

)

1 ,

0 (

...

)

1 ,

0 (

)

0 ,

0 (

)

,

(

M

N

f

N

f

N

f

M

f

M

f

y

x

f

(54)

a_0,0 a_0,1 … a_0,N-1 a_1,0 a_1,1 … a_1,N-1

A = . . .

. . .

a_M-1,0 a_M-1,1 … a_M-1,N-1 where aij = f(x=i,y=j) = f(i,j).

where aij = f(x=i,y=j) = f(i,j).

The sampling process may be viewed as partitioning the xy-plane into a grid.

f(x,y) is a digital image if (x,y) are integers from Z2 and f is a function that assigns a gray-level value to each distinct pair of coordinates (x,y).

(55)

The number of distinct gray-levels allowed for each pixel is an integer power of 2.

L = 2k

The range of values spanned by the gray scale is called the dynamic range of an image.

High dynamic range – high contrast image Low dynamic range – low contrast image Low dynamic range – low contrast image

The number of bits required to store a digitized image, b = M x N x k

When M =N, b = N2k.

(56)

Spatial and Gray-Level

Resolution

Sampling determines the spatial resolution of an image, which is the smallest discernible detail in an image.

Resolution is the smallest number of discernible line Resolution is the smallest number of discernible line pairs per unit distance. A line consists of a line and its adjacent space.

Resolution can also be represented using number of pixel columns (width) and number of pixel rows (height).

(57)

Resolution can also be defined as the total number of pixels in an image, given as number of megapixels.

More the number of pixels in a fixed range, higher the resolution.

resolution.

Gray-level resolution refers to the smallest discernible change in gray level.

(58)

Consider an image of size 1024 x 1024 pixels whose gray levels are represented by 8 bits.

The image can be subsampled to reduce its size.

Subsampling is done by deleting appropriate number Subsampling is done by deleting appropriate number

of rows from the original image.

eg; A 512 x 512 image can be obtained by deleting every other row and column from 1024 x 1024 image. The number of gray levels is kept constant.

(59)

(60)

(61)

The number of samples is kept constant and the number of gray levels is reduced.

(62)

The number of bits is reduced keeping spatial resolution constant.

(63)

False contouring – When the bit depth becomes

insufficient to accurately sample a continuous gradation of color tone, the continuous gradient will appear as a series of discrete steps or bands. This is termed as false contouring.

(64)

Images can be of low detail, intermediate detail, or high detail depending on the values of N and k.

(65)

Each point in Nk-plane represents an image having values of N and k equal to coordinates of that point. Isopreference curves – Curves that correspond to

(66)

The quality of the images tends to increase as N and k are

increased.

A decrease in k generally increases the apparent contrast of A decrease in k generally increases the apparent contrast of

an image.

For images with a larger amount of detail, only a few gray levels are needed.

(67)

Aliasing and Moire Patterns

Aliasing – The distortion that results from undersampling when the signal reconstructed from samples is different from the original continuous signal.

signal.

Shannon Sampling Theorem – To avoid aliasing, the

sampling rate should be greater than or equal to twice the highest frequency present in the signal.

(68)

Sine Wave

(69)

Sine Wave sampled 1.5 times per cycle

- results in a lower frequency wave

(70)

For an image, aliasing occurs if the resolution is too low.

To reduce the aliasing effects on an image, its high frequency components are reduced prior to sampling by blurring the image.

by blurring the image.

Moire Pattern – Interference patterns created when

(71)

(72)

Zooming and Shrinking Digital

Images

Zooming – Oversampling Shrinking – Undersampling Zooming involves two steps: Zooming involves two steps:

Creation of new pixel locations

Assigning gray levels to new pixel locations

Nearest neighbour interpolation

Pixel Replication

(73)

Nearest neighbour interpolation

Size of zoomed image need not be an integer multiple of size

of original image.

Fits a finer grid over the original image.

Gray level corresponding to the closest pixel in original image

is assigned as the gray level of new pixel.

Expand the grid to the original size.

Pixel replication

Special case of nearest neighbour interpolation

Size of zoomed image is an integer multiple of size of original

image

Duplication of columns and rows are done the required

(74)

Bilinear interpolation – uses 4 nearest neighbours of a

(75)

Linear interpolation: a straight line between the 2

known points.

This can be understood as a weighted average, where the weights are inversely related to the distance from the end points to the unknown point.

(76)

The weights are which are the normalized distances between the unknown point and each of the end points.

(77)

Interpolating in x-direction

(78)

(79)

Bilinear interpolation – uses 4 nearest neighbours of a point.

The gray level assigned to the new pixel is given by v(x’,y’) = ax’ + by’ + cx’y’ + d

The coefficients are determined from the four equations in four unknowns written using the four nearest neighbours of (x’,y’).

(80)

(81)

Shrinking

Shrinking by a non-integer factor

Expands the grid to fit over the original image.

Does gray-level nearest neighbour or bilinear

interpolation

(82)

Basic Relationships Between Pixels

Neighbours of a pixel

4-neighbours

diagonal-neighbours 8-neighbours

(i-1,j-1) (i-1,j) (i-1,j+1)

8-neighbours

Adjacency

4-adjacency 8-adjacency m-adjacency

(i,j-1) (i,j) (i,j+1)

(83)

Neighbours of a pixel

A pixel p at (x,y) has 4 horizontal and vertical

neighbours whose coordinates are given by: (x+1,y) , (x-1,y), (x, y+1) and (x,y-1)

This set of pixels called the 4-neighbours of p is

denoted by N₄(p).

Each pixel is of unit distance from p and may lie outside

(84)

The 4 diagonal neighbours are given by:

(x+1, y+1), (x+1, y-1), (x-1, y+1) and (x-1, y-1)

This set of pixels is denoted by N_D(p). These points together with the 4-neighbours are called the

(85)

Adjacency, Connectivity, Regions and Boundaries

Connectivity – Two pixels are connected if they are

neighbours and if their gray levels satisfy a specified criterion of similarity (eg; if their gray levels are equal). Adjacency – Let V be the set of gray-level values used

Adjacency – Let V be the set of gray-level values used

to define adjacency.

In binary image, V={1}, for adjacency of pixels with value 1.

a) 4-adjacency: Two pixels p and q with values from V are 4-adjacent if q is in the set N₄(p).

(86)

b) 8-adjacency: Two pixels p and q with values from V are 8-adjacent if q is in the set N₈(p).

c) m-adjacency : Two pixels p and q with values from V are m-adjacent if :

V are m-adjacent if :

q is in N₄(p)

Q is in N_D(p) and the set N₄(p)∩N₄(q) has no pixels

(87)

(88)

Two image subsets S

1

and S

2

are adjacent if some

pixel in S

₁

is adjacent to some pixel in S

₂

.

A digital path or curve from pixel p with

coordinates (x,y) to pixel q with coordinates (s,t) is

a sequence of distinct pixels with coordinates

(x

₀

,y

₀

), (x

₁

,y

₁

), …, (x

_n

,y

_n

) where (x

₀

,y

₀

) = (x,y),

(x

_n

,y

_n

) = (s,t) and pixels (x

_i

,y

_i

) and (x

_i-1

,y

_i-1

) are

adjacent for 1 ≤ i ≤ n. n is the length of the path.

(89)

We call the paths 4-, 8-, or m-paths depending on the type of adjacency.

(90)

Let S be a subset of pixels in an image.

Connectivity: Two pixels p and q are said to be

connected in S if there exists a path between them consisting entirely of pixels in S.

Connected Component: For any pixel p in S, the set

of pixels that are connected to it in S is called a connected component of S.

Connected Set: If the set S has only one connected

(91)

Region: Let R be a subset of pixels in an image. If

R is a connected set, it is called a region of the

image.

Boundary: Boundary of a region R is the set of

pixels in the region that have one or more

neighbours that are not in R.

It forms a closed path

pixels in the region that have one or more

neighbours that are not in R.

It forms a closed path and is a global concept.

Edge: Edges are formed from pixels with derivative

values that exceed a threshold. It is based on measure of gray-level discontinuity at a point and is a local

(92)

Distance Measures

For pixels p, q and z with coordinates (x,y), (s,t) and (v,w) respectively D is a distance function if :

a) D(p,q) ≥ 0, (D(p,q) = 0 iff p = q) b) D(p,q) = D(q,p)

(93)

The Euclidean distance between p and q is defined as

D_e(p,q) = [(x-s)2 + (y-t)2]1/2

D₄ distance (or city-block distance) between p and q

is defined as

D₄(p,q) = |x-s|+|y-t| D₄(p,q) = |x-s|+|y-t|

Pixels with D₄ = 1 are 4-neighbours of (x,y).

D₈ distance (or chessboard distance) between p and

q is defined as

(94)

D4 and D8 distances between p and q are independent

of any paths that might exist between the points because the distances involve only the coordinates of because the distances involve only the coordinates of the points.

(95)

Euclidean distance (2-norm) D₄distance (city-block distance) D₈ distance (chessboard distance) 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 4 4 2 2 2 2 2 2 2 5 5 5 5 0 1 ₁ 1 1 0 1 1 1 1 0 1 ₁ 1 1 1 1 1 ₁ 2 2 2 2 2 ₂ ₂ 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 4 4 2 ₂ 2 2 2 2 2 2 2 2 2 5 5 5 5 5 5

(96)

D_m distance between p and q is defined as the

shortest m-path between the two points. a) V= {1}

Dm = 2

(97)

b) V= {1}

(98)

c) V= {1}

Dm = 3

(99)

d) V= {1}

(100)

Image Operations on Pixels

Images are represented as matrices. Matrix division is not defined.

Arithmetic operations including division are defined between corresponding pixels in the images involved.

(101)

Linear and Non-linear Operations

Let H be an operator whose input and output are images.

H is said to be linear operator if for any two images f H is said to be linear operator if for any two images f

and g and any two scalars a and b, H(af + bg) = aH(f) + bH(g)

eg; adding 2 images.

Non-linear operation does not obey the above condition.

(102)