Chapter 3 Cross-modal perception and applications to rendering
3.1.1 The human visual system
The human eye is composed of three tissue layers. These are thesclera, the choroid
and theretina (see Figure 3.1). The sclera is the exterior white cover of the eye and it has an opening in front which is known as the cornea of the eye. The choroid is the intermediate layer between the sclera and the retina and is responsible for the formation of the iris in front of the eyeball. The iris is the colourful circular area in the eye that adjusts the diameter and size of the concentric circular area in the middle, known as the pupil. The pupil is responsible for controlling the amount of light that is allowed to pass into the interior of the eye and stimulate the retina that is located at the back of the cornea.
The retina is the curved surface that covers eye’s surface interior. This surface is composed of ganglion and photoreceptor cells. Ganglion cells receive visual signals from the photoreceptor cells and transfer them to the head thalamus and hypothalamus for further processing. Two kind of photoreceptor cells co-exist in the retina. These are the rods and the cones. Rods operate at dark ambient conditions (scotopic region) and their spatial acuity is very low. In their operating range, rods are capable of detecting small variations of light intensity. Di↵erent from rods, cones are able to distinguish colour information and have much higher spatial acuity. There are three di↵erent types of cones. These mediate perception of long wavelength light (red), middle wavelength light (green) and short wavelength light (blue). Cones mainly operate at brighter light conditions than the rods, beginning from the photopic region [Fer01].
The foveal region orfovea is located in the center of the retina. This region contains the highest concentration of cones in the eye thus it provides the best possible visual acuity known also as foveal vision. The retina contains around 100
million rods located mostly in the periphery of the retina and around 5 million cones [Pug88].
Figure 3.1: Basic anatomy of the Human Eye. Definitions and the original image can be found in Ferwerda’s study [Fer01].
Although the human eye is a highly sophisticated sensory organ, it has spe- cific limitations in terms of its visual acuity capability. These limitations refer to situations where the HVS cannot perceive very high resolution details or the tempo- ral nature of the visual stimulus does not permit its full processing and perception. In the spatial domain, when watching an object, one degree of the view- ing scene is projected, across 288µm of the retina’s surface where approximately 120 cones are located. This means that a repetitive pattern of alternating black and white stripes displayed at less than a degree of the viewing space cannot be perceived as a stripes texture anymore but rather as a blurred uniform grey re- gion [Nag80]. This typical example shows that the spatial resolution of the HVS is limited. Depending on the distance d of the object away from the eye, the spatial resolution✓ in degrees can be given as:
tan(✓ 2) =
x
2d, (3.1)
wherex is the height of the object in meters (see also Figure 3.1).
The temporal sensitivity of the human eye refers to the amount of informa- tion that can be captured and perceived when watching moving objects or imagery frames over time. In this case, the visual acuity is limited due to the inability to process the entirety of a dynamically varying scene. Information loss due to tempo- ral restrictions can be partially mediated by the ability of the eyes to move (saccadic
eye movements) and foveate objects at high speeds in the visual domain. According to Henderson and Pierce [HP08], the human eye is capable of doing saccadic moves three times per second when watching a scene while the gaze fixation time at di↵er- ent scene locations can be at least 20 ms. During the fixation time, the eye foveates the object and extracts all its visual information (colour, shape, size, etc.). The tem- poral sensitivity of the human eye is 26Hz or 26 frames per second when watching a video [FN96]. The impression of continuous visual stimuli is attributed to a phe- nomenon known as flicker fusion. According to this, the retina persists perceiving the same image for a time interval of 201 to 15 after its first presentation [Rog25].
The number of photoreceptor cells in the retina is numerous compared to the ganglion cells that transfer signals to the brain. The HVS resolves this im- balance by selectively restricting the amount of visual information that is sent for further processing at any given time. Also the distribution of cones, which are more densely packed in the fovea region, enhances the selective processing of the visual stimuli [Fer01]. When an object is foveated by the eye, vision is sharper and more details are sent through the nervous system for further processing. On the other hand, peripheral vision, is achieved using photoreceptor cells that are not located in the fovea and provides blurred visual stimulations and less information is sent for further processing.