Predictive coding theory also suggests a mode of visual pro- cessing different from the segregated one. The theory states that perceptual, cognitive and action-oriented processing follow a sin- gle general strategy, which uses top-down predictions to minimize prediction errors ( Clark, 2012 ). This approach suggests that neu- ronal selectivity to a feature is not an intrinsic property but the result of interactions across levels of a processing hierarchy ( Fris- ton, 2003 ). Sensory neurons, rather than features per se, encode an error signal, i.e., they feed forward to hierarchically higher areas the discrepancy between the actual input and the top-down expecta- tion ( Egner et al., 2010 ). According to the predictive coding model, predictions are relayed via feedback connections, whereas predic- tion errors are conveyed via feedforward connections ( Rao and Ballard, 1999 ). Hosoya et al. (2005) showed that retinal ganglion cells’ spatio-temporal receptive ﬁelds change dynamically with the visual scene; this result is in line with the view that the raw signal, carried by the receptors, is transformed as early as in the retina by the ﬁrst interneurons which encode deviations from predicted temporal and spatial structures ( Srinivasan et al., 1982 ). Recent fMRI studies have also shown evidence in support of predictive coding in the visualcortex ( Summerﬁeld et al., 2008 ; Kok et al., 2012 ). For instance, Kok et al. (2012) found that the amplitude of the fMRI signal in early visualcortex was smaller when the stimu- lus was expected; typically when we see something that we expect the prediction error encoded in the brain is smaller compared to when we see something unexpected. This mode of processing, however, appears to be at odds with several electrophysiological studies (see Koch and Poggio, 1999 for a commentary).
buffered in some mental store, where it is open to the person’s introspection and, hence, available for speech and action. However, in some cases an individual might display behavior that suggests that a given stimulation was well perceived, but yet deny any form of conscious experience. Consider the case of “blindsight”, a neurological condition that can occur following wide-ranging lesioning of the primary visualcortex. Patients suffering from blindsight usually deny consciously perceiving stimulation in the contralesional visual hemifield. Yet, when asked to guess at the color, orientation, movement, or other attributes of a contralesional stimulus in a forced-choice situation, blindsight patients respond correctly significantly more often than would be predicted by chance (Weiskrantz, 2007). Similar results can be obtained from healthy individuals using visual masking. For example, in metacontrast masking identical visual stimuli are briefly flashed on a screen and followed by a trailing, non-overlapping second stimulus (the mask) after a variable delay (Alpern, 1953; Enns & Di Lollo, 2000). When the delay is either very long or very short, perception of the first stimulus (the target) is not or only slightly hampered. Subjects indicate to consciously perceive the target and can accurately discriminate its attributes in a forced-choice situation. However, at intermediate delays (around 30 ms) the subject less often indicates conscious perception of the target stimulus, but forced-choice discrimination performance remains high (Lau & Passingham, 2006; Schwiedrzik, Singer, & Melloni, 2011).
The cells studied in here were sensitive to the presence of motion but showed no selectivity for the form of the stimulus. 25% of all visually responsive cells in area STP were classified as belonging to this class of cells. This group of cells was further categorized into unidirectional (39%), bidirectional (4%) and pandirectional (57%) cells. Tuning to direction varied in sharpness. For most cells the angular change in direction required to reduce response to half maximal was between 45 and 70 degrees. The optimal directions of cells appeared clustered around cartesian axes, (up/down, left/right and towards/away). The response latency varied between 35.0-126.4 ms (mean 90.9 ms). On average cell responses showed a transient burst of activity followed by a tonic discharge maintained for the duration of stimulation. 83% of the motion sensitive cells lacking form selectivity responded to any stimuli moved by the experimenter, but gave no response to the sight of the animal’s own limb movements. The cells remained, however, responsive to external stimulation whüe the monkey’s own hand was moving in view. Responses to self-induced movements were recovered if the monkey introduced a novel object in its hand into view. That the response discrimination between externally- and self-induced stimulation was not caused by differences in the visual appearance of the stimuli was confirmed in the second experiment where the monkey was trained to rotate a handle connected to a patterned cylinder in order to generate visual motion stimulation over a fixation point. 61% of the tested cells discriminated between pattern motion generated by the monkey and by the experimenter. It was shown that the monkey’s motor activity as such (turning a handle without visible cylinder rotation) did not affect the cells’ spontaneous activity. Some indication was received to suggest that the discriminative mechanism is using not only (motor) corollary discharges but also proprioceptive input. These results also gave evidence of the plasticity of discriminative processing in STP for the animal’s life-time experiences. Finally, the cells were studied for their responsiveness for Image motion resulting from movements of external objects and movements of the animal’s body (self-motion). 84% of the cells responded only to visual object-motion and failed to respond to visual motion resulting from animal’s self-motion. The experiments also revealed that area STP processes visual motion mostly in observer- relative terms, i.e. in reference to the perceiver itself.
Our enquiry into a parallel contribution to the processing of forms began with a time-based magnetoencephalograpic (MEG) study ( Shigihara and Zeki, 2013 ), which showed that two forms of increasing perceptual hierarchy—lines and rhombuses con- stituted from them—activate V1 and the early visual areas of prestriate cortex within the same time frame (at between 27 and 44 ms). Here, we extend that study by using functional magnetic resonance imaging (fMRI), to compensate for the relatively low spatial resolution of MEG and thus to better localize the activ- ity produced in prestriate cortex. In doing so, we also enlarged the repertoire of forms viewed. In our MEG study, we had cho- sen lines because of the ubiquity of OS cells in areas V1–V3, and rhombuses consisting of the same lines because they are percep- tually more complex. In this one, we added angles partly because they are intermediate in perceptual hierarchy between lines and rhombuses and partly because rhombuses have angles as con- stituents, which have been considered to be potent stimuli for cells in V2 ( Hegdé and Van Essen, 2000; Ito and Komatsu, 2004 ), thus raising the possibility that they may activate human V2 more strongly than lines. Our general hypothesis, derived from our MEG studies, was that all three forms derived from straight lines will activate all the three visual areas with which we are principally concerned equally.
A paradigm to investigate how people process overall features and small details has been proposed by Navon (1977) with a task that required the identification of hierarchically organized stimuli, i.e. an overall (global) shape which is constructed by the accumulation of several smaller (local) shapes. If the smaller shapes are the same as the overall shape, the stimuli are congruent (local and global shape match). If the shapes differ, the stimuli are called incongruent (mismatch between global and local shape). Navon (1977) found that identifying global shapes takes less time than identifying their local counterparts, also known as the global- to-local-advantage. Furthermore, he found that reaction times in identifying local stimuli are slower if the global appearance of the stimuli does not match the local appearance (i.e. the global form is not equal to the local form). This effect, which is also called the global-to-local interference, does take place in the identification of local stimuli with incongruent global stimuli, but takes place far less with global identification with incongruent local stimuli. Based on this, Navon (1977) named this effect the global precedence effect, meaning that the global features are identified earlier than their local counterparts, leading to an interference in the case of incongruent stimuli when local features have to be identified. Since then, the existence of the global precedence effect has been shown in several other studies (Bouvet, Rousset, Valdois, & Donnadieu, 2011; Goto, Wills, & Lea, 2004; Kimchi, 1992) including studies, that found evidence for the effect on a physiological level (Han, Yund, & Woods, 2003; Proverbio, Minniti, & Zani, 1998). Research furthermore has shown that the perception of global and local features of a stimulus can be affected by several factors such as size of the stimulus, viewing angle or spatial frequency (Baker & Braddick, 1982; Eagle & Rogers, 1997; Kimchi, 1992). Michimata et al. (1999) reported that the global precedence effect can also be mediated by changing the background color of the stimuli. The researchers found, that while global-to-local interference took place normally when stimuli where presented on a green background, it was decreased when presented on a red background. This led to the theory that the color red has an attenuating effect on the interference of conflicting global features on the identification of local elements in hierarchically organized stimuli.
Following implantation, post-lingually deafened adult CI re- cipients are seen to maintain enhanced speechreading abilities (Rouger et al., 2007) and to rely heavily on visual speech informa- tion in audio-visual conditions (Desai et al., 2007; Rouger et al., 2007, 2008). This apparent use of compensatory strategies is likely due to the degraded and unreliable nature of the auditory signal provided by the CI, as well as the need for CI users to learn to match the novel auditory speech inputs onto their existing stored auditory representations and to form new associations with the corresponding visual speech cues (Strelnikov et al., 2009). Behav- ioural evidence in pre-lingually deaf paediatric CI users has indi- cated that superior speechreading abilities before implantation may bene ﬁ t future auditory-only language abilities with a CI (Bergeson et al., 2005). Speechreading may provide these bene ﬁ ts by enabling early access to spoken language structure and the development of general linguistic skills such as phonological pro- cessing (Bergeson et al., 2005; Lachs et al., 2001). Moreover, recent behavioural and physiological evidence in early-deafened, bilater- ally-implanted ferrets has demonstrated that intermodal training (i.e. using interleaved auditory and visual cues on separate trials) can in fact improve auditory-alone localisation abilities (Isaiah et al., 2014). This intermodal training was also seen to enhance neural sensitivity to sound localisation cues within the auditory cortex (Isaiah et al., 2014). This animal model thereby suggests that vision may facilitate the restoration of auditory function following cochlear implantation, likely through modi ﬁ cations to the auditory cortex (Isaiah et al., 2014; Isaiah and Hartley, 2015).
The human visual system devotes a significant proportion of its resources to a very small part of the visual field, the fovea. Foveal vision is crucial for natural behavior and many tasks in daily life such as reading or fine motor control. Despite its significant size, this part of cortex is rarely investigated and the limited data have resulted in competing models of the layout of the foveal confluence in primate species. Specifically, how V2 and V3 converge at the central fovea is the subject of debate in primates and has remained “terra incognita” in humans. Using high-resolution fMRI (1.2 ⫻ 1.2 ⫻ 1.2 mm 3 ) and carefully designed visual stimuli, we sought to accurately map the human foveal confluence and hence disambiguate the competing theories. We find that V1, V2, and V3 are separable right into the center of the foveal confluence, and V1 ends as a rounded wedge with an affine mapping of the foveal singularity. The adjacent V2 and, in contrast to current concepts from macaque monkey, also V3 maps form continuous bands ( ⬃ 5 mm wide) around the tip of V1. This mapping results in a highly anisotropic representation of the visual field in these areas. Unexpectedly, for the centermost 0.75°, the cortical representations for both V2 and V3 are larger than that of V1, indicating that more neuronal processing power is dedicated to second-level analysis in this small but important part of the visual field.
Most recently, Kemmere et al. (2008) demonstrated that there were brain regions activated when participants made semantic judgments about verbs and more interestingly the regions activated were different depending on the semantic contents of the verbs. They used several classes of verbs in their study, namely, running verbs (e.g., run), speaking verbs (e.g., shout), hitting verbs (e.g., hit), cutting verbs (e.g., cut), and change of state verbs (e.g., shatter), which vary in the level of their involvement of five distinct semantic components, e.g., action, motion, contact, change of state, and tool use. For example, running verbs involve only two components which are action and motion, and cutting verbs involve all five components. Their results showed that the action component elicited activation in the primary motor and pre-motor cortices, the motion component in the posterior- lateral temporal cortex, the contact component in the intra-parietal sulcus and inferior parietal lobule, the change of state component in the ventral temporal cortex, and the tool use component in a distributed network of temporal, parietal, and frontal regions.
input is not a face, other than to try processing it as PPA permit excellent discrimination between pre- such. Thus, the mere existence of partial responses to ferred versus nonpreferred stimuli (e.g., faces-bottles nonpreferred stimuli in a category-selective region of and houses-bottles, respectively), we find that neither cortex does not guarantee that these responses encode region alone permits accurate discrimination between any information about the category of those stimuli, or pairs of nonpreferred stimuli (e.g., bottles-shoes). that any such information forms a critical part of the These findings indicate that the ventral visual pathway representation of those stimuli.
To extend our investigation of the role of fundamental properties of gratings, we investigated visual cortical responses to isoluminant chromatic gratings of varying spatial frequency. Color perception is believed to be a higher order cognitive process and is involved in other areas of the visualcortex, including area V4 (e.g., McKeefry & Zeki, 1997; Zeki, Watson, Lueck, Friston, & Kennard, 1991) and area V8 (Hadjikhani, Liu, Dale, Cavanagh, & Tootell, 1998). In this study, we have used photometrically isoluminant red/green gratings rather than luminance contrast gratings that we used in a number of previous studies. For these gratings, chromatic contrast is high while luminance contrast is low. We can therefore test the hypothesis that gamma band activity is related to luminance contrast processing specifically and not color contrast. We use the same cohort of subjects in order to facilitate the comparison between MEG cortical responses to isoluminant and contrast gratings.
In early stages of visualprocessing, individual neurons respond directly only to stimuli in their classical receptive fields (CRFs)(Hubel and Wiesel, 1962). These CRFs sample the local contrast information in the input but are too small to cover visual objects at a global scale. Recent experiments show that the responses of primary cortical (V1) cells are significantly influenced by stimuli nearby and beyond their CRFs (Allman et al 1985, Knierim and Van Essen 1992, Gilbert, 1992, Kapadia et al 1995, Sillito et al 1995, Lamme, 1995, Zipser et al 1996, Levitt and Lund 1997). These contextual influences are in general suppressive and depend on the relative orientations of the stimuli within and beyond the CRF (Allman et al, 1985, Knierim and Van Essen 1992, Sillito et al 1995, Levitt and Lund 1997). In particular, the response to an optimal bar in the CRF is suppressed significantly by similarly oriented bars in the surround — iso-orientation suppression (Knierim and Van Essen 1992). The suppression is reduced when the orientations of the surround bars are random or different from the bar in the CRF (Knierim and Van Essen 1992, Sillito et al 1995). However, if the surround bars are aligned with the optimal bar inside the CRF to form a smooth contour, then suppression becomes facilitation (Kapa- dia et al 1995). The contextual influences are apparent within 10-20 ms after the cell’s initial response (Knierim and Van Essen 1992, Kapadia et al 1995), suggesting that mech- anisms within V1 itself are responsible (see discussion later on the different time scales observed by Zipser et al 1996). Horizontal intra-cortical connections linking cells with non-overlapping CRFs and similar orientation preferences have been observed and hy- pothesized to be the neural substrate underlying these contextual influences (Gilbert and Wiesel, 1983, Rockland and Lund 1983, Gilbert, 1992). There have also been the- oretical studies of the mechanisms and phenomena of the contextual influences (e.g., Somers et al 1995, Stemmler et al 1995). However, insights into the computational roles of contextual influences have been limited to mainly contour or feature linking (Allman et al 1995, Gilbert, 1992, see more references in Li 1998a).
The electroencephalogram (EEG) was recorded continuously from 64 scalp sites at a digitization rate of 1000 Hz. Electrodes were mounted on an elastic cap (Easy Cap, FMS), with positions corresponding to the 10-10 System . Horizontal and vertical EOG was monitored by means of electrodes placed at the outer canthi of the eyes and, respectively, the superior and inferior orbits. All electrodes were referenced to Cz and re-referenced offline to linked mastoids. Impedances were kept below 5 kV. Electrophysiological signals were amplified using a 0.1–250-Hz bandpass filter using BrainAmp amplifiers (BrainProducts, Mu- nich) and filtered offline with a 1–40-Hz band-pass (Butterworth zero phase, 24 dB/Oct). Prior to epoching the EEGs, an independent-component analysis (ICA), implemented in the Brain Vision Analyzer software (BrainProducts, Munich), was run to identify and backtransform components that represent blinks and/ or horizontal eye movements. The EEG was then epoched into 500-ms segments relative to a 200-ms baseline, which was used for baseline correction. Only trials with correct responses and without artifacts – defined as any signal exceeding 660 m V, bursts of electromyographic activity (permitted maximal voltage steps/ sampling point of 50 m V), and activity lower than 0.5 m V within intervals of 500 ms (indicating dead channels) – were selected on an individual-channel basis, prior to averaging. The PCN component was quantified by subtracting ERPs obtained at lateral posterior electrode positions PO7/PO8 ipsilateral to the side of the singleton in the search array from contralateral ERPs. PCN latencies were determined individually as the maximum negative deflection in the 150–350-ms time window post stimulus. PCN amplitudes were calculated averaging five sample points before and after the maximum deflection. PCN onset latencies were estimated based on Ulrich and Miller’s (2001) jackknife-based scoring method, which defines the onset as the point in time at which the amplitudes reaches a specific criterion relative to the pre-stimulus baseline. As suggested by Ulrich and Miller , we used 50% maximum amplitude as an optimal criterion for determining the onset of stimulus-locked ERP potentials. Electro- physiological (latencies, onset latencies, and amplitudes of the PCN) as well as behavioral measures (reaction times, error rates) were subjected to two-way repeated-measure analyses of variance (ANOVAs) with the factors Dimension (color, orientation) and Neural Indices of Visual Feature Contrast
We obtained the regional correlation matrices within and between hemispheres for the blind and sighted subjects (Figure 2A). Structure can be seen within each matrix corresponding to increased direct correlations between visual areas with a hierarchical relationship and increased indirect correlations between the spatially distributed quarter fields of a given visual area (e.g., V2d and V2v in the left and right hemisphere). Control studies using a water phantom confirmed that the indirect correlation structure is not a property of the MRI images themselves (see Figure S1). As cortical regions that share a hierarchical, direct relationship are also adjacent in volumetric space, there is non-zero correlation between these regions attributable to image properties (Figure S1). Consequently, we do not ascribe a neural interpretation to the absolute level of the direct regional correlations measured from our human subjects, but do regard the relative level of correlation between groups as meaningful. Our desire to examine direct, hierarchical correlations free from this confound of volumetric adjacency in part motivates our study of fine-scale, retinotopic correlation discussed below.
Even though the abovementioned studies have found V1-independent unconscious processing, many studies have concluded that both conscious and unconscious processing depend on V1. Some of these studies have concluded that at some SOAs, it is possible to selectively interfere with conscious processing and reveal unconscious processing. This, however, does not constitute as TMS-induced blindsight because unconscious processing was disrupted at some other SOA. Unconscious visualprocessing in that case depends on V1 but not in same time- windows as conscious processing does. Sack, van der Mark Schuhmann, Schwarzbach, and Goebel (2009) found that when the accuracy of detecting the prime stimulus was lower owing to TMS of V1, the priming effect was also impaired. Koivisto, Mäntylä, and Silvanto (2010) studied the role of V1 in motion processing. They used a set of dots coherently moving either to the right or to the left. TMS of V1 impaired both the forced-choice accuracy and the awareness of the motion. Koivisto, Railo, and Salminen-Vaparanta (2011) studied the processing of the orientation of a bar and pointing arrows and found that both conscious and unconscious processing depended on V1. When the participants reported not being aware of the stimulus, they performed at chance level in the forced-choice task. Koivisto, Henriksson, Revonsuo, and Railo (2012) studied the role of V1 in unconscious priming. The prime was an arrow pointing to the left or to the right, and the target stimulus, which also served as a visual mask, was an arrow-shaped contour. The priming effect was found even though the participants saw only the target arrow. However, TMS of V1 impaired this unconscious priming effect. Persuh and Ro (2013) also studied unconscious priming. They found that unconscious priming depended on V1 at SOAs ranging from 5 to 25 ms and 65 to 125 ms and found unconscious priming at 45 ms SOA. They concluded that unconscious priming depends on V1 in specific temporal phases of processing. Koivisto, Lähteenmäki, Kaasinen, Parkkola, and Railo (2014) reported that TMS of V1 impaired conscious and unconscious shape discrimination. Railo, Salminen-Vaparanta, Henriksson, Revonsuo, and Koivisto (2012) studied the unconscious processing of color using the unconscious priming paradigm. They found that unconscious color priming was impaired by TMS of V1. Railo, Andersson, Kaasinen, Laine, and Koivisto (2014) studied the role of V1 in chromatic RTE. TMS of V1 suppressed the perception of the redundant chromatic stimulus and eliminated RTE.
Our results demonstrate that the distributed activity of neurons in cat primary visualcortex (area 17) contains information about previously shown images and that this information is available for a prolonged period of time. The information can be extracted by simple computer-simulated readout neurons and is available for as long as the firing rates stay elevated. These findings are related to the results obtained in macaque IT cortex , but there are also a number of differences. First, we show that information required for stimulus classification can be extracted easily from neurons at early processing stages that represent detailed and feature-based information, and under anesthesia. The results are similar to those obtained from neurons in IT that represent categorical informa- tion. Second, stimulus-specific information is readily extractable also from responses evoked by the offset of the stimulus (off- responses). This suggests that off-responses play an important role in cortical functions, and hence, should be studied more thoroughly than is usually the case. Third, by presenting sequences of stimuli, we show that the system has reliable memory for one stimulus back. Fourth, we were able to identify the response variables (neuronal code) that carry the stimulus-related informa- tion. The classifiers relied on information carried both by neuronal
these brain regions has only been researched more recently. One WM model, dubbed the “sensory recruitment model” (Serences et al., 2009; Lee and Baker, 2016), envisages VWM as an emergent property from sensory regions as early as V1, which specifically code for feature and stimulus-specific information. An important characteristic of the model is the sustained representation of visual perceptual information along inferior occipito-temporal cortex, even after the perceptual stimulus has faded. Thus, the model suggests that VWM is maintained in the same posterior visual brain regions that are responsible for perceptual encoding. Early influential research has attributed WM-related stimulus representations to PFC rather than visual regions, e.g. (Baddeley and Hitch, 1974; Goldman-Rakic, 1990). A number of non-human primate studies reported spiking activity of single units during the delay-period of WM-tasks, which was taken as evidence for retained stimulus-specific information in PFC (Fuster, 1973; Funahashi et al., 1989; Goldman-Rakic, 1995). However, converging findings with human participants and functional magnetic resonance imaging (fMRI) have since offered further insights into the specific frontal and posterior contributions to WM. For example, the distributed model of working memory envisages the PFC as an area exerting top- down control over posterior sensory regions. It converges with the sensory recruitment model on the notion that posterior sensory regions carry specific representational content (Postle, 2006; D’Esposito and Postle, 2015; Lee and Baker, 2016). Key support for the distributed and sensory recruitment model comes from studies using multi voxel pattern analysis (MVPA) that could discern the representational content in relevant frontal and occipito-temporal regions. Two studies (Christophel et al., 2012; Riggall and Postle, 2012) showed that although there was a sustained BOLD-response in frontal regions throughout the delay-period of a VWM
cryptography (VC)” . The major feature of their scheme was that the secret image can be decrypted simply by the human visual system without having to resort to any complex computation. Naor and Shamir’s scheme could hide the secret image in n distinct images called shares. The secret image could then be revealed by simply stacking together as many as k of the shares. Each of the shares looked like a collection of random pixels and of course appeared meaningless by itself. Naturally, any single share, before being stacked up with the others, reveals nothing about the secret image. This way, the security level of the secret image when transmitted via the Internet was effectively lifted up. Since Noar and Shamir published their VC scheme, many related methods have been developed and proposed. However, in addition to the meaningless shares they produce, those schemes take only binary images as secret images, which mean the contents of the secret images in most cases can be nothing but text or simple black-and-white designs. It is only natural now that researchers are more interested in developing new cryptography schemes that can also process secret color images that are more complex. .
The first three figures summarize key milestones for visual system development and illustrate compelling similarities between the timing of visual, anatomical, and neurobiological milestones in human V1. Tapping into these neurobiological mechanisms is going to be key for the next generation of treatments for visual disorders. For example, a wide range of potential new therapies has been developed in animal models for amblyopia. The treatments include every- thing from fine-tuning of traditional patching therapy 155,156 to