Humans have a capacity for sensing objects that are not supported by external sensory in- puts, in a way that involves a reportably different experience from how reality is sensed. Suchimageryinvolves the visual and auditory modalities, and can be used to access several content types, including objects, episodic memories, words and sounds. Temporally chang- ing (dynamic) imagery includes visual motion (wheremotor imageryincludes movements in peri-personal space), music, andinner speech. In addition, vision also allows static im- agery for most people. We use the terms imagery,thinkingandimaginingsynonymously. Like all brain processes, imagery can be need-driven (planning) or automatic (e.g., a song playing in one’s head without control). In all forms of imagery, the object lacks details and vividness. Visual imagery includes high level object contours without color, form or textural details, and auditory imagery does not include rich timbre.
The main questions about imagery are what its evolutionary role is, how it is mechanis- tically implemented in the brain, and whether non-human animals are capable of it. Here we provide novel answers to these questions.
Evolutionary role. Evolution is guided by the degree by which organisms answer their needs. Need satisfaction is managed by the N process, so the evolutionary role of a brain capacity should be explained by showing how it integrates with the N process. Imagery is clearly not an innate response, so non-automated imagery is part of the acute mode. More specifically, it commonly occurs after alerts and before focused execution, when flexible (novel) but usually not urgent responses need to be found. In other words,imagery (planning) is the R process non-urgent DM mode. Organisms with better response planning capacity have a clear evolutionary advantage.
Since planning is done prior to movement, and since part of the brain’s initial adaptive response to surprise is to stop on-going execution, imagery should be intimately related to movement suppression. Indeed, phasic release of SER, the Rgen promoting planning,
suppresses responses (see above).
Mechanism. Since the Q process underlies the generation of external movements, and since the brain’s architecture is tailored for running it, our view is that a Q process underlies imagery as well. Imagery is a Q process that does not include low level sensory nodes active in reality mode. This occurs because such nodes are not excited by sensory input, which is normally essential for their activation in reality mode.
There are several issues that should be addressed with respect to this account. First, the external Q process is powered by BU flow generated by sensory inputs. What drives imagery, which lacks these inputs? Second, which object nodes participate in imagery and in which mode are they active? Third, what is the role of motor nodes in imagery? Fourth, transitions are normally induced by sensory inputs. What induces transitions in imagery? Fifth, which areas are involved in imagery? Are there areas dedicated only to imagery? Finally, the brain can distinguish between imagination and reality and report the difference. How is this type of awareness implemented?
Drive. Internal needs excite neurons in motivation areas via OX (see above), providing continuous frontal drive in various networks. The flow networks are strongly intercon- nected all over cortex, short distance connections being stronger than long distance ones. Thus, there is no reason why frontal drive would not be able to activate frontal neurons in other nodes (even response neurons if they are not very large).
We note that an important trend in brain evolution apparent in recent ancestors of hu- mans is a significant size increase, especially in PFC, PPC, lateral temporal cortex, and the insula, including denser connectivity of prefrontal pyramidal neurons [Kaas, 2013]. The greatest difference between humans and other ape species is in PFC white matter (myeli- nated axons supporting strong flow) [Schoenemann et al., 2005]. These data imply that the capacity of frontal neurons to converge upon less frontal neurons and excite them effec- tively has greatly increased in animals having higher internal cognition capacities such as planning and tool use.
The moment at which non-urgent DM (i.e., planning) terminates is called decision time (DT). It is the equivalent of Q process competition resolution. DT arrives when a satisfactory plan has emerged, which occurs when the plan reaches an end-state goal having an appropriate valence. Amygdala BLA-cortex projections reach the DM network, providing valence-loaded BU drive to the planning Q process. During planning, the goal is activated in imagery mode (below), and its valence is accessible by cortical connections with BLA. The actual decision to move should be mediated by mPFC, the main motivation area. Indeed, the anterior midcingulate cortex (aMCC), which is strongly connected with all motor regions, is critical in volitional motor control [Hoffstaedter et al., 2014].
Object nodes. Low level sensory areas are identified in fMRI as active during imagery. For example, visual imagery extends as low as V1 [Pearson et al., 2015]. This is easily explained via internal network activity, since internal (mostly DM) neurons are mobilized by quax neurons even if they do not join the quax. Sensory inputs are only essential for activating low level nodes (object and action) in response mode, due to the relatively large size of their response neurons.
The main question related to object nodes is which nodes represent the imagined ob- jects. There are two possible accounts. First, they could be located in high level object areas or even frontal action areas (recall that the representation of categories could be frontal). In this case, the fact that their activation is not perceived as occurring in reality would be due
to their high level location, even if they are activated in reality mode70. Alternatively, they could be located anywhere but activated in internal (most likely DM) mode.
In the first option, imagery is implemented by an ordinary Q process, and the only difference between imagery and movement quacia is that the former do not include low level object nodes in reality mode. In the second option, imagery is implemented by a Q process that is different from the external one in two ways. First, external Q transitions are triggered by the activation of nodes in reality mode, while here they would be triggered by the activation of internal network neurons. Second, external quacia do not include focused DM paths without reality network activity (DM paths must be focused during imagery, because we imagine specific objects). We discuss this issue with transitions below.
Motor nodes. There is incontrovertible imaging evidence that areas activated during mo- tor execution are also activated during imagery, in a limb-specific manner [Guillot et al., 2014]. Moreover, execution and imagery exhibit similar ERPs [Machado et al., 2013]. As for low level object nodes, these data can be explained as DM network activity coupled with a lack of activation of the response network71. However, there is also evidence for re-
sponse network activation during imagery, since imagery involves increased corticospinal excitability (albeit weaker than during execution) [Grosprêtre et al., 2016]. In all motor areas, including high level ones, response neurons project to the spinal cord. Why would response neurons be activated during imagery? and if so, why does movement not occur?
Our answer is that spinal excitability during imagery is due to gMTNs, which do not induce movement. A simple explanation of why gMTNs are active during imagery is that the DM mode primes execution circuits during movement planning to facilitate the switch from DM to execution. Thus, gMTNs can be viewed as an innate part of the prediction network72.
There is good (albeit indirect) evidence supporting this account. Differences between the stretch and H reflexes in imagery suggest that it involves increased gMTN activity [Aoyama and Kaneko, 2011, Jeannerod, 1995]. Mental computation yields increased ex- citability of spinal reflexes due to gMTNs [Rossi-Durand, 2002]. In addition, imagery involves a decrease in the steady state motor EEG beta rhythm, which has been explained via cortico-spinal-cortical loops involving proprioception [Aumann and Prut, 2015]. Such a reduction can be due to increased gMTN excitability (as in alpha oscillation decreases during increased attention, see below). Finally, there is substantial evidence for the involve- ment of the motor system in cognition (see embodied cognition under language above), a lot of this evidence involving imagery. For example, there is increased excitability of lip muscles when reading the letter ‘P’ (whose articulation involves the lips), but not the letter ‘T’ (whose articulation does not involve the lips), with opposite results for tongue mus- cles [McGuigan and Dollins, 1989]. Similarly, forced relaxation of facial muscles (lip, forehead) reduces verbal rumination [Nalborczyk et al., 2017].
A simple explanation of why movements are not produced during imagery is that low level motor nodes are still not active in reality mode. Movement is initiated at decision time, when M1 and PMC accumulate enough excitation due to motivation and valence flow reaching mPFC (in particular aMCC).
Transitions and node activation mode. In movement execution, goal attainment (the
70The reason why it is harder to imagine touch than images and sounds may be because sensory touch
representations (in S1, S2) are close to the nodes representing their responses (in M1), which does not leave enough cortical space for the formation of high level nodes.
71It could be proposed that there are motor nodes dedicated to imagery. However, we can imagine every-
thing that we execute, and it is not reasonable that the brain has two copies of each motor node.
selection of object nodes) and transitions (the selection of the next action by attained goals) are done by sensory inputs. In imagery, object nodes are selected by flow generated by other object nodes serving ascues, in addition to flow generated by frontal task nodes. For example, when you try to find an animal whose name starts with a particular letter, the letter and the word ‘animal’ serve as cues that generate flow that triangulates frontal flow to activate the correct nodes.
The nodes representing the cues are also activated in imagery mode. Since the flow networks are internally connected, cue nodes can generate flow to activate other nodes in imagery mode regardless of whether imagery involves the internal networks or the response network. Thus, there is no inherent constraint that prevents transitions from being triggered by the activation of a node in focused DM or prediction mode.
Note that the BG and thalamus do not provide such a constraint. First, the prediction network projects to the STR, and even the DM network may do so. More importantly, the BG are required for adaptive quacia in order to allow cortex to recruit the thalamus, which is mainly needed in order to sustain response network execution. If imagery does not require the response network (for the imagined objects), the thalamus may not be needed as well (the thalamus may be needed to sustain internal network activity, an argument answered by the first point).
Some nodes must be active in reality mode during non-automatic imagery, in partic- ular motivation nodes, high level task nodes, and the nodes driving gMTNs (these are driven through corticospinal projections, so their activation requires the response network). Moreover, there is clear evidence that motor imagery can involve the BG [Hétu et al., 2013]. SER, the Rgen promoting DM, has both suppressive and excitatory effects in the BG. Thus, the role of the latter may be to support imagery.
Areas. Since imagery is TD driven, it always involves frontal areas. The other areas in- volved depend on the type of the imagined content. Episodic memory involves the extended hippocampal system (see above) [Schacter et al., 2017], music involves the auditory cor- tex, motor imagery involves motor areas, and imagining needs involves valence need areas (mPFC, the insula) [Lin et al., 2015].
Relatedly, there is a network of brain regions consistently activated during relaxed in- ternal cognition, known as thedefault mode network (DMN)[Raichle, 2015], whose core areas are mPFC, posterior cingulate cortex (PCC), lateral and medial temporal cortex, and the posterior inferior parietal lobule. Thus, the DMN largely overlaps with the episodic memory network. In a study using autobiographical and visuospatial planning tasks, the former engaged the DMN, while the latter engaged the dorsal attention network (dlPFC, FEF, visual motion area MT+, motor areas, and the superior parietal lobule) [Spreng et al., 2010]. Both tasks also engaged a general frontoparietal control network located between these two networks.
Being a DM and thus a longer term response, imagery should be supported by action areas that are more anterior than the areas supporting immediate movement. The leading candidate for this role is thesupplementary motor area (SMA), especially itspreSMA
part. Indeed, these areas are strongly associated with imagery, including auditory imagery [Lima et al., 2016] and semantic memory access [Hart et al., 2013]. SMA neurons project directly to the spinal cord, and their projections and the M1 ones converge on the same motoneurons [Maier et al., 2002]. The spinal effect of SMA is weaker than that of M1, has a different nature, and is related to silent periods more than to movement [Kikuchi et al., 2012]. The SMA contributes to the early readiness potential shown in voluntary movement (i.e., to movement planning) [Jahanshahi et al., 1995]. SMA also shows stronger activation in imagery than in hallucinations [Raij and Riekki, 2012], which are a kind of imagery
that stems from brain impairment rather than from the N process. In this view, gMTNs are driven by spinal projections of the SMA and preSMA response network.
Supporting our predicted connection between imagery and movement suppression, there is evidence that preSMA mediates suppression of the losing alternative during strong com- petition [Duque et al., 2013]. Surprises (including errors) induce global stopping via the right inferior frontal cortex (RIFC) and preSMA, while the anterior insula is involved is less immediate stopping [Wessel and Aron, 2017].
Note that many of the areas involved in imagery are medial areas (the hippocampal system, mPFC, preSMA, SMA, the insula), which accords with imagery being a need- driven DM process (recall that valence areas are generally medial).
Reporting. When a person is asked whether a recent experience involved reality or imagi- nation, a search quax is established as in all reporting questions. To be able to answer, the quax needs to include goal nodes that are active only in case of reality or only active in case of imagination. As noted above, the insula contains the former, and preSMA may contain the latter. The insula seems a more probable candidate, because preSMA may participate only in motor imagery and because the insula supports other awareness questions.
Language LS areas continuously generate reports of ongoing actions, and most of these reports are silent. It would be efficient if these areas could mediate speech suppression. Supporting this, the right IFC is strongly associated with stopping motor actions (as noted above), so the left IFC (the language syntax area) may be able to do this too, through a sub-area specializing in suppressing speech.
Non-human animals. Imagery is an R process DM mode, and the brains of all animals are managed by an R process that includes DM. Hence, animals can be argued to be capable of thinking, unless thinking is specifically defined as imagery that uses language. There is evidence that animals imagine action scenarios in the CA1 field of the hippocampus before executing them [Pfeiffer and Foster, 2013]. More generally, sharp wave ripples (SWRs) originating at the hippocampus and present in many brain areas occur during immobile wakefulness and sleep [Buzsáki, 2015], are essential for memory formation, and are in- volved in navigation planning [Roumis and Frank, 2015]. SWRs involve rapid transitions between negativities and positivities and thus provide support for a Q process-like execu- tion during internal cognition. SWRs are most prominent at event and frontal areas and least prominent in sensory areas, supporting their interpretation as planning.