And he spake this parable unto them, saying, What man of you, having an hundred sheep, if he lose one of them, doth not leave the ninety and nine in the wilderness. . . Luke 15:3–15:4
Hovik Khudaverdyan aged 6 Philosophers of mathematics find it useful to look at mathe-
matical texts as narratives (see, for example, David Corfield [17] and Robert Thomas [89]). Indeed, even an average, run-of-the-mill mathematical paper has a multi-layered structure of complexity comparable with that of a serious novel, like War and Peace or Ulysses. The analogy is, however, much deeper; for me, its most appealing aspect is a parallelism between the development of a character in a novel or play, and specialization of an abstract math- ematical structure.
However, I wish to follow the principal line of this book and to look at something smaller and simpler than a novel.
Robert Thomas aged 13 So-called mathematical folklore is virtually unknown outside
professional circles: it is the corpus of small problems, examples, brainteasers, jokes, etc., not properly documented and existing mostly in oral tradition. It is a small universe on its own; but in all its diversity, one can easily notice the prominent role of fables or parables, that is, general statements (or problems) which are in- tentionally set in the least general terms, or illustrated by a simple, highly specialized example.
In the written tradition, there is at least one famous parable, the celebratedPigeon-Hole Principle:
If you were to put6pigeons in5holes, then at least one hole would contain more than one pigeon.
Hans Freudenthal 1905–1990
In the Russian mathematical literature, the Pigeon-Hole Princi- ple is known under the name of theDirichlet Principle.1The name emphasizes its pedigree and status; however, the principle itself is usually formulated in terms of6rabbits and5hutches.
What always struck me since the time when I first encountered the Pigeon-Hole Principle, was the persistence of the numbers 6 and 5in its formulation. I felt that the choice was somehow very precise and convincing; I have sometimes seen alternative formu- lations, as a rule, with similar small numbers, but they somehow looked less attractive.
David Pierce Aged 13
I can now see a possible explanation of the persistence of the “6/5” formulation in works on the neurophysiology of counting. In- deed, it is now an established fact that the mechanisms of percep- tion of small ensembles of objects are very different from count- ing larger ensembles; up to5objects, we havesubitizing, i.e. “sud- denizing”, immediate perception of the quantity which does not interfere with our ability to keep track of each individual object [167]; subitizing starts to fail at6objects, and, as a rule, has to be switched to counting when we have7or more objects.2
The cultural significance of the subitizing/counting threshold was understood early on. Comments from one of the first experi- mental psychologists who wrote about subitizing and a related phe- nomenon, short-term memory (George A. Miller [202], 1956) were interesting for being both suggestive and very cautious:
And finally, what about the magical number seven? What about the seven wonders of the world, the seven seas, the seven deadly sins, the seven daughters of Atlas in the Pleiades, the seven ages of man, the seven levels of hell, the seven primary colors, the seven notes of the musical scale,3 and the seven days of the week? What about the seven-point rating scale, the seven categories for absolute judgment, the seven objects in the span of attention, and the seven digits in the span of immediate memory? For the present I propose to withhold judgment. Perhaps there is something deep and profound behind all these sevens, something just calling out for us to discover it. But I suspect that it is only a pernicious, Pythagorean coincidence.
Unlike subitizing, in counting our attention moves from one ob- ject to another. On the other hand, and somewhat surprisingly, experimental studies (using PET, positron emission tomography scans) have failed to find differences in the neurophysiological ac- tivities of brain in subitizing and counting [212].
Even if it turns out that subitizing and counting are both im- plemented by the same system of neuron circuits, they should cor- respond to two different modes of its activity, with some kind of a phase transition between the two. The task of mentally putting 6
4.1 Parables and fables 63
pigeons (recall, the borderline value) into5holes should, therefore, put the system in the critical zone. It is tempting to suggest that the criticality of the 6/5 combination may provoke the strongest response, leading to a new pattern of interaction (or interference) between neuron circuits.
Again, this is my speculative guess based on introspection, but I believe that mathematicians may like the classical “6/5” formu- lation of the Pigeon-Hole Principle because they hear, deep inside themselves, a subtle click made by a mathematical concept attach- ing itself to the neuron circuitry of their brains.
After I wrote this paragraph, I come across the following excerpt from Vandervert [238, p. 87]:
The experience of intuition “is” the feel of the entrain- ment, so to speak, of the neuro-algorithms of perception with the newer ontogenetic neural subcircuitry retoolments (Edelman [171]) that undergird mathematical discovery. We might speculate that the “aha” experience and exclamation occur uponrecognitionof the locking-in of the entrainment of the two systems of algorithms.
I am in agreement with this position; however, I prefer to ex- press the same ideas is simpler words, leaving it to experimen- tal neuroscientists to develop an appropriate terminology. Also, I would rather avoid the use of the word “intuition” as both exces- sively general and, at the same time, restricted to the process of mathematical discovery. The “locking-in” can be much more fre- quently found in routine everyday activities such as understand- ing and digesting other people’s mathematics. It can definitely be found in the act of accepting a proof. Remember Coxeter’s proof of Euler’s Theorem (Section 2.3); do you hear that click in your brain? It is likely that some mental objects have a higher degree of affinity to the hard-wired structures of human cognition and an- chor themselves more easily than others, while more sophisticated ones require the mediation of mental objects which have already been interiorized. For the moment, let us treat this as no more than a metaphor for the inner working of a mathematician’s brain.
As I discuss in more detail in Section 12.5, computer science and complexity theory might provide some hints for the further devel- opment of this metaphor, for predicting or explaining why certain objects are easier to interiorize than others. It is worth mention- ing that computer scientists and cognitive scientists have already started to think about abstract models of counting and subitiz- ing. For example, a possible model of counting is discussed by da Rocha and Massad [218], who claim that such models can be con- structed from the so-called Distributed Intelligent Processing Sys- tems. Peterson and Simon [211] claim to have an executable model of subitizing of up to4objects (apparently, it is available for down- load from the Internet). But what I would really like to see is an abstract model of counting (possibly, a further development of [218]
or [211]) which explains the subitizing/counting threshold (and the “6/5” formulation of the Pigeon-Hole Principle) at the “software level”, thus accounting for the indiscernibility of these two activ- ities at the physiological, “hardware” level.
In short term memory, we also have a threshold 7±2 similar to that of the subitizing threshold: people usually can memorize a 7 digit telephone number, but encounter serious difficulties with 10 digit numbers. Mathematical analogues of memory are easier to formulate than that of subitizing. Since memory is an adaptive, ever changing and dynamic system, stable patterns indynamical systems (the words “dynamical system” are now understood as a precise mathematical term) appear to be natural candidates for mathematical phenomena whose behavior might be analogous to the behavior of human memory. Can the7±2 threshold be found in mathematical dynamical systems?
I quote, at length, from a paper by Paul Glendinning [181] who explains recent works by Kaneko aimed at exactly this elusive tar- get: find a natural 7 ±2 threshold in the behavior of dynamical systems [193, 194].
Kaneko’s starting point is the idea of an attractor of a dynam- ical system. Classically attractors are thought of as invariant sets which ‘attract’ nearby points. That is, there exists an open neigh- bourhood of the set such that any solution with initial conditions in this neighbourhood eventually tends to the invariant set. There are all sorts of variants on this definition, but the defining feature of the attractor is a neighbourhood on which some property of at- traction holds. Twenty years ago Milnor [382] pointed out that this is a topological definition and introduced a measure-theoretic defi- nition in which the open neighbourhood is replaced by a set of pos- itive measure locally (or, again, a variant of this idea). The differ- ence between the definitions is one of how to give the words ‘lots’ or ‘most’ mathematical meaning—either in a topological sense (open neighbourhoods) or in a measure-theoretic sense (positive mea- sure). The term Milnor attractor is now used to describe an attrac- tor which attracts a large set in measure but not in topology. The important point here is that in any neighbourhood of a Milnor at- tractor, there are points which move away from the attractor, and which may be in the basin of another attractor. This gives Milnor attractors an interesting property: for a topological attractor all solutions close enough to the attractor are attracted, which is a sort of stability, whilst for a Milnor attractor there are points arbi- trarily close to the attractor which move away from the attractor.
Suppose now that a ‘memory’ is an attractor of a dynamical sys- tem (the brain: neurons etc). To be stable to perturbations, i.e. to be a useful memory, it is natural to ask that a memory should be a topological attractor rather than a Milnor attractor. The mem- ory of a telephone number ofN digits may be represented by sys- tems in RN (although more subtle questions about information content could be explored). Kaneko [194] considers an ‘prototype’ system of globally coupled maps and shows that the proportion of points which tend to Milnor attractors and hence the proportion which correspond to poor memory increases untilN ∼7and then
4.1 Parables and fables 65
plateaus. In other words, in order to minimize the proportion of easily forgotten states one should keep the dimension below 7. He has a rough, but appealing, argument to support this for the sys- tems he considers which basically comes down to a combinatorial balance between(N−1)!and2N, the balance being atN∼5.
Shuji Ishihara and Kunihiko Kaneko [193] extend this idea to a more conventional neural net model. This hasNinputsxi(0),i=
1, . . . , N, and a feed forward mechanism which passes information through successive layers indexed by`where
xi(`+ 1) = tanh à β √ N X k aik(`)xk(`) !
and whereaik(`)are chosen randomly from a Gaussian distribu- tion with standard deviation 1. The tanhfunction acts as a sig- moidal on-off switch for large enoughβ, with solutions approach- ing values close to±1. If0< β <1then solutions decay to zero. There is an intermediate range ofβwhere the behavior of the sys- tem depends significantly on N. If N is less than about (you’ve guessed it) 7, then the output at layer L assumes only a small number of distinct values and there is a clear separability of in- puts (Ishihara and Kaneko work with L = 30, but the principle is independent of the depth of layers used). If N is larger then the dynamics as a function of layer is chaotic, and small changes in the input (which we would hope should stabilize) create large differences in the output. This critical changeover in behaviour is striking, and once again suggests that there is a critical size of systems above which information becomes garbled.
The subitizing and short memory thresholds are just two of many problems which make me yearn to see the dawn of cogni- tive metamathematicswhich would turn the “software/hardware” metaphor into a theory. I believe that these are not unrealistic ex- pectations. The development of brain scan techniques appears to have reached a level where at least some of the ideas mentioned in this book can, with due effort, be made into experimentally refutable conjectures.
However, we shall still have only isolated experiments and ob- servations until mathematicians start, in earnest, to develop math- ematical models of mathematical cognition—this is where the true cognitive metamathematics will be born.
It is time now to return to the discussion of the 4/5 threshold in use of plurality markers in Russian, see Section 3.2. I mention there an alternative explanation of their appearance: the predeces- sor of the Indo-European language had numerals formed from base 4 (fingers of the hand) with thumb marking the next register [434]. Numerological theories (like conspiracy theories), however, are famous for their resilience. Indeed, there still remains a possibility that phase transitions in behavior of attractors in dynamical sys- tems would provide a uniformexplanation forboth the subitizing threshhold 5/6 and the fact that we have 4+1 fingers. This phase transition can manifest itself in a variety of ways:
(a)In differentiation of cells in embryogenesis, where a4 + 1finger anatomy can happen to be the easiest to achieve.
(b)In a relative ease of the neural control of complex movements of a hand with4+1fingers in comparison with other designs (here, nature tends to prefer simple solutions—this has been already observed by neuroscientists [149]).
(c)In the dynamics of patterns of activation of neural paths in im- age processing in humans which imposes the subitizing/counting threshold.
(d)Finally, in the architecture of short term memory and its influ- ence on word processing in humans.
It is possible, of course, that some other mathematical theory can work in place of the theory of dynamical systems. So far, highly speculative applications of dynamical systems bring to mind an old adage:
if all you have is a hammer, everything looks like a nail.