COMPUTING RESYLLABIFICATION RATES AND INFORMATION CONTENT
2.5. Combinatorial advantages of Syllable Structure
It is often the case that in the theoretical assumptions of speech production models, linguists prefer not to store anything which is predictable (and therefore, post-lexically computable). However, this may not necessarily be the case, as storing certain information that is (mostly) predictable might have other advantages. An efficient method of storage and retrieval is Content-Addressable Storage (CAS). While there are many theories about storage and retrieval, this section works with an assumption that is highly efficient in order to illustrate how predefined phonological information can affect a system’s efficiency. A content-addressable storage and retrieval algorithm defines the potential addresses that a representational scheme can generate. Addressing is based on the content and not the location of the data, thereby creating an efficient method of retrieval, i.e., it is unnecessary to search through the data serially in order to retrieve stored information. However, for such a system to work there must be space in the system for all potential addresses to avoid generating addresses that don’t exist. In an ideal system, the potential space will closely match the addresses that are actually used. An inefficient storage system would be one that generates a very large set of address (for which space must be allocated) only to use a small portion of the set.
This section attempts to illustrate how syllable structure allows for more efficient storage and retrieval. If the phonemes are unstructured, we would need a large amount of storage for potential permutations beyond the ones that are actually used. However, if there is an acquired structural constraint, then much less storage is needed as there will be constraints on what can be stored. Using syllabically structured representations limits the combinations that the system can produce and provides a much better fit between possible lexical items and the lexical items that actually occur. This section outlines a context in which syllabic representations provide a critical function mediating between acoustic/articulatory demands
64 of an external signal and the addressing and storage demands of a mental dictionary. A phoneme inventory of 44 segments provides a combination of 447 combinations for a maximal syllable of 7 segments. Only a fraction of these are used in speech. Many of these combinations are not acoustically or articulatorily possible. A representational system that allows addresses for a large number of linguistically impossible segments is extremely wasteful. The system of phonological representation needs a mechanism for restricting the set of combinations to those that are articulatorily and acoustically possible and, further, to the set that actually could appear in the speaker’s language. In addition, a significant restriction of the potential space needs to be specified in advance – that is, before any experience with the language that the learner will acquire. We are not advocating the view of memory as a static entity with a very strict maximum limit. Rather, this is a thought experiment in which we attempt to illustrate the advantages of storing syllable structure if memory was organised using the principals of CAS.
We argue that the syllable, as defined in a particular language, is a framework that specifies possible combinations, and provides a mechanism for efficient mapping from acoustically distinct combinations to individual words, and from individual words to articulatorily manageable sequences that must be produced quickly in time. This is almost certainly not its only function – it provides the unit over which prosodic information is calculated, among other things – but we would like to advance the speculation that the combinatorial advantages that the syllable provides mediating between the periphery and a large memory store are not a trivial part of its function.
2.5.1. Method
As this section deals with a purely mathematical scenario to illustrate the efficiency of a hypothesis, it was thought to be sufficient to make use of English instead of Italian and Hindi as well. But it can be assumed that this will hold true for those languages as well. An
65 algorithm was created to identify all the monosyllabic words in the CELEX English dictionary. Another algorithm was designed to combine permutations of all possible onsets, peaks and codas in English. The onsets and coda permutations were based on the data collected from the analysis of the speech corpus in the first experiment (calculating rate of resyllabification). This provided a list of all possible syllable types in the form of V, CV, CCV, CCCV, VC, VCC, CVC, CVCC, CCVC, CCVCC, CCCVC, and CCCVCC (note that V stands for monophthongs and diphthongs making a maximum of 7 segments). The outputs of these two algorithms could then be combined to estimate the theoretical upper limit of the storage space necessary for English syllables and the space needed to store syllables for monosyllabic words. We also estimated the space needed to store all possible permutations of phonemes without any combinatorial constraints. This was created by using the following formula:
( )
where n represents the number of types to choose from and r the number of slots available. This formula was applied within the context that the seven positions for a syllable-like framework could be occupied by any phoneme independent of any syllabic constraints.
The comparisons can therefore be divided as follows:
All combinations that are possible for a framework of 7 slots and 44 phonemes
All possible combinations for all legal sets of onsets, peaks and codas in English
All the monosyllabic words in the CELEX English dictionary representing all
monosyllabic words that could exists in the lexicon of a native English speaker with a very large vocabulary.
2.5.2. Results
The results show that storing syllable structure gives a much more constrained content-addressable storage space as opposed a system without such constraints. It illustrates how a
66 system with a large amount of potential storage can make storage and retrieval extremely inefficient.
Figure 17. Comparison between theoretical content-addressable storage needs
While it may seem that this is a foregone conclusion, the main argument is the large disparity that exists between the results. Specifying a predefined space based on syllable constraints saves a lot in terms of unused storage space, rather than having a system without any prior specifications.