Comparison of Information Content - COMPUTING RESYLLABIFICATION RATES AND INFORMATION CONTENT

COMPUTING RESYLLABIFICATION RATES AND INFORMATION CONTENT

2.4. Comparison of Information Content

The previous section showed how resyllabification concentrates on a small number of word edges and often in a limited set of lexical items. Resyllabifying these limited contexts would allow computational savings when compared to syllabifying all words each and every time they are produced. While storing syllable structure leads to computational savings, the trade-off is in storage costs. Storage of syllable structure necessitates another layer of information in addition to the phonemes of each word-form. How much more storage would be required? It is impossible to precisely estimate storage costs independent of a particular representational scheme, but one approach to this question is to quantify the minimal storage requirements under optimal conditions. One of the preferred methods of doing this is by calculating information entropy.

Since the inception of Information Theory by Shannon (1948), this branch of applied mathematics has broadened to a variety of fields including natural language processing and statistics. A key concept within Information Theory is ‘entropy.’ This is usually expressed as the average number of units (bits, nats, etc) that are necessary for storage or transmission.

The entropy of a discrete random variable is defined as the measure of the amount of uncertainty associated with it. If p is the probability mass function of a random variable X, then the entropy of X can be defined as:





58 where b is the base of the logarithm, the common values for which are 2 (for bits), e (for nats) and 10 (for dits). If applied to a simple example such as a coin toss, we can apply this equation for a fair coin to arrive at 1 bit as the information content. This means that in transmitting the outcomes of a coin toss, we need only 1 bit to store whether it is heads or tails and the uncertainty of the measure is equal to that.

While English is probably not a stationary ergodic process, it is still possible to arrive at an entropy rate. The earliest attempt to apply information theory in such a manner was by its founder. Shannon (1951) devised a guessing game in which he had human participants guess successive letters in a sample English text and arrived at the entropy of 1.3 bits per symbol (where the symbols consisted of 26 letters and a white space character). A later experiment (Cover & King, 1978) using 12 subjects and a sample of 75 letters from the same source as Shannon (Jefferson the Virginian by Dumas Malone) arrived at an estimate of 1.34 bits per letter. All these experiments were conducted to study the entropy of written English.

The information content of written language has had much focus since the inception of Information Theory. However, the principles of Information Theory have rarely been applied in spoken outputs. Here we quantified the information content that is required for lexical storage in three different speech production models. The models we compared were Dell’s spreading activation model (Dell, 1986), the LRM model’s serial phoneme representation (Levelt et al., 1999), and the LEWISS model with syllabic structure within the lexicon (Romani et al., 2011).

2.4.1. Method

The objective of this part of the project was to compare the storage costs of a model that stores syllable structure in the lexicon alongside established models that represent word forms using other methods. The English, Italian and Hindi Corpora were analysed to

59 calculate the frequency distribution of various token types as they are defined within the lexicon. These frequency distributions were then used to calculate the entropy of each model.

The Dell model has phonemes differentiated according to their syllabic position.

Therefore, the /p/ in pit would be an onset phoneme and is a different unit from the /p/ in tap which would be a coda phoneme. In exchange for storing the two phonemes separately, this representation allows a transparent account for syllable-initial aspiration and other syllable based allophones. The tokens for this model were phoneme onsets, peaks and codas.

The LRM model does not allow syllables to be located within the lexicon. This is justified as an economical way to deal with resyllabification. The word forms within the lexicon are connected to their phonological segments with their serial order encoded.

Therefore, the tokens for this model were the frequency distributions of the individual segments in relation to their serial order in lexical words. This model also proposes the existence of a mental syllabary from which articulatory motor programs are retrieved.

However, as lexical storage calculations involved the mental representation of word-forms as they are stored in the mental lexicon, the storage costs of a syllabary were not taken into account.

In the LEWISS model proposed by Romani et al. (2011), syllable structure is present within the mental lexicon. The tokens for this model were the structural and segmental information for each syllable. The structural information content was obtained by analysing the frequency distribution of syllable-based onsets, peaks and codas, while the segmental information looked at the frequencies of individual phonemes (44 basic phonemes).

60 Table 3 The Units of Representation for each Model

Model Representations Considered for calculation

Dell model

Onset phonemes Peak phonemes Coda phonemes

LRM model

Phonemes according to serial position

LEWISS model

Phonemes Syllable structure

After the entropy rates of each of these scenarios were calculated, they were used to calculate the storage needs of all the monosyllabic words in a selected corpus. For English this was the CELEX dictionary (N=6707). For Italian (N=579) and Hindi (N=2621), the list of monosyllabic words was derived from their respective corpora. A program isolated all the words consisting of a single syllable and applied the entropy rate to each segment and/or other lexical information (serial position, syllable structure, etc). Monosyllabic words were used to gain a scaled comparison of the information requirements of the three models. A cursory glance of the information content required for a segmental or structural unit does not provide a good comparison. As the LEWISS model required storage of structural information

61 (which varies from word to word), comparing only a few words is not sufficient. The overall information content of a fixed set of words defined according to some criteria (e.g., monosyllabic), provides a good comparison of information content across all three models.

2.4.2. Results

The CELEX dictionary consisted of 6707 monosyllabic words. When compared together, the storage costs for the LRM model were considerably higher than the Dell or LEWISS models. Although the Dell model stores separate consonants for onset and coda positions, it saves storage costs by not having to specify where they connect to the word (since the consonants are marked for their position by nature). The LRM model needs to store the segments and their serial position, making for a higher storage cost. The LEWISS model comes between these two extremes. Phonemes do not need to be stored in separate copies that are specific to syllabic position, but another level of syllabic information (syllable structure) needs to be stored as well.

(a) English (b) Italian and Hindi

Figure 16. Comparison between storage requirements of speech models

It must be noted that the bits that are mentioned here are not in any way meant to represent any unit of actual storage in the mental lexicon. Rather, it is a way in which to visualise and compare how storage needs contrast in terms of their information content and thereby deduce how they might apply in actual fact.

62 2.4.3. Discussion

The results show that the entropy required for storing structural information within the lexicon comes between those for the LRM model and the Dell model. Intuitively, it might appear that the LRM model uses the least amount of storage as only phonemes are stored with no structural information or syllable frames. But the results show that overall it requires more storage because the phonemes have no specification as to where they fit into a word other than serialisation. The Dell model requires the least amount of storage because the units already have their syllabic positions intrinsically assigned to them. But they do not account for resyllabification and would require additional computational effort in order changed syllabic positions during resyllabification. Storing syllable structure requires less information than the LRM model as the structural information has been associated with the segments and does not require further computation. The only burden would be to resyllabify the word edges before output. However, as the segments do not have an intrinsic syllabic position, they can be resyllabified before phonetic transformation in relatively less time than in the Dell model.

The results are surprising in that while LRM purports to save storage by increasing computational costs, it appears to be costly in both storage and computational requirements.

On the other hand, while it may appear that the Dell model is the most inefficient in terms of storage, the separation of phonemes according to syllable position saves overall storage costs as they do not need further information to link with a morpheme or word node. The compromise seems to be the LEWISS model, which comes between the two models by storing just enough information to specify syllable structure, while not requiring the storage of phonemes according to allophonic distribution.

In document Syllable structure in the mental lexicon: neuropsychological and computational evidence (Page 74-80)