2.3 Graphs of Language
2.3.2 Complex Network Growth and Models of Language Network Evo-
After seeing how language can be treated as a graph or network, the question has to be asked how such networks evolve. There have been different proposals for different kinds of networks. The most important ones are to be presented and compared here.
Random Networks (Erd¨os and R´enyi)
In 1959, Erd¨os and R´enyi proposed a model of random graphs. They start with n vertices
and connect those with N edges with a probability P (Erd¨os and R´enyi, 1959, p. 290).
Such a network’s degree distribution follows a Poisson distribution. The Poisson distri- bution indicates the probability that a certain state occurs. If we know that a graph has n vertices and N edges, the average number of edges a vertex has, its degree, should be
A = Nn. The Poisson distribution shows the probability of the state A ± x to occur. An
example of an Erd¨os and R´enyi graph is given in Fig. 8.
With a lack of data on real-world networks, this model could not be tested extensively. The model constructs a random network that is not like real-world, complex networks, as we have seen them already and will analyze later on in this thesis. The model is missing the most basic features of complex networks: Neither the power law distribution, and thus the scale-free feature, nor the small-world property of natural networks can be explained
using this model (Barab´asi and R´eka, 1999, p. 510).
Small-World Networks (Watts and Strogatz)
One feature of complex networks that the above shown model does not predict is the small-world phenomenon. Watts and Strogatz (1998) propose a model that explains how small-world networks emerge.
Having a network of n vertices and k edges per node, Watts and Strogatz “rewire each edge at random with probability p” (Watts and Strogatz, 1998, p. 440), where p = 0 results in an ordered network (see Fig. 9(a)), and p = 1 results in an unordered network (see Fig. 9(c)). While an ordered network has a long average path length L (a large- world network), and a high clustering coefficient C, a network of 0 < p > 1 (see Fig.
Figure 8: An Erd¨os and R´enyi random graph with 100 vertices and a connectivity prob- ability of 0.2.
9(b)) leads to a less clustered network with a short average path length or small-world network. Even for “small p, each short cut has a highly nonlinear effect on L, contracting the distance not just between the pair of vertices that it connects, but between their immediate neighbourhoods” (Watts and Strogatz, 1998, p. 440). But this model stills misses the power-law distribution that is typically found in complex networks, and it is no explanation of how such networks evolve since it does not incorporate growth of any kind.
Preferential Attachment (Barab´asi and R´eka)
Network models of language allow one to analyze the supposed structure of the mind and
to predict and analyze the evolution of language. Barab´asi and R´eka (1999) describe a
model of network growth using preferential attachment:18 Having a network, this mod-
els predicts that “new vertices in the growing network are preferentially attached to an existing vertex with a probability proportional to the degree of such a node” (Ferrer and
Sol´e, 2001, p. 2263). This is equivalent to the rich-get-richer principle introduced before.
The model leads to a power law distribution, which is generally seen as a natural result
18Preferential attachment applies Bayesian models which are also used (e.g., to predict the growing of
the mind (cf. Tenenbaum et al., 2011)). Bayesian models complement network approaches and are used widely in language models in psychology, linguistics, and medicine.
(a) Probability p = 0. (b) Probability p = 0.5. (c) Probability p = 1. Figure 9: Watts and Strogatz graphs with n = 20 and k = 4.
of the way a system, like a network, grows over time. Barab´asi and R´eka state that this
indicates that “the development of large networks is governed by robust self-organizing
phenomena that go beyond the particulars of the individual systems” (Barab´asi and R´eka,
1999, p. 509).19
In formal terms, the probability P (ki) that a new vertex in a network starting with
m0 vertices and adding a new vertex at every time step t with m ≤ m0 edges will connect
to an existing vertex i depends on the degree ki of that node:
P (ki) =
ki
P jkj
.
This leads to a random network with t + m0 vertices and mt edges that is “following a
power law with the an exponent ymodel = 2.9 ± 0.1” (Barab´asi and R´eka, 1999, p. 5).
An example is given in Fig. 10. Comparing this graph to an Erd¨os and R´enyi random
graph, one can see that the distribution of edges per vertex is not distributed in Poisson fashion, but that some vertices have a reasonably higher degree than others. This also leads to clustering as Fig. 10 shows.
This model predicts degree distributions that follow a power law, more or less like it can be observed in networks of natural language or, in general, in scale-free, small-world networks.
In Fig. 11, the degree distributions of the models presented above are given. In
19In protein networks, for example, scale-free distribution follows from preferential attachment in the
Figure 10: A Barab´asi and R´eka graph with 100 vertices.
accordance with what has been stated above and as one can see, neither the degree distribution shown in Fig. 11(a) nor the one in Fig. 11(b) follows a power law. Only the
model of Barab´asi and R´eka (1999), Fig. 11(c), shows a degree distribution that is found
in most natural networks, such as social networks, and, as will be shown in later chapters, in ontologies.
Language Network Models (Steyvers and Tenenbaum)
Steyvers and Tenenbaum compare the Barab´asi and R´eka model to findings in natural
language networks20 and find that preferential attachment as proposed by Barab´asi and
R´eka (1999) does not explain the structure of semantic networks.
From a language evolution21 point of view, Steyvers and Tenenbaum argue that
“[w]ords that enter the network early are expected to show higher connectivity” (Steyvers and Tenenbaum, 2005, p. 44). Also existing complex concepts (i.e., those with a high
20Unfortunately, they restrict themselves to an analysis of networks and ignore the large amount of
linguistic research in the areas of language evolution and language change.
21Steyvers and Tenenbaum’s model could just as well be applicable to individual language acquisition,
even though it cannot account for all the diverse processes that happen during the acquisition or evolution of natural languages.
(a) Degree distribution of Erd¨os and R´enyi ran- dom graph in Fig. 8.
(b) Degree distribution of Watts and Strogatz graph in Fig. 9(c).
(c) Degree distribution of Barab´asi and R´eka
graph in Fig. 10.
Figure 11: Degree distributions of Erd¨os and R´enyi, Watts and Strogatz, and Barab´asi
connectivity or degree) in a language are more likely differentiated over time. This means complex and very wide terms tend to be differentiated into narrower terms. There are of course a lot of other processes that can be discussed concerning the evolution of a natural language, but this process is leading to concepts with a high connectivity (i.e., hubs or authorities) and thereby to a small-world network.
Two models for the growth of language networks are proposed by the authors: The first explains the growing of an undirected network, while the second one explains the growth of a directed network (e.g., a semantic relationship network of words such as WordNet or other language ontologies).
The first growth model can be formulated as follows: Given is a fully connected network of size n that grows over time, at each time point t with t(n) vertices, a randomly chosen vertex i is differentiated by adding a new vertex with M connections (M < n) to randomly chosen vertices in the neighborhood of i. This leads to the effect that a “new vertex can be thought of as differentiating the existing node, by acquiring a similar but slightly more specific pattern of connectivity” (Steyvers and Tenenbaum, 2005, p. 57).
Now the probability that vertex i is chosen has to be defined. The probability Pi(t) is
corresponding to the connectivity of the vertex i (i.e., its degree):
Pi(t) =
ki(t)
Pi−1
n(t)ki(t)
.
The degree of i at t, ki(t), is divided by the sum of degrees from i − 1 to n(t) (i.e., all
vertices at time t)
To choose a vertex j in the neighborhood Hi of i that the vertex to be added will
be connected to, the probability Pij(t) is calculated in proportion to the utility of the
corresponding node:
Pij(t) =
1
ki(t)
.
This is repeatedly done until M vertices from Hi have been chosen. Then the new vertex
is connected to them. These steps are repeated until the desired network size is reached. The network produced by the model can then be compared to a real-world network of the same size.
The second model of network growth results in a directed network. The process is very similar to the first model. The only difference is the connection of the new node:
Still, the vertices it will be connected to are chosen with the probability Pij(t), but the
direction of the connecting edges is chosen randomly.
While this model fits the examined features of language graphs, it should be mentioned that a model of the growth or evolution of a language network should also account for the loss of words (i.e., at randomly chosen time steps the network should to be pruned and poorly connected concepts should be erased). A diachronic analysis of language shows not only a differentiation of words and hence a semantic shift, as well as that concepts that are poorly connected and therefore less frequently used, cease to exist in the vocabulary of a single person or even a language community (cf. Pagel et al., 2013). This property is, in my opinion, missing in the model. Also, Dorogovtsev and Mendes (2001) point out, because of a semantic shift, existing concepts should at time be rewired to other concepts.