• No results found

4.5 Refinement of the Model to Match the Clustering Coefficient

5.1.2 Community Analysis of PhoNet

We modify the Radicchi et al. [129] algorithm (henceforth termed as MRad) to make it suitable for weighted networks and subsequently use it to conduct the community structure analysis of PhoNet. There are mainly two reasons for choosing this algo- rithm – (a) it is fast, and (b) it can be easily modified to work for the case of weighted networks. The basis and the modification of the algorithm are as follows.

Basis: Edges that run between communities should not belong to many triangles, because, to complete the triangle containing such an edge there needs to be another edge that runs between the same two communities, and such other inter-community edges are, by definition, rare.

Modification for a Weighted Network: Nevertheless, for weighted networks, rather than considering simply the triangles (loops of length three) we need to con- sider the weights on the edges forming these triangles. The basic idea is that if the weights on the edges forming a triangle are comparable then the group of con- sonants represented by this triangle highly occur together rendering a pattern of co-occurrence, while if these weights are not comparable then there is no such pat- tern. In order to capture this property, we define the edge-strength S for each edge of PhoNet as follows. Let the weight of the edge (u, v), where u, v ∈ VC, be denoted

by wuv. S can be expressed as a ratio of the form,

S = qP wuv

i∈VC−{u,v}(wui− wvi)

2 (5.1)

ifqPi∈VC−{u,v}(wui− wvi)2 > 0 else S = ∞. The expression for S indicates that the

strength of connection between two nodes u and v depends on (i) the weight of the edge (u, v) and (ii) the degree to which the weights on the edges forming triangles with (u, v) are comparable. If the weights are not comparable then the denominator will be high, thus reducing the overall value of S. PhoNet can be then decomposed into communities by removing edges that have S less than a specified threshold (say η). The entire idea is summarized in Algorithm 5.1. Figure 5.1 illustrates the process of community formation.

It is important to mention here that while employing the above algorithm for community detection, we have neglected those nodes in PhoNet that correspond to consonants which occur in less than 5 languages1 in UPSID. Since the frequency

of occurrence of each such consonant is extremely low therefore, the communities they form can be assumed to be statistically insignificant. Furthermore, we have also

Algorithm 5.1: The MRad algorithm

Input: PhoNet

for each edge (u, v ) do Compute

S = wuv

qP

i∈VC −{u,v}(wui−wvi)2

if qPi∈VC−{u,v}(wui− wvi)2 > 0 else S = ∞;

end

Redefine the edge-weight for each edge (u, v) by S;

Remove edges with edge-weights less than or equal to a threshold η; Call this new version of PhoNet, PhoNetη;

Find the connected components in PhoNetη;

Figure 5.1: The process of community formation

removed those nodes that correspond to consonants which have a very high frequency of occurrence. Since such consonants co-occur with almost every other consonant (by virtue of their high frequency) the edge-strength S is likely to be high for the edges that connect pairs of nodes corresponding to these high frequency consonants. The value of S for these edges is much higher than η and as they do not get removed from the network therefore, they can pull in the nodes of two clearly disjoint communities

Figure 5.2: The dendrogram illustrates how the retroflex community of /ã/, /ú/, /ï/, /í/ and /õ/ is formed with the change in the value of η

into a single community. For instance, we have observed that since the consonants /m/ and /k/ are very frequent, the edge connecting the nodes corresponding to them has a high edge-strength. The strong link between /m/ and /k/ forces the sets of bilabial and velar consonants which should ideally form two different communities to merge into a single community. Hence, we have removed nodes which correspond to consonants that occur in more than 130 languages2 in UPSID (a total of 13 nodes).

Note that even these 13 nodes which form a hub-like structure also indicate a high correlation among the features that characterize them thus attesting the presence of feature economy.

We can obtain different sets of communities by varying the threshold η. As the value of η decreases, new nodes keep joining the communities and the process be- comes similar to hierarchical clustering [135]. Figure 5.2 shows a dendrogram, which illustrates the formation of the community of the consonants /ã/, /ú/, /ï/, /í/ and /õ/ with the change in the value of η.

Some of the example communities obtained from our algorithm are noted in Ta- ble 5.1. In this table, the consonants in the first community are dentals, those in the second community are retroflexes, while the ones in the third are all laryngealized.

Table 5.1: Consonant communities

Community Features in Common

/t/, /d/, /n/ dental /ã/, /ú/, /ï/, /í/, /õ/ retroflex /w

˜/, /j˜/, /m˜/ laryngealized