Well-mixed population - A model for code evolution

4.4 A model for code evolution

4.5.1 Well-mixed population

In the first scenario, each agent θiperceives the output of every other possible agent θj with the

same probability, that is p(θi, θj) = 1/n(n − 1) for every i, j ∈ [1, n], where n is the population

size. We consider a population of size 64, with an = 0.07. The conditional probabilities p (X | Y ) are optimised to maximise the mutual understanding. After maximisation, we obtained I (X ; X0) = 1.67 bits. Considering only the individual sensory input for the agents of the population, their growth rate is related to I(µ ; Y ) = 2.43 bits (we say that the growth rate is “related” since, as we showed in Eq. 3.20, the actual long-term growth rate is F − H(µ) + I(µ ; Y ), where F is a value obtained from some fitness function). However, after maximisation of their mutual understanding, they would have improved their growth rate in an amount related to their environmental information which considers their sensory input together with their messages, given by I (µ ; Y, X0 | Θ) = 2.76 bits. In Table 4.7, we analyse different noise values, showing the benefit of communication for low/high noise scenarios.

are universal; and, second, agents can only distinguish 6 out of the total 8 sensory states (distinguishing all of them would certainly increase the environmental information, but not necessarily the mutual understanding). We explain why we obtain these properties below.

y1 y2 y3 y4 y5 y6 y7 y8 (a) states of Y x2 x3 x4 x4 x1 x8 x7 x1 (b) 62 x2 x5 x4 x4 x1 x8 x7 x1 (c) 1 x6 x3 x4 x4 x1 x8 x7 x1 (d) 1

Figure 4.6: (a) Illustration of the sensory states in a grid. (b-d) Compact representation of a code p (X | Y ). For example, x2in code (b) is in the left top corner of the grid, and so is y1 in (a). Then, this means that p (x2| y1) = 1. The states X are coloured to make clear how many states a code can distinguish from Y . Below each code we show the amount of agents that adopted the code shown, which induces a partition of the sensory states.

Figure 4.6 shows three types of codes, which are represented by partitioning the sensory states according to p (X | Y ). The number of states that results from the partition is the number of states an agent can distinguish from its sensory states.

In this example, we say that the codes in the population are universal, although there are three types of codes. The reason lies in the non-semantic assumption of information theory: how we label the states of random variables is irrelevant for the computation of Shannon’s information-theoretic measures. For instance, state x5 (Fig. 4.6 (c)) denotes the same state

as x3 in Fig. 4.6 (b) and (d); and, in the same way, state x6 (Fig. 4.6 (d)) denotes the same

state as x2 in Fig. 4.6 (b) and (c). These are synonyms, because they have exactly the same

correspondence (although stochastic in the case of noisy sensors) to the environmental states. The question now is why the objective function settles in such an optimum. After all, a change in any code of any agent that would distinguish one more state of its sensors (7 instead of the achieved 6) would increase the mutual understanding. However, here we show that the adoption of synonyms can be disadvantageous when there is a limited set of outputs X. For example, had all agents adopted the code shown in Fig. 4.6 (b), then a change in the code of any agent that distinguishes sensory states y3 and y4 by using an unused output

would increase the mutual understanding. However, this is not possible since all outputs are in use, and any update would create inconsistencies that decrease the objective function.

An inconsistency would be expressing different sensory states with the same output x. For example, the output x6 is not used in the code scheme (b) in Fig. 4.6, then we could

use it such that p (x6 | y3) = 1. Now, agents with this updated code scheme would be able

to distinguish between y3 and y4. However, since there is one agent using this output to

encode sensor state y1, and since the population is well-mixed, then agents perceiving output

Let us note that increasing the available alphabet for choosing outputs alleviates this problem: the larger the set, the most likely it is that agents would be able to distinguish all of their sensor states. Typical solutions for doubling the alphabet (|X| = 16) shows a large number of synonyms, but yet agents can distinguish all of their sensor states (results not shown).

Noise variation

Our results have shown that a well-mixed population always leads to the emergence of a universal code for communication, assuming a cooperative scenario. Now we analyse whether these properties still hold when we vary the noise in the sensory input. Particularly, we ask if our original solution is still optimal when we vary the noise in the sensory input. If, as a result of this test, the obtained solution is still optimal, then our previous analysis is still valid independently of noise values. Let us note that the fitness landscape may change for different noise values, but our concern at this point is with the properties of optimal values, and not with the precise details of how the global optimum is reached.

I(Y ; Y0) I(X ; X0) I(µ ; Y ) I (µ ; Y, X0| Θ)

0.00007 2.99761 2.50 2.99874 2.99975 0.0007 2.98079 2.48 2.98969 2.99766 0.007 2.85492 2.37 2.92018 2.97780 0.07 2.05331 1.67 2.43756 2.76741 0.14 1.45372 1.17 2.02273 2.50325 0.28 0.68195 0.53 1.35849 1.91497 0.56 0.06954 0.05 0.43829 0.72595

Figure 4.7: Summary of further optimising the solution found in Sec. 4.5.1 by varying the value of for the sensory input (defined in Eq. 4.3). For each value, I(Y ; Y0) is the upper bound resulting from it, and I(X ; X0) is the result of further optimising the mutual understanding. In all these cases, the original solution (obtained from = 0.07) remained the same. We also show the average environmental sensory information of the population, I(µ ; Y ), for each considered noise value; and the average environmental information by considering together sensors and messages, given by I (µ ; Y, X0| Θ).

In Table 4.7 we show the results for further optimising the original solution (with = 0.07) represented in Fig. 4.6, for different values. In all of them, the equilibrium point from the original solution did not change with the updated noise value.

In document Information-Theoretic Models of Communication in Biological Systems (Page 62-65)