• No results found

Meme First Expression Ordering

7.2 Analysis Methodologies and Techniques

7.2.4 Meme First Expression Ordering

First expression ordering analysis is a metric selected for externally validating meme transmission against external data. The first time an agent expresses a meme provides direct proof that an agent has learned a meme. While the observed data may not provide the exact time the agent learned the meme, their first expression provides an upper bound for the time that each agent learned the meme. By definition, an agent A must know the meme at some tA value before the time of an agent’s first expression TA, where TA > tA. As such, the order of first expressions provides a metric that bounds the time span that each agent could have learned the meme. This means that each agent can be ordered by their time of first expression, as shown in Equation 7.6 (where it is assumed agents are assigned a subscript according to their order of expression).

TA1 <=TA2... <=TAN (7.6)

If the cumulative probability of expression is an increasing function with respect to the time passed after learning a meme, then the first expression time provides information about the order that the each agent learned the meme. In some circumstances, this information may be harnessed to estimate the order that agents learned memes. This approach will not be used in this analysis, however, since individual factors have a considerable impact upon the time between learning a meme and expressing a meme (if it is expressed at all).

Instead, the times of first expression are considered as an ordering that ranks a covariate combination of an agent’s learning of the meme and their motivation to express the meme. As noted in Section 6.1.2, an order of first expression for three potential memes was extracted from the Stanford Prison Experiment hourly logs. Similarly, each simulation run produces an ordering for when agents first performed each action. Comparing the simulated orderings against the observed orderings provides a form of external validation.

Inversion Count Algorithm

In principle, it is straightforward to compare two ordered series against each other statistically. Any ordered series can be reduced to a set of ordered pairs, and the sets of ordered pairs can be compared directly, with adjustments made to adjust for duplicate information. In practice, the situation for meme expression is much messier. Agents may not express a meme on a particular run, leading to ambiguities at the tail of the ordered series. There is simply no way to infer which agent expressed a meme first if neither expressed it. Additionally, even if the series were always complete- it would be necessary to adjust for duplicated information due to transitivity (ex. A> B and B> C implies A>C).

As a result, an algorithm was developed to statistically analyze the distance between two ordered series, which allows for right censoring of both series and for ties in both series. This algorithm is based upon the principle of series inversions. An inversion count algorithm can determine the minimum number of single- element swaps that are necessary to turn one ordering into another ordering. Table7.5 displays a simple example of inversion counting. Such algorithms are frequently used to measure the distance between sequences, such as in DNA chains. The inversion number of a random permutation follows a distribution somewhat similar to a normal distribution (Margolius,2001). The mean inversion count for random permutations is half the maximum inversion count, giving a null-hypothesis condition when examining inversion counts.

Table 7.5: Inversion Counting Example

Sequence Inversion Tabulation

Real Sequence [A,B,C] -

Permutation [C,B,A] -

[B,C,A] +1

[B,A,C] +1

[A,B,C] +1

Inversion Count 3

The algorithm takes advantage of these principles by calculating the inversion number to turn a simulation sequence into the ground-truth sequence and comparing it to the maximum possible number of inversions possible, given the simulation sequence and ground truth sequence. The algorithm handles ambiguously ordered or simultaneously occurring events by removing ignoring inversions within that subsequence when calculating the inversion number and the maximum inversions. This retains the property that an above average number of inversions would be more than half of the maximum possible inversions.

Table7.6shows the results of using the modified inversion distance algorithm on some example sequences. For the sequence and permutation, the second

Table 7.6: Modified Inversion Count Examples

Sequence Permutation Inversions (I) Max Inversions (M) Nearness (1 - I/M)

[A,B,C,D] [A,B,C] 0 3 1.00

[A,B,C] [B,C] (A) 2 3 0.33

[A,B,C] [B] (A,C) 1 2 0.50

[A,B] (C,D) [C,B,D] (A) 4 5 0.20

parenthetical list represents right-censored elements (with an unknown order). These examples demonstrate some of the dynamics of the distance calculation. The first example has no inversions, as the sequence is in the correct order. While one element is not present, it is not considered unless one designates it as being censored in some way. The second example demonstrates what happens when an element is censored from the permutation sequence. Since only one element is censored, the sequence might as well be fully observed (since the order is fully known). The third example has two right censored elements, meaning there is one less observable inversion. In this case, both the number of inversions and the number of possible inversions are reduced by one. This reduces the distance and improves the nearness score, since the inversion between A and C in the prior example has been replaced by uncertainty. The last example demonstrates the ability to have censored elements in the ground truth sequence. Ties are handled similarly to censored elements, with no inversions counted by either the inversion count nor the maximal inversion count. Appendix I notes some additional properties of the algorithm. As noted, given a random permutation with random right-censoring (the null hypothesis), the nearness calculation approaches 0.5 for this metric. A nearness above 0.5 means that a sequence is closer than chance.

This modified inversion number algorithm provides a useful metric for comparing the distance between an individual simulation sequence against the ground truth, while naturally normalizing this distance and adjusting for censored data and ties. It provides a way to determine which experimental cases are closer to the ground truth data, and a way to tell if the simulation as a whole is performing better than chance at predicting the order of first expression for each meme.

Median Expression Position

A second metric for comparing simulation expression orderings with ground truth orderings is by determining the median order for each agent’s first expression. This ordering is determined by calculating an agent’s order within a given simulation run, as compared to its peers. For each agent, this generates a set of data in the form [O1, O2, ..] whereO1 is the order that the agent took the action

in the first run andO2 is its order for the second run, etc. Due to the possibility

of ties, an agent may share its order with another agent on a given run. In this case, all simultaneously acting agents are assigned to the average of the slots they would have occupied as a group (ex. three agents tied for fourth would all be marked as 5, the mean of [4,5,6]). From an agent’s expression position across multiple simulation runs, an agent’s median expression position can be calculated by taking the median value.

Table 7.7 displays a set of 3 example orderings and their resulting median sequence. For reference, the indices of the sequences are shown in the first column. As one can see, the positions in the median sequence are determined by the median position of each term. A is the first element, since its positions were (1,1,4). C and D share the third position, since their positions were (3,3,4) and (2,3,4) respectively. This approach helps generate a typical ordering for the elements, which is representative of those observed in the individual sequences.

Table 7.7: Median Sequence Example

Index Sequence 1 Sequence 2 Sequence 3 Median Sequence

1 A B A A (1)

2 B D B B (2)

3 C C D C, D (3)

4 D A C

Using the median values of agent’s expression positions, an expression order can be generated that indicates the typical positions that agents first expressed a meme. This provides an alternative method for comparing the simulation orderings against the holdout data. It also provides insight into which agents typically did not express a meme, since their median expression position will be “Never.” Additionally, since this produces an expression ordering the inversion count method can be applied to the median position ordering as well.