• No results found

This technique is one that is defined here for the first time. It developed out of ideas on the clustering of discrete genetic traits in the Kellis-2 cemetery and the potential efficacy

of removing the first order effects by collapsing the distribution of graves to a simple 30x30 matrix. How this might be accomplished was explored, but it quickly became quite complex, as it involved a number of subjective decisions as to what was required to place two graves adjacent to each other and especially as it related to some of the internal gaps in the cemetery (see Figure 5-3 below). In order to remove the subjectivity, it was necessary to define a specific distance required for adjacency. At this point it became evident that the entire effort of removing the first order effects was unnecessary since it would be possible to simply count the number of pairs of graves within the specified radius that shared the same discrete genetic trait. While it was initially developed in the context of the Kellis 2 analysis, it has wider application. In the subsequent discussion the term grave has been generalized to event. The actual count developed is a count of pairs of events, not the total number of events with the trait which are found within the specified radius. Thus, two events within the specified radius would be a count of one, three events all within the radius would be a count of three and four would be a count of six, etc. In a highly clustered set of events with the trait, the count developed can exceed the number of events with the trait or even in the cemetery. The routine works through the list of events with the given trait one at a time counting how many other events with the trait can be found within the specified distance. Any given pair though is only counted once (i.e., A to B adds one to the total but B to A does not.)

This is defined mathematically as follows. Given a set of n events 𝑠1…𝑠𝑛

𝑃𝐶(𝑑) =∑ 𝑛𝑜. [𝑆 ∈ 𝐶(𝑠𝑖, 𝑑)]]

𝑛 1

2

Where 𝐶(𝑠𝑖, 𝑑) is a circle centred at event 𝑠𝑖 of radius d

And 𝑛𝑜. [𝑆 ∈ 𝐶(𝑠𝑖, 𝑑)]] is the number of events within radius d of event 𝑠𝑖

As defined this is a continuous function but in practice it is usually calculated for a small number (5-10) values of radius d.

The significance is calculated against an assumption of random labeling using a Monte Carlo routine across the set of events that display either the presence or absence of the

trait. It does not include other events, the condition of which makes it impossible to determine presence or absence of the specific trait. The Monte Carlo routine randomly selects a number of events without replacement which matches the count of events displaying the discrete trait. This selection calculates the Proximity Count and then repeats a specified number of times. The number of counts greater than the actual count is totaled and divided by the number of randomizations to obtain the probability of the actual count. This procedure is effectively a one-tailed test.

In using the original single run statistic it became evident that the actual distances yielding significant results can vary from trait to trait, with significant clustering

occurring at different distances. For example, in one case there was significant clustering at 3 m and in another significant clustering occurred at 7 m. Consequently the R routine was modified to do several runs with different user defined distances, with statistical significance calculated at each distance. In the Kellis 2 case this was set to 3, 5, 7 and 10 m. This is a global statistic and, with the addition of multiple runs at multiple distances, could be described as a function similar to the F G and K functions as defined in Bailey and Gatrell (1995).

While the Proximity Count was originally created to define clustering, in practice some sets of data created low counts that were smaller than the vast bulk of the randomizations, creating p values such as .95. In other words, only 5% of the randomizations created lower than the real count (e.g. Frontal Grooves at 7 m; see Figure B.7). This result is the opposite tail of the distribution and essentially implies that the trait is more evenly spaced than would be expected at that distance. Note that in this example Nearest Neighbour- Random Labeling also shows even spacing, but not with significance. At this point in time, since a count of events has been calculated, no attempt has been made to define differences between the two tails of the distribution and work that into the statistic. It would require something like “a count of 18 at 5 m, clustering with p = .05”. Currently high p values can be taken to mean even spacing.

The statistic was originally defined in the context of the Kellis 2 cemetery analysis in Chapter 5. However, as currently coded, it is applicable to other cases where we are

looking at the relative distributions of two types of events such as occurred in the Chapter 4 Davidson site analyses.

The routine was implemented using the R Statistical programming language.