First Order Analysis of Clusters - The Use of Point Pattern Analysis in Archaeology: Some Metho

Within the traditional field of point pattern analysis as defined by Bailey and Gattrell (1995) and O’Sullivan and Unwin (2003), there are a number of techniques for mathematically characterizing clusters, such as Nearest Neighbour, and the F, G, and K functions. In

archaeology I have never found any of these particularly useful functions, possibly excepting the K Function. This lack of utility may be because of an unstated assumption that the study area which we are trying to characterize is a subset of a much larger, frequently ecological, niche. For example, many of the discussions in Baddeley’s (2015) Spatstat library of R routines are related to the distribution of plants within a subset of a much larger ecological niche. The difference in archaeology is that we are almost invariably dealing with something that is most definitely a cluster on the landscape and we are frequently looking at the entire cluster, so trying to prove it is a cluster is just mathematically demonstrating the obvious. However, what is of significant interest in archaeology is whether or not there is structure within the overall cluster that might provide insight into the habits of the people who occupied it. For example, the Bullbrook Paleoindian site (ca. 11,000 BC) is composed of a series of discrete clusters, each of which are interpreted as being smaller scale individual social units within a larger aggregation site (Robinson et al. 2009). In these cases, a simplistic characterization of the overall site cluster does not shed any light on the really important issues. The analysis of the coarse-grained flake tool-making debris distribution at Davidson included in Chapter 4 is a good example of the analysis of structuring within the overall site cluster. In this case, the use of Kintigh’s Pure Locational Clustering and the ArcGIS

functions of Kernel Density and a Hot Spot Analysis of a quadrat summary proved much more useful than the global statistics like the F, G and K functions.

In conducting the analysis of Davidson it became clear that there were some decisions required that at first seemed somewhat arbitrary. The selection of some of the options tended to give an interpretation of that data that fit my expectations, so these results were preferred.

But why is this particular result better than that one? Is it because it fits my preconceptions? Obviously, it is not a good thing to begin with that presumption. This problem led to a definition which I have not seen articulated elsewhere and which I call resolution focus. One standard point pattern analytical technique is called density estimation; one implementation of this in ArcGIS is called Kernel Density. This function calculates the relative density of each point on the map and constructs density contours, but points closer to the centre of the circle are weighted higher than points further away. The main parameter entered is the density radius. Different values here tend to give what initially appear to be very different results; for example, see Figures 2.1, 2.2 and 2.3.

The obvious question presenting itself here is which radius is “right”? In the literature of density estimation, the concept of resolution focus has been thoroughly discussed, where it is referred to as “bandwidth selection” (Bailey and Gatrell 1995; O’Sullivan and Unwin 2003). As can be seen, these texts contain diagrams not unlike these three figures. There are also general rules around selection of bandwidth, which generally take the form “not too generalized (like Figure 2-1) and not too localized (like Figure 2-3)”. This conclusion is true in general. For instance, the density map with a bandwidth of 50 m is not particularly useful, especially at the south end where 50 m takes in a lot of offsite area. The result is that a lower density is reported than if the area considered was restricted to the site boundaries and edge effects were controlled. Similarly, a bandwidth of one metre would produce a map that would simply put a one metre circle around each artifact with a few showing two or three adjacent artifacts. This would not show anything that could not be seen with the simple plot of artifacts. However, I would argue that

in the middle ranges of bandwidth, different features might be better isolated at different bandwidths. Such is the case, as will be discussed, with the Davidson study.

If density estimation were all that mattered, there would be no need to introduce the new term resolution focus. However, in the Davidson case study two other techniques were used, Kintigh’s (2015) Pure Locational Clustering and high/low clustering (Getis-Ord Gi*) of a quadrat summary. In both these cases, a similar concept applies but is not

articulated. Within Pure Locational Clustering, you can request various numbers of clusters to be isolated from your point pattern by entering a number that defines the number of clusters that you want to produce. As with bandwidth specification in density estimation, a fewer number of cluster gives results consistent with a large

bandwidth and a request for many clusters gives results similar to narrow bandwidth specification. Thus, the request for the number of clusters in TFQA actually functions as a resolution focus variable. In fact, as was found in the Davidson case study through a process of trial and error, the results of density estimation and Pure Locational Clustering produce similar results when the resolution focus matches.

The other technique with similar considerations is the application of high/low clustering to a quadrat summary of the point pattern. Here the variable that influences the resolution focus is the size of the quadrats. Larger quadrats give results that look similar to Figure 2-1 and smaller quadrats yield results that are consistent with Figure 2-3.

When it comes to selection of the resolution focus such as occurred in the Davidson case study, I as yet do not have any hard and fast rules as to how to go about making decisions. The best approach seems to be trial and error and comparison of the results of all three methods.

Figure 2-3: KD Radius at 6 m

One final comment here relates to my initial question of which one is right? Barring the obvious extremes, I do not think there is a right answer. In reality, multiple scales of analysis

may show you different things about the site, as will be shown in the Davidson case study. While it would be preferable to have a mathematical technique that shows exactly what’s happening on a site, the reality is that it is ultimately the interpretation of the archaeologist doing the analysis that determines the “right” answer.

In document The Use of Point Pattern Analysis in Archaeology: Some Methods and Applications (Page 49-53)