Since the original data used to produce the site map were not available, the existing paper map was digitized into ArcGIS preserving the original scale. Thus, while true
georeferencing could not be accomplished, the relative positioning of and distances between the graves is accurate, or at least as accurate as the scale of the original map. Thus, the application of spatial statistics is possible and the distances discussed are actual distances rather than relative distances. This digitizing process uncovered several errors in the original map, all with the labeling of the grave numbers. There are six instances of the same grave number appearing twice on the map. These were corrected. In the GIS, each grave is shown as a polygon with the associated excavation grave number from the excavation included in the attribute table. The data showing the discrete traits present were available in an Excel file, with each row keyed by the grave number and the columns representing the presence or absence of each trait. These data were then joined to the GIS map using the GIS “Join” function, so each grave has all the discrete traits listed in its attribute table. This procedure facilitated the generation of maps showing the distribution of each trait. Since the initial statistics development did not include
integration into ArcGIS, it was necessary to get the actual XY coordinates out of ArcGIS so that they could be used in the statistical applications. A file of grave numbers and locations was obtained by creating a map layer that identified the centre of each grave as a point and then applying a function called XYtoASCII to this layer. The result was a text file with the grave number and (X,Y) coordinates. The coordinates were then
incorporated into the Excel spreadsheet with the discrete traits for the statistical analysis. In considering the application of spatial statistics to the data shown in Figures 5.2 and 5.3, there are additional features that need to be addressed. First, the cemetery has not been completely excavated. Hence, the impact of edge effects must be considered, especially along the east, west and north edges of the excavated areas. Edge effects result when a grave along the boundary of the excavated area possibly has more in common with graves outside the excavated area than it does with nearby graves inside the
excavated area. Also, within the excavated area the four excavated tomb structures also lie along the periphery. Thus, in the kinship model, we might expect the burials to radiate out from the central family tomb and we are, consequently, only capturing a portion of the family burial area. The final issue is that in the northeast portion of the excavated area, which represents the most recent excavations, unexcavated graves are
more prevalent than in other areas of the cemetery. Thus, these “data holes” may distort the spatial relationships of individuals buried in that area of the site.
The sample analyzed here has been restricted to include only adult males and females. As noted, the excellent preservation at K2 assures that sex determination is virtually 100%. There are cases where some traits are present but sex cannot be determined and these have been excluded. Further excluded is one burial, Grave 453, in the extreme southwest, as it is a spatial outlier. Also, several burials appear in the data but not on the map, making determination of location impossible. After these exclusions, the sample analyzed comprises 177 individuals, 107 females and 70 males.
The determination of statistical significance involves the distribution of discrete genetic traits within the overall cemetery structure. This is clearly an examination of second order effects (the genetic traits) within first order effects (the cemetery structure and grave locations). As discussed in Chapter 2, use of Complete Spatial Randomness (CSR) to compute statistical significance is not appropriate. For Kellis, the appropriate choice is an assumption of random labeling where significance is determined by holding the grave locations constant and the labels (discrete traits) are randomized over the cemetery. This procedure used has another advantage in that it markedly reduces the impact of potential edge effects.
In the following analysis, there are three major sections. First, is an analysis of the male/female distribution, second, is an individual analysis of each of the 38 discrete genetic traits, and finally is a simultaneous analysis of a number of traits in order to identify areas in the cemetery where individuals with a similar genetic composition were located. This sequence was selected because it seemed logical in that the results of the first could impact the second and third steps and the results of the second impact the third. Of course, reality is always more complex and, as a result, the analysis required several iterations. For example, the results of step three, in fact, impact our interpretation of step one and some traits in step 2. This confounds the documenting of the results and in some cases necessitates referencing analytical results which appear later in the text. This is, however, unavoidable.