sex estimation in white South Africans
5.4.1 Three-dimensional interlandmark distance validation
This study derived discriminant functions from the 3-D ILDs and compared them with the Steyn and İşcan (1) analysis of white South Africans. The purpose of this was twofold; 1) to validate the use of 3-D derived ILDs for sex estimation in white South Africans; and 2) determine whether adapting traditional cranial discriminant functions for 3-D ILDs would negatively impact sex estimation accuracies. An average decline in accuracy of only 3.4% using was achieved using the functions derived in this investigation, compared to Steyn and İşcan’s (1). The accuracy of the “Cranium”
function derived from 3-D data had the highest drop in accuracy. A decline of 7.5%
was noted in males, and 5.5% (to 80.2%) on average. This was expected as the measure from basion to bregma was omitted in the current investigation due to the exclusion of the cranial vault. The “Face” function was most similar between the two investigations, with only a 1.8% drop in accuracy in the 3-D derived function. This was somewhat surprising as two of the three ILDs used to create the current function had to be modified slightly to suit the 3-D landmarks available. These modifications were
114 necessary to approximate the landmarks used in traditional cranial measurements.
These results demonstrate that the landmarks used in this study to approximate those in Steyn and İşcan’s study are a good match. The “Vault” function had a decline in sexing accuracy of 3%, however, despite the fact that the 3-D ILDs were identical between two studies. The ability of the “Vault” function to assign sex in females was nearly identical, however (0.2% increase in accuracy). This suggests that the female sample in the two studies was more similar than the male sample. These difference proved insignificant when tested using a Chi-square test.
Finally, the sex bias between the studies (defined as the male percent accuracy minus the female percent accuracy for any given function) was compared. The average bias in the current study was 2%, compared with 5% in Steyn and İşcan’s analysis (1). The most notable bias was observed in Steyn and İşcan’s (1)
“Bizygomatic” function, in which females classified with 10% greater accuracy than males (85.1% versus 75%). The same function produced a much lower bias of 2% (in favour of females) in the current study. Once again, the differences proved not to be statistically significantly different. Possible reasons for the differences include a larger sample size in the current study (227 individuals versus 91 individuals), variations in the age of the specimens between the samples (♂ 63 years, ♀ 65 years in the current investigation versus ♂ 65 years, ♀ 67 years in Steyn and İşcan (1), and the asymmetry tooth loss corrections in the current study. Any of the aforementioned reasons, but especially the asymmetry and tooth loss corrections, could have resulted in a larger average bizygomatic breadth measure in females, and hence fewer correct classifications. Despite being counterintuitive, this represents a positive outcome of tooth loss correction. This is because the correction may have mitigated the effect of bone resorption, which may have been the cause of the smaller measures in females.
Furthermore, a low sex bias is favourable, especially in bioarcheology as pointed out by Walker (42), as it limits erroneous conclusions about the distribution of the sexes in burial or disaster sites.
5.4.2 Traditional Interlandmark Distances and most accurate 3-D Derived Interlandmark Distances
A measure of the distance between each of the 45 homologous fixed landmarks produced a total of 990 unique, non-repetitive ILDs. Both LDA and LR were then
115 applied to each of these measures to determine the difference in sex estimation accuracy between them. LDA is the most commonly used statistical method by which anthropologists assess population and sex differences (6). Assumptions of the technique include, amongst others, that all variables be normally distributed, have equal covariance matrices, be uncorrelated with one another, and that there be less predictor variables than the sample size (71). When large subsets of variables are assessed, such as in the current investigation, testing for normality of each of these variables becomes impractical and therefore it becomes important to consider alternative means of assessing independent variables. LR offers an ideal alternative as it is less constrained by assumptions and can be applied to nonparametric data, whereas, in instances of non-normality, LDA negatively impacts the fit of the model and its prediction capabilities, even in large samples (195). When using LDA, violation of the assumption of normality can attribute false significance to variables that are then erroneously included in discriminant functions, even with infinitely large samples. LDA is not appropriate when the independent variables are binary, such as in qualitative data (195).
In this investigation, each individual ILDs sex assignment potential was assessed independently and in the context of cranial subsets. A very high correlation between the methods was found in all cases with a maximum difference of 1.76%
seen for isolated measures. Concordance between the methods was highest in the basicranium with no difference on average between the methods. For the basipalate, the result was less than 0.1%, for the global cranial measures, 0.16%, and for nasomaxilla, a 0.2% difference on average. A maximum average difference between LDA and LR was demonstrated in the zygomatics and orbits, both at 0.4%.
LR was more accurate for all subsections except the nasomaxilla, suggesting that at least some of the variables in the current investigation may have a nonparametric distribution. These differences cannot be attributed to unequal covariance matrices because Boxs’ M test demonstrated that all covariances were homogenous during the discriminant analyses. The use of LR in deriving skeletal standards is becoming increasingly commonplace due to its robustness, a lack of underlying assumptions and the ease with which it can be applied to dichotomous variables (167,196–199). Lei and Koehly (48) did a comprehensive analysis of the difference in classification accuracy between LDA and LR with regards to a number of data parameters including 1) the degree of group separation, 2) covariance equality;
116 3) sample size and 4) cut score. They found that there is a complicated interplay between the above-mentioned factors and prediction accuracy, and concluded that the research question being addressed may be more important than the data distribution when deciding between LDA and LR (48). Pohar and colleagues (173) demonstrated that LDA performs better when the data are normally distributed, but that the predictive power of the methods become more similar as sample size increases to over 50 specimens due to the fact that the data approach normality. LDA also performed better on greater than three categorised variables, whereas LR had a greater predicative power in cases of less than three variables. Both the population group and sex to which the method is applied may also affect its accuracy, as demonstrated by Shah et al. (200). They found that LR displayed greater predictive accuracies in males, whereas LDA fared better in their female counterparts, although the accuracy of LR was found to be greater overall. Additionally, Dong et al. (62) found that while LDA had a greater predictive accuracy, LR has a lower sex bias when applied to Chinese mandibles. Overall, these studies suggest that LDA performs better when the dataset is normally distributed and includes a relatively low number of samples, whereas LR is more robust due it is not subject to as many assumptions.
These findings are corroborated by our results. Even so, they offer only a cursory look at these multivariate statistical techniques. Despite this, our results suggest that, in the current investigation, the difference between them is sufficiently small to warrant the use of LDA for the derivation of discriminant functions. The large sample size used in the current investigation likely resulted in the equivalence between LR and LDA demonstrated.