• No results found

3.7 Geospatial and statistical techniques

3.7.2 Geodemographics

It involves the analysis of behavioural and socio-economic data about individuals in the context of a particular location and local community (Harris, 2003). The term developed

87 from a blend of two concepts, “demography” and “geography”. As Birkin and Clarke (1998) noted that “Demography is the study of population types and their dynamics therefore geodemographics may be labelled as the study of population types and their dynamics as they vary by geographical area” (p.88). Essentially, geodemographics is the analysis of people based on the characteristics of where they live (Sleight, 1997). This method was adopted for the Phase 2 study. Area classification involves identifying similarities and dissimilarities between areas by grouping together area patterns (Webber and Craig, 1978).

Area classification has its origin in geodemographics.

It is also based on a belief that individuals with similar characteristics usually reside, visit and shop in similar areas and have the same behavioural tendencies. Hence, identifying spatial patterns within a locality is a crucial step towards understanding the spatial processes and resulting spatial structures within that locality (Harris et al., 2005). Although linkages exist between people and places, the linkages are however, complex and multi-faceted.

Therefore, the characteristics (social, demographic and economic) of a place echo the ideals, preferences, and consumer lifestyles of both past/present inhabitants as well as echoing government regulations.

According to Harris et al. (201, p. 15), “interrelationship suggests that measures of physical, social and economic properties of settlements can yield useful information about the characteristics, preferences and lifestyle choices of the populations within these settlements, because people and places are dependent on each other”. Therefore, geodemographics assumes that not only do individuals in close proximity relate to each other, but also individuals tend to belong to same neighbourhood class. This does not mean that people living in the same areas are not identical, but that they share similar characteristics (Harris et al., 2005).

The origin of geodemographics can be traced back to Charles Booth (Rothman, 1989), evidenced in his book published in 1889 and entitled ‘the Life and Labour of People of London’, where he grouped all houses in London into seven classes. His work on poverty are archived at the Charles Booth Online Archive at the London Business School of Economics (LSE, 2005). Modern geodemographics, on the other hand, has its roots in the work of Weber and Craig (1976 and 1978) which used population and key Census variables to create three national classifications. Although geodemographics lacks theoretical or statistical grounding, its use has continued to grow and has been adopted by the private sector

88 (CACI, ACORN, MOSAIC, CAMEO and PRIZM). More importantly, the availability and ease of obtaining Census data has played a very important role in the development of geodemographics.

A major theory that supports geodemographics is a notion in geography, which says that objects close to each other are likely to be similar compared to objects that are far away (Tobler, 1970), but researchers have challenged its theoretical and statistical underpinning.

Notwithstanding, it is a sound method with proven evidence, Flowerdew and Leventhal (1998) argue that “there is no formal proof and no ‘theory of geodemographics’ either, only the concept that ‘birds of a feather flock together’” and, in addition, “the systems are used simply because they work and have become established” (Flowerdew and Leventhal 1998, p.36).

Therefore, a major advantage of area classification is that it allows for targeted marketing (Harris et al., 2005). Therefore, geodemographics benefits research trying to ascertain the linkages between vulnerable areas or clusters of population targeted by a particular retail fascia or group. More importantly, it can also help to uncover the location preferences of retailers because it is primarily rooted in consumer and lifestyle behaviour. In addition, a multivariate classification of neighbourhoods offers a simplistic and valuable summary of the characteristics of areas (Openshaw and Wymer, 1995). Yet a major criticism levelled against geodemographics is that it is highly subjective, and resultant classifications are a function of the operational decisions made during the development process (Openshaw and Gillard, 1978). Usually, the decision process in creating an area classification is dependent on the research area and application and no one classification fits all. In contrast, subjectiveness is not necessarily an issue as long as decisions are critically evaluated (Openshaw and Gillard, 1978). In addition, as geodemographics lacks strong theoretical and statistical backing, there is the possibility that the classification might not reveal or provide robust evidence of the observed neighbourhood effects when subjected to the rigours of statistical analysis (Harris et al., 2007). Notwithstanding, the applicability of geodemographics in resource allocation by public sector institutions and customer segmentation and targeting by business is not questionable (Harris et al., 2007). This process involves carrying out clustering analysis and it is discussed below.

89 3.7.2.1 Clustering analysis

Clustering involves classification of variables based on similar characteristics. Clustering is a very common technique in biological and ecological research areas and is also used for geodemographic classifications. In recent times, numerous academic domains have also adopted the methodology due to its applicability and robustness. In marketing, clustering analysis has been applied to marketing mix, customer segmentation, targeting and positioning, to name but a few. In other words, it has been applied to the concepts of product development, price discounts, advertising, sales and promotion, competitor analysis and branding strategies (Rao and Sabavala, 1981; DeSarbo et al., 1993; Moroko and Uncles, 2009).

Clustering analysis is a data exploration technique that seeks to gain information from a dataset by splitting the data into separate groups with members of the same groups having homogenous characteristics (Jain and Dubes, 1988; Hastie et al., 2001). The resulting classifications are not mutually exclusive but, rather, fuzzy groups where the edges of each classification can overlap (Voas and Williamson, 2001). Therefore, this technique is used in this thesis for the classification of LSOAs based on SECs relating to AASR services.

The execution of clustering analysis involves a series of calculated steps (Milligan and Cooper, 1987), and omitting any step jeopardizes the accuracy of the classification. At this point, a distinction needs to be made between clustering method and clustering analysis.

Clustering method represents a step in the overall clustering process, while clustering analysis represents the sum total of all the steps taken to achieve the classification. Although these steps can be altered to fit specific applications, researchers have discussed the necessary steps involved in clustering analysis (Milligan et al., 1987; Milligan, 1996; Everett et al., 2011). Milligan (1996) summarized seven sequential steps essential for executing a clustering analysis, with each step representing a critical decision point as follows:

Step 1. Clustering elements – This involves the selection of objects to be clustered and should adequately reflect the principal population and provide total coverage to enable generalisation of the results to a wider population.

Step 2. Clustering variables – This refers to measurements obtained from the elements/objects to be clustered. There should be strong empirical evidence for each variable to be added to the clustering process. Irrelevant/masking variables should be avoided, otherwise they could obscure the underlying cluster in the data.

90 Step 3. Variable standardization – Decision to standardise each of the variables must be taken appropriately. In clustering analysis, there are potentially two false assumptions that can be made by researchers: (1) it is necessary to standardise variables and (2) z-score is the most appropriate method for clustering (Milligan, 1986). Nonetheless, variable standardisation and method are at the discretion of the researcher.

Step 4. Measure of association – For clustering analysis to be executed, a dissimilarity or similarity measure must be adopted. This measure indicates the extent of closeness or separation (i.e. distance) between objects/entities to be clustered. For this step, there is no consensus or general guideline.

Step 5. Clustering method – This is a very important step in successfully executing a cluster analysis. The selection of method should be based on the perceived clustering within the data because different methods are suitable for different clustering patterns. The method should also be robust in order to detect underlying clusters.

Step 6. Number of clusters – Selecting the number of clusters is a very subjective process and the most difficult step in running a cluster analysis, especially when there is no prior knowledge of the underlying clusters. A major rule of thumb is that the final number of clusters must have relevant interpretation within the context of this study.

Step 7. Interpretation, testing and replication – Results must be interpreted based on the context of the investigation which requires extensive knowledge in the subject area. In addition, it is necessary to ensure that re-run of the clustering analysis will produce similar results. The classification can also be cross validated against a known measure of the observed objects where possible.

3.7.2.2 K-mean clustering method

There are numerous methods for carrying out clustering analysis, but this research utilizes K-means clustering technique (Forgy, 1965; Hartigan and Wang, 1979). K-means is one of the most commonly used clustering algorithms (Duda et al., 2012; Harris et al., 2005). Harris et al. (2005) attribute its common usage to two major benefits: it produces cluster solutions that retain a high proportion of the variance of the initial input variables and it creates cluster solutions relatively equal in (population) size. On the other hand, its major drawback is that the number of clusters must be specified based on the researcher’s experience, making it somewhat subjective in nature as there is no universal technique available (Xu and Wunsch,

91 2009). To overcome this, the process is usually repeated with different cluster numbers and the most suitable solution finally selected (Gordon, 1999). In addition, different cluster numbers can also be selected based on the results of running another cluster method (Everitt et al., 2011).

The ‘K’ represents the total number of clusters generated which has to be indicated before the algorithm is executed. K-means is a non-parametric method which adopts an iterative optimization procedure which seeks to minimize a squared-error criterion function (Duda et al., 2012). The basic principle which informs the algorithm is to move an entity from one cluster to another, with a view to minimizing the sum of squared deviations within each cluster (Aldenderfer and Blashfield, 1984). This process is reiterated until a final classification is reached, i.e. when no movement/re-classification occurs between an iteration process, after which the means of each cluster for each input variable can be examined to determine the uniqueness of each cluster. The steps in the clustering algorithms (Everitt et al., 2011) are:

a. Find and initialize a partition of the entities into ‘K’ clusters and calculate the mean for each cluster for all entities, as well as the sum of squared deviations (clustering criterion) from the group mean for the entity,

b. transfer each entity from the initial cluster to the nearest cluster and re-calculate the respective clustering criterion,

c. adopt the change which offers the best improvements in the clustering criterion, and

d. repeat steps b and c till there is no movement that produces an improvement in the clustering criterion.