3.7 Clustering analysis of set3
3.7.2 Functional clustering analysis of set3
Quantile maps
Set3 was initially divided into two categories, below and above the median, to produce a median defined map (Figure 3.10).
Figure 3.9: Morisita index for radon measurements (green line) compared to a regular (blue line) and a random distribution (red line)
Figure 3.10: Quantile map of indoor radon for set3 using median value as threshold
The median map is a hard generalization of values to find out whether or not there are large sub regions with different concentrations. Differences in spatial distribution are not evident for this representation due to the presence of extreme values defining hot spots. A map with subdivisions into 10 equal-sized categories or deciles was build to visualize data distribution with a lower generalization level (Figure 3.11a). This decile map also shows how indoor radon level’s categories are mixed-up in space. A separate representation of only the first and the last deciles (Figure 3.11b) show some tendency to form hotspots for high values in the NW direction and lower values in the central area.
Functional box-counting
The procedure of functional counting was applied to set3 using quartile limit values as thresholds T q (0, 58, 92 and 138 Bq/m3). The series of graphs and their correspondent Df s are presented in Figure 3.12.
Figure 3.11: Quantile maps for set3 a) considering decile thresholds and b) for first and last deciles
Figure 3.12: Functional cumulative box counting diagrams for data over quartile thresholds
The Df for the whole dataset (f (x) > 0) is approximately 1.49; while for f (x) > 138 Bq/m3, Df ≈ 1.17. What this graph shows is how the dimensional resolution of points diminish for higher thresholds. At some point, the dimensional resolution is too low to find points.
Quantile box-counting
As previously shown in Figure 3.12, the cumulative box-counting function is correlated with the number of points, and this can mask real clustering. Therefore, it is more pertinent to use quantile (i.e equal-sized) subsets, such as the ones used in quantile mapping. For the case of quartile thresholds T q, the subsets are defined as Q(T q) = {T qi < f (x) < T q(i+1)}, where
T q = {0, 58, 92, 138}Bq/m3. Figure 3.13 presents a series of fractal dimension diagrams us- ing both, the sandbox (Figure 3.13a) and the box-counting method (Figure 3.13b) for quartile subsets from set3.
In Figure 3.13, the raw diagrams (instead of the regression lines) offer a better visual comparison of the methods. For every subset and method, Df fluctuates around 1.5, and for both methods a lower Df was measured for the first and the last quartiles (Q1 and Q4). The
Figure 3.13: Df diagrams for the quartile subsets from set3 using a)Sand-box method and b) Box- counting method
sandbox method appears to be more sensitive than the box-counting in detecting differences. It can be concluded that more clustering is present for lower and higher values.
Functional MI
Figure 3.14 presents the functional cumulated MI diagrams for subsets above thresholds from 50 to 1000 Bq/m3. As with the box-counting method in Figure 3.12, MI shows a ten- dency of clustering for higher values. However, this is also due to the reduced number of points in subsets above high thresholds (400 and 1000 Bq/m3).
Figure 3.14: Funtional MI for cumulate subsets for set3
Quantile MI
To avoid the class-size effect, the use of quantiles, equal-sized subsets defined according to quantile thresholds, has been proposed. This procedure is called quantile MI (QMI). Dia-
grams for quartile subsets for set3 using QMI method are presented in Figure 3.15.
Figure 3.15: QMI diagrams for quartiles subsets of set3 at a) global range b) Zoomed for cell sizes between 900 and 3000 meters
Figure 3.15 shows that MI have the same pattern of clustering as the one measured with fractality methods. What can be also observed is that MI differentiates better clustering for smaller scales, while it tends to be one for larger scales (cell sizes). This is coherent in the sense that clustering is produced by concentration of points on smaller scales. It can be said, that if one set has a larger MI than another equal-sized set, at the same cell size, this set is effectively more clustered. Fluctuations can occur on intermediate scales and are measured using MI as well. In Figure 3.15 the QMI diagrams for quartile subsets of set3 are presented. Figure 3.15a shows diagrams for the whole range of scales. Figure 3.15b is a zoom, with scales ranging from the average distance to half the diagonal distance of set3. This zoom allows us to compare the results of QMI with the results of the sandbox method presented in Figure 3.6b.
The tendency of clustering is well defined and coherent with the results from fractality methods, as Q1 and Q4 appear more clustered. In the zoomed Figure (3.15b), it is remarkable that at a cell size of 1700 meters, an abrupt change in clustering was produced; something that was also detected with the sandbox method. The multifractality behavior from Figure 3.6b was also reproduced in the MI diagrams.
Quantile MI Profiles
In Figure 3.16, a series of profiles from QMI quartile diagrams are presented for set3 for cell sizes ∆ 1000, 1900 and 10000 meters. Quartile limits, as mentioned, are 0,58,92,138 Bq/m3.
It is also interesting to calculate QMI for higher thresholds. To acheive this, set3 was subdivided into 15 quantiles. The corresponding QMI profile is presented in Figure 3.17.
In the example with set3, it is worth pointing out that the profile of the average distance ∆Avr is also representative for small and medium distances (1000 and 10000 meters). The MI for Q1 and Q4 has a tendency to be higher in comparison to Q2 and Q3. In Figure 3.17,
Figure 3.16: Profile of the QMI diagrams for the set3 for quartile limits
Figure 3.17: Profile of the QMI diagrams for the set3 for 15 quantile limits and random distributions
this tendency remains; however, clustering appears more accentuated for values above 316 Bq/m3.
Furthermore in Figure 3.17, the MI for randomly distributed points was included (red line). In theory, the MI for random points has values around one. This value can fluctuate when there are a low number of points, which are shown as a red line in Figure 3.17. For the 15 quantiles case, each quantile has only 87 points and the MI fluctuates between 0.5 and 2.5 for several random sets. For quartiles, which have more points, the fluctuation of the MI is lower and remains close to one. In any case, the differences in MI between random distribution and actual values are clear.
It can be proposed that, statistically speaking, the most representative scale of inter- est for a spatial distribution is the average distance. The profile presented in Figure 3.17 is a fast characterization of QMI for a given spatial dataset, and can be used to analyze func- tional clustering in exploratory analysis. Preferential sampling can be depicted with the QMI
method as will be seen later.