• No results found

The statistical algorithms implemented in this section for local planar surface fitting in 3D point cloud data use two complementary statistical paradigms: diagnostic and robust statistics. Based on the FMCD and DetMCD estimators, three algorithms are proposed: (i) diagnostic PCA, (ii) robust PCA, and (iii) diagnostic robust PCA. Diagnostic and robust statistics have the same objective of fitting a model that is resilient to outliers. However the analysis stages for diagnostic statistics occur in reverse order for robust statistics. In diagnostic statistics, first the outliers are detected and deleted and then the remainder of the data is fitted in the classical way, whereas in robust statistics, first a model is fitted that does justice to the majority of observations and then the outliers that have large deviations (e.g. residuals) from the robust fit are detected.

For local neighbourhood based point cloud processing, data points from a local planar surface are sampled from within a local fixed radius r or within a local neighbourhood of sizek. We use the well-knownk Nearest Neighbourhood (kNN)

searching technique (Figure 3.4a) rather than the Fixed Distance Neighbourhood (FDN) method (Figure 3.4b) because kNN is able to avoid the problem of point density variation. We know point density variation is a common phenomenon particularly when we are dealing with mobile laser scanning data because of the movement of the data acquisition sensors (or vehicles) relative to the geometry of the sensors. Density varies as a function of orientation of a surface relative to the sensor, and as a function of the path taken by the sensor or vehicle and its velocity. A further advantage is that the same size of local neighbourhood can produce local statistics (e.g. normal and curvature) of equal support.

(a) (b) 𝒑𝒌 𝒑𝒊 𝒑𝟏 𝑵𝒑𝒊 𝑵𝒑𝒊 pi r

Figure 3.4Local neighbourhood (region) forpi: (a)knearest neighbourhood, and (b)

fixed distance neighbourhood.

3.4.1

Diagnostic PCA

The algorithm proposed here is inspired by diagnostic statistics, and couples the ideas of outlier diagnostics and classical PCA. First, we detect and remove outliers from the dataset, and then fit a planar surface using PCA to the cleaned data. For local planar surface fitting, we need to find the local region of an interest point pi as shown in Figure3.4.

After fixing a local neighbourhood N pi, we find outliers in the neighbourhood using robust distance (FRD or DetRD) in Eq. (3.1) or Eq. (3.2). We then fit a plane using classical PCA to the cleaned data. The best-fit-plane is obtained by projecting all the inlier points onto the two Principal Components (PCs) with the highest eigenvalues. The third PC is the normal to the fitted plane, and the elements of the corresponding third eigenvector are the estimated plane parameters.

The algorithm for diagnostic PCA (called RD-PCA) based on robust distance is described in Algorithm 3.1: RD-PCA (Robust Distance based PCA) as follows. Algorithm 3.1: RD-PCA (Robust Distance based PCA)

1. Input: point cloudP, neighbourhood size k,χ2 (Chi-square) cut-off=3.075.

2. Determine the local neighbourhood N pi for a point pi consisting of its k nearest

neighbours.

3. Calculate robust distance (FRD or DetRD) for each point inN pi.

4. Classify the points inN pi into inliers and outliers according to the respective FRD

or DetRD values and theχ2 cut-off value assigned. 5. Perform PCA on the inlier matrix.

6. Arrange the three PCs associated with their respective eigenvalues.

7. Find the two PCs that have the largest eigenvalues, and fit the plane by projecting the points onto the directions of the two PCs.

8. Output: normals, eigenvalues and the necessary statistics such as curvature.

The RD-PCA algorithm can be performed in two different ways: using FRD and DetRD in place of RD for finding outliers in the local neighbourhood. We name the FRD based diagnostic PCA and DetRD based diagnostic PCA as FRD-PCA and DetRD-PCA respectively.

3.4.2

Robust Principal Component Analysis

Robust statistics fit a model considering the consensus of the majority of observations and then as an extra benefit can find the outliers that have large deviations from the robust fit. We know that robust covariance matrix based methods and Projection Pursuit (PP; Friedman and Tukey,1974) methods have some limitations. The robust covariance matrix based approach may face the problem of lacking sufficient data to estimate a high-dimensional robust covariance matrix. In contrast, the robustness of the PP based methods depends on the robustness of the adopted estimators. The solely PP based methods are faster but robust covariance matrix methods with PP give more robust PCs than the PP methods (Friedman and Tukey, 1974; Li and Chen,

1985). We choose robust PCA (RPCA) introduced by Hubert et al. (2005) because it yields accurate estimates of outlier-free datasets, produces more

robust estimates for contaminated data, is able to detect exact-fit situations, is location and orthogonal invariant, and has the further advantage of outlier diagnostics and classification

This approach couples the idea of PP to make sure that the transformed data are lying in a subspace whose dimension is less than the number of observations, and then uses the robust covariance matrix based method to get the final robust PCs. In the case of 3D point cloud data we have the advantage that usually the data dimension (m= 3)< the number of points in the dataset for fitting a plane. The RPCA algorithm can then be performed using the stages in Algorithm 3.2. We perform the DetMCD based robust PCA algorithms by plugging the DetMCD based mean vector and covariance matrix for finding outlying cases into Eq. (3.3) and in the relevant places of the RPCA algorithm. The FMCD and DetMCD versions of RPCA are called FRPCA and DetRPCA respectively.

3.4.3

Diagnostic Robust PCA

Fung (1993) pointed out that robust and diagnostic methods do not have to be competing, and the complementary use of highly robust estimators and diagnostic measures provides a very good way to detect multiple outliers and leverage points. To see the effectiveness of using diagnostic and robust approaches at the same time, we propose the Diagnostic Robust PCA (DRPCA) algorithm, which is the combination of diagnostic and robust PCA. First the RDs are used to find outliers in a local neighbourhood to which we want to fit a plane. Then we use RPCA to fit the plane to the cleaned data. One of the DRPCA based algorithms uses FMCD based FRD and FRPCA and is called FDRPCA, and the other uses DetMCD based DetRD and DetRPCA and is called Deterministic Diagnostic Robust PCA (DetDRPCA). In DetDRPCA, we find candidate outliers using robust distance DetRD from the local surface (neighbourhood, N pi). Finding outliers and removing them from the N pi makes the data more homogeneous. Second, we use DetMCD based robust PCA (DetRPCA) to get the required PCs and the eigenvalues. The DetDRPCA method can be summarized in Algorithm 3.3.

Algorithm 3.2: RPCA (Robust PCA)

1. Input: point cloudP, neighbourhood size k.

2. Determine the local neighbourhood N pi for a point pi consisting of its k nearest

neighbours.

3. Process the data to make sure that the data is lying in a subspace whose dimension is at mostk−1.

4. Compute the measure of outlyingness for each point in the neighbourhood by projecting all the data points onto univariate directions passing through two individual data points. The dataset is compressed to PCs defining potential directions. The value of outlyingness for a pointpi is:

wi =argmax

v

|pivT −cFMCD(pivT)|

ΣFMCD(pivT)

, i= 1, . . . , k (3.3)

wherepivT denotes a projection of the ith observation onto the v direction, cFMCD

and ΣFMCDare the FMCD based mean vector and covariance matrix on a univariate

directionv.

5. Construct a robust covariance matrix Σh using an assumed portion (h > k/2) of

observations with the smallest outlyingness values. We use h = d0.5×ke in our algorithm.

6. Project the observations onto thed dimensional subspace spanned by thedlargest eigenvectors of Σh, and compute the mean vector and covariance matrix by means

of reweighted FMCD estimator with weights based on the robust distance of every point.

7. The eigenvectors of this covariance matrix from the reweighted observations are the final robust PCs, and the FMCD mean vector serves as a robust mean vector. 8. Arrange the three PCs associated with their respective eigenvalues.

9. Find the two PCs that have the largest eigenvalues, and fit the plane by projecting the points onto the directions of the two PCs.

10. Output: normals, eigenvalues and the necessary local statistics such as curvature. 11. Outlier detection: calculate Orthogonal Distance (OD) and Score Distance (SD)

using:

ODi =||pi−pˆi||=||pi−µˆp−LtTi ||, i= 1, . . . , k (3.4)

where ˆµp is the robust centre of the neighbourhood, L is the robust loading (PC)

matrix, which contains robust PCs as the columns in the matrix, andti = (pi−µˆp)L

is theith robust score; and

SDi= v u u t d X j=1 (t2 ij/lj), i= 1, . . . , k (3.5)

wherelj is thejth eigenvalue of the robust covariance matrix ΣFMCD, andtij is the

ijth element of the score matrix:

Tk,d= (Pk,m−1kcFMCD)Lm,d, (3.6)

wherePk,m is the data matrix, 1kis the column vector with all kcomponents equal

to 1,cFMCD is the robust centre, and Lm,d is the matrix constructed by the robust

PCs. The cut-off value for the score distance is qχ2

d,0.975, and for the orthogonal

Algorithm 3.3: DetDRPCA (DetMCD based Diagnostic Robust PCA)

1. Input: point cloudP, neighbourhood size k.

2. Determine the local neighbourhood N pi for a point pi consisting of its k nearest

neighbours.

3. Calculate robust distance using DetRD for all the points inN pi.

4. Classify inliers (regular observations) and outliers based on the DetRD values. 5. Perform robust PCA using DetRPCA based on the inliers from Step 4.

6. Arrange the three PCs associated with their respective eigenvalues.

7. Find the two PCs that have the largest eigenvalues, and fit the plane by projecting the points onto the directions of the two PCs.

8. Output: normals, eigenvalues and the necessary statistics such as curvature. 9. Outlier detection: similar to Step 11 in Algorithm 3.2.

In Algorithm 3.3, if we calculate the robust distance in Step 3 using FRD, and robust PCA in Step 5 using FRPCA then we have the algorithm FDRPCA (Fast-MCD based Diagnostic Robust PCA).