• No results found

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

N/A
N/A
Protected

Academic year: 2021

Share "GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L1-NORM

BARRIOT Jean-Pierre, SARRAILH Michel

BGI/CNES 18.av.E.Belin 31401 TOULOUSE Cedex 4 (France) Email: [email protected]

1/Introduction

The Bureau Gravimétrique International is managing a worldwide gravity database. These data have different origins and must be controlled to detect and eliminate outliers. Up to now, we used a prediction technique based on the L2-norm (collocation) method. We have developed a new method, using the L1-norm. We present here shortly the outlines of this method, and compare it for different test cases with the L2-method.

2/ Theory of the L1 prediction method

Self-validation is the detection of outliers in a survey from the cross-comparison of all the values of the survey.

Let g

~

the N- vector of the set of observed gravity values over a survey: g g g gN ~ . =             1 2 The N-vector g ~ ^

of the "true" (unknown) values is related to the N-vector g

~ of observed values by I gN g ~ ^ ~ ~

= +ε (1), where IN is the identity matrix of order N and ε

~ is the N-vector of errors. In a perfect world, ε ~ =0~ and then g~ g ^ ~

= . In an imperfect (our) world, ε

~ ≠0~. We have then to

solve Eq. (1) contaminated by errors.

L2- norm solution: ε

~ and g~ ^

are considered as random variables with a priori 0

~ means and respectively

σε2IN and Cov g( )

~ ^

covariances. The L2 a posteriori estimate of g

~ ^ as then g* Cov g Cov g IN g ~ ^ ~ ^ ~ ^ ~ ( ) ( ) =  +   − σε2 1 mean

(2)

and σε− − − +       2 1 1 IN Cov g ~ ^ covariance.

This is the usual least-squares collocation solution.

L1- norm solution:

From a L1-norm point of view, we select the particular g i ~ ^ ( ) , which realizes min ^ ( ) ^ ( ) i l i l j l l N j j M gg = = = =

1 1

over a set of M realizations of the N-vectors g

~ and ε ~, with g g i i i ~ ^ ( ) ~ ( ) ~ ( )

= +ε . Of course, in the real world, we have to cope with a unique realization of g

~

and ε

~ (and we know only their sum g~ ^ ), so we 2-1/ select from g ~ ^ observed M subvectors γ ~ ^ ( )i (i=1,...,M) of dimension K,

2-2/ complete through a given interpolation-extrapolation procedure the missing NK values in order to get M vectors Γ

~ ^ ( )i

of dimension N, 2-3/ select the best estimate Γ

~ ^ ( )i

of g

~

as the one which realizes min ( ) ^

i l i l l N g Γ − =

1 .

Fig. 1: Fitting a line through 3 data points. The L2 solution (a) goes through the 3 data points( , )x yi i by realizing a b i i i y a x b ,

min

( −( + ))2. For the L1 solution, the solution fulfills

a b i i i y a x b , min

−( + ), and

corresponds to one of the lines (b1, b2, b3) that joins the 3 two points subsets.

For L1 norm, there is no equivalent of covariance matrices, so if we want to have some indication about the robustness of the solution, we can only construct Monte-Carlo estimates of the errors

(3)

by adding to the observed g

~ ^

values a random vector ζ

~

of 0

~ mean and known σζ 2

variance and

infer from this perturbation the corresponding mean and variance of Γ

~ ^ ( )i

.

3/ L1 prediction method algorithm

For a given gravity station where we have to predict the gravity value:

3-1: search of all the neighbouring points, up to a given radius;

3-2: determination of the "best" plane (Fig. 2) or paraboloid (Fig. 3) approximation of the local gravity around the station, by using the gravity values of a subset of selected neighbours, in the sense of the L1 norm. The gravity value at the predicted point is excluded. As we consider only a limited number of neighbouring points, we study all the subsets of neighbours (subsets of 3 points for the "best" plane, subsets of 6 points for the "best" paraboloid), instead of considering the simplex method;

3-3: computation of the difference between observed and predicted anomaly, interpolated from the "best" L1-surface, at the location of the predicted value;

3-4: comparison with a given threshold;

3-5: rejection or validation of the gravity value. 4/ Pros and cons of the L1 norm method

4-1 Pro: no "contamination" of the neighbouring points by "bad" points (i.e. a "good" point can be flagged as false, if compared to erroneous ("bad") neighbouring points);

4-2 Pro: no need to use residual anomalies; 4-3 Con: systematic rejection of extrema;

4-4 Con: rejection of points near the edges of the map (only with paraboloid prediction);

4-5 Con: rejection only based on a threshold on the difference between observed and predicted anomaly;

4-6 Con: no error estimate of the predicted anomaly. 5/ Pros and cons of the L2 norm method

5-1 Pro: rejection based on thresholds for the difference between observed and predicted anomaly and for the standard deviation error of the predicted anomaly;

5-2 Con: robustless solution: a "good" point can be flagged as "false", if compared to "bad" neighbouring points (see 4-1);

5-3 Con: need of computing residual anomalies before prediction; 5-4 Con: rejection of extrema.

(4)

Fig. 2: Dark triangle: « best » approximating plane going through the neighbouring gravity values. Black bar: difference between the observed and the predicted anomaly on the selected point (dot).

Fig. 3: Light grey: « best » approximating paraboloid going through the neighbouring gravity values. Black bar: difference between the observed and the predicted anomaly on the selected point (dot).

6/ Future improvements of the L1 method

6-1: estimating of the error on the predicted anomaly by Monte-Carlo method (see point 2); 6-2: Replacing planar or paraboloidal approximation by collocation prediction, to take into

account the covariance function of the anomalies. This will realize a "mix" between L1 and L2 methods.

(5)

7/ Example of data validation

Fig. 4: Bouguer anomaly map: good points (cross marker), doubtful points (circle marker) are identified by prediction using a collocation technique, taking into account the local covariance function.

Fig. 5: With the collocation technique, "bad" points can "contaminate" neighbouring points. Such points must be repredicted, after flagging of the erroneous ("bad") points with the largest differences between observed and predicted anomaly (see points 4-1 and 5-2).

(6)

Fig. 6: Prediction using L1 norm and plane approximation (Fig. 2). Seven neighbouring points are selected

per predicted point. The points predicted are considered doubtful if the difference between the observed anomaly and the predicted one is larger than 7 mGals.

Fig. 7: Prediction using L1 norm and paraboloid approximation (Fig. 3). Ten neighbouring points are

selected per predicted point. The points predicted are considered doubtful if the difference between the observed anomaly and the predicted one is larger than 7 mGals.

References

Related documents