Interpolation of data - Data processing - processing and interpretation 2.1 Introduction

processing and interpretation 2.1 Introduction

2.7 Data processing

2.7.2 Interpolation of data

Most data processing and data display methods require the data points to be regularly distributed, i.e. to be equally spaced. As noted inSection 2.6.3.3, data are rarely acquired in this way, so there is a need to interpolate the data into an evenly spaced network. For example, a 1D unevenly spaced dataset can be interpolated into an evenly spaced series of measurements in terms of time or distance. The interpolated data points are often called nodes. Similarly, an uneven 2D distribution of data points, acquired either randomly or along a series of approximately parallel survey lines, is usually interpolated into a regular grid network. Logically, the process is known as gridding (Fig. 2.14a) and the distance between the nodes is the cell size or grid interval.

Gridding is a very common operation in geophysical data processing, and there are various ways in which it can be done. Normally, it is assumed that spatial variations in geophysical parameters will be continuous. Somewhat counter-intuitively, interpolation schemes whose results honour the data points exactly do not usually produce the best results. This is because the data contain both signal and noise, so allowing the gridding algorithm toﬁt the data to within some prescribed limit helps to reduce the inﬂu- ence of the noise component.

Interpolation is based on an analysis of a window of data points in the vicinity of the node. The window is centred at the node, and when gridding a 2D dataset its shape must

be deﬁned. Normally it is circular, but if the data exhibit a well-developed trend direction the window may be elongated parallel to that direction, since this will ensure the trend is preserved in the interpolated data. The size of the window needs to be large enough to enclose a representa- tive sample of measurements, although if it is too large potentially important short-wavelength variations will be lost in the‘averaging’ of the data within it.

There are two main ways of establishing the value of the parameter at a node: either statistically or using a simple mathematical function. Both methods can be applied to 1D and 2D datasets; the process as applied to gridding is shown inFigs. 2.14bandc. A key concept in gridding is the concept of minimum curvature. The human vision system perceives smoothness if theﬁrst and second derivatives of the parameter being visualised are continuous. Put simply, if the curvature (the gradient of the gradient) of a line or surface varies gradually, it is perceived as smooth. The spatial variation of the parameter being gridded can be thought of as deﬁning a ‘topographic’ surface. To make

a) Cell size Cell size Grid nodes Data points Contributing point Non-contributing point Contributing point Non-contributing point b) c) Surface/function Weight/distance

Figure 2.14 Gridding a 2D dataset. (a) Data points in the vicinity of the grid nodes are used to determine an interpolated value at each node. (b) Node-to-station distance-based weighting. (c) A smooth

2D function isﬁtted to the data and the interpolated value computed

from that. In (b) and (c) the grey area represents the region that inﬂuences the interpolated value.

this surface appear smooth the values of the grid nodes are adjusted so that the second derivative of the gridded surface varies smoothly, i.e. it has minimum curvature. Some types of geophysical data, e.g. gravity and magneticﬁelds, are smoothly varying, but there is no direct physical basis for using smoothness as a basis for interpolation.

2.7.2.1 Statistical interpolation

The statistical approach involves calculating some form of average of the measurements in the window. The median value has the advantage of being immune to the effects of outliners (extreme values) in the data series, presumed to be noise. Arithmetic or geometric means can incorporate a system of weights, one for each data point scanned and which, for example, vary inversely proportionally to the distance to the data points. Therefore points closer to the grid node exert a greater inﬂuence on the interpolated value than those further away. An extreme form of weighted averaging is to assign a value to the node that is equal to that of the closest data point. This is known as nearest neighbour gridding and it can be effective if the data are already very nearly regularly spaced. If this is not the case the resultant dataset can have an unacceptably ‘blocky’ appearance. All of the gridding algorithms described above are suitable for both randomly distributed and line-based data, and minimum curvature adjustment can be applied.

One of the more sophisticated statistical interpolation methods is kriging, which is widely used in mineral- resource calculations (Davis, 1986). It is a method using weighted moving averages, where low-valued data points are increased and high values are decreased using smoothing factors or kriging coefﬁcients (weights) depend- ent on both the lateral dispersion of the data points and their values. The method is not commonly used in geophysics, although it can be very useful for small datasets having an uneven distribution of data points of large dynamic range.

2.7.2.2 Function-based interpolation

An advantage of function-based interpolation methods is that particular behaviour of the measured parameter can be incorporated into the interpolation process, most commonly smoothness. By far the most common function- based interpolation methods use splines. A spline, in its original sense, is a thin strip ofﬂexible material used pre- computer drafting to draw smooth curves. A physical model applicable to 1D data is the drafting spline held in

position with weights and distorted so that it passes smoothly through the points to be connected or interpolated. The ﬂexible strip naturally assumes a form having minimum curvature, since elasticity tries to restore its original straightness but is prevented from doing so by the constraining weights. The curve formed is a cubic polynomial and is known as a cubic spline.

Splining in the numerical sense is a line-fitting method that produces a smooth curve (De Boer,2001). Many types of polynomial functions can be used as splines, but cubic polynomials have less possibility of producing spurious oscillations between the data points, a characteristic of some other functions. The cubic spline consists of a series of cubic functions eachfitted to pairs of neighbouring data points. They join smoothly at their common points, where the functions have the same gradients and curvature (i.e. the same first and second derivatives). New data values are calculated in the data intervals using the respective function. 1D interpolation using cubic splines is illustrated inFig. 2.15.

The motivation for splining is a ‘pleasingly’ smooth curve. The smoothness of splines may actually be a disad- vantage, since if a parameter varies abruptly the require- ment for smoothness may result in spurious features inﬁltrating the interpolated data. Normally there is some

Final curve – regular spacing Component splines 3 3 3 2 2 2 1 1 1 Data – irregular spacing

Figure 2.15 1D splining. Cubic splines areﬁtted to each pair of the irregularly spaced data points so that the gradients of connecting

splines are the same at the joining point (Δ1etc). The new values,

interpolated at regularly spaced intervals, are obtained from the splines.

kind of overrun which creates non-existent maxima, minima or inﬂections: for example a high-amplitude positive anomaly surrounded by a negative ‘moat’. There are various ways of reducing these artefacts. Unlike the cubic spline, the Akima spline uses polynomials based on the slopes of the data points local to the new interpolated point, so it copes well with abrupt variations in the data. An alternative strategy is to introduce tension into the spline (Smith and Wessel, 1990). This involves relaxing the minimum curvature property, but has the advantage of reducing overruns etc. The greater the tension, the less overrun that occurs, but the less smooth is the overall interpolation.

When the data are in the form of sub-parallel lines, gridding is possible based on successive perpendicular 1D interpolation, usually with splines. Firstly, interpolation is done along the (approximately) parallel survey lines to produce an equally spaced along-line distribution of samples. The new samples are then used in a second interpolation perpendicular to the interpolated lines, to com- pute new samples between adjacent lines. This is known as bi-directional gridding (Fig. 2.16). The physical analogy would be bending a sheet ofﬂexible material so it approxi- mates the form of the variation in the data with a smooth surface. Two-dimensional spline gridding is often used when the data points are irregularly distributed along a series of approximately parallel survey lines. It is not suitable for randomly distributed data, or line-based data where the lines have random directions.

2.7.2.3 Interpolation parameters and artefacts

Setting the appropriate spacing between the interpolated values, i.e. the grid cell size, when interpolating data is fundamental in producing a grid that depicts the survey data with a high degree of accuracy (Fig. 2.14a). If the chosen cell size is too small, instability may occur in the algorithm resulting in artefacts (see below). On the other hand, a very large cell size will result in the loss of useful short-wavelength information and introduce spurious long wavelengths because of spatial aliasing (seeSection 2.6.1). In practice, the uneven distribution of samples inevitably leads to variable degrees of spatial aliasing within the interpolated dataset.

When the data are random or form a regular grid network, the cell size is usually set at about half the nom- inal distance between the data points. Calculating minimum or average spacing from the data is rarely useful since the results may be affected by clusters, cf.Fig. 2.10c.

Comparing the final grid with the distribution of data points is an effective way of determining whether features of interest are properly represented, distorted or signifi- cantly aliased. The distribution of the points will strongly influence the gridded data, with most gridding-induced artefacts occurring in areas where there are fewer data points to control the gridding process. Ideally the gridding algorithm should automatically not interpolate beyond some specified distance, assigning ‘dummy’ values to nodes that the data do not adequately constrain. If this is not the case then features that occur in gaps in the data, or near its edges, should be viewed with suspicion. Furthermore, features centred on a single data point, referred to as single- point anomalies, must be considered highly unreliable. For these reasons it is good practice to have a map of survey station/point locations available when analysing the data

(cf. Fig. 3.18). This is also useful for recognising changes

in survey speciﬁcations, as inevitably occurs when datasets have been merged to form a single compilation (see

Section 2.7.3). This can cause changes in the wavelengths

Field data

Interpolated data points (nodes)

Locations of data points a) b) c) 1st 1D spline 2nd1D spline Final grid

Figure 2.16 Bi-directional gridding of a dataset, comprising a series of approximately parallel survey lines, using splines.

that make up the gridded data and may cause artefacts to appear along the join of the different datasets.

When the data to be gridded consist of parallel lines, with station spacing much smaller than the line spacing, the cell size must account for the anisotropic distribution of the measurements. It is not uncommon for the across- line sampling interval to be greater than the along-line interval by a factor of 50 or more in reconnaissance surveys, with 1:10 or 1:20 common for detailed prospect- scale surveys. A cell size based on the along-line sampling interval will create major difﬁculties for interpolation in the perpendicular direction, potentially creating artefacts (see below). On the other hand, choosing a cell size based on the line spacing will result in the loss of a lot of valuable information contained in the line direction. The normal

compromise is to select a cell size of between 1/5 and 1/3 the line spacing. A smaller cell size can only be justiﬁed by having closer sampled data, and in particular closer survey lines.

Even with these cell sizes, interpolation in the across-line direction can be a challenge for gridding algorithms. Lat- erally continuous short-wavelength anomalies, as might be associated with a steeply dipping stratigraphic horizon or a dyke, can cause particular problems. The phenomenon is variously referred to as beading, boudinage, steps, step ladders, string of beads, etc. The aeromagnetic data in

Fig. 2.17illustrate the effect. Note how the individual beads have dimensions in the across-line direction equal to the line spacing, allowing this kind of artefact to be easily recognised. As the anomaly trend approaches the survey line direction, individual beads become more elongated towards this direction.

A problem with minimum curvature algorithms (see

Section 2.7.2) is their tendency to produce round

anomalies, often referred to as bulls-eye anomalies. This occurs because an unconstrained minimum curvature surface is a sphere. Where data points are sparse and the true shapes of features not properly deﬁned by the data sampling, the gridding algorithm makes them circular (Fig. 2.18).

Both beading and bulls-eyes are the result of inadequate sampling by the survey. The solution is more measurements, but this may be impractical. The problem can be a) 10 Kilometres 0 Survey lines b) c)

Figure 2.17Beading in gridded data containing elongate anomalies. The example is aeromagnetic data from South Australia. (a) Close- up of beaded data showing how each bead is centred on a survey line. (b) Data gridded using the inverse-square technique with minimum curvature applied. The rectangle is the area shown in (a). (c) The same data gridded using a trend-enhancement algorithm. Data reproduced courtesy of Department of Manufacturing, Innovation, Trade, Resources and Energy, South Australia.

Olympic Dam Breccia Complex Data points

Kilometre 0 1 A B A A 10 20

Figure 2.18 Circular anomalies produced by the minimum curvature gridding illustrated in contours of IP-phase response

(where IP¼ induced polarisation), at a constant pseudo-depth,

from the Olympic Dam IOCG deposit in South Australia. Positive (A) and negative (B) circular anomalies are caused by inadequate sampling of the anomalous areas. Based on a diagram and data

from Esdale et al. (2003).

imperfectly addressed if information about geological/ anomaly trends is known or can be inferred. Some gridding algorithms include a user-deﬁned directional bias, referred to as trend enhancement. This has the disadvan- tage of assuming a consistent trend across the entire survey area, but can be highly effective if this assumption is valid

(Fig. 2.17c). An alternative approach is to use the data to

estimate between-line values and use these in the gridding process. The most effective remedy is to measure gradients of the ﬁeld in the across-line direction, which act as con- straints on the permissible interpolated values and enhance across-line trends, (O’Connell et al.,2005).

There is an extensive literature on gridding methods, for example Braile (1978) and Li and Göetze (1999). Different methods perform better in different circumstances; but both inverse-square distance with minimum curvature, and spline algorithms are widely used for geophysical data.

In document Geophysics for the Mineral Exploration Geoscientist.pdf (Page 53-57)