• No results found

Chapter 7: Arabian Sea Project Methodologies

7.7 Data Analysis

In order to explore variation within the XRF data set generated from the high-resolution scanning measurements of cores from ODP Site 721/722, a set of time-series analyses and statistical models were developed using a number of different methods. Time-series analysis aims to investigate the temporal behaviour of one or several variables which can be random,

clustered, cyclic or chaotic through time. These are briefly summarised in the following sections.

7.7.1 Generalised linear and additive modelling

Generalised linear models (GLMs) are distribution-driven models with a linear predictor where the distribution of the independent response variable can be defined as belonging to a particular exponential family such as Gaussian, Poisson, binomial, geometric or gamma. GLMs therefore require an a priori statistical model to define the distribution of the response variable (Yee and Mitchell, 1991). Sometimes GLMs do not have sufficient flexibility to adequately model the true regression surface; this is particularly true in the natural world when analysing environmental variables.

Generalised additive models (GAMs) are a non-parametric extension of GLMs in which the sum of regression coefficients and explanatory variables in a linear regression is replaced by the sum of smoothed functions of the explanatory variables i.e. where a linear predictor is replaced by a smooth predictor (Simpson and Anderson, 2009). The models are data-driven where the complexity of the smoothing function is characterised by the number of degrees of freedom and the resulting fitted values are not determined from an a priori model, as is the case with GLMs (Yee and Mitchell, 1991). This makes GAMs a particularly useful tool for the exploration of data as they are able to deal with non-linear data structures.

The mixed-effects model (GAMM) is one which contains both fixed and random effects. Fixed effects are the standard representation of variables in a linear model whereas random effects arise from additional sources of variability within the data caused by grouping structures (Simpson and Anderson, 2009). The GAMM used here is more complex and takes significantly longer (approximately 5 days) to run, but generates a more statistically accurate trendline and includes a function to explore autocorrelation structure between the data.

The different types of models were tested to determine the best way to explore trends within the XRF elemental ratios such as Ti/Al, which is commonly adopted as a proxy for terrigenous content. The models were run using either the raw or log-transformed data. Models were run using the software R (v. 2.12.1, 2010). A gamma GLM was found to be the best way to analyse the data and plots were generated. The gamma GLM uses the same

systematic fit as a normal linear model but with an additional random part which has a gamma distribution. The model was fitted with an adaptive smoother which allows for periods where there is limited structure within the data and adapts for periods where there is evidence of more rapid change. 99% confidence intervals and derivatives were plotted for each variable.

7.7.2 Spectral analysis

The adoption of GLMs and GAMs allows the examination of changes and trends in the XRF data sets from ODP Site 721/722. However, in order to understand the more detailed complexities of periodicities present within the data, it is necessary to use auto-spectral and cross-spectral analysis. Auto-spectral analysis aims to describe variation within a single variable through time as a function of frequency or wavelength using methods such as the Blackman-Tukey method which is popular in palaeoclimate studies and uses a complex Fourier transform of an autocorrelation sequence (Trauth, 2010). One drawback of the Blackman-Tukey method (and other spectral analysis techniques such as the Welch method) is that it requires data to be evenly spaced in time. This can be problematic in the Earth Sciences where material is typically sampled at constant depth intervals which are unevenly spaced in time. This requires the addition of an extra step which involves the interpolation of the data to account for the uneven spacing. Interpolation is commonly performed using either linear of cubic-spline interpolation, although both can introduce artefacts into the data set. An alternative method for performing spectral analysis on unevenly-spaced data is the Lomb-Scargle power spectrum algorithm (Lomb, 1976; Scargle, 1982). The advantage of this technique over other methods is that it does not require data to be first interpolated at evenly-spaced time intervals, thus eliminating the introduction of further uncertainty to the data. A more detailed review and discussion of the method can be found in Schulz and Strattegger (1998).

Another method of examining time-series data and identifying changes in dominant cyclicities through time is through the use of wavelet analysis. Wavelet analysis is an increasingly used technique to explore variation within palaeoclimate data by analysing local variations in time-frequency space using the wavelet transform. This enables both the analysis of the dominant modes of variability within the data and the determination of how these modes change through time (Torrence and Compo, 1998). In contrast to the Fourier transform, the wavelet transform uses base functions or wavelets which are small packets of

frequency domain is performed using a mother wavelet ψ (t). The most popular mother wavelet in Earth Sciences is the Morlet wavelet. For a more detailed explanation of wavelet analysis and its use as a tool for palaeoclimate analysis, see Torrence and Compo (1998), and Trauth (2010).

A number of different techniques for spectral analysis were performed on the Ti/Al data set from ODP Site 721/722. The Lomb-Scargle method was found to be the best technique for spectral analysis. Wavelet analysis was also performed and the resulting model plotted alongside variations in Ti/Al through time. A low-pass filter of 800 years was applied prior to analysis in order to eliminate some of the high-frequency noise within the data set without compromising the resulting power spectra. All analyses and plotting of data were performed using MATLAB version 7.10.0 (R2010a) for Windows.

7.8 Summary

Sediments from ODP Sites 721 and 722 were sampled for analysis of Plio-Pleistocene changes in parameters related to monsoon intensity. Discrete samples, taken every 10 cm, were analysed for bulk organic and inorganic isotope composition from benthic foraminifera. These were used to construct a revised oxygen isotope chronology for ODP 721/722, tied to the LR04 global benthic stack (Lisiecki and Raymo, 2005). Core halves were scanned at high-resolution (2 mm) for changes in elemental composition using an XRF scanner. Data sets were then analysed using a range of time-series analysis techniques to explore variation and trends within the data.

Chapter 8:

A 600-kyr Record of Dust Transport to the