• No results found

4.2 Methodology

4.2.1 Principal Component Analysis

Principal component analysis (PCA) is commonly used in fields other than geophysics. In statistics, PCA is used to reduce the number of variables in a dataset through eliminating variables that correlate with each other by transforming the data set into a smaller number of variables, or principal components, that are linearly uncorrelated with, or orthogonal to, each other. The majority of the variance in the data set is captured in the first few principal components. Thus, a very large matrix of observations may be re-represented as a sig- nificantly smaller matrix when redundancies are found within the data (Jolliffe, 2002). In other fields, PCA is known as the Karhunen-Loeve transform, the Hotelling transform, and the Eckart-Young theorem. Here, since the horizontal channels of a seismometer record the same signal simultaneously, the data from the two channels is clearly correlated. PCA reduces the signals recorded over the two channels to a principal component signal that is orthogonal to both, and by simple vector analysis one may see that the principal com- ponent thus points to the direction from which the signal originates, with an ambiguity of 180 degrees. The method of using PCA to determine directionality from two horizontal vectors has been used elsewhere to determine tool orientation in downhole seismic surveys (Michaels, 2001). Eigenvector/eigenvalue analysis, of which PCA is a flavor, has also been used to investigate the source of volcanic tremor (Ereditato and Luongo, 1994). However, the application to the case of determining ambient signal directionality is new. PCA also of- fers a way to directly detect and image off-path signal bounces that contribute to the energy recorded at seismic stations. Eliminating the off-path contributions avoids contamination from phase ambiguities and allows one to focus on noise traveling between each station

pair in the ground. The Matlab code written by the author to conduct PCA and implement quality control parameters is given in Appendix B.

The broadband seismometers installed at stations AKGG, AKLV, AKRB, and AKUT on Akutan are all three-component instruments. PCA uses the recordings from the north and east (horizontal) channels of each instrument. In general, PCA is set up by arranging

the data recorded over the time series by the north and east channels into a matrix,ξ, of the

form ξn=    E1 E2 · · · En N1 N2 · · · Nn    T (4.1)

where En are then samples in the time series recorded on the east channel, Nn are then

samples in the time series recorded on the north channel, and T is the matrix transpose.

Each(1,...,n)2-dimensional element ofξnis, physically, a position in the east-north plane

of particle motion such that each two-component (east,north) element of ξn is a vector

variable of the particle motion throughout the period of observation. To accommodate analysis over a very long time series, the matrix is analyzed over a certain number of time samples in a window as:

ξwin=    E1 E2 · · · Ewin N1 N2 · · · Nwin    T , (4.2)

where the variablewinis the length of the window. An idea that emerged over the course

of this study concerned the possibility of some frequencies offering more directional signals than others. To investigate, each window of data was narrowband filtered with a zero-phase 8th order Butterworth bandpass filter created with the Matlab FILTFILT command. Center frequencies of the bandpass filter ranged from the lowest frequency of interest, 0.01 Hz,

to the highest frequency of interest, 1 Hz, and the bandwidth of the pass band was set to

±0.001 Hz on either side of the center frequencies. The window length was set to a number

of samples that covered one wavelength for a frequency of 0.01 Hz. The larger window size provides a wider aperture, allowing one to look at more closely-spaced frequencies, and also ensures that each frequency may be present in each window. The functions bpfilt3.m and pca2staGen.m in Appendix B define the filters and show the implementation. The choice of a set window length rather than a window length definition that depends on the frequency under investigation and the consequences of this choice are discussed further in Chapter 6. For a general investigation of the variation in signal arrival direction with frequency and to speed runtime, a wider bandpass filter was used for the generation of Figures 4.1 and 4.3 in this section. An advancement of a number of points equivalent to 200 samples was chosen to assure overlap of a decent number of samples as the analysis slides along the time series, but this may be increased to reduce runtime of the codes provided in Appendix B because larger advances will reduce the number of windows generated. In keeping with the decision to avoid the assumption of temporal signal stationarity, the mean of the signal was removed from each window individually instead of from the entire signal. This allows one to accommodate times with high seas or increased ship traffic. Since PCA focuses on defining a new set of orthogonal variables, the next step in the derivation takes the definition of the scalar inner product of the north and east components as:

hE,Ni=

EwinNwin =ETN, (4.3)

east component as:

¯

Ewin= 1

wini=1

,...,winEi, (4.4)

and similarly, for the north component as:

¯

Nwin= 1

wini=1

,...,winNi. (4.5)

Then, the covariance matrix is defined by taking the outer product:

Cov=    h(E−E¯),(EE¯)i h(EE¯),(NN¯)i h(N−N¯),(EE¯)i h(NN¯),(NN¯)i   . (4.6)

The principal component directions are the eigenvectors of the covariance matrix. The azimuthal direction for the approaching ambient seismic noise is then identified by tak- ing the four-quadrant arctangent of the eigenvectors. The eigenvector associated with the largest eigenvalue is, physically, the long dimension of the particle motion ellipse. In the quality control component of the PCA algorithm, the ratio between the maximal and min- imal radii of the ellipse traced by the particle motion is used to determine whether a data window is classified as good or bad as noted below.