Plane wave parameter estimation using gps estimates of total electron content in a neural network

(1)

c

(2)

PLANE WAVE PARAMETER ESTIMATION USING GPS ESTIMATES OF TOTAL ELECTRON CONTENT IN A NEURAL NETWORK

BY

AARON SMITH

THESIS

Submitted in partial fulfillment of the requirements

for the degree of Master of Science in Electrical and Computer Engineering in the Graduate College of the

University of Illinois at Urbana-Champaign, 2018

Urbana, Illinois Adviser:

(3)

ABSTRACT

Global Positioning System (GPS) signals provide us with a unique oppor-tunity to continually monitor the free electron density in the ionosphere. Physical phenomena, such as tsunamis, have been shown to create wave fea-tures in the free electron density. The parameterization of these waves is of interest to the scientific community. Here, we investigate the application of neural networks as our parameter estimator. In this study, we provide a background on the use of GPS signals, as used to quantify the total number of free electrons between a satellite and a receiver. Following this, we provide an analysis of the neural network, starting from a basic neuron, and discuss the means by which a network is able to perform both classification and re-gression. We then describe in detail the methodology we use to construct a network which utilizes Doppler frequency and velocity information to esti-mate the waveheading, wavelength, and frequency of a plane wave. After an evaluation of our simulated environment, we apply our network to GPS data captured during the 11 March 2011 Tohoku tsunami.

(4)

(5)

TABLE OF CONTENTS

CHAPTER 1 INTRODUCTION . . . 1

CHAPTER 2 BACKGROUND . . . 3

2.1 Identifying Total Electron Content Using GPS Signals . . . 3

2.2 Neural Networks . . . 7

CHAPTER 3 SIMULATION . . . 23

3.1 Generation of Training Data . . . 24

3.2 The Neural Network . . . 27

3.3 Simulation Evaluation . . . 29

CHAPTER 4 CASE STUDY . . . 41

4.1 Receiver Locations . . . 41

4.2 Data Pre-processing . . . 42

4.3 Results . . . 45

CHAPTER 5 CONCLUSION . . . 53

(6)

CHAPTER 1

INTRODUCTION

The ionosphere is comprised of charged gases that alter the phase and group velocities of radio waves, such as GPS signals. This property allows us to remotely sense the Total Electron Content (TEC) along a path between a ground-based receiver and the transmitting satellite. Tsunamis have long been known to produce internal gravity waves (IGW)s, which propagate obliquely upward. These IGWs were first identified using GPS following an earthquake in Peru [Artru et al., 2005]. Following this, several tsunami generated traveling ionospheric disturbance (TID)s have been identified using GPS observations.

An internal atmospheric gravity wave is a physical mechanism which prop-agates energy and momentum vertically through the atmosphere. A pioneer-ing interpretation of this process was put forth by Hines [1960]. In this work, the author describes the propagation of internal atmospheric grav-ity waves, which are driven by compressive and gravitational forces. Later,

Hooke [1968] formalized this interpretation using a perturbation approach, which connected atmospheric gravity waves to ion production, chemical loss, and motion of ionization.

The changes to Total Electron Content (TEC) due to atmospheric gravity waves is explored inDavis [1973]. As gravity waves interact with the charged gases of the ionosphere, they form a TID. In some cases, these TIDs can be related back to the source phenomenon. In these cases, the identification of a TID is performed by identifying a significant event, such as an earthquake or a tornado, then the GPS data is processed to measure the TEC, which may contain a TID. In this work, we attempt to train a neural network to identify wave parameters in TEC data. Ideally, such a system could be used to estimate wave parameters over a large dataset, possibly illuminating source phenomena which have been overlooked in the past.

(7)

then describe a method for measuring TEC, using two different GPS carrier frequencies and a fixed ground-based receiver. After this, we provide an intro-duction to the neural network. Our study of the neural network begins with a mathematical description of the neuron and a demonstration of the neuron’s ability to act as a logic gate. We then build on this concept to perform linear classification and regression. After discussing the individual neuron, we ex-tend our analysis to a neural network, where we describe the process through which a network of neurons is capable of performing non-linear estimation. Finally, we conclude with a description of the backpropagation algorithm.

In Chapter 3, we discuss a set of assumptions made in our formulation of the TID parameter estimation problem. We model the TEC field as a two-dimensional surface and a pierce point as a collector, traveling with constant velocity on this two-dimensional surface, collecting TEC samples. Due to the motion of the pierce point, the TID frequency, as observed by a pierce point, will be shifted due to the Doppler effect. Assuming the TID is a uniform plane wave, we hypothesize that, with multiple pierce points of varying velocity, we can utilize the shifts in the measured TID frequency and the pierce point velocities to identify the properties of the underlying wave. In this chapter, we discuss our method for generating simulated training data. After this, we present the neural network architecture, which was trained to solve the two-dimensional case. We conclude the chapter with an analysis of the simulated environment.

In Chapter 4, we apply our trained neural network on GPS data from March 11, 2011. On this day, an earthquake off the coast of Japan generated a tsunami which has been shown to have created TIDs that were detectable on the West Coast of the United States [Azeem et al., 2017]. In this chapter, we discuss the Continuously Operating Reference Station (CORS) network of GPS receivers and the pre-processing steps used to prepare the GPS data for input into the trained neural network. Following this, we present the results of our estimations, followed by a discussion of the outcome. We conclude in Chapter 5 with an overall summary of the study and a discussion of possible future work.

(8)

CHAPTER 2

BACKGROUND

In the following chapter, we review the concept of the TEC field and then describe the dual-frequency method for TEC determination. We then briefly discuss the history of the neural network, followed by an introduction to the neuron base elements. After this, we provide examples of classification and regression using both the neuron and a neural network and conclude with a description of the backpropagation algorithm.

2.1 Identifying Total Electron Content Using GPS

Signals

Each GPS satellite contains a highly precise atomic clock which is used to encode the transmitted signals with timing data. When these time encoded signals reach a receiver on the ground, we can use that timing information to calculate the distance between the receiver and the satellite, or similarly, the phase advance of the carrier signal. If we know the fixed position of our receiver (using land surveying equipment) and we know the position of the satellite (using the satellite ephemerides), then we already know the true distance between the receiver and the satellite. The difference between the actual path length and calculated path length then tells us something about the change in the speed of the signal. The main factor in this is the ionosphere, which changes the signal speed due to refraction. In this section, we first describe the notation and geometry for TEC, and then we discuss a method for determining TEC experimentally.

(9)

2.1.1 Total Electron Content

The ionosphere is a region in the atmosphere with begins at about 50 km above the earth surface and extends to approximately 1000 km. This region of our atmosphere absorbs radiation from the sun, which splits molecules into a mixture of free electrons and ions. This process of ionization is cyclic with the day/night cycle, as the ionized gases begin to recombine once the UV radiation from the sun is removed.

TEC defines the total number of free electrons along a ray path and is calculated as

TEC =

Z

l

ne(l)dl (2.1)

where l is the ray path between a satellite and the receiver, and ne is the electron density for a point along that path. We note that this column integration has a cross-sectional area of 1m2_{. Additionally, common notation}

is to measure TEC in units of TEC Unit (TECU), which is a normalizing unit where 1 TECU = 1016 _electrons/_m2_.

A common simplification of the ionosphere, made when interpreting TEC measurements obtained from GPS receivers, is to assume that the free elec-trons are all located on a shell located at a height ofIh above the earth’s sur-face. A ray path, between the transmitting satellite and the receiver, would intersect this shell at a particular location called the ionospheric pierce point (IP P). A diagram of this is shown in Figure 2.1. Assuming a uniform density in the latitudinal and longitudinal directions, the TEC would be minimized when the satellite is in the zenith direction of the receiver, as we are mini-mizing the path length through the ionosphere. This is generally refereed to as vertical TEC (VTEC).

Again assuming that the electron density is uniform in the latitudinal and longitudinal directions, we can convert TEC to VTEC using the geometric relationship:

TEC(ζ) = 1

cos(ζ0₎TECV (2.2)

We useζto denote the zenith angle, the angle between a purely vertical vector and the vector pointing to the satellite. The elevation angle is el = 90−ζ. At the pierce point, we define the zenith angle as ζ0, which can be different than ζ due to the spherical nature of the geometry.

(10)

Ionosphere Ih ζ ζ0 IP P Earth Earth Center

Figure 2.1: A diagram of the path between a GPS satellite and a ground-based receiver.

2.1.2 TEC Estimation Using Dual-Frequency GPS

Observations

GPS satellites use two different carrier frequencies to transmit the modulated message, those frequencies areL1 = 1575.42 MHz andL2 = 1227.6 MHz. In

a dispersive medium, like the ionosphere, the phase and group velocities of a GPS signal will change as a function of the carrier frequency and the electron density. Here, we use np to denote the phase refraction index. This is calculated as

np =

c vp

(2.3) wherevp is the phase speed of the signal andcis the speed of light in vacuum. The phase refractive index can be approximated to the first order as

np ≈1−

40.3 ne

(11)

where ne is the electron density and f is the frequency of the signal [Misra

and Enge, 2011]. Using this refractive index, we can calculate the time it would take a signal to reach our receiver as

τ = 1

c

Z

l

np(l)dl (2.5)

Similarly, we can calculate the time it would take a signal to reach the re-ceiver, in the absence of the ionosphere, as

τ = 1

c

Z

l

1 dl (2.6)

By combining Equations 2.1, 2.3, 2.5, and 2.6, we obtain the propagation delay caused by the ionosphere as a function of c, f, and TEC.

∆τ = 1 c Z l np(l)dl− 1 c Z l 1dl =₋1 c Z l (1₋np(l))dldl =₋1 c Z l 40.3 ne(l) f2 dl =−40.3 TEC cf2 (2.7)

The phase advancement of a GPS signal is a function of the distance be-tween the satellite and the receiver, as well as a number of other factors, such as timing errors, tropospheric delay, and multipath. However, when we subtract the differences in phase between two signals transmitted at the L1

and L2 frequencies, many of the extra terms cancel out. This leaves us with

λL2φL2 −λL2φL1 =IL1 −IL2 +b+λL2NL2 −λL1NL1 =−40.3 TEC f2 L1 +40.3 TEC f2 L2 +b+λL2NL2 −λL1NL1 = 40.3 TECf 2 L2 −f 2 L1 f2 L1f 2 L2 +b+λL2NL2 −λL1NL1 (2.8)

where Iq =c∆τq and bcorresponds to unknown timing biases in the receiver and satellite hardware. Additionally, we have the termsλqNq, which are the wavelengths of the carrier signals and an unknown integer number of phase cycles between the receiver and satellite. This ambiguity of N is caused due

(12)

to the cyclic nature of the phase signal. Solving Equation 2.8 for TEC, we obtain TEC = 1 40.3 f2 L1f 2 L2 f2 L2 −f 2 L1 (λL2(φL2 −NL2) +λL2(NL1 −φL1)−b) (2.9)

Using this phase difference method, we are able to identify a value which is proportional to TEC, but we cannot identify the absolute value in units of TECU, due to the unknown biases and integer cycle counts.

A similar method, which uses the pseudo-range calculated using theL1 and

L2 carrier frequencies, is able to identify TEC in an absolute sense, without

the cycle ambiguity N [Misra and Enge, 2011]. However, the pseudo-range method returns a TEC value that contains more noise than the carrier phase method described in Equation 2.9. To resolve this, we can identify the mean of the TEC result from the pseudo-range method and use that to scale our carrier phase values. This combination results in a clean signal with absolute TECU values [Makela, 2003].

2.2 Neural Networks

The TEC in the ionosphere is influenced by a number of sources, such as so-lar activity, the day/night cycle, and geophysical phenomena. Some ground base sources, such as tsunamis, impart traveling waves into this TEC field. By measuring the parameters of these traveling waves, we can validate at-mospheric models and gain information about the source phenomena. In our experiment, we have chosen a neural network as our parameter estimator. In this section, we provide a brief history of the neural network, a description of the base neuron elements, and an description on how we can train a neural network to handle non-linear data.

2.2.1 History

The concept of a neural network draws its history from the study of neurons in the brain. In a biological neuron, a potential builds at the dendrite due to the activity of its neighbors. When the potential rises beyond a threshold, the neuron fires, sending an electrical signal down the length of the neuron.

(13)

In 1943, Warren McCulloch and Walter Pitts presented a paper describing how a network of neurons might work using electrical circuits [McCulloch and Pitts, 1943]. The concept of threshold logic motivated many future studies in both the biological processes and the mathematical representation of an artificial neuron. One such study was summarized by Donald Hebb when he introduced a learning hypothesis which described how neural pathways were strengthened with each firing and how the brain performs an unsupervised learning operation, later known as Hebbian learning [Hebb, 1949].

In the mid 1950s, the computational resources were advanced enough to support the simulation of so-called Hebbian networks, followed shortly by simulations of multi-layer networks. During this time, there was significant interest in the field and a vast perceived potential for utility. However, by the 1970s, much of this optimism had faded. In a paper by Minsky and Papert [Minsky and Papert, 1969], the authors discuss the inability of a single-layer network to simulate the exclusive-or circuit, and the significant computa-tional resources needed to train and execute multi-layer neural networks. Following this, due to those issues and the general over promise of the future capabilities of such machines, research into neural networks began to slow.

In 1975, Paul Werbos described the process of backproagation, as applied to neural networks, in his Ph.D thesis [Werbos, 1975]. Backpropagation is an algorithm that facilitates the training of multi-layer neural networks by allowing the individual neurons to properly apply gradient decent on a backwardly propagated error signal which is proportional to the error for that node. Furthermore, backpropagation utilized dynamic programming, which provided a method for re-using previously calculated errors, thereby significantly reducing the computational complexity. To this day, neural networks are still trained using this algorithm.

As computing power continued to grow according to Moore’s law, we were able to utilize larger and more complex networks along with larger datasets. Today, neural networks have been successfully deployed in a wide range of research fields. In 2012, AlexNet [Krizhevsky et al., 2012], a Convoluational Neural Network (CNN), won the ImageNet image classification competition. This was the first time a neural network had beaten the expert feature based methods in the field of image classification and was seen as a pioneering event for neural networks. Other areas of impact include planning, natural language processing, pattern recognition, automated trading, machine

(14)

trans-lation, cancer detection and game-playing.

We are again experiencing significant optimism in the area of machine learning and artificial intelligence. If the past has taught us anything, it is that some of this optimism is probably more air than substance. However, neural networks are providing new and innovative solutions to old problems and the application of neural networks into new domains should be investi-gated.

2.2.2 The Neuron

The fundamental element of a neural network is the artificial neuron. This is a computational unit that was inspired by the physical processes of the biological neuron cell. A biological neuron is a cell that takes inputs from its surroundings, with the inputs each having a different connective strength, or weight. If the neuron cell receives enough potential from its neighbors, it can then transmit a pulse of energy to its own outputs. The artificial neuron attempts to approximate this physical process and is often diagrammed in some manner similar to Figure 2.2. Each line that connects an input to a neuron has an associated weight which scales the input value. Internally, the weighted inputs are summed together with a bias and the result is passed through an activation function, σ, which becomes the value of that neuron. Mathematically, we can write the output value of the neuron as

h(x,w) =σ X

i

wixi+b

=σ(wTx+b) (2.10) This value can then be output to other neurons, but for now we restrict our discussion to a single neuron. An expanded diagram of this process is shown in Figure 2.3.

A neuron as a logic unit

To begin, we evaluate a neuron which takes in two binary inputs [x1, x2], and

(15)

inputs

x

1

x

2

..

.

x

_F

neuron

value

h

Figure 2.2: Diagram of a neuron with inputs x. In this diagram, the weight values ware assumed due to the connected lines. There is also an assumed activation function σ, though its properties are not explicitly defined. This is shown in more detail in Figure 2.3.

Heaviside step function, defined as

H(z) =    1 ifz >0 0 ifz ≤0 (2.11)

and we can re-write Equation 2.10 as

h(x,w) =    1 if wTx+b >0 0 if wTx+b≤0 (2.12)

By choosing appropriate values for w and b, we can build the logic gates that underlay computation. For example, if we wanted our neuron to perform the AND operation, we could assign weight values [1, 1] with b=₋1.5. The value of the neuron would then be h(x,w) = H(x1+x2−1.5), which would

only be > 0 when both x1 and x2 were equal to one, which is the AND

operation. We could similarly identify appropriate values ofw andb to form the OR, NOR, and NAND functions (see Table 2.1).

We can visualize this by plotting the input x on the Cartesian plane, where x1 is plotted on the x-axis, and x2 on the y-axis. The region for

(16)

input & weight values x1 w1 x2 w2 .. . xF wF X X .. . X P b activation function neuron value h

Figure 2.3: An expanded neuron unit with an input vector xand weight vector w. This figure explicitly diagrams the dot operation, the bias termb, and the activation function of the neuron.

Table 2.1: This table lists an example set of weights and biases that could be used to perform the logic operations AND, OR, NAND, NOR.

AND OR NAND NOR

w= [1,1] w= [−1,−1] b=₋1.5 b=₋.5 b= 1.5 b=.5 x1 x2 y 0 0 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 1 1 1 0 0

h(x,w) = 1 can be found from Equation 2.12, asx2 >−x1_ww1₂ − _wb₂ if w2 >0

and x2 <−x1_ww₂1 − _wb₂ if w2 <0. We demonstrate the operations and values

in Table 2.1 and Figure 2.4.

From Figure 2.5, we notice that we cannot recreate the exclusive-or oper-ation (XOR) using a single neuron, as there is no line that can separate the two classes (1/0). This suggests that a neuron is only capable of classifying

(17)

(a)

1

0

1

x

1

x

2 (b)

1

0

1

x

1

x

2 (c)

1

0

1

x

1

x

2 (d)

1

0

1

x

1

x

2

Figure 2.4: These figures plot the input space x1, x2 onto the Cartesian

plane and demonstrate a neurons ability to implement logic gate operations using the parameters described in Table 2.1. (a) An AND gate

implementation. (b) An OR gate implementation. (c) A NAND gate implementation. (d) A NOR gate implementation.

vectors xthat are linearly separable. However, since we can create an XOR gate using a combination of the operations in Table 2.1, it stands to reason that a network of neurons would be able to perform the XOR operation. We explore this further in Section 2.2.3.

A neuron as a linear classifier

In the previous section, we restricted our neuron to only take binary inputs. Here, we remove that restriction and allow x _∈ _RF, where F is the length of our input vector x. Using binary inputs and the Heaviside step as our activation function, σ, was convenient because it allowed us to easily identify the weight vector, w, and bias, b, needed to create the appropriate decision surface. In general, we will need to learn the weights and biases and in this section we do so using the perceptron algorithm [Freund and Schapire, 1999]

(18)

1

0

1

x

1

x

2

Figure 2.5: A plot depicting the exclusive-or operation. We note that to properly separate the two classes, we would need more than one decision line.

and gradient decent [Kiefer, 1952].

One of the earliest models of the artificial neuron (the perceptron) was developed by Frank Rosenblatt in 1957 [Rosenblatt, 1957] and is shown in Figure 2.3. Here, we will use that neuron, but instead of using a Heaviside step activation function, we will instead use the sgnfunction, defined as

sgn(wTx+b) =          1 if wTx+b >0 0 if wTx+b = 0 −1 ifwTx+b <0 (2.13)

which results in a neuron value of

h(x,w) =sgn(wTx+b) (2.14) In Figure 2.6, we show an example two-dimensional set of data and a decision line that properly separates the two classes. Now, using the per-ceptron algorithm, we seek to identify a vector of weights, w, and a bias,

b, that separates the two classes. To do this, we define a training set

D = {(xi, yi) : i ∈ I}, where xi is the ith data vector, indexed by the set I and yi is the desired output label for that input. Additionally, we de-fine x_i,f as the fth _{element in the vector} _x

i and extend the dimensions of xi by one, wherexi,1 = 1. This means that we also extend the dimensions ofw

by one and identify the individual elements in wbywf. Using this notation causes the dot product wTxi to always include the term (1)×w1, which we

(19)

is simplified to

h(x,w)i =sgn(wTxi) (2.15) Finally, as this algorithm iteratively updates the weight vector w, we will denote w0 as the next value of the weight vector. The perceptron algorithm can then be written as

w0 =w+ [yi−h(x,w)i]xi (2.16)

1

0

1

x

1

x

2

Figure 2.6: An example of linearly separable data with two dimensions. The process described in Equation 2.16 is continued, repeatedly cycling through the labeled training set D, until we meet one of two stopping condi-tions. The first stopping condition is reached when the sum of the absolute differences between weight vectors w0 and wis less than some threshold δ

δ <

F

X

f=1

|w_f0 ₋wf| (2.17)

where F is the length of the vectors xafter we have extended the length by one, to absorb the bias term. The second stopping condition is reached after a user-defined number of iterations through the training set D.

Next, we will demonstrate a method of using the neuron to perform classi-fication using logistic regression. We define a training set as D=_{(xi, yi) :

i_∈ I_}, whereI is an index set, xi is an input vector, and yi is a class label. We seek to identify the P(Y = yi|X = xi,w). Unlike the perceptron algo-rithm, this method gives us a measure of the confidence in our prediction. By choosing our activation function σ as the sigmoid function (see Figure

(20)

2.7), the output of our neuron becomes

h(x,w) = 1

1 +e−wT_x (2.18)

and is identical to the hypothesis used by logistic regression. Next, we assume that the P(yi = 1|xi,w) =h(xi,w) andP(yi = 0|xi,w) = 1−h(xi,w). This can be written as p(y_i|x_i,w) = (h(x_i,w))yi₍₁₋_h₍_x i,w))1−yi (2.19) −6 −4 −2 0 2 4 6 0 0.2 0.4 0.6 0.8 1 x f ( x )

Figure 2.7: The sigmoid function f(x) = 1 1+e−x.

If the samples in our training set are independent, then our joint probabil-ities can be written as the product of the individual probabilprobabil-ities. Therefore, our likelihood function is

L(w) =Y i

p(yi|xi,w) (2.20) and we identify the log likelihood function as

l(w) = log L(w) =log Y i p(yi|xi,w) =log Y i (h(xi,w))yi(1−h(xi,w)1−yi =X i yi log h(xi,w) + (1−yi) log (1−h(xi,w)) (2.21)

(21)

In plain words, if we input a data vector belonging to class one, we would like our hypothesis to output a value as close to one as possible. However, since there is a collection of data vectors, we need to do this for each xi and increasing the probability for one may impair another. Therefore, we instead maximize the joint probability of all the data vectors using the likelihood function. Obviously we cannot move the data vectors xi to increase this likelihood, so instead a maximization of the likelihood function is the process of identifying the best weight values w for our training data. We perform this maximization using gradient ascent

w0 =w+α∇wl(w) (2.22) whereαis a step size parameter. In this equation, we see that we are stepping up the gradient, with respect to the weight vector, w, of the log likelihood function, maximizing the joint probability of correctly labeling each data vector.

To find the gradient with respect tow, it is useful to note that the deriva-tive of the sigmoid function is d

dzg(z) =g(z)(1−g(z)). We can simplify the derivation of the gradient by identifying this for a single partial derivative of

w, for one training example, as

∂ ∂wf l(w) = yi 1 h(xi,w) − (1₋yi) 1 1₋h(xi,w) _∂ ∂wf h(xi,w) =yi 1 h(xi,w) − (1−yi) 1 1₋h(xi,w) h(xi,w)(1−h(xi,w)) ∂ ∂wf wTxi = (yi(1₋h(xi,w))−(1−yi)h(xi,w))xi,f = (yi−h(xi,w))xi,f (2.23) Here, we identify the fth _{element in the}_ith _{training example as} _x

i,f, and the

fth _{element in the weight vector} _w _as _w

f. This results in a weight update rule, for single training example i, for a single element inw, which is

w_f0 =wf +α(yi−h(xi,w))xi,f (2.24) Finally, we need a method for converting our neuron’s hypothesis output into a class label for binary classification. To do this, we simply assign the

(22)

label one to outputs with a P(Y = 1_|x_i,w) > 1

2 else we assign a zero. We

could also write this as sgn(P(Y = 1_|x_i,w)₋ 1

2) =sgn(h(xi,w)− 1 2).

A neuron as a linear regressor

If we assume that there is a linear combination of our input data x that describes a function h(x,w) of interest, then the neuron in Figure 2.3 will be able to perform this operation by using a linear activation function where

σ(wTx) = wTx. Here, we continue with the notation that x1 = 1 for all x,

such that w1 becomes our bias term. Then, our neuron output is

h(x,w) = σ F X f=1 wixi =σ(wTx) =wTx (2.25)

where F is the number of elements in our data vectors x, after we appended

x1 = 1.

Provided we have a set of training data,D=_{(xi ∈RF, yi ∈R) : i∈I},

we introduce a cost function and attempt to minimize that cost such that the hypothesis output of the neuron converges to the training valueyi. There are many cost functions that could be used, but we demonstrate the least-squares cost function as C(w) = 1 2 X i (h(x_i,w)₋yi)2 (2.26) which will lead to the ordinary least-squares regression model.

Similar to our approach in the classification problem, we will take advan-tage of the differentiability of our hypothesis by utilizing gradient decent. Here we seek to minimize a cost, as oppose to maximizing a likelihood as we did in classification. The update to our weights, using gradient decent, which is

w0 =w₋α_∇wC(w) (2.27)

where α is a step size parameter, and we seek to move down the gradient of w, thereby minimizing the cost over our training data. Identifying the

(23)

partial derivatives of w can be found as ∂ ∂wf C(w) = ∂ ∂wf 1 2(h(x,w)−y) 2 = (h(x,w)₋y) ∂ ∂wf (h(x,w)₋y) = (h(x,w)₋y) ∂ ∂wf (wTx₋y) = (h(x,w)₋y)xf (2.28)

where we show the derivation for a single element in the vectorw, for a single training example.

Combining Equations 2.27 and 2.28, we update the elements of our weight vector w as

wf =wf +α(yi−h(xi,w))xi,f (2.29) We note that this is only for a single weight element, for a single training example. To identify all of the weight elements, we would need to iterate the procedure for each weight element. If we want to extend this to multiple training examples, we could do so in a few ways. The first would be to sum the component errors over all training examples and to terminate after the weight vector has converged. This has the benefit of a guaranteed minimum over the training set, but requires us to save the results from each training example, which may not be feasible depending on the training set size. The second method is to update the weights after each training example, this removes the memory issue, but also means that we will likely not converge to an absolute minimum over the whole training set. The third option is to combine the two methods and to perform weight updates after a subset of the training data has been consumed.

2.2.3 Non-Linear Data

In the previous section we discussed a few different methods that could be used to perform linear classification and regression, using a single neuron. If the data that we are working on is not linear in nature, those methods will provide poor results. In this section, we discuss the additional step of networking neurons together to form a neural network.

(24)

To start, we will re-visit the example of the XOR gate implementation from Section 2.2.2. We again assume that we are building a logic model where our inputsx1 andx2 can only take the values 0 or 1. As was shown in Figure 2.5,

there is no line that could separate the classes correctly. However, as shown in Figure 2.8, we can build an XOR gate by combining other gate elements. In Figure 2.9 we diagram the connections of an equivalent neural network and list the parameters of those neurons in Table 2.2. Additionally, we set the activation functions σ, of each of the three neurons, to the Heaviside step function (see Equation 2.11). This process transforms the original input space x1, x2, to a new space defined by the outputs h11, h12. In this new

feature space, we find that the XOR operation is linearly separable by the output neuron y, as shown in Figure 2.10.

x2 x1

y

Figure 2.8: An example method of combining gates to create an XOR operation. inputs hidden layer output layer x1 x2 h1 1 h1 2 y

Figure 2.9: A diagram of the neurons used to implement the XOR operation.

This example with the XOR operation demonstrates the value of connect-ing multiple neurons together in a network. By doconnect-ing this, we are able to transform a non-linear input space into a linear feature space, using a hidden-layer of neurons. The problem with this structure is that we need a method of identifying the weights needed for this transformation. In the example of

(25)

Table 2.2: This table lists the parameters used to perform the XOR

operation, corresponding to Figure 2.9. Each neuron utilized the Heaviside step activation function.

w1 w2 b Operation h1 1 1 1 -0.5 OR h1 2 -1 -1 1.5 NAND y 1 1 -1.5 AND 1 0 1

x

01

x

02 (0,1) (1,0) (0,0) (1,1)

Figure 2.10: The feature space of our hidden layer from Figure 2.9, where

x0

1, x02 are the outputs of h11 and h12, respectively. We have added the

coordinates in the original x1, x2 space which map to these points. We note

that in this space, the XOR operation can be classified linearly.

the XOR gate, we could easily identify the weights and biases needed due to the simplicity of the input data, the size of the network, and the analogy to logic gates. In general, neural networks can be of arbitrary size, as shown in Figure 2.11.

In Section 2.2.2 we presented various methods of weight optimization using the perceptron algorithm and gradient ascent/descent. However, when the neurons are networked together, those methods no longer work. The issue being that any change in the weights of the neurons in earlier layers will effect the output of the neurons in later layers, which means that there is no longer a linear relationship between changes in the weights and a change in the output. This was a major issue with neural networks until 1975, when Paul Werbos outlined a method of training neural networks called backpropagation.

The backpropagation algorithm, as its name implies, is a technique which propagates errors in the output backward through the network. This is done by iteratively, computing gradients for each layer and repeatedly applying the chain rule through all possible network paths. Furthermore, by applying

(26)

input layer x1 x2 .. . xF−1 xF hidden layer 1 h1 1 h1 2 .. . h1 J−1 h1 J hidden layer 2 h2 1 h2 2 .. . h2 J−1 h2 J . . . . . . . . . . . . . . . hidden layer L hL 1 hL 2 .. . hL J−1 hL J output layer y1 y2 .. . yP−1 yP

Figure 2.11: Diagram of a fully connected neural network. In this figure, each hidden layer has a width of J, but this could be extended such that J

is a vector of hidden layer widths, with each layer possibly having a different width.

the principles of dynamic programming, we reuse intermediate results when calculating the gradients of the layers. This is of particular importance as the number of paths through a network grows exponentially.

As one might expect, a full derivation of the backpropagation algorithm is expressionally expansive, but the steps involved are fairly straightforward. The first step in this algorithm is to input a training example x_i into the network via the input layer. For each neuron in the network, and for each training example, we store the values of the weighted inputsyi

j =wTxand the neuron output zyi

j =σ(s yi

j ). Here, the valuewTx may not correspond to the input training vector, as only the first layer of nodes will directly interact with the input vector x(see Figure 2.11). Next, we use thesyi

j values to calculate error signals via a gradient for each node, for each training example. Finally, we use the error signals, the values zyi

j , and gradient decent to identify the value by which we update each individual weight.

In our experiment, we utilized a variation of backpropagation called Adap-tive Moment Estimation (Adam) [Kingma and Ba, 2014]. This method builds on standard backpropagation by storing exponentially decaying aver-aged past gradients and squared gradients. This gives our movements down the gradients a sense of momentum and friction, which causes the algorithm to prefer flat minima on the error surface. This is useful because gradient decent is subject to local minima and this algorithm seeks to address that

(27)

(28)

CHAPTER 3

SIMULATION

Our goal is to design a system which performs wave parameter estimation using a neural network. As discussed in Chapter 2, we require a method which determines the network’s weight values. This process is performed using backpropagation, which uses a training set of example inputs and labels. Often, training sets do not exist for real-world problems, as this would require an already working solution, or the employment of human beings, which manually label each example of real data. Instead, we rely on a simulated dataset, which would ideally match the form of the real-world problem.

In Chapter 2, we discussed the TEC field and the waves which propagate in this field. These waves are the natural phenomena we wish to identify. In order to estimate the parameters of these waves, we need to collect a set of data that is correlated with those parameters. In a blind approach, we might naively collect any piece of data we can obtain, i.e., the temperature, the time, the location of the planets, the number of soccer games played in Oregon on a particular day, etc. Obviously, not all information is equally valuable to our problem and by introducing uncorrelated, or weakly correlated data, we increase the complexity of our problem without aiding in the solution.

We hypothesize that a minimal dataset would only require the GPS derived TEC data and the velocity information of the pierce points. If we assume that each of the pierce points are collecting samples from a uniform plane wave, then it seems possible that, by analyzing the shift in the measured TID frequency from multiple pierce points, we could identify a waveheading, wavelength, and frequency of the underlying wave.

In Figure 3.1, we diagram this process. We begin with the assumption that a uniform plane wave exists in the TEC field and that we can sample that field using GPS signals. In the feature extraction step, we assume that a transformation of the raw TEC samples would reduce the problem complexity. This transformation is the Fast Fourier Transform (FFT) of the

(29)

TEC samples. In this step, we also add the velocity information of the pierce points. Finally, we input the resulting feature vector into the neural network, which outputs an estimation of the TEC wave parameters. In the remainder of this chapter, we begin by discussing the methods used to generate our training set, followed by a description of the neural network which calculates our parameter estimations, and conclude with an evaluation of the training results. TEC Field (ˆkx,ˆky, λ, f) Sample Collection Feature Extraction Estimation (ˆk0 x,ˆk0y, λ0, f0)

Figure 3.1: Architecture used in TEC wave parameter estimation.

3.1 Generation of Training Data

We start by describing the assumptions placed on our simulated TEC field. First, assume that it is appropriate to model the ionosphere as an infinity thin shell, containing all the free electrons along a path between the receiver and a satellite. This is a common assumption used when calculating TEC samples from GPS signals and is discussed in Chapter 2. Using this shell model of the ionosphere, we claim that any small region of this shell can be approximated as a two-dimensional surface, with errors to this assumption increasing as the size of the surface increases. Furthermore, we assume that a TID is present in the ionosphere and that it can be modeled as a traveling wave, on this two-dimensional surface, with constant parameters over the sample region. Finally, we assume that our TEC samples contain noise and that the noise can be modeled as a white noise Gaussian process. Mathematically, this is

T EC(r, t) = cos(k_·r₋wt+ϕ) +n(t) (3.1) where r = [x, y], k = 2π

λ [ˆkx,kˆy]

T_, _t _{is the sample time in seconds,} _ϕ _{is a} constant phase offset, λ, f are the wavelength and frequency of the TEC wave, and n(t) is the Gaussian noise sample.

For each training example, we randomly select TEC wave parameters. We uniformly select a waveheading θ, in the range [0, 2π]. This θ is then

(30)

decomposed into ˆkx, ˆky as

[ˆkx,kˆy] = [Re(ejθ), Im(ejθ)] (3.2) where Re(·), Im(·), denote the real and imaginary part. The wavelength parameter λ is selected uniformly from the range [100, 400] km. The fre-quency term f is selected uniformly from the range [0.5, 4.5] mHz. The constant phase offset ϕ, is selected uniformly over the range [0, 2π]. Finally, the noise term was selected to have a fixed variance, such that the resulting Signal-to-Noise ratio (SNR) is uniformly random, in the range [6, 12] dB.

Next, we discuss the process we follow to generate TEC samples. For each training example, we construct a random TEC field. Over this field, we move a pierce point, according its initial position and velocity, across the field, sampling N times. We repeat this for theM pierce points, resulting inM x

N TEC samples. Next, we perform Root Mean Square (RMS) normalization by scaling the TEC values as

xRM S =x s N PN−1 i=0 x2i (3.3) whereN is the length of the TEC vector, for a single pierce point. Each of the

M pierce points have an associated velocity, with fixed components vx, vy. For a given pierce point, the velocity components are selected uniformly over the annulus shown in Figure 3.2. Once the pierce point velocity is selected, an initial position is chosen randomly. Using this location as a starting point, a position vector is generated using the pierce point velocity and the time vectort =T[0,1, ...,(N−1)], whereN is the number of samples andT = 30 seconds.

The next part of the data generation process involves the construction of a feature vector, which corresponds to a single training example, consisting of a TEC field with fixed parameters. We choose to supply the neural network with the FFT of the TEC samples, instead of the TEC samples directly, due to the prevalence of the Doppler information in the FFT output. If we assume that the TEC wave field has a single frequency component, then the sam-pled TEC values, once converted into frequency space, will show a frequency component which has been Doppler shifted as a function of the difference in the velocity vectors between the TEC wave and the pierce point. Therefore,

(31)

−200 −100 0 100 200 vx [m_s] −200 −150 −100 −50 0 50 100 150 200 vy [ m ]s Pierce Point Velocity Distribution 0 25 50 75 100 125 150 175 Counts

Figure 3.2: A 2D histogram of pierce point velocities used in simulation. for a given pierce point, we generate the magnitude of the FFT of the TEC samples. Additionally, we remove the negative frequency components. We do not lose any information doing this, as the TEC data is real valued and the negative frequency components are Hermitian symmetric. We append the v = [vx, vy] of the pierce point to this vector, such that the network is provided with the direction, speed and Doppler effects for a given pierce point. We repeat this process for each of the M pierce points, resulting in a feature vector as

F = [vppm, |X|pp_m : m∈(0, M −1)] (3.4)

where _|X_|pp

m is the magnitude of the non-negative frequency terms in the

FFT of the TEC samples for the mth _{pierce point in the set of} _M _pierce points. Assuming that we use an even-valued N, the length of our feature vector F is

len(F) = M(2 + N

2 + 1) (3.5)

For each training feature vector Fi, we have a corresponding solution vector

yi, consisting of the wave parameters y= [ˆkx,kˆy, λ, f].

Given the parameters we have established in this section, we can now verify that we have not violated the Nyquist criteria. We begin by calculating the Doppler shifted TID frequency experienced by a pierce point. We assume the source of the TID is stationary, the velocity of the pierce point is constant,

(32)

and the pierce point is sufficiently far enough away from the source such that the phase fronts of the plane wave are parallel over the collection area. Given this, we calculate the Doppler frequency as

fd=f(1− k

vppk

kc_k cos(θ)) (3.6)

where fd is the Doppler frequency, f is the frequency of the TID, vpp is the pierce point velocity vector, c = ˆkλf is the TID phase velocity vector, and

θ is the angle between the two vectors. In the one-dimensional case, where the pierce point is traveling in the negative direction of the TID wave, this can be simplified as

fd=f+

vpp

λ (3.7)

where vpp is the speed of the pierce point and λ is the wavelength of the TID wave. Therefore, from the perspective of the pierce points, the largest possible frequency will occur when the TEC wave frequency f is maximized (4.5 mHz), the velocity of the pierce point is maximized (200 m/s), and the wavelength of the TEC wave is minimized (100 km). This results in a Doppler shifted frequency of 6.5 mHz. The Nyquist criteria states that we must sample at twice the highest frequency, fN yquist = 13 mHz, which is less than the sampling rate of ₃₀1 = 33.33 mHz.

3.2 The Neural Network

The input layer of our neural network is populated by a feature vector F, which has a length of M(2 + N₂ + 1). Following this, we use seven hidden layers, each containing 200 neurons, followed by an output layer of width 4, corresponding to the estimate vector y. As described in Chapter 2, each neuron applies an activation function σ, to the dot product of the weight vector and the neuron’s input vector. Our model utilizes the Rectified Linear Unit (ReLU) activation function in the hidden layers. The ReLU function is defined as f(x) =    0, x <0 x, x ≥0 (3.8)

(33)

As we seek to estimate numbers in _R, the neurons in our final layer use a linear activation function, which was described in Chapter 2. The total number of trainable parameters are calculated as

p= (I+ 1)J+ (J+ 1)J(L₋1) + (J+ 1)O (3.9) whereI is the length of the input layer,J is the hidden layer width,L is the number of hidden layers, andOis the width of the output layer. We note that the +1 terms in Equation 3.9 are due to the neuron bias values. From this, we see that there are 268,604 trainable parameters when M = 4, N = 60, and 253,004 trainable parameters when M = 3 and N = 30. These do not vary much as we did not change the width of the hidden layers as a function of the feature vector length.

The backpropagation algorithm we use is the Adam algorithm, which was described in Chapter 2. We use the learning parameters from the original paper [Kingma and Ba, 2014]. Our loss value is calculated using mean-squared error (MSE). In Section 3.1, we describe the data generation process. Part of this process includes saving the TEC wave parameters in a solution vector yi = [ˆkx,ˆky, λ, f], for each training example. Our loss function sums the Mean Squared Error (MSE) of each output node. This creates an issue, as the wavelength parameter is measured in kilometers, while our frequency term is measured in mHz. This means that, for equal percentage errors, the MSE of the wavelength parameter will have a larger value than that of the frequency parameter. Due to this, we linearly map the wavelength and frequency solutions to the same range as the waveheading components ˆkx, ˆ

ky, which are in [-1, 1].

To formalize, our transformation g is a mapping from a range _A to a range _B or g : _{A 7→ B}. If _A is defined between [_Amin, _Amax], _B is defined between [_Bmin, _Bmax] and we wish to transform q _{∈ A} to p _{∈ B}, then our transformation is

p= (q_{− A}min)Bmax− Bmin

Amax− Amin +Bmin (3.10) After training is complete, we use the neural network to predict the wave parameters from new feature vectors. As the network is trained to output values between [-1, 1] for all four wave parameters, we will need to apply the

(34)

inverse of g to map the values back to their absolute ranges as

q= (p_{− B}min)Amax− Amin

Bmax− Bmin +Amin (3.11) To avoid storing large training files, feature vectors and the corresponding solution vectors are generated during the training process. To train the neural network, a batch of 200 training examples is constructed, then for each F, y

pair in the batch, the network forward propagates the feature vector through the network, resulting in an estimation vector h. The difference between the

y and h vectors are squared, and we calculate the mean of those squared errors. From each batch, an error gradient is calculated and the weights are updated using the backpropagation algorithm. This process continues until the network fails to improve it’s estimations for four consecutive epochs (a collection of 4000 batches). Finally, the trained neural network, or model, is saved to file.

3.3 Simulation Evaluation

To evaluate the simulation, we start by walking through the steps of a single example. The first step is the generation of a TEC field that contains a TID with fixed parameters [ˆkx, ˆky,λ,f]. Figure 3.3 shows an example noisy TEC field, at a particular time instance t[i].

After this, M pierce points are generated. Each pierce point has a random velocity vector, which is shown in Figure 3.4. We highlight the differences in velocities between the M pierce points and compare them with the velocity vector of the plane wave. Using the pierce point velocity vectors, we generate M position vectors and sample the TEC field at those locations, using the time vector t = T[0,1, .., N ₋1]. The resulting M, RMS normalized TEC sample vectors, are shown in Figure 3.5.

Next, we demonstrate the FFT portion of feature conversion. As described in Section 3.1, we have only retained the non-negative bins associated with the magnitude of the FFT. In Figure 3.6, we can see that the frequency of the TEC samples has been Doppler shifted due to variations in pierce point velocities, with respect to the plane wave velocity.

(35)

−300 −200 −100 0 100 200 300 x[km] −300 −200 −100 0 100 200 300 y [ k m ] TEC Field Y = [₋0.95,0.313,133.2km,2.97mHz]

Figure 3.3: Example TEC field generated during simulation.

−200 0 200 vy [ m ]s Ipp0 Ipp1 −200 0 200 vx[m_s] −200 0 200 vy [ m ]s Ipp2 −200 0 200 vx[m_s] Ipp3

Figure 3.4: Example velocity vectors of the M pierce points (black), compared with the velocity vector of the TEC plane wave (blue).

(36)

−2 0 2 T E CRM S Ipp0 −2 0 2 T E CRM S Ipp1 −2 0 2 T E CRM S Ipp2 0 10 20 30 40 50 60 t T [sec] −2 0 2 T E CRM S Ipp3

Figure 3.5: Example RMS normalized TEC samples corresponding to the M pierce points. 0 25 | X [ k ] | Ipp0 0 25 | X [ k ] | Ipp1 0 25 | X [ k ] | Ipp2 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 f[mHz] 0 25 | X [ k ] | Ipp3

Figure 3.6: Example |X|vectors. Here we have taken the magnitude of the FFT of the TEC samples for each of the M satellites and retained the non-negative frequency components of those signals. We note the Doppler shifted frequency components due to each pierce point having a unique

(37)

to evaluating the distributions of the feature and solution vectors created by the example generating function used in training and evaluation. In Figure 3.7, we look at the distributions of pierce point velocities and _|X_| vectors. In Figure 3.7, we have separated the vx and vy distributions, but the joint distribution of these variables create the uniformly sampled annulus shown in Figure 3.2. From Figure 3.7, we note that a sample period of T = 60 seconds may also have been appropriate, which would have reduced the com-putational complexity.

The next set of distributions we investigate are the distributions associated with the TEC plane wave, as shown in Figure 3.8. Here, we see that ˆkx and ˆ

ky take on the expected distribution of thexandycomponents of unit circle, which is sampled uniformly with respect toθ. The distributions of the wave-length and frequency components show the expected uniform distributions over the ranges [100, 400] km and [0.5, 4.5] mHz, respectively.

After reviewing the distributions of the elements in the feature vectors and the parameters of the simulated TID, we evaluate the output of the neural network. To start, we generate 4,000 random examples and generate estimations using the trained neural network. In Figure 3.9, we show those results in four figures, with each figure representing one of the parameters. For each example, we plot the true TEC wave parameter on the x-axis and the estimated parameter value on they-axis. A perfect estimation of the wave parameters would be represented as diagonal line and is shown in blue. From this plot we can see that the neural network struggles to correctly estimate wavelengths above 350 km. Additionally, it is clear that the the relative variance of the wavelength parameter is larger than that of the frequency parameter.

An issue with the plot shown in Figure 3.9 is that we lose the conditional information on frequency and wavelength. For example, we cannot tell if the variance of the wavelength estimation changes as a function of the TEC wave’s frequency. In response to this, we extend the estimation analysis by evaluating the four statistics in Table 3.1, which are functions of the TEC plane wave’s wavelength and frequency. For each statistic, we follow the steps described in Section 3.1, with the sole exception being that we fix the wavelength and frequency so that we can run 500 iterations of each wavelength/frequency pair. We note that the waveheading is still randomized for each example.

(38)

−200 0 200 vx[m_s] Counts −200 0 200 vy[m_s] 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 f[mHz] 2 4 6 8 10 E [ | X [ k ]| ]

Figure 3.7: Distributions of feature vector elements [vx, vy,|X|].

−1.0 −0.5 0.0 0.5 1.0 ˆ kx Sample Counts −1.0 −0.5 0.0 0.5 1.0 ˆ ky 100 200 300 400 λ[km] Sample Counts 1 2 3 4 f [mHz]

(39)

−1 0 1 ˆ kx −1.0 −0.5 0.0 0.5 1.0 ˆk 0 x −1 0 1 ˆ ky −1.0 −0.5 0.0 0.5 1.0 ˆk 0 y 100 200 300 400 λ[km] 100 150 200 250 300 350 400 λ 0 [k m ] 2 4 f [mHz] 1 2 3 4 f 0 [mH z ]

Figure 3.9: Scatter plots representing the true wave parameter vs. the predicted wave parameter for [ˆkx, ˆky, λ, f]. Here we have used 4,000

independent examples with N = 60 TEC samples and M = 4 pierce points. We have superimposed the line Yi =Yi0 for clarity.

We start with the waveheading parameters. For the ˆkx, ˆky parameters, we only display the MSE statistics. We found the mean and variance of these parameters to be independent of the TEC wave’s wavelength and frequency. The MPE statistic was not useful for these parameters due to the denomina-tor in the MPE calculation containing values near zero – as these parameters range from [-1, 1]. The MSE of the waveheading parameters are shown in Figure 3.10. In these plots, we can see that the waveheading estimations have similar error values, with both being maximum when wavelength and frequency are minimum. We believe this is due to pierce points having speeds larger than the phase speed of the wave. In general, we see that the MSE, for the waveheading parameters, is less than 0.1 for phase speeds (λf) above 100 m/s.

(40)

Table 3.1: A summary of the metrics used to evaluate the neural network estimations in Figures 3.10-3.12.

Metric Description

Estimation Mean This statistic is calculated in the absolute range of the variable. This is an approxi-mation of E[g( ˆYi)_|λ = λ,f = f]. Where g

is the mapping described in Equations 3.10-3.11, which transforms the outputs to their absolute ranges.

Estimation Variance This statistic is calculated on the variables while in the range [-1, 1] so that we can mean-ingfully compare the results between differ-ent variables. This is an approximation of

Var( ˆYi|λ=λ,f=f).

Mean Percentage Error We calculate Mean Percent Error (MPE) on the variables in their absolute range. We do this because in the normalized range of [-1, 1], we would be dividing by numbers near zero. We calculate this as100

A PA n=1 ˆ Yi[n]−Yi[n] Yi[n] .

Mean Squared Error As with the variance metric, we calculate this while the estimated variables are in the range [-1, 1] and is calculated as _A1 PA_n₌₁( ˆYi[n]−

Yi[n])2

parameter. In the mean plot, we see a gradient which is almost independent of frequency. We note however, that for low frequencies, the wavelength parameter is underestimated. We can also see that the wavelength is also underestimated for wavelengths near 400 km, which is also visible in Figure 3.9. The variance plot is maximized when the speed of the TID is minimum and we also note that the wavelength variance is much larger, in general, than the frequency estimations. The MPE clearly identifies the negative bias of the estimator for large wavelengths. There is also a positive bias for short wavelengths, but this is much more frequency dependent than the bias for large wavelengths. The MSE plot shows that we struggle to correctly estimate wavelength when the frequency is near 0.5 mHz, though this is most dramatic at the edges of the wavelength range. In the MSE plot we can also see the impact of the negative bias at large wavelengths.

In Figure 3.12, we analyze the frequency estimation. In the plot of mean estimates, we see a gradient which is nearly independent of wavelength. The

(41)

1 2 3 4 f [mHz] 100 200 300 400 λ [ k m ] ˆ kx MSE 0.05 0.10 0.15 0.20 0.25 0.30 1 2 3 4 f [mHz] 100 200 300 400 λ [ k m ] ˆ ky MSE 0.05 0.10 0.15 0.20 0.25 0.30

Figure 3.10: MSE of the estimated ˆkx and ˆky parameters. Each statistic is calculated on 500 samples, for a given wavelength and frequency.

1 2 3 4 f [mHz] 100 200 300 400 λ [ k m ] λ Mean 150 200 250 300 1 2 3 4 f [mHz] 100 200 300 400 λ [ k m ] λ Variance 0.025 0.050 0.075 0.100 0.125 0.150 0.175 1 2 3 4 f [mHz] 100 200 300 400 λ [ k m ] λ MPE −20 −10 0 10 20 30 40 50 1 2 3 4 f [mHz] 100 200 300 400 λ [ k m ] λMSE 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

Figure 3.11: Simulation evaluation of wavelength estimation, using the statistics in Table 3.1.

variance plot shows us that the estimator is most variable at short wave-lengths. We do not see the same issue with slow moving waves as we did in the wavelength variance plot in Figure 3.11. We note that the scale factor of variance in frequency estimation is nearly an order of magnitude improved

(42)

when compared with wavelength. The MPE plot shows us that the frequency estimation has little bias, with MPE<10%, except at low-frequencies, where slow, low frequency waves have a positive bias of approximately 70%. This is also noticeable in Figure 3.9, where our estimates tend to cluster above the line, at low frequencies. The MSE plot for frequency follows that of the variance plot, with the addition of the bias seen in the MPE plot at low frequencies. 1 2 3 4 f [mHz] 100 200 300 400 λ [ k m ] f Mean 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 2 3 4 f [mHz] 100 200 300 400 λ [ k m ] f Variance 0.01 0.02 0.03 0.04 1 2 3 4 f [mHz] 100 200 300 400 λ [ k m ] f MPE 0 20 40 60 80 100 1 2 3 4 f [mHz] 100 200 300 400 λ [ k m ] f MSE 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

Figure 3.12: Simulation evaluation of frequency estimation, using the statistics in Table 3.1.

In Figures 3.10-3.12, we presented statistics calculated on distributions of estimated parameters. Though this is useful in describing the estimator, it abstracts the shape of the distributions within each bin. In Figure 3.13, we fix the simulated TID to have a wavelength of 300 km and the frequency to be 2 mHz. We then generate 25,000 example TIDs, randomizing the waveheading for each example, and plot a histogram of the estimated parameters. First, we notice that the distribution of estimated ˆkxand ˆky appear as noisy versions of the expected distributions (see Figure 3.8). On the wavelength and frequency

(43)

distributions, we have superimposed vertical lines at the truth locations. We notice that the wavelength and frequency distributions are non-Gaussian and that the relative variance of the wavelength parameter is significantly larger than the frequency parameter. The average estimated wavelength is 308.6 km, and the average frequency estimation is 2.04 mHz.

−1.0 −0.5 0.0 0.5 1.0 ˆ k0_x Counts −1.0 −0.5 0.0 0.5 1.0 ˆ k_y0 100 200 300 400 λ0 [km] Counts 1 2 3 4 f0 [mHz]

Figure 3.13: The distribution of estimated parameters on a wave with fixed wavelength and frequency. 25,000 trials are estimated. The waveheading is randomized for each trial, and the distribution of ˆkx and ˆky would be expected to match the distribution shown in Figure 3.7. We have marked the true wavelength and frequency as a vertical line.

Up to this point, we have assumed that a TID is present, with an SNR between 6 and 12 dB. In real data, there will be periods where this is not the case. It was originally hoped that we could detect a TID by observing the variance of the output, with the assumption that the estimated parameters would vary more or less uniformly in the presence of pure noise. To test this, for each trial, we randomly generated velocity vectors for M pierce points in the same manner as described in Section 3.1, but instead of sampling a TEC field, we generate TEC samples from a Gaussian white noise vector which is

(44)

RMS normalized. Figure 3.14 shows the results of this noise input into the neural network using 5,000 trials. Unfortunately, the output of the neural network is not uniformly random in the presence of white noise. This means that we cannot simply observe the variance of the output as our detection scheme.

Going one step further, we evaluate the prediction distributions over a range of SNR values, showing the progression from a high SNR state to a low SNR state. We provide this analysis in Figure 3.15, where we evaluate the distributions at the SNR values of [15, 5, 0, -5, -15].

−1.0 −0.5 0.0 0.5 1.0 ˆ k_x0 Counts −1.0 −0.5 0.0 0.5 1.0 ˆ k_y0 100 200 300 400 λ0 [km] Counts 1 2 3 4 f0 [mHz]

Figure 3.14: The distribution of estimated parameters when TEC samples are replaced by a Gaussian white noise vector. 25,000 trials were estimated. The pierce point velocities used in the feature vectors were generated

(45)

100 200 300 400 Counts SNR 15 dB. 1 2 3 4 SNR 15 dB. 100 200 300 400 Counts SNR 10 dB. 1 2 3 4 SNR 10 dB. 100 200 300 400 Counts SNR 5 dB. 1 2 3 4 SNR 5 dB. 100 200 300 400 Counts SNR 0 dB. 1 2 3 4 SNR 0 dB. 100 200 300 400 Counts SNR -5 dB. 1 2 3 4 SNR -5 dB. 100 200 300 400 Counts SNR -10 dB. 1 2 3 4 SNR -10 dB. 100 200 300 400 λ0_[_km_] Counts SNR -15 dB. 1 2 3 4 f0 [mHz] SNR -15 dB.

Figure 3.15: The progression of wavelength and frequency estimation distribution as SNR decreases from 15 to -15 dB. We use the wavelength and frequency parameters as in Figure 3.13 and superimpose those values as a vertical line on each plot.

(46)

CHAPTER 4

CASE STUDY

On March 11, 2011, at 05:46 UTC, a magnitude 9.0 earthquake occurred off the northeast coast of Honoshu, Japan. This earthquake created a tsunami which traversed the Pacific ocean, reaching the West Coast of the United States in approximately 10 hours [Dunbar et al., 2011]. The tsunami wave has been shown to have created a TID which was detectable in TEC data obtained using GPS signals [Azeem et al., 2017]. In this chapter, we at-tempt to identify this event using the trained neural network described in Chapter 3. We begin by discussing our data, which was obtained from the Continuously Operating Reference Station (CORS) dataset, then we describe the pre-processing steps performed on that data and finally, we present our results.

4.1 Receiver Locations

The CORS network is a set of GPS receivers with fixed and known positions that are continuously capturing and storing GPS data. From this network of receivers, we obtain the pseudo-range and carrier-phase information needed to calculate TEC, as described in Chapter 2, with a sample period of 30 seconds. In this chapter, we analyze two receiver sets labeled “West” and “Southwest.” The “West” receiver set is the set of all CORS receivers which are west of -102◦ longitude and contains 392 receivers. The “Southwest” receiver set is a set of nine CORS receivers near San Francisco, California. Additionally, we provide an analysis of an individual receiver in the “South-west” receiver set. The receiver is identified as “P198” in the CORS database. We display the receiver locations for each of these sets in Figure 4.1.

(47)

(a)

West Receiver Set

(b)

San Francisco P198

Southwest Receiver Set

Figure 4.1: (a) The locations of the 392 receivers in the “West” receiver set. (b) The locations of the 9 receivers in the “Southwest” receiver set. We identify the position of receiver “P198” in the “Southwest” receiver set, which we also inspect individually.

4.2 Data Pre-processing

To begin, pseudo-range and carrier phase information is gathered from the CORS system and TEC is calculated for each of the (receiver, satellite) pairs. After this, the TEC data is bandpass filtered, with a passband of [5, 25] minutes. This process is completed for samples between 10:00 and 23:00

(48)

UTC, on March 11, 2011, the day of the Tohoku tsunami. A control set is gathered for comparison, using samples between 10:00 and 23:00 UTC, on March 10, 2011.

As described in Chapter 3, we generate an estimation for the TID wave parameters, [ˆkx,ˆky, λ, f], using a single receiver and M satellites. Each esti-mation corresponds to a specific receiver and a time index. To generate an estimation, we first construct a feature vector which requires N TEC sam-ples from each of theM pierce points and pierce point velocity components, (vx, vy), for each of theM pierce points. From the individual TEC vectors, we calculate the magnitude of the FFT, keeping only the non-negative frequency bins.

The positional information of the pierce points is initially in (latitude, longitude) coordinates. We transform the latitude and longitude of a pierce point to a two-dimensional (x, y) plane. Thex coordinate is defined as

x=a(lonIpp−lonrx)cos(latIpp) (4.1)

where lonrx is the longitude of the receiver, in radians, a is the distance from the center of the earth to the height of the pierce point, and (latIpp,

lonIpp) correspond to the pierce point location in radians. The y coordinate

is calculated as

y =a(latIpp−latrx) (4.2)

Next, we identify the pierce point velocities, which is performed as

vi = 1

T[(xi−xi−1), (yi−yi−1)] (4.3)

where xi is the x position of a pierce point, at time index i, and T is the sample period. We choose the mean of the velocity vector over the collection period as the (vx, vy) values for a pierce point during an estimation.

Our neural network is trained under the assumption that the velocity of the pierce point is constant during the collection period. In reality, from the perspective of the receiver, the change in pierce point velocity is maximized at low elevation angles and minimized near zenith. Therefore, to match our training simulation, it is preferable to use pierce points with high elevation angles. Additionally, by using large elevation angles, we sample a smaller area of the ionosphere, which increases the likelihood that the parameters

(49)

of a TID are constant over that region. A fixed elevation angle inscribes a circle on the ionosphere shell, forming a cap. By choosing the largest possible elevation angle, we minimize the radius of this circle, however, the elevation angle is constrained by the requirement that we maintain at least

M pierce points above the specified elevation angle for the duration of the

N sample collection period. In Figure 4.2, we plot the number of receivers which maintain at least M = 3 pierce points, with an elevation angle above 45◦, during the collection of N = 60 samples. This figure shows that an elevation angle of 45◦ results in a large number of unusable receivers at any given time. We also note that we will not be able to use M = 4 pierce points during an estimation as this would also restrict the number of usable receivers. Due to this, for the remainder of our analysis of the Tohoku event, we use an elevation angle threshold of 35◦_, _M _{= 3, and} _N _{= 60.}

10:30 12:00 13:30 15:00 16:30 18:00 19:30 21:00 HH:MM UTC 0 50 100 150 200 250 300 350 Receiv ers Count

West Receiver Set March 11, 2011 45◦_{Elevation Angle}

Figure 4.2: Using the “West” receiver set, M = 3, and N = 60, we plot the number of receivers with at least M pierce points above an elevation angle of 45◦, during the collection of N samples. We note that, at this elevation angle, there is a significant number of receivers with an insufficient number of pierce points. Requiring four pierce points would further reduce the number of usable receivers for a given time instance.

The neural network is not designed to be able to detect the presence of a TID. As such, we require a TID detection test. Our TID detection process is similar to the method used in [HernndezPajares et al., 2006]. To detect a

(50)

TID, we begin by calculating the FFT of the TEC signals, from each of the

M pierce points, without RMS normalization. Once the FFT operation has been performed, we threshold the magnitude of the positive frequency bins using a value of 0.2 TECU. If any of the M TEC signals contain a mode above this threshold, we declare that a TID is present.

4.3 Results

To begin, we display the results of the TID detection process on the “West” receiver set, for both March 10 (control day) and March 11 (event day). For each time index, we partition the receivers into four states. The first state is the “<M” state. This is determined by counting the number of receivers which have at least M = 3 pierce points that remain above an elevation angle of 35◦ for the N = 60 sample collection period. After this, we count the receivers with either, invalid TEC values, or invalid pierce point locations, during the collection period. The receivers which contain invalid data over the collection period are counted in the “NaN” receiver state. Next, on the remaining receivers, we perform the TID detection test. Receivers are counted in the “TID” or “No TID” as a result of this test. We partition the receivers into these four states and present this as a stacked plot, for both the control day (March 10) and the event day (March 11), in Figure 4.3.

Given that GPS satellites orbit once in approximately 12 hours, the simi-larities in the “<M” states between March 10 and March 11 is expected, as this is caused by the positions of the satellites. On our control day (March 10), we see that fewer than 20 receivers entered the “TID” state at any given time between [10:00, 23:00] UTC. On the day of the tsunami (March 11), we see more than 200 receivers in the “TID” state between [15:45, 17:15] UTC, which corresponds to the tsunami arrival time as reported in Dunbar et al.

[2011].

Next, we analyze the distributions of the estimations which result from the “West” receiver set for the days March 10 and March 11, of 2011. We show these distributions in Figure 4.4. We note that Figure 4.4 does not utilize the TID detection test, instead, we simply count the binned estimations for all receivers, for all time samples.

(51)

References

Download now ( PDF - 61 Page - 3.00 MB )

Outline

CONCLUSION