Random variables and signals - Filtering and System Identification

After studying this chapter you will be able to

• define random variables and signals;

• describe a random variable by the cumulative distribution func-tion and by the probability density funcfunc-tion;

• compute the expected value, mean, variance, standard devia-tion, correladevia-tion, and covariance of a random variable;

• define a Gaussian random signal;

• define independent and identically distributed (IID) signals;

• describe the concepts of stationarity, wide-sense stationarity, and ergodicity;

• compute the power spectrum and the cross-spectrum;

• relate the input and output spectra of an LTI system;

• describe the stochastic properties of linear least-squares esti-mates and weighted linear least-squares estiesti-mates;

• solve the stochastic linear least-squares problem; and

• describe the concepts of unbiased, minimum-variance, and maximum-likelihood estimates.

4.1 Introduction

In Chapter 3 the response of an LTI system to various deterministic signals, such as a step input, was considered. A characteristic of a deter-ministic signal or sequence is that it can be reproduced exactly. On the other hand, a random signal, or a sequence of random variables,

cannot be exactly reproduced. The randomness or unpredictability of the value of a certain variable in a modeling context arises generally from the limitations of the modeler in predicting a measured value by applying the “laws of Nature.” These limitations can be a consequence of the limits of scientific knowledge or of the desire of the modeler to work with models of low complexity. Measurements, in particular, introduce an unpredictable part because of their finite accuracy.

There are excellent textbooks that cover a formal treatment of random signals and the filtering of such signals by deterministic systems, such as Leon-Gracia (1994) and Grimmett and Stirzaker (1983). In this chapter a brief review is made of the necessary statistical concepts to understand the signal-analysis problems treated in later chapters.

The chapter is organized as follows. In Section 4.2 we review ele-mentary concepts from probability theory that are used to characterize a random variable. Only continuous-valued random variables are con-sidered. In Section 4.3 the concept and properties of random signals are discussed. The study of random signals in the frequency domain through power spectra is the topic of Section 4.4. Section 4.5 concludes the chap-ter with an analysis of the properties of linear least-squares estimates in a stochastic setting. Throughout this chapter the adjectives “random”

and “stochastic” will both be used to indicate non-determinism.

4.2 Description of a random variable

The deterministic property is an ideal mathematical concept, since in real-life situations signals and the behavior of systems are often not pre-dictable exactly. An example of an unprepre-dictable signal is the accelera-tion measured on the wheel axis of a compact car. Figure 4.1 displays three sequences of the recorded acceleration during a particular time interval when a car is driving at constant speed on different test tracks.

The nondeterministic nature of these time records stems from the fact that there is no prescribed formula to generate such a time record syn-thetically for the same or a different road surface. A consequence of this nondeterministic nature is that the recording of the acceleration will be different when it is measured for a different period in time with the same sensor mounted at the same location, while the car is driving at the same speed over the same road segment. Artificial generation of the acceler-ation signals like the ones in Figure 4.1 may be of interest in a road simulator that simulates a car driving over a particular road segment for an arbitrary length of time. Since these signals are nondeterministic,

4.2 Description of a random variable 89

0 200 400 600 800 1000 1200 1400 1600 1800 2000

10 5 0 5 10

highway

0 200 400 600 800 1000 1200 1400 1600 1800 2000

10 5 0 5 10

pothole

0 200 400 600 800 1000 1200 1400 1600 1800 2000

10 5 0 5 10

cobblestone

number of samples

Fig. 4.1. Real-life recordings of measurements of the accelerations on the rear wheel axis of a compact-size car driving on three different road surfaces. From top to bottom: highway, road with a pothole, and cobblestone road.

generating exactly the same signals is not possible. However, we might not be interested in the exact reproduction of the recorded acceleration sequence. For example, in evaluating the durability properties of a new prototype vehicle using the road simulator, we need only a time sequence that has “similar features” to the original acceleration signals. An exam-ple of such a feature is the samexam-ple mean of all the 2000 samexam-ples of each time record in Figure 4.1. Let the acceleration sequence at the top of Figure 4.1 be denoted by x(k), with k = 0, 1, 2, . . . The sample mean mx

is then defined as

mx= 1

2000

1999

k=0

x(k). (4.1)

For that purpose it is of interest first to determine features from time records acquired in real life and then develop tools that can generate signals that possess the same features. Such tools are built upon notions and instruments introduced in statistics.

4.2.1 Experiments and events

The unpredictable variation of a variable, such as the height of a boy at the age of 12, is generally an indication of the randomness of that variable.

Determining the qualitative value of an object, such as its color or taste, or the quantitative value of a variable, such as its magnitude, is called the outcome of an experiment. A random experiment is an exper-iment in which the outcome varies in an unpredictable manner. The set S of all possible outcomes is called the sample space and a subset of S is called an event.

Example 4.1 (Random experiment) The height of a boy turning 12 years old is a variable that cannot be predicted beforehand. For a particular boy in this class, we can measure his height and the sample space is R⁺. The heights of boys in this class, in a prescribed interval of R⁺, constitute an event.

The outcome of an experiment can also refer to a particular qualitative feature, for example, the color of a ball. If we add an additional rule that assigns a real number to each element of the qualitative sample space, the number that is assigned by this rule is called a random variable.

Example 4.2 (Random variable) This year’s students who take the exam of this course can pass (P) or fail (F). We design an experiment in which we arbitrarily (randomly) select three students from the group of students who take the exam this year. The sample space that corresponds to this experiment is

S = {PPP, PPF, PFP, PFF, FPP, FPF, FFP, FFF}.

The number of passes in each set of three students is a random variable.

It assigns a number to each outcome s ∈ S. This number is a random variable that can be described as “the number of students who pass the exam, out of a group of three arbitrarily selected students who take the course this year.”

4.2.2 The probability model

Suppose that we throw three dice at a time and do this n times. Let Ni(n) for i = 1, 2, . . . , 6 be the number of times the outcome is a die with i spots. Then the relative frequency that the outcome is i is defined

4.2 Description of a random variable 91

60 70 80 90 100 110 120

0.1 0.2 0.3

frequenycRelative

Number of throws

Fig. 4.2. The relative frequency of the number of eyes when throwing three similar dice; for one (solid line), three (dashed line), and five (dotted line) spots.

fi(n) = Ni(n)

n . (4.2)

The limit

n→∞lim fi(n) = pi,

if it exists, it is called the probability of the outcome i.

Example 4.3 (Relative frequencies) The above experiment of throw-ing three dice is done 120 times by a child’s fair hand. The relative frequency defined in (4.2) is plotted in Figure 4.2 for respectively one, three, and five spots, that is i = 1, 3, and 5. It is clear that, for i = 3, the relative frequency approaches 1/6 ≈ 0.167; however, for the other values of i this is not the case. This may be a sign that either one of the dice used is “not perfect” or the child’s hand is not so fair after all.

Formally, let s be the outcome of a random experiment and let X be the corresponding random variable with sample spaces S and SX, respectively. A probability law for this random variable is a rule that assigns to each event E a positive number Pr[E], called the probability of E, that satisfies the following axioms

∩ B^c A ∩ B A^c∩ B

Fig. 4.3. The decomposition of event A ∪ B into three disjoint events.

(i) Pr[E] ≥ 0, for every E ∈ SX.

(ii) Pr[SX] = 1, for the certain event SX.

(iii) For any two mutually exclusive events E1 and E2, Pr[E1∪ E²] = Pr[E1] + Pr[E2].

These axioms allow us to derive the probability of an event from the already-defined probabilities of other events. This is illustrated in the following example.

Example 4.4 (Derivation of probabilities) If the probabilities of events A, B, and their intersection A ∩ B are defined, then we can find the probability of A ∪ B as

Pr[A ∪ B] = Pr[A] + Pr[B] − Pr[A ∩ B].

To see this, we decompose A ∪ B into three disjoint sets, as displayed in Figure 4.3. In this figure each set represents an event. Let A^cdenote the complement of A, that is S without A: A ∪ A^c= S, and let B^c denote the complement of B. We then have

Pr[A ∪ B] = Pr[A ∩ B^c] + Pr[B ∩ A^c] + Pr[A ∩ B], Pr[A] = Pr[A ∩ B^c] + Pr[A ∩ B],

Pr[B] = Pr[A^c∩ B] + Pr[A ∩ B].

From this set of relations we can easily find the desired probability of Pr[A ∪ B].

4.2 Description of a random variable 93 The probability of a random variable is an important means to char-acterize its behavior. The empirical way to determine probabilities via relative frequencies is a cumbersome approach, as illustrated in Exam-ple 4.3. A more systematic approach based on counting methods can be used, as is illustrated next.

Example 4.5 (Derivation of probabilities based on counting methods) An urn contains four balls numbered 1 to 4. We select two balls in succession without putting the selected balls back into the urn.

We are interested in the probability of selecting a pair of balls for which the number of the first selected ball is smaller than or equal to that of the second.

The total number of distinct ordered pairs is 4 · 3 = 12. From these, only six ordered pairs have their first ball with a number smaller than that of the second one; thus the probability of the event is 6/12 = 1/2.

In the above example, the probability would change if the selection of the second ball were preceded by putting the first selected ball back into the urn. So the probability of an event B may be conditioned on that of another event A that has already happened. This is denoted by Pr[B|A]. According to Bayes’ rule, we can express this probability as

Pr[B|A] =Pr[A ∩ B]

Pr[A] . (4.3)

If the two events are independent, then we know that the probability of B is not affected by whether we know that event A has happened or not.

In that case, we have Pr[B|A] = Pr[B] and, according to Bayes’ rule, Pr[A ∩ B] = Pr[A] Pr[B].

Instead of assigning probabilities by counting methods or deriving them from basic axioms via notions from set theory, a formal way to assign probabilities is via the cumulative distribution function. In this chapter we will consider such functions only for random variables that take continuous values. Similar concepts exist for discrete random vari-ables (Leon-Gracia, 1994).

Definition 4.1 (Cumulative distribution function) The cumulative distribution function (CDF) FX(α) of a random variable X yields the probability of the event {X ≤ α}, which is denoted by

FX(α) = Pr[X ≤ α], for − ∞ < α < ∞.

The axioms of probability imply that the CDF has the following prop-erties (Leon-Gracia, 1994).

(i) 0 ≤ FX(α) ≤ 1.

(ii) lim

α→∞FX(α) = 1.

(iii) lim

α→−∞FX(α) = 0.

(iv) FX(α) is a nondecreasing function of α:

FX(α) ≤ FX(β) for α < β.

(v) The probability of the event {α < X ≤ β} is given by Pr[α < X ≤ β] = F^X(β) − F^X(α).

(vi) The probability of the event {X > α} is Pr[X > α] = 1 − FX(α).

Exercise 4.1 on page 122 requests a proof of the above properties.

The cumulative distribution function is a piecewise-continuous func-tion that may contain jumps.

Another, more frequently used, characterization of a random variable is the probability density function (PDF).

Definition 4.2 (Probability density function) The probability den-sity function (PDF) fX(α) of a random variable X, if it exists, is equal to the derivative of the cumulative distribution function FX(α), which is denoted by

fX(α) = dFX(α) dα .

The CDF can be obtained by integrating the PDF:

FX(α) =

−∞

fX(β)dβ.

The PDF has the property fX(α) ≥ 0 and

∞

−∞

fX(α)dα = 1.

We can derive the probability of the event {a < X ≤ b} by using Pr[a < X ≤ b] =

b a

fX(α)dα.

4.2 Description of a random variable 95 4.2.3 Linear functions of a random variable

Consider the definition of a random variable Y in terms of another ran-dom variable X as

Y = aX + b,

where a ∈ R is a positive constant and b ∈ R. Let X have a CDF, denoted by FX(α), and a PDF, denoted by fX(α). We are going to determine the CDF and PDF of the random variable Y .

The event {Y ≤ β} is equivalent to the event {aX + b ≤ β}. Since a > 0, the event {aX + b ≤ β} can also be written as {X ≤ (β − b)/a}

and thus

FY(β) = Pr

X ≤ β − b a

= FX

β − b a

Using the chain rule for differentiation, the PDF of the random variable Y is equal to

fY(β) = 1 afX

β − b a

4.2.4 The expected value of a random variable

The CDF and PDF fully specify the behavior of a random variable in the sense that they determine the probabilities of events corresponding to that random variable. Since these functions cannot be determined experimentally in a trivial way, in many engineering problems the speci-fication of the behavior of a random variable is restricted to its expected value or to the expected value of a function of this random variable.

Definition 4.3 (Expected value) The expected value of a random variable X is given by

E[X] =

∞

−∞

αfX(α)dα.

The expected value is often called the mean of a random variable or the first-order moment. Higher-order moments of a random variable can also be obtained.

Definition 4.4 The nth-order moment of a random variable X is given by

E[Xⁿ] =

∞

−∞

αⁿfX(α)dα.

A useful quantity related to the second-order moment of a random variable is the variance.

Definition 4.5 (Variance) The variance of a random variable X is given by

var[X] = E

-(X − E[X])². .

Sometimes the standard deviation is used, which equals the square root of the variance:

std[X] = var[X]^1/2.

The expression for the variance can be simplified as follows:

var[X] = E

-X²− 2E[X]X + E[X]².

= E[X²] − 2E[X]E[X] + E[X]²

= E[X²] − E[X]².

This shows that, for a zero-mean random variable (E[X] = 0), the vari-ance equals its second-order moment E[X²].

4.2.5 Gaussian random variables

Many natural phenomena involve a random variable X that is the con-sequence of a large number of events that have occurred on a minuscule level. An example of such a phenomenon is measurement noise due to the thermal movement of electrons. When the random variable X is the sum of a large number of random variables, then, under very general conditions, the law of large numbers (Grimmett and Stirzaker, 1983) implies that the probability density function of X approaches that of a Gaussian random variable.

Definition 4.6 A Gaussian random variable X is a random variable that has the following probability density function:

fX(α) = 1

√2πσexp

−(α − m)² 2σ²

, −∞ < α < ∞, where m ∈ R and σ ∈ R⁺.

Gaussian random variables are sometimes also called normal random variables. A graph of the PDF is given in Figure 4.4.

4.2 Description of a random variable 97

5 0 5

0 0.2 0.4 0.6

fX(a)

Fig. 4.4. The probability density function fX(α) of a Gaussian random ran-dom variable for m = 0, σ = 0.5 (solid line) and for m = 0, σ = 1.5 (dashed line).

The PDF of a Gaussian random variable is completely specified by the two constants m and σ. These constants can be obtained as

E[X] = m, var[X] = σ².

You are asked to prove this result in Exercise 4.2 on page 122. Since the PDF of a Gaussian random variable is fully specified by m and σ, the following specific notation is introduced to indicate a Gaussian random variable X with mean m and variance σ²:

X ∼ (m, σ²). (4.4)

4.2.6 Multiple random variables

It often occurs in practice that, in a single engineering problem, several random variables are measured at the same time. This may be an indi-cation that these random variables are related. The probability of events that involve the joint behavior of multiple random variables is described by the joint cumulative distribution function or the joint probability density function.

Definition 4.7 (Joint cumulative distribution function) The joint cumulative distribution function of two random variables X1 and X2 is

defined as

FX1,X2(α1, α2) = Pr[X1≤ α¹and X2≤ α²].

When the joint CDF of two random variables is differentiable, then we can define the probability density function as

fX1,X2(α1, α2) = ∂²

∂α1 ∂ α2

FX1,X2(α1, α2).

With the definition of the joint PDF of two random variables, the expectation of functions of two random variables can be defined as well.

Two relevant expectations are the correlation and the covariance of two random variables. The correlation of two random variables X1 and X2

RX1,X2 = E[X1X2] =

∞

−∞

∞

−∞

α1α2fX1,X2(α1, α2)dα1 dα2. Let mX1 = E[X1] and mX2 = E[X2] denote the means of the random variables X1and X2, respectively. Then the covariance of the two ran-dom variables X1and X2is

CX1,X2 = E[(X1− mX1)(X2− mX2)]

= RX1,X2− mX1mX2.

On the basis of the above definitions for two random variables, we can define the important notions of independent, uncorrelated, and orthog-onal random variables.

Definition 4.8 Two random variables X1 and X2 are independent if fX1,X2(α1, α2) = fX1(α1)fX2(α2),

where the marginal PDFs are given by fX1(α1) =

∞

−∞

fX1,X2(α1, α2)dα2, fX2(α2) =

∞

−∞

fX1,X2(α1, α2)dα1.

Definition 4.9 Two random variables X1 and X2 are uncorrelated if E[X1X2] = E[X1]E[X2].

This definition can also be written as

RX1,X2 = mX1mX2.

4.2 Description of a random variable 99 Therefore, when X1 and X2 are uncorrelated, their covariance equals zero. Note that their correlation RX1,X2can still be nonzero. Exercise 4.3 on page 122 requests you to show that the variance of the sum of two uncorrelated random variables equals the sum of the variances of the individual random variables.

Definition 4.10 Two random variables X1 and X2 are orthogonal if E[X1X2] = 0.

Zero-mean random variables are orthogonal when they are uncorrelated.

However, orthogonal random variables are not necessarily uncorrelated.

The presentation for the case of two random variables can be extended to the vector case. Let X be a vector with entries Xi for i = 1, 2, . . . , n that jointly have a Gaussian distribution with mean equal to:

mX =



 E[X1]

... E[Xn]





and covariance matrix CX equal to

CX=







CX1,X1 CX1,X2 · · · CX1,Xn

CX2,X1 CX2,X2 CX2,Xn

... ... . .. ... CXn,X1 CXn,X2 . . . CXn,Xn





,

then the joint probability density function is given by fX(α) = fX1,X2,...,Xn(α1, α2, . . . , αn)

= 1

(2π)^n/2 det(CX)^1/2exp

−1

2(α − m^X)^TC_X⁻¹(α − m^X)

, (4.5) where α is a vector with entries αi, i = 1, 2, . . . , n. A linear transfor-mation of a Gaussian random vector preserves the Gaussianity (Leon-Gracia, 1994). Let A be an invertible matrix in R^n×nand let the random vectors X and Y , with entries Xi and Yi for i = 1, 2, . . . , n, be related by

Y = AX.

Then, if the entries of X are jointly Gaussian-distributed random vari-ables, the entries of the vector Y are again jointly Gaussian random variables.

4.3 Random signals

A random signal or a stochastic process arises on measuring a random variable at particular time instances, such as the acceleration on the wheel axis of a car. Such discrete-time records were displayed in Fig-ure 4.1 on page 89. In the example of FigFig-ure 4.1, we record a different sequence each time (each run) we drive the same car under equal cir-cumstances (that is, with the same driver, over the same road segment, at the same speed, during a time interval of equal length, etc.). These records are called realizations of that stochastic process. The collection of realizations of a random signal is called the ensemble of discrete-time signals.

Let the time sequence of the acceleration on the wheel axis during the ξjth run be denoted by

{x(k, ξj)}^{N −1}k=0.

Then the kth sample {x(k, ξ^j)} of each run is a random variable, denoted briefly by x(k), that can be characterized by its cumulative distribution function

Fx(k)(α) = Pr[x(k) ≤ α],

and, assuming that this function is continuous, its probability density function equals

fx(k)(α) = ∂

∂αFx(k)(α, k).

For two different time instants k1 and k2, we can characterize the two random variables x(k1) and x(k2) by their joint CDF or PDF.

For a fixed value j the sequence {x(k, ξj)} is called a realization of a random signal. The family of time sequences {x(k, ξ)} is called a random signal or a stochastic process.

4.3.1 Expectations of random signals

Each entry of the discrete-time vector random signal {x(k, ξ)}, with x(k, ξ) ∈ Rⁿ for a fixed k, is a random variable. When we indicate this sequence for brevity by the time sequence x(k), the mean is also a time sequence and is given by

mx(k) = E[x(k)].

On the basis of the joint probability density function of the two random variables x(k) and x(ℓ), the auto-covariance (matrix) function is defined

4.3 Random signals 101 as

Cx(k, ℓ) = E-

x(k) − mx(k)

x(ℓ) − mx(ℓ)T. .

Note that Cx(k, k) = var[x(k)]. The auto-correlation function of x(k) is defined as

Rx(k, ℓ) = E

-x(k)x(ℓ)^T. .

Considering two random signals x(k) and y(k), the cross-covariance function is defined as

Cxy(k, ℓ) = E

-(x(k) − m^x(k))(y(ℓ) − m^y(ℓ))^T. . The cross-correlation function is defined as

Rxy(k.ℓ) = E[x(k)y(ℓ)^T].

Following Definitions 4.9 and 4.10, the random signals x(k) and y(k) are uncorrelated if

Cxy(k, ℓ) = 0, for all k, ℓ, and orthogonal if

Rxy(k, ℓ) = 0, for all k, ℓ.

4.3.2 Important classes of random signals 4.3.2.1 Gaussian random signals

Definition 4.11 A discrete-time random signal x(k) is a Gaussian dom signal if every collection of a finite number of samples of this ran-dom signal is jointly Gaussian.

Let Cx(k, ℓ) denote the auto-covariance function of the random signal x(k). Then, according to Definition 4.11, the probability density function

In document Filtering and System Identification (Page 104-143)