properties are stated without proof, you should be able to easily show them. Note especially the last property and the fact that the constant in front of the variance on the right hand side is squared. This occurs because the variance is the expectation of a
squared quantity. Check yourself out on the use of these properties. Suppose the variance
of X is equal to 1 and Y=2X+3. What is the variance of Y?5
The last formula involving variance that will be mentioned is given below: (4.17)
This states that the variance can be found by computing the second moment and subtracting the mean squared. Frequently, this method for computing the variance is much easier than using the formula (4.16) directly.
The proof of (4.17) is a good example of considering E{·} as a linear operator and using the properties of expectation listed in Table 4.1. We begin with (4.16) and expand the argument algebraically:
Since the expectation is a linear operator, we can distribute the expectation over the three terms and write
5 Answer:
where in the last step we have used some of the properties in Table 4.1. Finally, since
E{X}=mX, the middle term on the far right is equal to . Therefore, by adding up terms,
(4.17) follows.
An example, involving a discrete random variable, illustrates the use of (4.17). Example 4.5: A Bernoulli random variable has the PMF shown in Fig. 3.5 of Chapter 3.
The mean is computed as follows:
The variance is computed by first computing the second moment:
Then (4.17) is applied to compute the variance:
(You can check this result by computing from (4.16) directly.)
□
4.2.3 Some higher-order moments
Some normalized quantities related to the third and fourth order moments are sometimes used in the analysis of non-Gaussian random variables. These quantities are called the
skewness and kurtosis and are defined by
(4.18)
The skewness and kurtosis for a Gaussian random variable are identically 0.
The skewness can be used to measure the asymmetry of a distribution. PDFs or PMFs which are symmetric about their mean have zero skewness. The kurtosis is sometimes used to measure the deviation of a distribution from the Gaussian PDF. In other words distributions that are in this sense “closer to Gaussian” have lower values of kurtosis. The two quantities α3 and α4 belong to a more general family of higher-order statistics known
as cumulants [7, 8]. These quantities have recently found use in various areas of signal processing (see e.g., [9]). Some problems on computing skewness and kurtosis are included at the end of this chapter (see Prob. 4.20).
4.3 Generating Functions
In the analysis of signals and linear systems, Fourier, Laplace, and z-transforms are essential to show properties of the signal or system that are not readily apparent in the time domain. For similar reasons formal transforms applied to the PDF or PMF can be useful tools in problems involving probability and random variables. One especially important use of the transforms is in generating moments of the distribution; hence the name “generating functions.” This section provides a short introduction to these transforms or “generating functions” for continuous and discrete random variables.
4.3.1 The moment generating function
The moment generating function (MGF) corresponding to a random variable X is defined by
(4.19) where s is a complex variable taking on values such that the integral converges.6 These
values of s define a region of the complex plane called the “region of convergence.” The definition (4.19) provides two interpretations, both of which are useful The MGF can be thought of as either: (1) the expected value of esX, or (2) as the Laplace transform of the
PDF.7 The MGF is most commonly used for continuous random variables; however, it
can be applied to mixed or discrete random variables as long as impulses are allowed in the PDF.
When the MGF is evaluated at s=jω (i.e., on the imaginary axis of the complex plane), the result is the Fourier tranform:
This quantity, which is called the characteristic function, is often used instead of the MGF to avoid discussion of region of convergence and some subtle mathematical difficulties that can arise when we stray off the s=jω axis. Notice that on the jω axis, there is absolute convergence for the integral:
Therefore the characteristic function always exists and the MGF always converges at
least on the s=jω axis.
The ability to derive moments from the MGF is apparent when you expand the term
The moments E{Xn} appear as coefficients of the expansion in s.
Given the moment generating function MX(s), the moments can be derived by taking
derivatives with respect to s and evaluating the result at s=0.9 To see how this works, we can use the last equation to write
Then by taking the derivative once again:
6 Lately it is common to define the MGF for only real values of s (e.g., [10]). Although this
simplifies the mathematical discussion, it leads to cases where the MGF may not exist.
7 Notice that the transform definition (4.19) uses s instead of –s which is common in electrical
engineering. Hence any arguments involving the left and right half planes have to be reversed.
8 This expansion is possible in the region of convergence since there are no poles at s=0. [11] 9 While the derivative of a function of a complex variable may not always exist, we assume here
that they exist in our region of interest.
Generalizing this result produces the formula
(4.20)
The following examples illustrate the calculation of the MGF and the use of (4.20) to compute the mean and variance.
Example 4.6: Refer to the exponential random variable defined in Example 4.4. The PDF is given by
The MGF is computed by applying (4.19)
The integral exists as long as Re[s]<λ. This inequality defines the region of convergence.
Example 4.7: To compute the mean of the exponential random variable from its MGF, apply (4.20) with n=1:
To compute the variance, first compute the second moment using (4.20):
Then use (4.17) to write
The results agree with the results of Example 4.4.
□
4.3.2 The probability generating function
The moments for a discrete integer-valued random variable K are more easily dealt with by using the probability generating function (PGF) defined by
(4.21) where z is a complex variable in the region of convergence (i.e., the region where the infinite sum converges). Again, two interpretations are equally valid; the PGF can be thought of as either the expectation of zK or the z-transform of the PMF.10 The name
probability generating function comes from the fact that if fK[k]=0 for k<0 then
From this expansion it is easy to show that
(4.22) Our interest in the PGF, however, is more in generating moments than it is in generating probabilities. For this, it is not necessary to require that fK[k]=0 for k<0. Rather we can
deal with the full two-sided transform defined in (4.21).
The method for generating moments can be seen clearly by using the first form of the definintion in (4.21), i.e.,
The derivative of this expression is11
If this is evaluated at z=1, the term zK−1 goes away and leaves the formula
(4.23) This is the mean of the discrete random variable. To generate higher order moments, we repeat the process. For example,
While this result is not as “clean” as the corresponding result for the MGF, we can use the last two equations to express the second moment as
(4.24) Table 4.3 summarizes the results for computing the first four moments of a discrete random variable using the PGF. The primes in the table denote derivatives.