• No results found

Measuring information and coding

Figure 2.8 Digital communication

2.5.2 Measuring information and coding

The study of Information Theory is fundamental to understanding the trade-offs in design of efficient communication systems. The basic theory was developed by Shannon [2, 3, 4] and others in the late 1940’s and ’50’s and provides fundamental results about what a given communication system can or cannot do. This section provides just a taste of the results which are based on a knowledge of basic probability.

Consider the digital communication system depicted in Fig. 2.8 and let the events A1

and A2 represent the transmission of two codes representing the symbols 0 and 1. To be

channel with the following probabilities: Pr[A1]= and Pr[A2]= The information

associated with the event Ai is defined as

The logarithm here is taken with respect to the base 2 and the resulting information is expressed in bits.2 The information for each of the two symbols is thus

Observe that events with lower probability have higher information. This corresponds to intuition. Someone telling you about an event that almost always happens provides little information. On the other hand, someone telling you about the occurrence of a very rare event provides you with much more information. The news media works on this principle in deciding what news to report and thus tries to maximize information.

The average information3 H is given by the weighted sum

Notice that this average information is less than one bit, although it is not possible to transmit two symbols with less than one binary digit (bit).

Now consider the following scheme. Starting anywhere in the sequence, group together two consecutive bits and assign this pair to one of four possible codewords. Let 2 Other less common choices for the base of the logarithm are 10, in which case the units of information are Hartleys, and e in which case the units are called nats.

3 Average information is also known as the source entropy and is discussed further in Chapter 4. the corresponding events (or codewords) be denoted by Bj for j=1, 2, 3, 4 as shown in the

table below and assume that two consecutive symbols are independent.

codeword symbols probability information (bits)

B1 00 6

B2 01 3.193

B3 10 3.193

B4 11 0.386

Notice that since the probabilities of two consecutive symbols multiply, the corresponding information adds. For example, the symbol 0 by itself has information of 3 bits, while the pair of symbols 00 shown in the table has information of 6 bits. The average information for this scheme is given by the weighted sum

The average information per bit is still 1.087/2=0.544.

The previous analysis does not result in any practical savings since the average information H is still more than one bit and therefore in a binary communication system it will still require a minimum of two bits to send the four codewords. A continuation of this procedure using larger groups of binary symbols mapped to codewords however does lead to some efficiency. The table below lists the average information with increasing numbers of symbols per codeword.

No. symbols Avg. Information Avg. Inf. / bit

2 1.087 0.544 3 1.631 0.544 4 2.174 0.544 5 2.718 0.544 8 4.349 0.544 10 5.436 0.544 12 6.523 0.544

From this table, it is seen that when three symbols are grouped together, the average information is 1.631 bits. It would therefore seem that only two binary digits should be theoretically required to transmit the codewords, since the information I=1.631 is less than 2 bits. Likewise, when 12 symbols are grouped together, it should require no more than 7 binary digits on average to code the message (I=6.523<7). How to achieve such efficiency in practice has led to various coding algorithms such as Huffman coding and Shannon-Fano coding. The basic idea is to use variable length codes and assign fewer binary digits to codewords that occur more frequently. This reduces the average number of bits that are needed to transmit the message. The example below illustrates the technique using the Shannon-Fano algorithm.

Example 2.14: It is desired to code the message “ELECTRICAL ENGINEERING” in an efficient manner using Shannon-Fano coding. The probabilities of the letters (excluding the space) are represented by their relative frequency of occurrence in the

message. The letters {E, L, C, T, ,R, I, A, N, G} are thus assigned the probabilities .

in order of decreasing probability; any ties are broken arbitrarily. The letters are then partitioned into two groups of approximately equal probability (as closely as possible). This is indicated by the partition labeled 1. Those letters in the first group are assigned a codeword beginning with 0 while those in the second group are assigned a codeword beginning with 1.

Within each group, this procedure is repeated recursively to determine the second, third, and fourth digit of the codeword.

The final result is as shown below:

An inherent and necessary property (for decoding) of any variable-length coding scheme is that no codeword is a prefix of any longer codeword. Thus, for example, upon finding the sequence 011, we can uniquely determine that this sequence corrresponds to the letter N, since there is no codeword of length 4 that has 011 as its first three binary digits.

Now consider the efficiency achieved by this coding scheme. The average number of bits used in coding of the message is the sum of the lengths of the codewords weighted

by the probability of the codeword. For this example, the average length is given by (see final figure)

On the other hand, if a fixed-length coding scheme were used then the length of each codeword would be 4 bits. (Since there are nine letters, three bits are insufficient and four bits are needed to code each letter.) Thus the variable-length coding scheme, which is based on estimating the average information in the message, reduces the communication traffic by about 24%.

2.6 Summary

The study of probability deals with the occurrence of random “events.” Such events occur as outcomes or collections of outcomes from an experiment. The complete set of outcomes from an experiment comprise the Sample Space. These outcomes must be mutually exclusive and collectively exhaustive and be of finest grain in representing the conditions of the experiment. Events are defined in the sample space.

The algebra of events is a form of set algebra that provides rules for describing arbitarily complex combinations of events in an unambiguous way. Venn diagrams are useful as a complementary geometric method for depicting relationships among events.

Probability is a number between 0 and 1 assigned to an event. Several rules and formulae allow you to compute the probabilities of intersections, unions, and other more complicated expressions in the algebra of events when you know the probabilities of the other events in the expression. An important special rule applies to independent events: to compute the probability of the combined event, we simply multiply the individual probabilities.

Conditional probability is necessary when events are not independent. The formulas developed in this case provide means for computing joint probabilities of sets of events that depend on each other in some way. These conditional probabilities are also useful in developing a “tree diagram” from the facts given in a problem, and an associated representation of the sample space. Bayes’ rule is an especially important use of conditional probability since it allows you to “work backward” and compute the probability of unknown events from related observed events. Bayes’ rule forms the basis for methods of “statistical inference” and finds much use (as we shall see later) in engineering problems.

A number of examples and applications of the theory are discussed in this chapter. These are important to see how the theory applies in specific situations. The applications also illustrate some well-established models, such as the binary communication channel, which are important to electrical and computer engineering.

References

[1] Alvin W.Drake. Fundamentals of Applied Probability Theory. McGraw-Hill, New York,1967. [2] Claude E.Shannon. A mathematical theory of communication. Bell System Technical Journal,

27(3):379–422, July 1948. (See also [4].).

[3] Claude E.Shannon. A mathematical theory of communication (concluded). Bell System Technical Journal, 27(4):623–656, October 1948. (See also [4].).

[4] Claude E. Shannon and Warren Weaver. The Mathematical Theory of Communication. University of Illinois Press, Urbana, IL, 1963.

Problems