Chapter 1 Introduction

(1)

Chapter 1 Introduction

1. Shannon’s Information Theory 2. Source Coding theorem

3. Channel Coding Theory

4. Information Capacity Theorem 5. Introduction to Error Control

Coding

Appendix A : Historical Notes

(2)

1. Shannon's Information Theory

The theory provides answers to two fundamental questions (among others):

(a)What is the irreducible complexity below which a signal cannot be compressed?

(b)What is the ultimate transmission rate for reliable communication over a noisy channel?

2. Source Coding Theorem (Shannon's first theorem)

The theorem can be stated as follows:

Given a discrete memoryless source of entropy ^H ^(S ⁾ , the average code-word length ^L for any distortionless source coding is bounded as

)

(S

H

L ≥

(3)

§ This theorem provides the mathematical tool for assessing data compaction, i.e.

lossless data compression, of data generated by a discrete memoryless source.

§ The entropy of a source is a function of the probabilities of the source symbols that constitute the alphabet of the source.

§ Entropy of Discrete Memoryless Source Assume that the source output is modeled as a discrete random variable,

S

, which takes on symbols from a fixed finite alphabet

} { s

₀

, s

₁

, , s

_K₋₁

= L

S

With probabilities

∑

=

−

=

^K^-¹

0 k

k k

k

) p , k 0,1,2, , K 1 p 1 s

P(S L with

Define the amount of information gain

after observing the event ^S ⁼ ^s

^k

as the

logarithmic function

(4)

bits p ) ( 1 log )

I(s

k 2

k

=

the entropy of the source is defined as the mean of ^I ⁽ ^s

^k

⁾ over source alphabet

S given by

∑

−

=

−

=

1 K

0

k k

2 k 1 K

0 k

k k

k

p log 1

p

s I p

s I E H

bits )

( ) (

)]

( [ )

(S

§ The entropy is a measure of the average information content per source symbol.

§ The source coding theorem is also

known as the "noiseless coding

theorem" in the sense that it establishes

the condition for error-free encoding to

be possible.

(5)

3. Channel Coding Theorem (Shannon's 2

^nd

theorem)

§ The channel coding theorem for a discrete memoryless channel is stated in two parts as follows:

(a)Let a discrete memoryless source with an alphabet

^S

have entropy ^H ^(S ⁾ and produce symbols once every ^T

^S

seconds. Let a discrete memoryless channel have capacity ^C and be used once every

TC

seconds. Then if

C

S

T

C T

H (S ) ≤

There exists a coding scheme for

which the source output can be

transmitted over the channel and be

reconstructed with an arbitrarily

small probability of error.

(6)

(b) Conversely, if

C

S

T

C T

H (S ) >

It is not possible to transmit information over the channel and reconstruct it with an arbitrarily small probability of error.

§ The theorem specifies the channel capacity ^C as a fundamental limit on the rate at which the transmission of reliable error-free message can take place over a discrete memoryless channel.

§ Mutual Information

Channel

x y

X

alphabet alphabet Y

Channel input x (selected from ^X ) with entropy H . ^X ⁼ ^{ ^x

⁰

^, ^x

¹

^,..., ^x

^J⁻¹

^}

Channel output y (selected from

^Y

) ,

} ,...,

,

{

₀ ₁ ₋₁

= y y y

_K

Y

How can we measure the uncertainty

about x after observing y?

(7)

Define the conditional entropy of X selected from ^X , given that

^Y ⁼ ^y^k

as

∑

⁻

=

= ^J ¹

0

j j k

2 k

j

k P x y

log 1 y

x P y

Y

H ]

)

| [ (

)

| ( )

| ( X

The mean of entropy ^{H X} ⁽ ^| ^y

^k

⁾ over the output alphabet

^Y

is given by

∑ ∑

∑

=

−

=

−

=

1 - K

0 k

1 J

0

j j k

2 k

j 1

K

0 k

k k

y x P log 1 y

, x P

y P y Y H

H

)]

| [ (

) (

) ( )

| ( )

Y

|

(X X

The conditional entropy represents the amount of uncertainty remaining the channel input after the channel output has been observed.

The difference ^H ⁽ ^X ⁾ ⁻ ^H ⁽ ^X ^| ^Y ⁾ is called the mutual information, which represents the uncertainty about the channel input that is resolved by observing the channel output.

Denoting the mutual information by

)

; ( X Y

I , we may thus write

)

| ( )

( )

;

( X Y H X H X Y

I = −

Note that ^I ⁽ ^X ^; ^Y ⁾ ⁼ ^I ⁽ ^Y ^; ^X ⁾ .

(8)

§ Channel Capacity

The mutual information can be expressed by

∑∑

=

−

=

^K^-¹

0 k

1 J

0

j k

k j 2

k

j

P y

y x log P

y , x P

I ]

) (

)

| [ (

) (

)

; ( X Y

The input probability distribution

)}

(

{ P x

_j

is obviously independent of the channel. We can then maximize ^I ⁽ ^X ^; ^Y ⁾ with respect to

^{^P⁽^x^j^)}

Hence we define the channel capacity

C as

)

;

)}

(

{

max I X Y

C

xj

=

p

The channel capacity ^C is measured in bits per channel use.

§ The Channel Coding Theorem is also

known as “noisy coding theorem”.

(9)

4. Information Capacity Theorem

(also known as Shannon-Hartley law or Shannon's 3

^rd

theorem)

It can be stated as follows:

The information capacity of a continuous channel of bandwidth ^B Hz, perturbed by additive white Gaussian noise of power spectral density ^N ₂

⁰

and limited in bandwidth to ^B , is given by

d bits/secon

)

( N B

1 P log B C

0

2

+

=

where ^P is the average transmitted power.

This theorem implies that, for given

average transmitted power ^P and

channel bandwidth ^B , we can transmit

information at the rate ^C bits per

second, with arbitrarily small

probability of error by employing

sufficiently complex encoding systems.

(10)

Shannon limit:

For an ideal system that transmits data at a rate ^R

^b

⁼ ^C

Thus ^P ⁼ ^E

^b

^• ^R

^b

⁼ ^E

^b

^• ^C Then ^C _B ^log

²

⁽ ¹ _N ^E ^C _B ⁾

o b

• +

=

B N c

E

^C^B

o

b

= 2 − 1

∴

For infinite bandwidth, the ^{E /}

^b

^N

^o

approaches the limiting value

N dB E N

E

o b o B

b

) lim ( ) ln 2 0 . 693 1 . 6

( = = = = −

∞

∞ →

This value is called Shannon limit.

(11)

Appendix A. Historical Notes

§ The seeds of Shannon's information theory appeared first in a Bell Labs classified memorandum dated 1 September 1945, “A Mathematical Theory of Cryptography” (revised form was published in BSTJ, 1949.)

§ Shannon's work on cryptography problem at Bell Labs brought him into contact with two scholars, R.V.L.

Hartley and H. Nyquist, who were major consultants to the cryptography research.

§ Nyquist (1924) has shown that a certain bandwidth was necessary in order to send telegraph signals at a definite rate.

§ Hartley (1928) had attempted to

quantify of information as the logarithm

of the number of possible message built

from a pool of symbols.

(12)

§ Shannon was aware of Norbert Wiener's work on cybernetics (he cited Wiener's book in his two 1948 articles on information theory). Wiener had recognized that the communication of information was a problem in statistics.

§ Shannon had taken a mathematics course from Norbert Wiener when he was studying at MIT, and Shannon had access to the "Yellow Peril" report (Wiener's book before publication) while he worked on cryptographic research.

Major Publications of Shannon

§ A Mathematical Theory of Communication, I, II, published in BSTJ, 1948, 379-423, 623-656.

§ Communication Theory of Secrecy

Systems, published in BSTJ, 1949,

656-715.

(13)

§ Paperback edition of "The Mathematics Theory of Communication" was published by University of Illinois Press, 1963. 32,000 copies have been sold up to 1964.

§ In a review of the accomplishments of Bell Labs. in communication science, Millman in his edited book (1984), “A history of engineering and science in the Bell system: Communication Science (1925-1980)” says: “Probably the most specular development in communication mathematics to take place at Bell Laboratories was the formulation in the 1940's of information theory by C. E.

Shannon.”

§ In his 1948 articles on information

theory, Shannon credits J.W. Tukey (a

professor at Princeton Univ.) with

suggesting the word "bit" as a measure

of information.

(14)

§ In 1984 J.L. Massey wrote a paper

“Information Theory: the Copernican System of Communications.” Published in IEEE Communications Magazine, vol.

22, No. 12, pp. 26-28.

The Shannon’s theory of information is the scientific basis of communications in the same sense that the Copernicus’

heliocentric theory is the scientific basis of astronomy.

§ IEEE Transaction on Information Theory Oct. 1998. Vol.44 No.6.

Commemorate Issue (1948-1998)

(15)

5. Introduction to Error Control Coding 5.1 Purpose of Error Control Coding

§ In data communications, coding is used for controlling transmission errors induced by channel noise or other impairments, such as fading and interference, so that error-free communication can be achieved.

§ In data storage system, coding is used for controlling storage errors (during retrieval) caused by storage medium defects, dust particles and radiation so that error-free storage can be achieved.

Information source

Source encoder

Channel encoder

Modulation (writing unit)

Transmission channel (storage medium)

Demodulator (read unit) Channel

decoder

Destination Source

decoder

noise interference

(16)

5.2 Coding Principle

§ Coding is achieved by adding properly designed redundant digits (bits) to each message. These redundant digits (bits) are used for detecting and/or correcting transmission (or storage) errors.

5.3 Types of Codings

§ Block and Convolutional coding.

§ Block coding:

A message of k digits is mapped into a structured sequence of n digits, called a codeword.

k digits n digits

rate code n:

k

length code

: n

length message

: k

The mapping operation is called encoding. Each encoding operation is independent of the past encoding, i.e.

memoryless. The collection of all

codeword is called a "block code".

(17)

§ Convolutional Coding:

« An information sequence is divided into (short) blocks of k digits each.

Each k-digit message is encoded into an n-digit coded block. The n-digit coded block depends not only the corresponding k-digit message block but also on m (

^≥¹

) previous message blocks. That is, the encoder has memory of order m.

« The encoder has k inputs and n outputs.

« An information is encoded into a coded sequence. The collection of all possible code sequences is called an (n, k, m) convolutional code.

« Normally, ¹ ₂ ^≤ _≤ ^k _n _≤ ^≤ ⁸ ₉ _n ^k ^: ^code ^rate

(18)

5.4 Type of Errors and Channels

§ Random errors and burst errors.

Type of Channels

§ Random error channels:

Deep space channel, satellite channels, line of sight transmission channel, etc.

§ Burst error channels:

Radio links, terrestrial microwave links,

wire and cable transmission channels,

etc.

(19)

5.5 Decoding

Suppose a codeword corresponding to a message is transmitted over a noisy channel.

§ Let ^r be the received sequence. Based on ^r , the encoding rules and the noise characteristics of the channel, the receiver (or decoder) makes a decision which message was actually transmitted.

This decision making operation is called

“decoding”. The device which performing the decoding operation is called a “decoder”.

§ Two types of decoding:

Hard-decision decoding and soft-decision decoding.

§ Hard-decision decoding:

When binary coding is used, the

modulator has only binary inputs. If

binary demodulator output quantization

(20)

is used, the decoder has only binary inputs. In this case, the demodulator is said to make hard decisions.

Decoding based on hard decision made by the demodulator is called “hard decision decoding”.

§ Soft-decision decoding:

If the output of demodulator consists of more than two quantization levels or is left unquantized, the demodulator is said to make soft decisions.

Decoding based on soft decision made by demodulator is called soft-decision decoding.

§ Hard-decision decoding is much easier

to implement than soft-decision

decoding. However, soft-decision

decoding offers much better

performance.

(21)

5.6 Some Channel Models

§ Binary Symmetric Channel (BSC)

1-p

p p

0 0

1 1

Input Output

p: transition probability

When BPSK modulation is used over the AWGN channel with optimum coherent detection and binary output quantization, the bit-error probability for equally likely signal is given by

) (

N

0

Q 2E p =

∫

^{∞ −}

≡

x

2 y

dy 2p e

Q(x) 1

2

where

E : bit energy

2 N

₀

: power spectral density of AWGN

(22)

§ Binary-input, 8-ary Output Discrete Channel

0

1

Input Output

2 3 4 5 6 7

5.7 Optimum Decoding

§ Suppose the codeword ^c corresponding to a certain message

^m

is transmitted. Let ^r be the corresponding output of the demodulator.

§ The decoder produces an estimate ^m ^ˆ of the message based on ^r .

§ An optimum decoding rule is that minimize the probability of a decoding error. That is, ^P ⁽ ^c ^ˆ ^≠ ^c ^| ^r ⁾ is minimized.

Or, equivalently, maximizing ^P ⁽ ^c ^ˆ ⁼ ^c ^| ^r ⁾ .

(23)

§ The decoding error is minimized for a given ^r by choosing ^cˆ to be a codeword ^c that maximizes

) (

) ( ) ) (

( P r

c P c

| r r P

| c

P =

That is, ^cˆ is chosen to be the most likely codeword, given that ^r is received.

5.8 Maximum a posteriori decoding (MAP decoding)

§ In general, we have

) (

) ( ) ) (

( P r

c P c

| r r P

| c

P =

If knowledge or an estimation of

^P^(c⁾

is

used for decoding, the technique is

called MAP decoding.

(24)

5.9 Maximum Likelihood Decoding (MLD)

§ Suppose all the messages are equally likely. An optimum decoding can be done as follows:

For every codeword ^c

^j

, compute the conditional probability

) (r |c_j P

The codeword ^c

^j

with the largest conditional probability is chosen as the estimate ^cˆ for the transmitted codeword ^c .

This decoding rule is called the

Maximum Likelihood decoding

(MLD).

(25)

§ MLD for a BSC

Let ^a ⁼ ⁽ ^a

¹

^, ^a

²

^, ^L ^, ^a

ⁿ

⁾ and ^b ⁼ ⁽ ^b

¹

^, ^b

²

^, ^L ^, ^b

ⁿ

⁾ be two binary sequences of ⁿ components.

The Hamming distance between

^a

and

b , denoted as ^d

^H

⁽ ^a ^, ^b ⁾ , is defined as the number of places where

^a

and ^b differ.

In coding for a BSC, every codeword and every received sequence are binary sequences.

Suppose some codeword is transmitted and the received sequence is

) ( r

₁

, r

₂

, , r

_n

r = L

For a codeword

^cⁱ

, the conditional probability ^P ⁽ ^r ^| ^c

ⁱ

⁾ is

) ( )

(

( )

)

( r | c

_i

p

^d^H ^r^,^cⁱ

1 - p

ⁿ^-^d^H ^r^,^cⁱ

P =

For ^p ^< ₂ ¹ ,

^P⁽^r ^|^cⁱ⁾

is a monotonically decreasing function of ^d

^H

⁽ ^r ^, ^c

ⁱ

⁾ .

Then ^P ⁽ ^r ^| ^c

ⁱ

⁾ ^> ^P ⁽ ^r ^| ^c

^j

⁾ if and only if

) (

)

(

_i _H _j

H

r , c d r , c

d <

(26)

The MLD is completed by the following steps:

(i) Compute ^d

^H

⁽ ^r ^, ^c

ⁱ

⁾ for all ^c

ⁱ

(ii) ^c

ⁱ

is taken as the transmitted codeword if ^d

^H

⁽ ^r ^, ^c

ⁱ

⁾ ^< ^d

^H

⁽ ^r ^, ^c

^j

⁾ for ^j ^≠ ⁱ

(iii) Decode

^cⁱ

into message

^mⁱ

That is, the received vector ^r is decoded into the closest codeword.

This is also called the minimum distance

(nearest neighbor) decoding.

(27)

Chapter 1 Introduction