Shannon’s entropy based measures - Information based complexity measures

Complexity concepts

II.2 Computational complexity measurement

II.2.3 Information based complexity measures

II.2.3.1 Shannon’s entropy based measures

Beginning with fundamental Shannon’s entropy measure, the role of Information theory was at first very narrow. It was a subset of communication theory the main purpose of which was to find answers to two fundamental questions:

1. What is the ultimate data compression that can be applied to a signal?

2. What is the ultimate transmission rate of signals on a wire?

But the mathematical techniques developed after pioneer Shannon so fruitfully that they were applied in various fields of investigation including Theory of Complexity.

Entropy might be used for classification tasks complexity measurement, because accordingly to theory, it figures out the characteristics of probabilistic models.

This ought to be defined as following:

Let x be a discrete variable which may take i = 1 … l values x{x1,…xl}, where l is the index of maximal possible values. To i-th value xi of variable x is assigned probability pi. Thus, Shannon’s entropy measures complexity as following:

)

The entropy varies from 0 to log2(l). Additional predefining has to been done for i-cases when pi(x) = 0, log pi(x) must be set equal 0. In case of uniform distribution the entropy is maximal H(x) = log2(l). It is obvious that the greater number of possible states we have - the greater entropy is. In a classical works from the theory of information, entropy signifies the average amount of information required to select observations by categories (Krippendorff 1986).

Entropy may be standardized so that it would range from 0 to 1, by dividing it by its maximum log2(l). It may be easier to compare the amount of disorder of two systems, knowing that one system encountered more states than the other.

A nice property of Shannon’s entropy is that variable categories may be permuted without changing its value. Only the relative frequencies have a matter. This is why this

tel-00481367, version 1 - 6 May 2010

measure is said to be content-free. It does not make any assumptions about the distribution of data; it thus belongs to the nonparametric family of statistics methods.

Entropy is interpreted in many different ways. It is a measure of the uncertainty tied to the observed system. The lower the entropy the easier it is to predict the system's state, and conversely. It may also be interpreted as a measure of disorder of a system, or in a very similar fashion, a measure of its variability. There again is the lower the entropy - the more orderly the system and conversely. Important note here is that we have to take into consideration the source/analyzing tool which provide the distribution of pi(x) which in the terms of mentioned variables describe the system.

Shannon's entropy works very fine for describing the order, uncertainty or variability of a unique variable, but when we deal with more than one variable the following Joint entropy is used.

There are various entropies when considering two or more variables together: the joint entropy, the mutual information and the conditional entropy.

When considering two discrete variables x and y at the same time, it is possible to measure the degree of uncertainty or information associated with them. It is called the joint entropy, H(x,y). If independent value x and y may respectively take l1 and l2 maximal numbers of possible values of their individual probabilities pij correspondingly, the joint entropy is computed as:

Where pij represents the probability of being classified in both categories i of variable x and category j of variable y. The joint entropy varies from a theoretical 0 or empirically min{H(x),H(y)} to log2(l1)+log2(l2). The relation between the individual entropies and their joint entropy is given by:

)

It expresses the fact that the joint entropy is always smaller than the sum of the individual entropies. Let us underline that the equation III.3 holds only when the two variables are independent, plus despite similar notation joint entropy shouldn’t be confused with cross entropy.

Not only information can be measured by two variables as a whole, but also the amount of information of a variable knowing the other. This is called the conditional entropy.

tel-00481367, version 1 - 6 May 2010

This type of entropy relies on conditional probabilities, or as it’s also called transitional probabilities. Suppose we want to compute the conditional probability of state j of variable y for given state i of variable x; this is written as p(y|x). It is different from the joint probability

The relationship between conditional entropy and joint entropy is as follows:

)

The conditional entropy defines a reduction of uncertainty. The higher the conditional entropy the more an observer can predict the state of a variable, knowing the state of the other variable.

Contrary to the joint entropy, conditional entropy is not a symmetrical measure:

H(y|x) ≠ H(x|y). Conditioning on a variable or the other does not give the same result, because each variable has its own entropy H(x) and H(y).

The conditional information is the information particular to a variable, while the joint entropy is the sum of the information of two variables.

Another commonly used Shannon’s based measure is Mutual information. It measure the information shared by variables, or the quantity of information an observer that gets common in two (or more) variables. Generally, for two variable formula given as

∑ ∑

= =+

There are many ways of expressing given formula. It might be expressed as a relation between the individual entropies and the joint entropy. It is the sum of the individual entropies, minus the joint entropy, as expressed:

)

In a case of three variables, the equation becomes:

)

When two variables are independent, the sum of their individual entropies is equal to the joint entropy the mutual information is equal to zero.

tel-00481367, version 1 - 6 May 2010

Therefore the best measure of the proximity between variables is the mutual information. It was shown in work (Lemay 1999) that mutual information is related to the likelihood ratio Λ by the following, where l is the number of observations:

)

This is an important fact, since it is the link between the information theory and the statistical use of the branch of theory of probability. The greater is the mutual information, the more similar two variables.

In document Thèse. Présentée pour l obtention du titre de DOCTEUR DE L UNIVERSITÉ PARIS-EST. Spécialité: Sciences Informatiques. (Page 67-70)