Kolmogorov complexity and data compression

K(y)<(1−²)|y|, and letAn denote the eventxn∈S. Then by lemma 8.4.2,

P(An)≤

y∈S

2−n_<₂(1−²)n_{+ 12}−n_{= 2}1−²n_,

and hence the sumP∞_k₌₁P(An) is convergent. But then, the Borel-Cantelli Lemma in prob-

ability theory implies that with probability 1, only finitely many of the eventsAkoccur. But

this just means thatK(xn)/n→ ∞. ¤

8.5 Kolmogorov complexity and data compression

Let L ⊆ Σ∗

0 be a recursive language and suppose that we want to find a short program,

“code”, only for the words in L. For each wordxin L, we are thus looking for a program

f(x)∈ {0,1}∗ _{printing it. We call the function} _f _:_{L →}_Σ∗ _a _{Kolmogorov code} _of _L_{. The} conciseness of the code is the function

η(n) = max{ |f(x)|:x∈ L, |x| ≤n}.

We can easily get a lower bound on the conciseness of any Kolmogorov code of any language. LetLn denote the set of words ofLof length at mostn. Then obviously,

η(n)≥log₂|Ln|.

We call this estimate theinformation theoretical lower bound.

This lower bound is sharp (to within an additive constant). We can code every wordx

in L simply by telling its serial number in the increasing ordering. If the word x of length

n is the t-th element then this requires log2t ≤ log2|Ln| bits, plus a constant number of

additional bits (the program for taking the elements of Σ∗ _{in lexicographic order, checking} their membership inL and printing thet-th one).

We arrive at more interesting questions if we stipulate that the code from the word and, conversely, the word from the code should be polynomially computable. In other words: we are looking for a languageL0 _{and two polynomially computable functions:}

f :L → L0_, _g_:_L0_{→ L}

withg◦f = idL for which, for everyxin Lthe code|f(x)|is “short” compared to|x|. Such a pair of functions is called a polynomial-time code. (Instead of the polynomial time bound we could, of course, consider other complexity restrictions.)

We present some examples when a polynomial-time code approaches the information- theoretical bound.

Example 8.5.1 In the proof of Lemma 8.4.4, for the coding of the 0-1 sequences of lengthn

with exactlym1’s, we used the simple coding in which the code of a sequence is the number giving its place in the lexicographic ordering. We will show that this coding is polynomial.

Let us view each 0-1 sequence as the obvious code of a subset of the n-element set {n−1, n−2, . . . ,0}. Each such set can be written as{a1, . . . , am}witha1> a2>· · ·> am.

Then the set {b1, . . . , bm} precedes the set {a1, . . . , am} lexicografically if and only if there

is an i such that bi < ai while aj = bj holds for all j < i. Let {a1, . . . , am}, be the

lexicographically t-th set. Then the number of subsets {b1, . . . , bn} with this property is

exactly¡ ai

m−i+1

. Summing this for alli we find that

t= 1 + µ a1 m ¶ + µ a2 m−1 ¶ +· · ·+ µ am 1 ¶ . (8.4)

Given a1, . . . , am, the value of t is easily computable in time polynomial in n. Conversely,

if t < ¡_mn¢ is given then t is easy to write in the above form: first we find, using binary search, the greatest natural numbera1 with

¡_a

1 m

≤t−1, then the greatest numbera2 with

¡ _a 2 m−1 ¢ ≤t−1−¡a1 m ¢

, etc. We do this form steps. The numbers obtained this way satisfy

a1> a2>· · ·; indeed, according to the definition ofa1we have

¡_a 1+1 m ¢ =¡a1 m ¢ +¡ a1 m−1 ¢ > t−1 and therefore ¡ a1 m−1 ¢ > t−1−¡a1 m ¢

implying a1 > a2. It follows similarly that a2 > a3 >

· · · > am ≥ 0 and that there is no “remainder” after m steps, i.e., that 8.4 holds. It can

therefore be determined in polynomial time which subset is lexicographically thet-th. Example 8.5.2 Consider trees, given by their adjacency matrices (but any other “reason- able” representation would also do). In such representations, the vertices of the tree have a given order, which we can also express saying that the vertices of the tree are labeled by numbers from 0 to (n−1). We consider two trees equal if whenever the nodes i, j are connected in the first one they are also connected in the second one and vice versa (so, if we renumber the nodes of the tree then we may arrive at a different tree). Such trees are called labeled trees. Let us first see what does the information-theoretical lower bound give us, i.e., how many trees are there. The following classical result, called Cayley’s Theorem, applies here:

Theorem 8.5.1 The number ofn-node labeled trees is nn−2_.

Consequently, by the information-theoretical lower bound, for any encoding of trees some

n-node tree needs a code with length at least dlog(nn−2₎_e ₌_d₍_n₋_{2) log}_n_e_{. But can this}

lower bound be achieved by a polynomial-time computable code? (a) Coding trees by their adjacency matrices takesn2 _bits.

8.5. KOLMOGOROV COMPLEXITY AND DATA COMPRESSION 151 (b) We fare better if we specify each tree by enumerating its edges. Then we must give a “name” to each vertex; since there arenvertices we can give to each one a 0-1 sequence of lengthdlogneas its name. We specify each edge by its two endnodes. In this way, the enumeration of the edges takes cca. 2(n−1) log2nbits.

(c) We can save a factor of 2 in (b) if we distinguish a root in the tree, say the node 0, and we specify the tree by the sequence (α(1), . . . , α(n−1)) in whichα(i) is the first interior node on the path from nodeito the root (the “father” ofi). This is (n−1)dlognebits, which is already nearly optimal.

(d) There is, however, a procedure, the so-calledPr¨ufer code, that sets up a bijection between then-node labeled trees and the sequences of lengthn−2 of the numbers 0, . . . , n−1. (Thereby it also proves Cayley’s theorem). Each such sequence can be considered the expression of a natural number in the base n number system; in this way, we order a “serial number” between 0 andnn−2 _{to the}_n_{-node labeled trees. Expressing these}

serial numbers in the base two number system, we get a coding in which the code of each number has length at mostd(n−2) logne.

The Pr¨ufer code can be considered a refinement of the procedure (c). The idea is that we order the edges [i, α(i)] not by the magnitude of i but a little differently. Let us define the permutation (i1, . . . , in) as follows: let i1 be the smallest endnode (leaf) of the tree; if

i1, . . . , ik are already defined then let ik+1 be the smallest endnode of the graph remaining

after deleting the nodesi1, . . . , ik. (We do not consider the root 0 an endnode.) Letin= 0.

With theik’s thus defined, let us consider the sequence (α(i1), . . . , α(in−1)). The last element

of this is 0 (the “father” of the nodein−1can namely be onlyin), it is therefore not interesting.

We call the remaining sequence (α(i1), . . . , α(in−2)) thePr¨ufer code of the tree.

Claim 8.5.2 The Pr¨ufer code of a tree determines the tree.

For this, it is enough to see that the Pr¨ufer code determines the sequencei1, . . . , in; then

we know all the edges of the tree (the pairs [i, α(i)]).

The nodei1 is the smallest endnode of the tree; hence to determine i1, it is enough to

figure out the endnodes from the Pr¨ufer code. But this is obvious: the endnodes are exactly those that are not the “fathers” of other nodes, i.e., the ones that do not occur among the numbers α(i1), . . . , α(in−1),0. The nodei1is therefore uniquely determined.

Assume that we know already that the Pr¨ufer code uniquely determines i1, . . . , ik−1. It

follows similarly to the above that ik is the smallest number not occurring neither among

Claim 8.5.3 Every sequence (b1, . . . , bn−2), where 1≤bi ≤n, occurs as the Pr¨ufer code of

some tree.

Using the idea of the proof above, letbn−1= 0 and let us define the permutationi1, . . . , in

by the recursion thatik is the smallest number not occurring neither amongi1, . . . , ik−1nor

among bk, . . . , bn−1, where (1 ≤ k ≤ n−1); and let in = 0. Connect ik with bk for all

1≤k≤n−1 and letγ(ik) =bk. In this way, we obtain a graphGwithn−1 edges on the

nodes 1, . . . , n. This graph is connected, since for everyitheγ(i) comes later in the sequence

i1, . . . , in thani and therefore the sequence i, γ(i), γ(γ(i)), . . . is a path connectingi to the

node 0. But thenG is a connected graph withn−1 edges, therefore it is a tree. That the sequence (b1, . . . , bn−2) is the Pr¨ufer code ofGis obvious from the construction.

Remark 8.5.1 An exact correspondence like the Pr¨ufer code has other advantages besides optimal Kolmogorov coding. Suppose that our task is to write a program for a randomized Turing machine that outputs a random labeled tree of size n in such a way that all trees occur with the same probability. The Pr¨ufer code gives an efficient algorithm for this. We just have to generate randomly a sequenceb1, . . . , bn−2, which is easy, and then decode from

it the tree by the above algorithm.

Example 8.5.3 Consider now the unlabeled trees. These can be defined as the equivalence classes of labeled trees where two labeled trees are considered equivalent if they areisomor- phic, i.e., by a suitable relabeling, they become the same labeled tree. We assume that we represent each equivalence class by one of its elements, i.e., by a labeled tree (it is not interesting now, by which one). Since each labeled tree can be labeled in at mostn! ways (its labelings are not necessarily all different as labeled trees!) therefore the number of unlabeled trees is at leastnn−2_/n_!_≤₂n−2_{. (According to a difficult result of George P´olya, the number}

of n-node unlabeled trees is asymptoticallyc1cn2n3/2 where c1 and c2 are constants defined

in a certain complicated way.) The information-theoretical lower bound is therefore at least

n−2.

On the other hand, we can use the following coding procedure. Consider ann-node tree

F. Walk throughF by the “depth-first search” rule: Letx0be the node labeled 0 and define

the nodesx1, x2, . . .as follows: if xi has a neighbor that does not occur yet in the sequence

then letxi+1 be the smallest one among these. If it has not andxi6=x0then letxi+1 be the

neighbor ofxi on the path leading fromxi to x0. Finally, if xi =x0 and every neighbor of

x0occured already in the sequence then we stop.

It is easy to see that for the sequence thus defined, every edge occurs among the pairs [xi, xi+1], moreover, it occurs once in both directions. It follows that the length of the

8.5. KOLMOGOROV COMPLEXITY AND DATA COMPRESSION 153

εi = 0 otherwise. It is easy to understand that the sequence ε0ε1· · ·ε2n−3 determines the

tree uniquely; passing trough the sequence, we can draw the graph and construct the sequence

x1, . . . , xi of nodes step-for-step. In step (i+ 1), if εi = 1 then we take a new node (this

will be xi+1) and connect it with xi; if εi = 0 then let xi+1 be the neighbor of xi in the

“direction” of x0.

Remark 8.5.2 1. With this coding, the code assigned to a tree depends on the labeling but it does not determine it uniquely (it only determines the unlabeled tree uniquely). 2. The coding is not bijective: not every 0-1 sequence will be the code of an unlabeled

tree. We can notice that

(a) There are as many 1’s as 0’s in each tree;

(b) In every starting segment of every code, there are at least as many 1’s as 0’s (the difference between the number of 1’s and the number of 0’s among the first i

numbers gives the distance of the nodexi from the node 0). It is easy to see that for

each 0-1 sequence having the properties (a)−(b), there is a labeled tree whose code it is. It is not sure, however, that this tree, as an unlabeled tree, is given with just this labeling (this depends on which unlabeled trees are represented by which of their labelings). Therefore the code does not even use all the words with properties (a)−(b). 3. The number of 0-1 sequences having properties (a)−(b) is, according to a well-known

combinatorial theorem, 1

¡₂_n₋₂

n−1

(the so-called catalan number). We can formulate a tree notion to which the sequences with properties (a)−(b) correspond exactly: these are therooted planar trees, which are drawn without intersection into the plane in such a way that their distinguished vertex—their root—is on the left edge of the page. This drawing defines an ordering among the “sons” (neighbors farther from the root) “from the top to the bottom”; the drawing is characterized by these orderings. The above described coding can also be done in rooted planar trees and creates a bijection between them and the sequences with the properties (a)−(b).

Chapter 9

Pseudo-random numbers

In document Complexity of Algorithms (Page 155-161)