Spanning trees of a graph - Graphs: an introduction

Graphs: an introduction

5.3 Spanning trees of a graph

A spanning tree is one of the basic graph constructions:

5.3.1 Definition. Let G = (V, E) be a graph. An arbitrary tree of the form (V, E), where E ⊆ E, is called a spanning tree of the graph G. So a spanning tree is a subgraph of G that is a tree and contains all vertices of G.

Obviously, a spanning tree may only exist for a connected graph G.

It is not diﬃcult to show that every connected graph has a spanning tree. We prove it by giving two (fast) algorithms for ﬁnding a span-ning tree of a given connected graph. In the subsequent sections we will need variants of these algorithms, so let us study them carefully.

5.3.2 Algorithm (Spanning tree). Let G = (V, E) be a graph with n vertices and m edges. We order the edges of G arbitrarily into a sequence (e1, e2, . . . em). The algorithm successively constructs sets of edges E0, E1, . . .⊆ E.

We let E0=∅. If the set Ei−1 has already been found, the set E_i is computed as follows:

E_i=

E_i₋₁∪ {ei} if the graph (V, E_i₋₁∪ {ei}) has no cycle Ei−1 otherwise.

The algorithm stops either if E_i already has n− 1 edges or if i = m, i.e. all edges of the graph G have been considered. Let E_t denote the set for which the algorithm has stopped, and let T be the graph (V, E_t).

5.3.3 Proposition (Correctness of Algorithm 5.3.2). If Algo-rithm 5.3.2 produces a graph T with n−1 edges then T is a spanning tree of G. If T has k < n− 1 edges then G is a disconnected graph with n− k components.

Proof. According to the way the sets E_i are created, the graph G contains no cycle. If k =|E(T )| = n − 1 then T is a tree according to Exercise 5.1.2, and hence it is a spanning tree of the graph G. If k < n− 1, then T is a disconnected graph whose every component is a tree (such a graph is called a forest). It is easy to see that it has n− k components.

We prove that the vertex sets of the components of the graph T coincide with the vertex sets of the components of the graph G. For contradiction, suppose this is not true, and let x and y be vertices

5.3 Spanning trees of a graph 167 lying in the same component of G but in distinct components of T . Let C denote the component of T containing the vertex x, and con-sider some path (x = x0, e1, x1, e2, . . . , e, x = y) from x to y in the graph G, as in the following picture:

x y

xi e

Let i be the last index for which x_iis contained in the component C.

Obviously i < , and hence xi+1∈ C. The edge e = {xi, xi+1} thus does not belong to the graph T , and so it had to form a cycle with some edges already selected into T at some stage of the algorithm.

Therefore the graph T +e also contains a cycle, but this is impossible as e connects two distinct components of T . This provides the desired

contradiction. 2

Complexity of the algorithm. We have just shown that Algorithm 5.3.2 always computes what it is supposed to compute, i.e. a spanning tree of the input graph. But if we really needed to ﬁnd spanning trees for some large graphs, should we choose this algorithm and spend our time programming it, or our money by buying some existing code?

To answer such a question is no simple matter, and algorithms are compared according to diﬀerent, and often contradictory, criteria. For instance, it is important to consider the clarity and simplicity of the algorithm (a complicated or obscure algorithm easily leads to program-ming errors), the robustness (how do rounding errors or small changes in the input data inﬂuence the correctness of the output?), memory requirements, and so on. Perhaps the most common measure of com-plexity of an algorithm is its time comcom-plexity, which means the number of elementary operations (such as additions, multiplications, compar-isons of two numbers, etc.) the algorithm needs for solving the input problem. Most often the worst-case complexity is considered, i.e. the number of operations needed to solve the worst possible problem, one expressly chosen to make the algorithm slow, for a given size of input.

For computing a spanning tree, the input size can be measured as the number of vertices plus the number of edges of the input graph. Instead of “worst-case time complexity” we will speak brieﬂy of “complexity”, since we do not discuss other types of complexity.

The complexity of an algorithm can seldom be determined precisely.

In order that we could even think of doing it, we would have to deter-mine exactly what the allowed primitive operations are (so, in principle, we would restrict ourselves to a speciﬁc computer), and also we would

have to describe the algorithm in the smallest details including various routine steps; that is, essentially look at a concrete program. Even if we did both these things, determining the precise complexity is quite laborious even for very simple algorithms. For these reasons, the com-plexity of algorithms is only analyzed asymptotically in most cases. We could thus say that some algorithm has complexity O(n^3/2), another one O(n log n), and so on (here n is a parameter measuring the size of the input).

For a real assessment of algorithms, it is usually necessary to com-plement such a theoretical analysis by testing the algorithm for various input data on a particular computer. For example, if the asymptotic analysis yields complexity O(n²) for one algorithm and O(n log⁴n) for another then the second algorithm looks clearly better at ﬁrst sight because the function n log⁴n grows much more slowly than n². But if the exact complexity of the ﬁrst algorithm were, say, n²− 5n and of the second one 20n(log₂n)⁴, the superiority of the second algorithm will only show for n > 5· 10⁶, and such a superiority is quite illusory from a practical point of view.

Let us try to estimate the asymptotic complexity of Algorithm 5.3.2.

We have described the algorithm on a “high level”, however. This doesn’t refer to a prestigious social position but to the fact that we have used, for instance, a test of whether a given set of edges contains a cycle, which cannot be considered an elementary operation even with a very liberal approach. The complexity of the algorithm will thus dep-end on our ability to realize such a complex operation by elementary operations.

For our Algorithm 5.3.2, we may note that it is not necessary to store all the edge sets E_i, and that all of them can be represented by a single variable (say, a list of edges) which successively takes values E₀, E₁, . . ..

The only signiﬁcant question is how to test eﬃciently whether adding a new edge ei creates a cycle or not. Here is a crucial observation: a cycle arises if and only if the vertices of the edge ei belong to the same connected component of the graph (V, Ei−1). Hence we need to solve the following problem:

5.3.4 Problem (UNION–FIND problem). Let V = {1, 2, . . . , n}

be a set of vertices. Initially, the set V is partitioned into 1-element equivalence classes; that is, no distinct vertices are considered equiva-lent. Design an algorithm which maintains an equivalence relation on V (in other words, a partition of V into classes) in a suitable data structure, in such a way that the following two types of operations can be executed eﬃciently:

(i) (UNION) Make two given nonequivalent vertices i, j ∈ V equiva-lent, i.e. replace the two classes containing them by their union.

5.3 Spanning trees of a graph 169 (ii) (Equivalence testing—FIND) Given two vertices i, j ∈ V , decide

whether they are currently equivalent.

A new request for an operation is input to the algorithm only after it has executed the previous operation.

Our Algorithm 5.3.2 for ﬁnding a spanning tree needs at most n− 1 operations UNION and at most m operations FIND.

We describe a quite simple solution of Problem 5.3.4. In the beg-inning, we assign distinct marks to the vertices of V , say the marks 1, 2, . . . , n. During the computation, the marks will always be assigned so that two vertices are equivalent if and only if they have the same mark. Thus, equivalence testing (FIND) is a trivial comparison of marks.

For replacing two classes by their union, we have to change the marks for the elements of one of the classes. So, if the elements of each class are also stored in a list, the time needed for the mark-changing operation is proportional to the size of the class whose marks are changed.

For a very rough estimate of the running time, we can say that no class has more than n elements, so a single UNION operations never needs more than O(n) time. For n− 1 UNION operations and m FIND operations we thus get the bound O(n²+ m). One inconspicuous imp-rovement is to maintain also the size of each class and to change marks always for the smaller class. For such an algorithm, one can show a much better total bound: O(n log n+m) (Exercise 1). The best known solution of Problem 5.3.4, due to Tarjan, needs time at most O(nα(n) + m) for m FIND and n− 1 UNION operations (see e.g. Aho, Hopcroft, and Ullman [11]), where α(n) is a certain function of n. We do not give the deﬁnition of α(n) here; we only remark that α(n) does grow to inﬁnity with n→ ∞ but extremely slowly, much more slowly than functions like log log n, log log log n, etc. For practical purposes, the solution described above (with re-marking the smaller class) may be fully satisfactory.

Let us present one more algorithm for spanning trees, perhaps even a simpler one.

5.3.5 Algorithm (Growing a spanning tree). Let a given graph G = (V, E) have n vertices and m edges. We will successively con-struct sets V0, V1, V2, . . .⊆ V of vertices and sets E0, E1, E2, . . .⊆ E of edges. We let E0 =∅ and V0 ={v}, where v is an arbitrary vertex.

Having already constructed V_i₋₁ and E_i₋₁, we ﬁnd an edge e_i = {xi, yi} ∈ E(G) such that xi ∈ Vi−1 and yi ∈ V \ Vi−1, and we set V_i = V_i₋₁ ∪ {yi}, Ei = E_i₋₁ ∪ {ei}. If no such edge exists, the algorithm ﬁnishes and outputs the graph constructed so far, T = (Vt, Et).

5.3.6 Proposition (Correctness of Algorithm 5.3.5). If the algorithm ﬁnishes with a graph T with n vertices, then T is a spanning

tree of G. Otherwise G is a disconnected graph and T is a spanning tree of the component of G containing the initial vertex v.

Proof. The graph T is a tree because it is connected and has the right number of edges and vertices. If T has n vertices, it is a spanning tree, so let us assume that T has ¯n < n vertices. It remains to show that V (T ) is the vertex set of a component of G.

Let us suppose the contrary: let there be an x ∈ V (T ) and a y ∈ V (T ) connected by a path in the graph G. As in the proof of Proposition 5.3.3, we ﬁnd an edge e ={xj, yj} ∈ E(G) on this path such that x_j ∈ V (T ) and yj ∈ V \ V (T ). The algorithm could thus have added the edge e and the vertex y_j to the tree, and should not have ﬁnished with the tree T . This contradiction concludes the

proof. 2

Remark. The details of the algorithm just considered can be designed in such a way that the running time is O(n + m) (see Exercise 2).

Exercises

1. Prove that if Problem 5.3.4 is solved by the described method (always changing the marks for the smaller class), then the total complexity of n− 1 UNION operations is at most O(n log n).

2. ^∗,CS Design the details of Algorithm 5.3.5 is such a way that the run-ning time is O(n + m) in the worst case. (This may require some knowledge of simple list-like data structures.)

3. From Exercise 4.4.7, we recall that a Hamiltonian cycle in a graph G is a cycle containing all vertices of G. For a graph G and a natural number k≥ 1, deﬁne the graph G^(k)as the graph with vertex set V (G) and two (distinct) vertices connected by an edge if and only if their distance in G is at most k.

(a) ^∗Prove that for each tree T , the graph T⁽³⁾ has a Hamiltonian cycle.

(b) Using (a), conclude that G⁽³⁾ has a Hamiltonian cycle for any connected graph G.

(c) Find a connected graph G such that G⁽²⁾has no Hamiltonian cycle.

In document Invitation to Discrete Mathematics (Page 184-188)