• No results found

Before concluding this chapter, we point out some of the approaches from literature that generalize cliques. The discussion in the previous section comparing the clique relaxations from SNA has already brought to our attention the structural properties that need to be studied while evaluating a clique relaxation. It must now be apparent to the reader that any model relaxing a clique, while relaxing a particular aspect must also guarantee several important properties. We will review some of the existing approaches in this context.

Consider a simple, undirected, edge-weighted graph G = (V, E) with a weight function w : E → R+. Define the weight of a subset of edges E0 as w(E0) = P

e∈E0

w(e). Given a positive integer k < n, the heaviest k-subgraph problem (HkS) is to find a subgraph G0 = (V0, E0) such that |V0| = k and w(E0) is a maximum. When the

edge weights are unity, it is referred to as the heaviest unweighted k-subgraph problem (HUkS). In literature, HkS is also known as densest k-subgraph problem while HUkS is called the maximum edge subgraph problem. A different but closely related problem is the densest subgraph problem (DS) which is to choose a subgraph G0 = (V0, E0) such

that density, defined as w(E|V00|) is maximized. This problem can be solved in polynomial

time using maximum flow algorithms [96]. On the contrary, HkS and HUkS are clearly NP-complete by reduction from Clique [77]. Hence, approximation algorithms for the problems have been developed [131, 16, 90]. But for our purposes of relaxing a clique, the HUkS and DS models have obvious drawbacks. In the DS model, there

is no guarantee on the size of resulting V0 as our objective is to maximize density

as defined above. In the HkS/HUkS models, the resulting G0 can be disconnected as

well as include vertices of low degree. Furthermore, the size of the resulting set is fixed at k. However, these models have been used effectively in facility location [166]. The concept of a quasi-clique is used in defining a clique relaxation used in data- mining massive call graphs [5, 7] (see Section II.1 for definitions). A graph G = (V, E) is said to be γ-dense if |E| ≥ γ¡|V |2¢. A γ-clique (quasi-clique) S is a subset of vertices S such that G[S] is connected and γ-dense. A γ-clique represents a clique when γ = 1 and is a relaxation when 0 ≤ γ < 1. The maximum γ-clique problem can then be defined as the problem of finding a largest γ-clique. In this case, although we find a largest connected subgraph with density no smaller than a fixed number γ, we are still not guaranteed connectivity or degree. This is because unless γ is sufficiently high, a large clique sharing one vertex with a path could also meet the constraints, but it can be disconnected easily and the vertices on the path have very low degree. However high γ values could result in smaller γ-cliques. Furthermore as γ is reduced, a sudden jump in the size of the γ-clique in power law graphs would be observed when the giant component becomes “dense enough”. While using this approach, it is recommended that an appropriate choice of γ be made by proper tuning.

Remark 4. Note that this “jump” would also be observed while using k-cliques and k- clubs due to the small world phenomenon observed in power law graphs. The increase in k-plex number with k is more controlled in power law graphs until the condition for diameter-2 is violated by the maximum k-plex found. In this context the following relationship should be taken into account,

ω(G) ≤ ωk(G) ≤ ¯ωk(G) ≤ ˜ωk(G)

¯

ωk(G) ≤ ˜ωk(G) holds always by definition. Whenever ωk(G) > 2k − 2, the maximum

k-plex is also a 2-club as stated in Theorem 7.

More recently, a related model was proposed for relaxing independent sets in [175] where an upper-bound is placed on the number edges in the induced subgraph. A generalized vertex packing GVP-k is a subset of vertices I such that there are at most k edges in the induced subgraph G[I]. When k = 0, it represents an independent set and is a relaxation for k ≥ 1. This approach, used to model problems in air-traffic control and national airspace planning is studied using polyhedral methods in [175]. This model is close to the co-k-plex model proposed in III.2 for relaxing independent sets. Note that every GVP-k is a co-k +1-plex but not vice versa.

Clustering problem on gene co-expression networks is studied in [66]. The authors here, also point out some of the issues that were raised in Section II.1 regarding the restrictive and impractical nature of cliques when studying erroneous data. A model robust enough to deal with this situation is necessary and k-plex is ideal for such purposes. In fact, the authors of [66] propose the use of paracliques that are close to the 2-plex model. Starting with a set P initialized to some maximum clique C, the authors find new vertices that have at least g neighbors in P . These new vertices are then added to P and the process is repeated until P can no longer be enlarged. The authors report successful results with g = |C| − 1 where g is called the glom factor. However, the final P that results need not be a 2-plex since each vertex is allowed at most one non-neighbor in a 2-plex. For the paraclique P in every iteration, it is only ensured that new vertices have at least |C| − 1 neighbors in the current set and hence could have more than one non-neighbor in the final P that is produced by the algorithm. However, the notion of allowing one non-neighbor is clearly in the same spirit as a 2-plex approach.

subgraph with minimum degree g. Although the approach of finding a maximum g-core is only of limited use (as we will see in Chapter IX), the approach here is effective because g is chosen to be ω(G) − 1 and we start building around a maximum clique. Maximum k-plex for k = 2, 3 as well clustering using 2 or 3-plexes can provide an interesting alternative to paracliques.

It should be noted that our discussion pointing out drawbacks in the existing models for relaxing cliques including k-cliques and k-clubs is not meant to undermine their usefulness. In fact, all these approaches have been applied successfully on real- life data. In applications where reachability is the only consideration, k-cliques and k-clubs models are obviously the most appropriate. In this section, we have only tried to emphasize the fact that not all clique relaxations are “created equal” and one must carefully evaluate the guarantees provided in the context of the application under consideration. The best model for cohesive subgroups is that model which mines the most useful information out of the given data. Often evaluating what is “most useful information” depends on the application and there is usually never a common consensus even among area experts on this issue. Our recommendation is that whenever sound structural guarantees as provided by the clique definition are necessary, k-plex model should be considered as a meaningful alternative.

CHAPTER IV

COMPUTATIONAL COMPLEXITY

In Chapter II, we introduced the concepts of computational complexity and their usefulness in studying the tractability of CO problems. The optimization problems defined in Chapter III lead to several interesting questions regarding their tractability. Clearly for arbitrary k, the problems are at least as hard as the maximum clique problem since any algorithm that can solve the optimization problems for arbitrary k can solve the maximum clique problem. But allowing k to be arbitrary does not convey anything specific about the complexity of these problems. In other words: What is the complexity of solving maximum 2-clique, 2-club and 2-plex problems? This is not answered by our observation that the problems are hard to solve when k is “arbitrary”. Furthermore, when k = n or n − 1 all three optimization problems are solved easily. For example, when k = n or n − 1 , the maximum k-clique and k-club problems reduce to finding the largest connected component in the given graph, which is solvable in polynomial time. On the other hand the maximum k-plex problem is trivially solved when k = n since the graph is itself a k-plex. Similarly when k = n − 1, the trivial answer to the maximum k-plex problem is the graph itself, minus an isolated vertex if it exists. There seems to be a trend in the tractability of the problem as k goes from 1 to n, becoming easier towards the end.

We need to focus on two aspects with regards to complexity, in order to obtain meaningful results. Firstly, the tractability of the problems when k is a fixed positive integer and secondly, how the transition in complexity occurs when k is neither arbi-

Parts of this chapter are reprinted with permission from Balasundaram, B.,

Butenko, S., Trukhanov, S.: Novel approaches for analyzing biological networks. Journal of Combinatorial Optimization 10(1), 23–39 (2005) c° Springer.

trary nor fixed, but viewed in relation to a meaningful graph parameter. In the case of k-cliques and k-clubs, we consider k in relation to the diameter of the instance and in the case of k-plex, we consider k in relation to minimum degree in the graph.

This chapter establishes NP-completeness results that show the problems are hard for every fixed positive integer k. Furthermore, transition results on restricted graph classes are also established. Recall that in order to prove NP-completeness of a problem P, we need to provide a polynomial-time reduction from a known NP- complete problem to P and show that P is in NP. The reductions for our problems are from the decision version of the maximum clique problem stated as follows.

Clique : Given a graph G = (V, E) and a positive integer c, does there exist a clique of size ≥ c in G?