• No results found

Exploiting > 2/3 -majorities

4.3 Data reduction rules and partial kernelization

4.3.3 Exploiting > 2/3 -majorities

As shown in the previous subsection, for s < 3/4 we cannot provide a linear problem kernel with respect to the “number of dirty pairs” by adapting the 3/4-Majority Rule.

The following three lemmas establish the basis for an alternative polynomial-time data reduction rule to obtain fixed-parameter tractability with respect to the “number of dirty pairs”. Note that the following results will not improve the fixed-parameter tractability results with respect to the “average KT-distance”. However, the “number of dirty pairs” according to the≥s-majority for values of s < 3/4 provides a “stronger”

parameterization than the “average KT-distance” in the sense that it might allow for smaller parameter values for the same instance. To this end, recall that the number of dirty pairs according to the≥3/4-majority can be much higher than the the number of dirty pairs according to the >2/3-majority.

We state the result for the >2/3-majority. Clearly, it holds for any ≥s-majority with s > 2/3. The basic idea is to consider an order that is induced by the >2/3 -majorities of the nondirty pairs and then to show that a dirty candidate can only

“influence” the positions of nondirty candidates that are not “too far away” from it in this order. Then, it is safe to remove nondirty candidates that cannot be influenced by any dirty candidate. In the following, let D denote the set of dirty candidates and nd

denote the number of dirty pairs according to the >2/3-majority in an election.

Lemma 4.3. For an election containing nd dirty pairs, in every Kemeny consensus at most nd nondirty pairs are not ordered according to their >2/3-majorities.

Proof. For an election (V, C) with nd dirty pairs, let l be a preference list with P :=

{{c, c} | c > c in l and c>2/3c} and |P | > nd. We show that l cannot be optimal.

Let l2/3 denote a preference list with c > c for all pairs with c >2/3 c and the remaining dirty pairs are ordered arbitrarily. First, we show that such an order exists. Due to Proposition 4.1, all nondirty candidates can be ordered according to the >2/3-majority order. Analogously, one can show that every dirty candidate can be ordered according to the 2/3-majority with respect to all nondirty candidates and that two dirty candidates that form a nondirty pair do not violate transitivity if ordered according to the 2/3-majority of this pair. Since the remaining dirty pairs can be ordered arbitrarily, they can be ordered without violating transitivity as well.

We show that score(l) > score(l2/3). Let CP denote the set of all candidate pairs of C, that is, CP := {{c, c} : c, c ∈ C, c 6= c}, and DP denote the set of all dirty pairs in (V, C). Then, score(l) and score(l2/3) can be decomposed into partial scores depending on candidate pairs of P , DP, and CP\(DP∪ P ), that is,

score(l) = sl(P ) + sl(DP) + sl(CP\(DP ∪ P )).

4.3 Data reduction rules and partial kernelization 55 Now, consider score(l)− score(l2/3). Since all pairs p∈ CP\(DP∪ P ) are ordered according to the >2/3-majority in l and in l2/3, the partial scores for them are equal.

The partial score for every nondirty pair is more than 2n/3 if it is not ordered according to the >2/3-majority, and less than n/3 otherwise. Together with the fact that for a dirty pair the difference of the partial scores of the two possible orders is at most n/3, one obtains

sl(DP)− sl2/3(DP)≥ −|DP| · n/3, and

sl(P )− sl2/3(P ) >|P | · n/3.

Since |P | > |DP|, it follows that score(l) − score(l2/3) > n/3 > 0. Thus, l cannot be optimal.

In the following, we show that the bound on the number of “incorrectly” ordered nondirty pairs from Lemma 4.3 can be used to fix the relative order of two candidates forming a nondirty pair. For this, it will be useful to have a concept of distance of can-didates with respect to the order induced by the >2/3-majority. For an election (V, C) and a nondirty pair{c, c}, define

dist(c, c) :=

 |{b ∈ C : b is nondirty and c >2/3b >2/3c}| if c >2/3c

|{b ∈ C : b is nondirty and c>2/3b >2/3c}| if c >2/3c.

Lemma 4.4. Let (V, C) be an election and let{c, c} be a nondirty pair. If dist(c, c)≥ nd, then in every Kemeny consensus c > c iff c >2/3c.

Proof. Let l be a preference list such that there is a nondirty pair {c, c} with c > c in l, c >2/3 c, and dist(c, c)≥ nd. We show that l cannot be a Kemeny consensus.

Since dist(c, c)≥ nd, there is a set E of at least nd nondirty candidates with c >2/3

e >2/3c for e∈ E. Since c > cin l, the candidates from E cannot be ordered according to the >2/3-majority with respect to c or c in l. Hence, there are at least nd pairs formed by the candidates from E and c or cin l, which, together with the pair{c, c}, give more than nd nondirty pairs that are not ordered according to the >2/3-majority.

This contradicts Lemma 4.3 and, thus, l cannot be optimal.

Finally, the next lemma enables us to fix the position in a Kemeny consensus for a nondirty candidate that has a sufficiently large distance to all dirty candidates.

Lemma 4.5. If for a nondirty candidate c it holds that dist(c, cd) > 2nd for all dirty candidates cd∈ D, then c is ordered according to the >2/3-majority with respect to all candidates from C in every Kemeny consensus.

Proof. Assume that there is a nondirty candidate c with dist(c, cd) > 2ndfor all cd∈ D and that there is a preference list l with e > c for a candidate e with c >2/3e. Then, we show that l cannot be optimal.

Since dist(c, cd) > 2nd for all dirty candidates cd ∈ D, it follows from Lemma 4.4 that all dirty candidates must be ordered according to the >2/3-majority with re-spect to c. Thus, e must be a nondirty candidate. Due to Lemma 4.4, dist(e, c) <

nd. Since for all cd ∈ D one has dist(c, cd) > 2nd, it follows from dist(e, c) < nd

that dist(e, cd) > nd for all cd ∈ D as well. Thus, in a Kemeny consensus, e must

be ordered according to the >2/3-majority with respect to all dirty candidates due to Lemma 4.4. For a candidate cd∈ D one has c >2/3cdiff e >2/3cdsince for all cd∈ D one has dist(c, cd) > 2ndand dist(e, c) < nd. Hence, there is no dirty candidate cd∈ D with e > cd> c in l, that is, all candidates fi, i = 1, . . . , s, with e > fi>· · · > fs> c in l must be nondirty. Then, analogously to the proof of Proposition 4.1, one can show that ordering c, e, f1, . . . , fs according to the >2/3-majority gives a consensus with score less than the score of l. Thus, l cannot be optimal.

The correctness of the following data reduction rule follows directly from Lemma 4.5.

It is not hard to verify that it can be carried out in O(n· m2) time.

Rule 4.2. For an election with nd dirty pairs, let c be a nondirty candidate with dist(c, cd) > 2nd for all cd ∈ D. Let Cl :={c ∈ C : c >2/3 c} and Cr :={c ∈ C : c >2/3 c}. Delete c and reorder every vote such that Cl > Cr and the order of the candidates within Cl and Cr remains unchanged.

In the following, we show that after exhaustively applying Rule 4.2, the number of nondirty candidates is bounded by a function quadratic in the “number of dirty pairs”.

Theorem 4.3. For s > 2/3, Kemeny score admits a partial kernel with at most 2nd+ 8n2d candidates where nd denotes the number of dirty pairs.

Proof. An instance with nd dirty pairs has at most 2nd dirty candidates. For every nondirty candidate c not deleted after exhaustively applying Rule 4.2, there must be a dirty candidate cdwith dist(c, cd)≤ 2nd. Thus, for every dirty candidate there can be at most 4nd nondirty candidates that are not deleted. It follows that, in total, there can be at most 2nd· 4nd nondirty candidates left. The theorem follows.

4.4 Conclusion

We conclude this chapter with an overview of the provided results, a short a discussion of the applicability of the introduced framework to Kemeny Score with Ties, and finally state some open questions deriving directly from our results.

Overview of the results. Our results are summarized in Table 4.2. We identified a concept of dirtiness leading to some observations of structural properties of a Kemeny consensus (see Table 4.1, Section 4.2). These observations provided the basis for the identification of a polynomial-time solvable special case and data reduction rules resulting in partial kernelization results. The new concept of partial kernelization may significantly ease the task to develop provably effective data reduction rules for multidimensional problems where it seems difficult to provide “full” kernel results.

For Kemeny Score, this would include the development of data reduction rules that provably decrease the number of votes.

Finally, note that due to the dependencies of da and d as discussed in Section 3.8 our results directly transfer to d.

4.4 Conclusion 57

Table 4.2: Partial kernelization results for Kemeny Score. The term dirty refers to the

s-majority according to the respective values of s. The number of dirty pairs is nsdand da

denotes the average KT-distance. An instance is nondirty if it does not contain any dirty pair.

value of s results 1/2≤ s ≤ 2/3

-2/3 < s < 3/4 polynomial-time solvability for nondirty instances (Proposition 4.1) quadratic partial kernel wrt. nsd (Theorem 4.3)

3/4≤ s ≤ 1 polynomial-time solvability for nondirty instances (Proposition 4.1) linear partial kernel wrt. da and wrt. nsd (Theorem 4.1)

Kemeny Score with Ties. First parameterized complexity results for Kemeny Score with Ties with respect to several parameterizations have been discussed in Section 3.7. The question of fixed-parameter tractability of Kemeny Score with Ties with respect to the “average KT-distance” has been left open. This question can be answered positively since the new method for partial kernelization introduced in Section 4.1 also applies to Kemeny Score with Ties [25]. To this end, we extend the definition of dirtiness as follows. A pair of candidates a, b is dirty if neither a >sb nor a =sb nor a <sb according to a≥s-majority where one has a =sb if a = b in at least sn votes. Using analogous but more laborious proofs as in this chapter, one can show the following results [25].

• A Kemeny Score with Ties instance without dirty pairs is solvable in poly-nomial time.

• Kemeny Score with Ties admits a quadratic partial kernel with respect to the “average KT-distance” as well with respect to the “number of dirty pairs”.

Open Problems. The results presented in this chapter lead to several concrete questions.

• Despite the negative results from Theorem 4.2, there is still room for improving the >2/3-majority based results. In particular, is there a linear partial kernel with respect to the≥s-majority for any s < 3/4 ? A natural step in answering this question seems to investigate whether for two nondirty candidates a, b, there must be a Kemeny consensus with a > b if a≥sb.

• A challenging task of theoretical interest concerns the development of classical problem kernels also bounding the number of votes for Kemeny Score with and without ties.

• We introduced the new structural parameters “number of dirty candidates” and

“number of dirty pairs”. The investigation of further fixed-parameter algorithms with respect to these parameterizations is clearly of interest. This is especially motivated by the observation that there are instances in which the “dirtiness”

parameters assume small values whereas the parameters “number of candidates”,

“average/maximum KT-distance” and “average distance from the Kemeny con-sensus” can be arbitrarily large. For example, consider the election consisting of the vote

a1> a2>· · · > am and three identical votes defined as follows

am> am−1>· · · > a1.

There is no dirty pair according to the≥3/4-majority but the values of the other three parameters grow at least linearly in m.

• For Kemeny Score with Ties there are only studies according to the >2/3 -majority resulting in a quadratic partial kernel with respect to the average KT-distance [25]. It seems very promising that a linear partial kernel can be obtained analogously to the case without ties by using the ≥3/4-majority.

Finally, we stress that partial kernelization might be of interest for many NP-hard problems defined on elections. For example, Conitzer [57] uses a different notion of similarity to efficiently compute the closely related Slater rankings. Using a concept of similar candidates, he identifies efficiently solvable special cases, yielding a powerful preprocessing technique for computing Slater rankings. It is interesting to investigate if the concept of (partial) kernelization might be used to provide some performance guarantee of the corresponding reduction rules.

The following chapter provides experimental results showing the usefulness of data reduction for the computation of a Kemeny consensus.

Chapter 5

Experimental results for Kemeny

We investigated the practical value of fixed-parameter algorithms for computing opti-mal Kemeny rankings. Our main focus was on data reduction rules leading to partial kernelization as described in Chapter 4. To this end, we implemented and extended the 3/4-Majority Rule introduced in Subsection 4.3.1. In addition, we implemented the search tree algorithm from Section 3.4, the dynamic programming algorithm show-ing fixed-parameter tractability with respect to the number of candidates (Section 3.3) as well as an ILP-based algorithm used in previous experimental work [59, 185]. We showed that the data reduction rules allow for the computation of Kemeny rankings of instances that cannot be solved by the other implemented algorithms without data reduction.

Combining our data reduction with the other implemented algorithms, we provide encouraging results in experiments with real-world data arising in web search and sport competitions. We often achieve provably optimal rankings with small running times—

for example, a few seconds or even milliseconds for instances with about 100 –150 candidates. An essential property of our data reduction algorithm is that it can break instances into several subinstances to be handled independently, that is, the relative order between the candidates in two different subinstances in a Kemeny ranking is already determined. This also means that for many of the instances which could not be completely solved, we were still able to compute “partial rankings” of the top and bottom ranked candidates. For example, for a large instance based on rankings of about 1300 mathematicians according to their impact in the world wide web, we could not compute a complete Kemeny ranking but still provide a “partial” ranking of the best 31 mathematicians.

In our experiments, we are not only interested in the decision problem Kemeny Score but also want to compute a corresponding Kemeny ranking. Hence, we deal with the following NP-hard optimization problem.

Rank aggregation Input: An election (V, C).

Task: Find a Kemeny ranking of (V, C).

Our algorithms for Rank aggregation are implemented in C++ and the source

Input: An election (V, C).

Output: A minimal subset C ⊆ C with c1/2 c for every c ∈ C and every c∈ C \ C.

For every candidate c∈ C xxxStart with Mc:={c}.

xxxxxxRepeat until Mc remains unchanged

xxxxxxxxxIf there is a candidate c∈ Mcand a candidate c′′∈ C\Mcwith c′′>1/2c, xxxxxxxxxthen add c′′ to Mc.

Return: an Mc such that|Mc| ≤ |Mc| for every c ∈ C \ {c}.

Figure 5.1: Strategy to find winning subsets.

code and test data are available under the GPL Version 3 license1. In the following two sections, we first provide more details on the implemented algorithms and then describe our experimental results.

5.1 Implemented algorithms

In this section, we describe the algorithms realized in our software package. We dis-tinguish between data reduction rules and other “solution algorithms”.