• No results found

3.7 Conclusion

4.5.2 Candidate Keyword Selection

Recall that the best candidate keyword set W0 that provides the maximum cardinality ofBphas to be determined for p.l = ` (Line 8.19) in Algorithm 8, where ` is the location at the top of the priority queue at that iteration. As this is an NP-hard problem, an approximation algorithm is first developed. An exact method that uses several pruning strategies is also presented, which can serve as a naive baseline.

4.5.2.1 Approximate Algorithm

The candidate keyword selection problem was shown to be NP-hard in Lemma 2 using a re- duction from the Maximum Coverage (MC) problem. For the MC problem, a greedy algorithm exists, which is a (1 − 1/e) ' 0.632 approximation algorithm. In the MC problem, the input is a collection of sets S = {S1, S2, . . . , Sm} and a number n. The greedy algorithm chooses a

set in each step that contains the largest number of uncovered elements until exactly n sets are selected. This greedy algorithm was shown to be the best-possible polynomial time approxi- mation algorithm for the MC problem by Feige [38]. Inspired by this algorithm, we propose an approximate algorithm to select the candidate keywords in our algorithm when p.l = ` (Line 8.19 of Algorithm 8). However, some preprocessing must be done before applying the greedy algorithm, and is discussed next.

Preprocessing. For each w ∈ W , we generate a list LWw of the users such that these users can be inBp based on an upper bound estimation, where p.d = W0 and W0∩ w 6= ∅. As the set

Grp-topk Approach 101

B↑

`is already generated based on this upper bound, only the users inB↑`need to be considered

for this step. Let W↑w,u be a set of the ω highest weighed keywords from W ∩ u.d such that

W↑w,u∩ w 6= ∅. When p.d = W↑w,u and p.l = `, a user u can be in Bp if CS(p, u) ≥Rk(u).

Such users are included in the corresponding list, LWw for each w ∈ W .

Approximating the best candidate keyword set. Recall that in the MC problem, the objective is to find a subset S0 ⊆ S such that |S0| ≤ n and the number of covered elements

by S0, | ∪Si∈S0Si| is maximized, given a collection of sets S = S1, S2, . . . , Sm and a number n.

In our case, the collection of the sets are the collection of LWw for each w and the number n is ω. The greedy approach of MC is applied in our problem to find the best set of candidate keywords W0 of size ω such that | ∪w∈W0 LWw| is maximized. This set W0 is returned as the

best candidate keyword set for the location `.

4.5.2.2 Exact Algorithm

The number of candidates can be small in some applications. Moreover, the search space can be pruned using several strategies when selecting the candidate keyword set. This motivates us to develop an exact algorithm for selecting the best keyword set W0 in a MaxST query. The pseudocode is presented in Algorithm 9 and the pruning techniques are now explained.

• Pruning users: According to the definition of CS↑(`, u), only the users in B↑` can have

p as a top-k object when p.l = `. So only the users in B↑` must be considered.

• Pruning candidate keywords: Let the union of the text description of the users in B↑

` be WU (Line 9.3). Only the candidate keywords that are contained in at least one

of those users, W ∩ WU, are necessary.

• Let M be the set of the combinations of ω number of keywords from W ∩ WU. For a keyword combination m ∈ M , only those users where m ∩ u.d 6= ∅ are processed.

• Early termination: If |W ∩ WU| ≤ ω, this is the only possible candidate keyword set. So the process terminates and W ∩ WU is returned as the best candidate keyword set for ` as shown in Lines 9.6-9.7.

• If the lower bound relevance, CS↓(`, u) ≥Rk(u), then u is included inBp0, where p0.l = `

ALGORITHM 9: Exact method to select candidate keyword

9.1 Input: A set of users U , a candidate location ` ∈ L with the maximumB↑` in the

current iteration, a set of candidate keywords W , the number of keywords to select ω, and the number of top relevant objects k.

9.2 Output: The optimal set of ω keywords from W for the location `. 9.3 WU ← S u∈B↑ ` (u.d) 9.4 W0 ← ∅; 9.5 best ← 0 9.6 if |W ∩ WU| ≤ ω then 9.7 W0 ← (W ∩ WU) 9.8 else

9.9 M ← combinations of ω number of keywords from W ∩ WU. 9.10 p0.l = ` 9.11 for each m ∈ M do 9.12 p0.d = m 9.13 for each u ∈B↑` do 9.14 if CS↓(`, u) ≥Rk(u) then 9.15 Bp0 ← u

9.16 else if m ∩ u.d 6= ∅ then 9.17 if CS(p0, u) ≥Rk(u) then 9.18 Bp0 ← u 9.19 if |Bp0| > best then 9.20 W0 ← m 9.21 best ← |Bp0| 9.22 return W0

- 9.15). If the cardinality of Bp0 is greater than that of the current best keyword combi-

nation, the current best is updated (Lines 9.19 - 9.21).