In this section we present an elegant algorithm for mining disjunctive rules from uncertain databases [58]. This algorithm can be specialized to generate disjunctive rules from data without uncertainties as well. Before presenting details of our algorithm we define precisely what a disjunctive rule is.
Any rule of the form π β π1 ππ π2 ππ β― ππ ππβ1, where π, π1, π2, β― , and ππβ1 are pairwise
disjoint itemsets (k being any integer equal to 2 or more) is what we refer to as a disjunctive rule. Just for simplicity of exposition, in the rest of this paper we focus on disjunctive rules where each of these k itemsets consists of a single item. We refer to any such rule as a k- disjunctive rule. Our algorithm is generic and can be readily extended to the general case.
Consider a k-disjunctive rule: π β π1 ππ π2 ππ β― ππ ππβ1. This rule suggests that if the
item π occurs in a transaction then one or more of the items π1, π2, β― , and ππβ1 are also
likely to occur (with enough support and confidence). Clearly, if the rule π β π has enough support, then the rule π β π ππ π also will have enough support even if the rule π β π has zero support. If the rule π β π does not have at least some minimum support, then the rule π β π ππ π may not be interesting even if the rule has sufficient support. Keeping this mind, we require that, for the rule π β π1 ππ π2 ππ β― ππ ππβ1 to be interesting, each of the rules
π β ππ, for 1 β€ π β€ (π β 1), have some minimum support. We introduce two support parameters ππππ π’π1 and ππππ π’π2. The rule π β π1 ππ π2 ππ β― ππ ππβ1 must have a
minimum support of ππππ π’π2 and each rule π β ππ must have a minimum support of 1
46
identify pairs of items that have enough expected support. In the second phase we utilize these pairs to generate k-disjunctive rules.
A. The first phase:Let πππ1ππ π’π be the minimum expected support that is enforced
between any a pair of associated items and πππ2ππ π’π be the minimum expected support
that is required between an item π, and the set of items {π1, π2, β― , ππβ1}, for the rule π β
π1 ππ π2 ππ β― ππ ππβ1 to be interesting. Then,
πππ1ππ π’π = π Γ ππππ π’π1 πππ2ππ π’π = π Γ ππππ π’π2
We scan through the database to generate all possible pairs of items, with an expected support of β₯ πππ1ππ π’π. Specifically, for each pair of items (a, b), we calculate its expected support as:
ππ π’π({π, π}) = β ππ(π) Γ ππ(π)
π
π=1
Where ππ(π₯) is the probability that the transaction ti has item π₯ (for any item π₯). A pair (π, π) is frequent if and only if: ππ π’π(π, π) β₯ πππ1ππ π’π. Let πΉ2 stand for the set of all frequent
pairs.
B. The second phase: We utilize πΉ2 to generate all the k-disjunctive rules as
follows. Let π be any item. Let {π1, π2, β― , ππβ1} be any (π β 1) βitemset (from πΌ β {π}) such that each pair (π, πj) is frequent (πππ 1 β€ π β€ (π β 1)) and also βπβ1π=1ππ π’π(π, ππ) β₯
47
Example 4.1
:
Consider the uncertain transaction database ππ·π΅ shown in table 4.1.Let ππππ π’π1 = 0.01 , and ππππ π’π2 = 0.1. Let number of transactions π = 2, and π = 4. πππ1ππ π’π = π Γ ππππ π’π1 = 0.01 Γ 2 = 0.02, πππ2ππ π’π = π Γ ππππ π’π2 = 0.1 Γ 2 = 0.2
Table 4.1 uncertain transaction database TID Transactions
ππ a (0.1) b (0.2) c (0.5) d (0.9) ππ a (0.8) b (0.4) d (0.3) g (0.1)
A. First Phase: Identification of frequent pairs
ππ π’π(π, π) = βπ ππ(π) Γ ππ(π)
π=1 = (0.1 Γ 0.2) + (0.8 Γ 0.4) = 0.34 > πππ1ππ π’π.
ππ π’π(π, π) = (0.1 Γ 0.5) = 0.05 > πππ1ππ π’π.
ππ π’π(π, π) = (0.1 Γ 0.9) + (0.8 Γ 0.3) = 0.33 > πππ1ππ π’π. In a similar manner, we
realize that the following pairs are also frequent: (π, π), (π, π), (π, π), (π, π), (π, π), and (π, π).
B. Second Phase: Rules Generation
Consider the generation of rules in which π is the antecedent. We know that there are 3 items in the consequent. Thus there are 4 possibilities for the consequent, namely, {π, π, π}, {π, π, π}, {π, π, π}, and {π, π, π}. For each of these possibilities we check if there is enough support. Calculate the expected support for each of the above 4 possibilities.
ππ π’π (π, {π, π, π}) = ππ π’π(π, π) + ππ π’π(π, π) + ππ π’π(π, π) = 0.72 . Since ππ π’π (π, {π, π, π}) > πππ2ππ π’π, we output the rule: π β π ππ π ππ π.
ππ π’π (π, {π, π, π}) = ππ π’π(π, π) + ππ π’π(π, π) + ππ π’π(π, π) = 0.47 . Since ππ π’π (π, {π, π, π}) > πππ2ππ π’π, we output the rule: π β π ππ π ππ π.
ππ π’π (π, {π, π, π}) = ππ π’π(π, π) + ππ π’π(π, π) + ππ π’π(π, π) = 0.75 . Since ππ π’π (π, {π, π, π}) > πππ2ππ π’π, we output the rule: π β π ππ π ππ π.
48
ππ π’π (π, {π, π, π}) = ππ π’π(π, π) + ππ π’π(π, π) + ππ π’π(π, π) = 0.46 . Since ππ π’π (π, {π, π, π}) > πππ2ππ π’π, we output the rule: π β π ππ π ππ π.
In a similar manner we can generate all the disjunctive rules for which the antecedent is π, π, π, ππ π. We provide below a pseudo code for our algorithm.
4.2.1 DRMUD Algorithm
Our proposed algorithm as follows; π°ππππ:
- A transaction set π with |π| = π, - A set πΌ of items in the transactions, - ππππ π’π1, ππππ π’π2,
- Size π of Rules,
Output:
All the k-disjunctive rules of the form: (π β π1 ππ π2 ππ β― ππ ππβ1).
Algorithm:
Step 1: Pre-Processing
πππ1ππ π’π = π Γ ππππ π’π1 πππ2ππ π’π = π Γ ππππ π’π2
Step 2: /*Find the set πΉ2 of frequent pairs of items with a support of β₯
min1esup*/
for every i β I do
for every j β I and ( jβ i) do
49
if (ππ π’π(π, π) β₯ πππ1ππ π’π), store the frequent pair (i,j) in F2 together with esup(i, j)
end for end for
Step 3: Generate the k-disjunctive rules
for every a β I do
for every (k-1) subset S of I-{a} do
Let S= {π1, π2, β― , ππβ1};
Support=0; for 1 β€ jβ€ k-1 do
ππ’πππππ‘ += ππ π’π(π, ππ);
/* esup values are in F2 */
endFor if ππ’πππππ‘ β₯ πππ2ππ π’π Output π β π1 ππ π2 ππ β― ππ ππβ1 endIf endFor endFor
Note: In Step 3, while considering the set S= {π1, π2, β― , ππβ1}, if there is any π (1 β€ π β€ π β 1) such that ππ π’π(π, ππ) < πππ1ππ π’π, then the set π is not considered.
50