• No results found

Disjunctive Rules Miner from Uncertain Databases (DRMUD)

In this section we present an elegant algorithm for mining disjunctive rules from uncertain databases [58]. This algorithm can be specialized to generate disjunctive rules from data without uncertainties as well. Before presenting details of our algorithm we define precisely what a disjunctive rule is.

Any rule of the form 𝑋 β‡’ π‘Œ1 π‘œπ‘Ÿ π‘Œ2 π‘œπ‘Ÿ β‹― π‘œπ‘Ÿ π‘Œπ‘˜βˆ’1, where 𝑋, π‘Œ1, π‘Œ2, β‹― , and π‘Œπ‘˜βˆ’1 are pairwise

disjoint itemsets (k being any integer equal to 2 or more) is what we refer to as a disjunctive rule. Just for simplicity of exposition, in the rest of this paper we focus on disjunctive rules where each of these k itemsets consists of a single item. We refer to any such rule as a k- disjunctive rule. Our algorithm is generic and can be readily extended to the general case.

Consider a k-disjunctive rule: π‘Ž β‡’ 𝑏1 π‘œπ‘Ÿ 𝑏2 π‘œπ‘Ÿ β‹― π‘œπ‘Ÿ π‘π‘˜βˆ’1. This rule suggests that if the

item π‘Ž occurs in a transaction then one or more of the items 𝑏1, 𝑏2, β‹― , and π‘π‘˜βˆ’1 are also

likely to occur (with enough support and confidence). Clearly, if the rule π‘Ž β‡’ 𝑏 has enough support, then the rule π‘Ž β‡’ 𝑏 π‘œπ‘Ÿ 𝑐 also will have enough support even if the rule π‘Ž β‡’ 𝑐 has zero support. If the rule π‘Ž β‡’ 𝑐 does not have at least some minimum support, then the rule π‘Ž β‡’ 𝑏 π‘œπ‘Ÿ 𝑐 may not be interesting even if the rule has sufficient support. Keeping this mind, we require that, for the rule π‘Ž β‡’ 𝑏1 π‘œπ‘Ÿ 𝑏2 π‘œπ‘Ÿ β‹― π‘œπ‘Ÿ π‘π‘˜βˆ’1 to be interesting, each of the rules

π‘Ž β‡’ 𝑏𝑗, for 1 ≀ 𝑗 ≀ (π‘˜ βˆ’ 1), have some minimum support. We introduce two support parameters π‘šπ‘–π‘›π‘ π‘’π‘1 and π‘šπ‘–π‘›π‘ π‘’π‘2. The rule π‘Ž β‡’ 𝑏1 π‘œπ‘Ÿ 𝑏2 π‘œπ‘Ÿ β‹― π‘œπ‘Ÿ π‘π‘˜βˆ’1 must have a

minimum support of π‘šπ‘–π‘›π‘ π‘’π‘2 and each rule π‘Ž β‡’ 𝑏𝑗 must have a minimum support of 1

46

identify pairs of items that have enough expected support. In the second phase we utilize these pairs to generate k-disjunctive rules.

A. The first phase:Let π‘šπ‘–π‘›1𝑒𝑠𝑒𝑝 be the minimum expected support that is enforced

between any a pair of associated items and π‘šπ‘–π‘›2𝑒𝑠𝑒𝑝 be the minimum expected support

that is required between an item π‘Ž, and the set of items {𝑏1, 𝑏2, β‹― , π‘π‘˜βˆ’1}, for the rule π‘Ž β‡’

𝑏1 π‘œπ‘Ÿ 𝑏2 π‘œπ‘Ÿ β‹― π‘œπ‘Ÿ π‘π‘˜βˆ’1 to be interesting. Then,

π‘šπ‘–π‘›1𝑒𝑠𝑒𝑝 = 𝑁 Γ— π‘šπ‘–π‘›π‘ π‘’π‘1 π‘šπ‘–π‘›2𝑒𝑠𝑒𝑝 = 𝑁 Γ— π‘šπ‘–π‘›π‘ π‘’π‘2

We scan through the database to generate all possible pairs of items, with an expected support of β‰₯ π‘šπ‘–π‘›1𝑒𝑠𝑒𝑝. Specifically, for each pair of items (a, b), we calculate its expected support as:

𝑒𝑠𝑒𝑝({π‘Ž, 𝑏}) = βˆ‘ 𝑝𝑖(π‘Ž) Γ— 𝑝𝑖(𝑏)

𝑁

𝑖=1

Where 𝑝𝑖(π‘₯) is the probability that the transaction ti has item π‘₯ (for any item π‘₯). A pair (π‘Ž, 𝑏) is frequent if and only if: 𝑒𝑠𝑒𝑝(π‘Ž, 𝑏) β‰₯ π‘šπ‘–π‘›1𝑒𝑠𝑒𝑝. Let 𝐹2 stand for the set of all frequent

pairs.

B. The second phase: We utilize 𝐹2 to generate all the k-disjunctive rules as

follows. Let π‘Ž be any item. Let {𝑏1, 𝑏2, β‹― , π‘π‘˜βˆ’1} be any (π‘˜ βˆ’ 1) βˆ’itemset (from 𝐼 βˆ’ {π‘Ž}) such that each pair (π‘Ž, 𝑏j) is frequent (π‘“π‘œπ‘Ÿ 1 ≀ 𝑗 ≀ (π‘˜ βˆ’ 1)) and also βˆ‘π‘˜βˆ’1𝑗=1𝑒𝑠𝑒𝑝(π‘Ž, 𝑏𝑗) β‰₯

47

Example 4.1

:

Consider the uncertain transaction database π‘ˆπ·π΅ shown in table 4.1.

Let π‘šπ‘–π‘›π‘ π‘’π‘1 = 0.01 , and π‘šπ‘–π‘›π‘ π‘’π‘2 = 0.1. Let number of transactions 𝑁 = 2, and π‘˜ = 4. π‘šπ‘–π‘›1𝑒𝑠𝑒𝑝 = 𝑁 Γ— π‘šπ‘–π‘›π‘ π‘’π‘1 = 0.01 Γ— 2 = 0.02, π‘šπ‘–π‘›2𝑒𝑠𝑒𝑝 = 𝑁 Γ— π‘šπ‘–π‘›π‘ π‘’π‘2 = 0.1 Γ— 2 = 0.2

Table 4.1 uncertain transaction database TID Transactions

π’•πŸ a (0.1) b (0.2) c (0.5) d (0.9) π’•πŸ a (0.8) b (0.4) d (0.3) g (0.1)

A. First Phase: Identification of frequent pairs

𝑒𝑠𝑒𝑝(π‘Ž, 𝑏) = βˆ‘π‘ 𝑝𝑖(π‘Ž) Γ— 𝑝𝑖(𝑏)

𝑖=1 = (0.1 Γ— 0.2) + (0.8 Γ— 0.4) = 0.34 > π‘šπ‘–π‘›1𝑒𝑠𝑒𝑝.

𝑒𝑠𝑒𝑝(π‘Ž, 𝑐) = (0.1 Γ— 0.5) = 0.05 > π‘šπ‘–π‘›1𝑒𝑠𝑒𝑝.

𝑒𝑠𝑒𝑝(π‘Ž, 𝑑) = (0.1 Γ— 0.9) + (0.8 Γ— 0.3) = 0.33 > π‘šπ‘–π‘›1𝑒𝑠𝑒𝑝. In a similar manner, we

realize that the following pairs are also frequent: (π‘Ž, 𝑔), (𝑏, 𝑐), (𝑏, 𝑑), (𝑏, 𝑔), (𝑐, 𝑑), and (𝑑, 𝑔).

B. Second Phase: Rules Generation

Consider the generation of rules in which π‘Ž is the antecedent. We know that there are 3 items in the consequent. Thus there are 4 possibilities for the consequent, namely, {𝑏, 𝑐, 𝑑}, {𝑏, 𝑐, 𝑔}, {𝑏, 𝑑, 𝑔}, and {𝑐, 𝑑, 𝑔}. For each of these possibilities we check if there is enough support. Calculate the expected support for each of the above 4 possibilities.

𝑒𝑠𝑒𝑝 (π‘Ž, {𝑏, 𝑐, 𝑑}) = 𝑒𝑠𝑒𝑝(π‘Ž, 𝑏) + 𝑒𝑠𝑒𝑝(π‘Ž, 𝑐) + 𝑒𝑠𝑒𝑝(π‘Ž, 𝑑) = 0.72 . Since 𝑒𝑠𝑒𝑝 (π‘Ž, {𝑏, 𝑐, 𝑑}) > π‘šπ‘–π‘›2𝑒𝑠𝑒𝑝, we output the rule: π‘Ž β‡’ 𝑏 π‘œπ‘Ÿ 𝑐 π‘œπ‘Ÿ 𝑑.

𝑒𝑠𝑒𝑝 (π‘Ž, {𝑏, 𝑐, 𝑔}) = 𝑒𝑠𝑒𝑝(π‘Ž, 𝑏) + 𝑒𝑠𝑒𝑝(π‘Ž, 𝑐) + 𝑒𝑠𝑒𝑝(π‘Ž, 𝑔) = 0.47 . Since 𝑒𝑠𝑒𝑝 (π‘Ž, {𝑏, 𝑐, 𝑔}) > π‘šπ‘–π‘›2𝑒𝑠𝑒𝑝, we output the rule: π‘Ž β‡’ 𝑏 π‘œπ‘Ÿ 𝑐 π‘œπ‘Ÿ 𝑔.

𝑒𝑠𝑒𝑝 (π‘Ž, {𝑏, 𝑑, 𝑔}) = 𝑒𝑠𝑒𝑝(π‘Ž, 𝑏) + 𝑒𝑠𝑒𝑝(π‘Ž, 𝑑) + 𝑒𝑠𝑒𝑝(π‘Ž, 𝑔) = 0.75 . Since 𝑒𝑠𝑒𝑝 (π‘Ž, {𝑏, 𝑑, 𝑔}) > π‘šπ‘–π‘›2𝑒𝑠𝑒𝑝, we output the rule: π‘Ž β‡’ 𝑏 π‘œπ‘Ÿ 𝑑 π‘œπ‘Ÿ 𝑔.

48

𝑒𝑠𝑒𝑝 (π‘Ž, {𝑐, 𝑑, 𝑔}) = 𝑒𝑠𝑒𝑝(π‘Ž, 𝑐) + 𝑒𝑠𝑒𝑝(π‘Ž, 𝑑) + 𝑒𝑠𝑒𝑝(π‘Ž, 𝑔) = 0.46 . Since 𝑒𝑠𝑒𝑝 (π‘Ž, {𝑐, 𝑑, 𝑔}) > π‘šπ‘–π‘›2𝑒𝑠𝑒𝑝, we output the rule: π‘Ž β‡’ 𝑐 π‘œπ‘Ÿ 𝑑 π‘œπ‘Ÿ 𝑔.

In a similar manner we can generate all the disjunctive rules for which the antecedent is 𝑏, 𝑐, 𝑑, π‘œπ‘Ÿ 𝑔. We provide below a pseudo code for our algorithm.

4.2.1 DRMUD Algorithm

Our proposed algorithm as follows; 𝑰𝒏𝒑𝒖𝒕:

- A transaction set 𝑇 with |𝑇| = 𝑁, - A set 𝐼 of items in the transactions, - π‘šπ‘–π‘›π‘ π‘’π‘1, π‘šπ‘–π‘›π‘ π‘’π‘2,

- Size π‘˜ of Rules,

Output:

All the k-disjunctive rules of the form: (π‘Ž β‡’ 𝑏1 π‘œπ‘Ÿ 𝑏2 π‘œπ‘Ÿ β‹― π‘œπ‘Ÿ π‘π‘˜βˆ’1).

Algorithm:

Step 1: Pre-Processing

π‘šπ‘–π‘›1𝑒𝑠𝑒𝑝 = 𝑁 Γ— π‘šπ‘–π‘›π‘ π‘’π‘1 π‘šπ‘–π‘›2𝑒𝑠𝑒𝑝 = 𝑁 Γ— π‘šπ‘–π‘›π‘ π‘’π‘2

Step 2: /*Find the set 𝐹2 of frequent pairs of items with a support of β‰₯

min1esup*/

for every i ∈ I do

for every j ∈ I and ( jβ‰  i) do

49

if (𝑒𝑠𝑒𝑝(𝑖, 𝑗) β‰₯ π‘šπ‘–π‘›1𝑒𝑠𝑒𝑝), store the frequent pair (i,j) in F2 together with esup(i, j)

end for end for

Step 3: Generate the k-disjunctive rules

for every a ∈ I do

for every (k-1) subset S of I-{a} do

Let S= {𝑏1, 𝑏2, β‹― , π‘π‘˜βˆ’1};

Support=0; for 1 ≀ j≀ k-1 do

π‘†π‘’π‘π‘π‘œπ‘Ÿπ‘‘ += 𝑒𝑠𝑒𝑝(π‘Ž, 𝑏𝑗);

/* esup values are in F2 */

endFor if π‘†π‘’π‘π‘π‘œπ‘Ÿπ‘‘ β‰₯ π‘šπ‘–π‘›2𝑒𝑠𝑒𝑝 Output π‘Ž β‡’ 𝑏1 π‘œπ‘Ÿ 𝑏2 π‘œπ‘Ÿ β‹― π‘œπ‘Ÿ π‘π‘˜βˆ’1 endIf endFor endFor

Note: In Step 3, while considering the set S= {𝑏1, 𝑏2, β‹― , π‘π‘˜βˆ’1}, if there is any 𝑗 (1 ≀ 𝑗 ≀ π‘˜ βˆ’ 1) such that 𝑒𝑠𝑒𝑝(π‘Ž, 𝑏𝑗) < π‘šπ‘–π‘›1𝑒𝑠𝑒𝑝, then the set 𝑆 is not considered.

50

Related documents