• No results found

Mining Association Rules on Grid Platforms

N/A
N/A
Protected

Academic year: 2021

Share "Mining Association Rules on Grid Platforms"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

UNIVERSITY OF TUNIS EL MANAR

FACULTY OF SCIENCES OF TUNISIA

Mining Association Rules on

Mining Association Rules on

Mining Association Rules on

Mining Association Rules on

Mining Association Rules on

Mining Association Rules on

Mining Association Rules on

Mining Association Rules on

Grid Platforms

Grid Platforms

Grid Platforms

Grid Platforms

Raja Tlili

Yahya Slimani

(2)

Plan

Introduction

Association rules

The need of parallel computing

Workload balancing: Problem description

Workload balancing in association rule mining algorithms

Workload balancing in Grid computing

Workload balancing in Grid computing

(3)

Introduction (1)

Data vs Knowledge

Databases

Data : involved

Data : involved

Knowledge : hidden

Knowledge

Knowledge is most important than data

Decision making

To increase revenues and reduce costs

(4)

Introduction (2)

What is data mining

Extracting knowledge from a large volume of data

Non trivial

Implicit

Previously unkown

Previously unkown

(5)

Association rules (1)

Association rules (1)

The use of knowledge

(6)

Finding the rule

A

B with

support >= minsup and a

confidence >= minconf

Association rules (2)

Clients buying milk

Clients buying both

Clients buying sugar

confidence >= minconf

support,

s

, probability that a transaction contain

{A, B}

confidence,

c,

conditional probability that a

transaction containing A will also contains

B

Transaction Items

T1 A B C D E F G H I T2 .. .. .. .. .. .. .. .. T3 .. .. .. .. .. .. .. .. T4 .. .. .. .. .. .. .. ..

Clients buying sugar

(7)

The support and confidence thresehlods are fixed by the user

MinSup MinConf

Extracting association rules : how ?

Finding all association rules respecting that MinSup and this MinConf

Objectif :

1. Finding all frequent itemsets (support MinSup) Problem decomposition

(8)

The need of parallel computing

Databases to be mined are often very large

( in GB and TB

)

( in GB and TB

)

Transactional database have to be scanned

repeatedly (iteratively)

Databases to be mined are often very large

The need of fast algorithms for discovering

association rules

association rules

(9)

Workload balancing

Main challenges facing parallelism

Synchronisation & Communication minimization

Finding good data layout & data decomposition

Workload

Balancing

Finding good data layout & data decomposition

Disk I/O minimization

(10)

Work load balancing is the

assignment of work

to processors in

Load balancing: Problem description

Work load balancing is the

assignment of work

to processors in

a way that maximizes

application performance

Minimizing

(11)

Homogeneous environment Even if we

Causes of load imbalance

Homogeneous environment Even if we

equally partition the DB, the imbalance would occur

due to the differences in data correlation.

Heterogeneous platforms Have different

Heterogeneous platforms Have different

processor capacities and network speed.

(12)

Related work

The majority of current approaches use static

load balancing based on finding some

intelligent way for partitionning the database

(13)

Taxonomy of load balancing policies

Taxonomy of load balancing policies

Proposed Load Balancing Approach:

Characteristics

Taxonomy of load balancing policies

Static Dynamic

Reassignment Centralized Distributed

One-time Dynamic Local Global

Taxonomy of load balancing policies

Static Dynamic

Reassignment Centralized Distributed

One-time Dynamic Local Global

One-time Dynamic

Adaptive Non-Adaptive

Local Global

Cooperative Non-Cooperative One-time Dynamic

Adaptive Non-Adaptive Non-Adaptive

Local Global

(14)

Proposed Load Balancing Approach:

Goals

Improving the efficiency and the scalability of

ARM algorithms under Grid platforms :

ARM algorithms under Grid platforms :

Exploiting prallelism at various levels ;

considering the particular features of the

target platform

(15)

Let

G

= (

S

1 ,

S

2 ,…,

S

T)

S

=

(M , Coord(S ) , Mem , Stor ,

Proposed load balancing model

Clij : Cluster j of Si

Coord (clij) :

Cluster coordinator

S

i =

(M

i

, Coord(S

i

) , Mem

i

, Stor

i

,

Band

i

)

M

i : total number of clusters in

S

i

Coord(Si) : coordinator node of the site

S

i

Mem

i : memory size

Stor

i : capacity of the storage subsystem

Band

: bandwidth size of the network

Network …. . BD3 BD3 BD3 BD1 BD1 coordinator

Band

i : bandwidth size of the network

=

=

NNi j i j i

Mem

Mem

1 ,

=

NNi

Stor

Stor

Coord (Si) : Site coordinator ndijk : node k of clij
(16)

DB DB

DB

Load balancing strategy :

(1) Before execution

S1 S2 Sn … . DB Partition 2 DB Partition n DB Partition 1 … Processing Network … Processing … Processing Network

(17)

Steps :

Step I : K=1

Load balancing strategy :

(1) Before execution

S

1

D

Coord(S

i

)

S

1

P0 P1 P2 P3

S1 S2 S3

•Partitioning the database

D

between sites according to their

• Every processor has its local database

(18)

From the intra-site level

Load balancing strategy :

(2) During execution

… Network State Vector S ta te V e c to r Network S ta te V e c to r

(19)

From the Grid level

Load balancing strategy :

(2) During execution

… Network Global State Information Global State Information Global State Information Global State Information

the coordinators of different sites periodically

Global State Information Global State

(20)

Intra Site Candidates Migration

Load balancing strategy :

(2) During execution

{A,B,C,..} …

(21)

Inter Site Transactions Migration

Load balancing strategy :

(2) During execution

… Network T : A,B,C,I,J T: D,E,F,H,I,K T:D,F,H,I,H,J . . T: C,F,J,L,M

(22)

Load balancing strategy :

(2) During execution

The coordinator sends migration plan to all

The coordinator sends migration plan to all

processing nodes and instructs them to

reallocate the work load.

The previously mentioned process is periodically

invoked. Coordinators check the work load

(23)

Experimentation under a Grid

Experimental results

under a Grid computing environment: Grid’5000

Grille

constituted of 5000

CPU distributed over

9 sites : Lille,

9 sites : Lille,

Rennes, Orsay,

Nancy, Lyon,

Bordeaux, Grenoble,

Toulouse, Sophia.

(24)

Database size Transactions number

Items number Average

transaction size DB100T13M 100 MB 1 300 000 4000 25

Experimental results

DB100T13M 100 MB 1 300 000 4000 25 (b) DB100T13M 1500 2000 2500 R u n t im e ( s e c ) Time seq // without loadbalancing // with loadbalancing

2 Sites

Each site contains

2 Clusters

16

computational

Nodes :

500 1000 R u n t im e ( s e c )

16

computational

Nodes :

3 nodes

/cluster 1

,

2 nodes

/cluster 2

,

(25)

Experimental results

There is not a fixed

optimal number of

optimal number of

processors

that could

be used for

execution. The

number of processors

used

should be

proportional to the

size of data sets to be

size of data sets to be

mined.

The easiest

way to determine

(26)

Association rule mining algo. have a simple statement, but they are

Conclusion and future works

computationally and I/O intensive

(performance

problem).

Parallel & distributed computing

is essential for providing

scalable mining solutions,

and can play an important role in

ameliorating performances.

The

dynamic nature

of association rule mining algorithms causes

The

dynamic nature

of association rule mining algorithms causes

(27)

We developed a

distributed dynamic

load balancing strategy

,

under a

Grid Computing environment.

Conclusion and future works

Grid Computing environment.

Experimentations showed that our strategy succeeded in

reducing

the

execution time

of

iterative

association rule mining algorithms (good

distribution

of workload among the processors of the Grid).

Work migration is known since a long time in

«

task scheduling »

Work migration is known since a long time in

«

task scheduling »

Adapting

it to ARM algorithms.

Executing

ARM algorithms

under Grid platforms

and

obtaining good results

, even with the various phases of

synchronizations.

(28)

UNIVERSITY OF TUNIS EL MANAR

FACULTY OF SCIENCES OF TUNIS

References

Related documents

The Los Angeles startup ecosystem is supported by a lively network of Meetups, coworking spaces, incubators, accelerators and educational resources.. If you are new to the scene,

California, which has the highest number of support centers in our survey (22 companies), seems to be losing many of its once high-paid support employees: median pay levels for

Thermo Scientific ™ TraceFinder ™ software uses Q Exactive GC system data and combines compound detection, identification, and quantitation to enable high efficiency screening..

Regional Development Grant, The American Academy of Religion, 2008-09 University Scholarship, The Catholic University of America, 2008-2009 Teaching Fellowship, The

Administration centre and that the City would be proceeding as planned. In response to a question by Councillor Furfaro, the Chief Administrative Officer advised that the

The objective of this research was to investigate differences in functional impairment between youth with Attention-Deficit Hyperactivity Disorder (ADHD) and an Anxiety Disorder

The refractive index of Cadmium sulphide thin film deposited at 200 O C increases exponentially with increase in photon energy, whereas the refractive index of the

Using the regression coef cients, the counterfactual wage distribution without an increase in the effective minimum wage is created to quantify the contribution of the increased