• No results found

Data Mining and Business Intelligence

N/A
N/A
Protected

Academic year: 2021

Share "Data Mining and Business Intelligence"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

1

(2)

Data Mining and Business

Intelligence

Increasing potential to support

business decisions E nd U ser

B usiness A nalyst D ata A nalyst D B A

Decision

M aking

Data Presentation

V isualization T echniques

Data M ining

Inform ation D iscovery

Data Exploration

Statistical Sum m ary, Q uerying, and R eporting

Data Preprocessing/Integration, Data W arehouses

Data Sources

(3)

3

MDA Strategy Structured Matrix Analysis in the variable space

NO

By row

YES

Discriminant An. Segmentation

YES

Symmetric Analysis

NO

PLS Regress. Conjoint Anal. Non Symm Corresp. Anal.

NO

Canonical Corr. Multiple Corresp. 3 Way Analysis

YES

Cluster Anal. Multid. Scaling

NO

Princ. Comp. Corresp. An.

YES

Explicative

Analyses

Exploratory

Analyses

Categorical Variables

Ordinal Variables

Quantitative Variables

Matrices & Methods in MDA

(4)

AFC sur les résultats du premier tour des élections présidentielles dans les différents

arrondissements de Paris

. (Mai 2002)

(5)

5

Decision Trees

Decision trees are the main outcome of a segmentation procedure

They represent a learning technique for solving problems of

classification and forecast

Graphically, a decision tree may be seen as an upside down tree:

leaf node root node node leaf leaf leaf leaf

¾

Explanatory purpose:

ª

explain the response variable from

the set of predictors

¾

Decisional purpose

:

(6)

Example:

Referendum on the European

Constitution

(Binary Response Y)

Vote for European Constitution

Sex Age Class Political

Affiliation Last Degree

Confidence in the future

Oui Femme 25-34 PS Bac+3/4 Confiant+

Oui Homme 60 et + PS < Bac

Confiant-Oui Femme 35 à 44 ans UMP Bac+3/4 Nsp

Oui Homme 45-59 PS Bac Confiant++

Oui Femme 35 à 44 ans UMP Bac+5/Grande école Confiant++

Oui Homme 25-34 UMP Bac Confiant+

Oui Femme 25-34 UMP Bac Confiant+

Oui Homme 35 à 44 ans PS Bac+5/Grande école Confiant+ Oui Femme 35 à 44 ans UDF Pas de diplôme Confiant+

Oui Homme 45-59 UDF < Bac

Confiant--Oui Homme 25-34 UMP Bac+5/Grande école Confiant+

Oui Homme 60 et + UMP < Bac Confiant+

Oui Femme 35 à 44 ans PS < Bac Confiant+

Oui Homme 18-24 UMP Bac+3/4

Confiant-Oui Femme 35 à 44 ans PS Bac+2

Confiant-Oui Femme 18-24 Verts Bac Confiant++

Oui Femme 60 et + UMP < Bac Confiant+

Oui Homme 35 à 44 ans PS Bac+2 Confiant+

(7)

7

Building the second level in the

tree

(8)

Building the third level in the

tree

(9)

9

Extracting association rules

from decision trees

]

The knowledge represented in a decision tree may be also represented

in terms of “IF

Æ

THEN” rules.

]

For each path from the root to a terminal node, an association rule

may be defined.

IF age = “<=30” AND student = “no”

THEN buys_computer = “no”

IF age = “<=30” AND student = “yes”

THEN buys_computer = “yes”

IF age = “31…40”

THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”

IF age = “<=30” AND credit_rating = “fair”

THEN buys_computer = “no”

(10)

Partial Least Squares (PLS)

Path Modeling for:

Causal Network of Relationships

Multi-block Analysis

(11)

11

Path model describing a network of causal

relationships for Customer Satisfaction

.

Image

Perceived value Customer Expectation Perceived quality Loyalty Customer satisfaction

Complaints

(12)

a) Expectations for the overall quality of “your

mobile phone provider” at the moment you

became customer

of this provider.

b) Expectations for “your mobile phone

provider” to provide products and services to

meet your personal need.

c) How often did you expect that things could

go wrong at “your mobile phone provider” ?

Measurement Instrument for the Mobile Phone Industry

Examples of latent and manifest variables

Customer expectation

Customer satisfaction

a) Overall satisfaction

b) Fulfilment of expectations

c) How well do you think

“ your mobile phone provider”

compares with your ideal mobile

phone provider ?

(13)

13

Customer loyalty

a)

If you would need to choose a new mobile phone provider how

likely is it that you would choose “your provider” again ?

b) Let us now suppose that other mobile phone providers decide

to lower fees and prices, but “your mobile phone provider”

stays at the same level as today. At which level of difference (in %)

would you choose another phone provider ?

c) If a friend or colleague asks you for advice, how likely is it that

you would recommend “your mobile phone provider” ?

And so on for the other latent variables ...

Measurement Instrument for the Mobile Phone Industry

Examples of latent and manifest variables

(14)

ECSI Model in the XLSTAT-PLSPM software

Image Expectation Perceived Quality Perceived Value Satisfaction Loyalty Complaints IMAG1 IMAG2 IMAG3 IMAG4 IMAG5

CUEX1 CUEX2 CUEX3

PERV1

PERV2

CUSA1 CUSA2 CUSA3

CUSL1 CUSL2 CUSL3

CUSCO PERQ1

PERQ2 PERQ3

(15)

15

ECSI Path model for a

“Mobile phone provider”

Image

Perceived

value

Customer Expectation

Perceived

quality

Loyalty

Customer satisfaction

Complaint

.492 (7.67) R2=.242 .544 (10.71) .066 (1.10) .037 (1.14) .153 (3.07) .211 (2.54) .541 (6.93) .543 (8.62) .201 (3.59) .468 (5.18) .540 (11.08) .049 (1.11) R2=.296 R2=.335 R2=.672 R2=.432 R2=.292

(16)

16

Latent Variable Computation

Example : Customer Satisfaction Index

0264

.

0

0231

.

0

0158

.

0

3

sat

_

C

0264

.

0

2

sat

_

C

0231

.

0

1

sat

_

C

0158

.

0

CSI

+

+

×

+

×

+

×

=

Mean and standard deviation of the latent variables

250 26.49 100.00 72.6878 13.7660 250 25.85 100.00 72.3198 14.1259 250 23.95 100.00 74.5765 14.2573 250 .00 100.00 61.5887 20.5987 250 23.68 100.00 71.2876 15.3417 250 .00 100.00 67.4704 25.2684 250 1.29 100.00 69.1757 21.2668 IMAGE CUSTOMER EXPECTATION PERCEIVED QUALITY PERCEIVED VALUE CUSTOMER SATISFACTION COMPLAINT LOYALTY

N Minimum Maximum Mean Std. Deviation

Explanatory Variables for

Customer Satisfaction

ˆ

β

j

Correlation Contribution

to

R

2

(%)

Image

.153 .671

15.28

Expectation

.037 .481

2.67

Perceived Value

.200 .604

17.98

Perceived Quality

.544 .791

64.07

(17)

PLS1 regression :

an overview of the algorithm

Step 1 : Research of

m

orthogonal components

t

h

= Xa

h

as correlated as possible with

y

and as explanatory as

possible of their own group.

The number

m

is obtained by cross validation.

Step 2 : Regression of Y on the

m

components

t

h

.

(18)

Objective of step 1 of PLS regression

**

*

*

*

*

X

2

X

1

CPX

1

t

1

*

** *

*

*

*

y

CPX

1

t

1

y

*

*

**

* *

(19)

PLS1 Regression: Next Steps

]

Finally, the m-components PLS regression model:

y

=

c

1

t

1

+ c

2

t

2

+ … + c

m

t

m

+ Residual

=

c

1

Xa

1

+ c

2

Xa

2

+ … + c

m

Xa

m

+ Residual

=

X(c

1

a

1

+ c

2

a

2

+ … + c

m

a

m

) + Residual

=

b

1

x

1

+ b

2

x

2

+ … + b

k

x

k

+ Residual

Similarly, we proceed for the next components

(20)

Wine data (Asselin, Morlat & Pagès)

X

2el (Saumur),1 1cha (Saumur),1 1fon (Bourgueil),1 1vau (Chinon),3 t1 (Saumur),4 t2 (Saumur),4 Smell intensity at rest 3.07 2.96 2.86 2.81 3.70 3.71 Aromatic quality at rest 3.00 2.82 2.93 2.59 3.19 2.93 Fruity note at rest 2.71 2.38 2.56 2.42 2.83 2.52 Floral note at rest 2.28 2.28 1.96 1.91 1.83 2.04 Spicy note at rest 1.96 1.68 2.08 2.16 2.38 2.67 Visual intensity 4.32 3.22 3.54 2.89 4.32 4.32 Shading (orange to purple) 4.00 3.00 3.39 2.79 4.00 4.11 Surface impression 3.27 2.81 3.00 2.54 3.33 3.26 Smell intensity after shaking 3.41 3.37 3.25 3.16 3.74 3.73 Smell quality after shaking 3.31 3.00 2.93 2.88 3.08 2.88 Fruity note after shaking 2.88 2.56 2.77 2.39 2.83 2.60 Floral note after shaking 2.32 2.44 2.19 2.08 1.77 2.08 Spicy note after shaking 1.84 1.74 2.25 2.17 2.44 2.61 Vegetable note after shaking 2.00 2.00 1.75 2.30 2.29 2.17 Phenolic note after shaking 1.65 1.38 1.25 1.48 1.57 1.65 Aromatic intensity in mouth 3.26 2.96 3.08 2.54 3.44 3.10 Aromatic persisitence in mouth 3.26 2.96 3.08 2.54 3.44 3.10 Aromatic quality in mouth 3.26 2.96 3.08 2.54 3.44 3.10

Intensity of attack 2.96 3.04 3.22 2.70 2.96 3.33

Acidity 2.11 2.11 2.18 3.18 2.41 2.57

Astringency 2.43 2.18 2.25 2.18 2.64 2.67

Alcohol 2.50 2.65 2.64 2.50 2.96 2.70

Balance (Acid., Astr., Alco.) 3.25 2.93 3.32 2.33 2.57 2.77 Mellow ness 2.73 2.50 2.68 1.68 2.07 2.31

Bitterness 1.93 1.93 2.00 1.96 2.22 2.67

Ending intensity in mouth 2.86 2.89 3.07 2.46 3.04 3.33

Harmony 3.14 2.96 3.14 2.04 2.74 3.00

Global quality 3.39 3.21 3.54 2.46 2.64 2.85

3 Appellations

4 Soils

y

(21)

21

Hierarchical PLS model for wine data

(22)

Variable loading plot (w

*

, c)

- 0.40 - 0.30 - 0.20 - 0.10 0.00 0.10 0.20 0.30 0.40 - 0.20 - 0.10 0.00 0.10 0.20 0. 30 w* c [2 ] w*c[1]

SMELL INTENSITY AT REST

AROMATIC QUALITY AT REST FRUITY NOTE AT REST

FLORAL NOTE AT REST

SPICY NOTE AT REST

VISUAL INTENSITY SHADING SURFACE IMPRESSION SMELL INTENSITY SMELL QUALITY FRUITY NOTE

FLORAL NOTE AFTER SHAKING

SPICY NOTE

VEGETABLE NOTE

PHELONIC NOTE

AROMATIC INTENSITY IN MOUTH AROMATIC PERSISTENCE IN MOUTH AROMATIC QUALITY IN MOUTH

INTENSITY OF ATTACK ACIDITY ASTRINGENCY ALCOHOL BALANCE MELLOWNESS BITTERNESS

ENDING INTENSITY IN MOUTH HARMONY GLOBAL QUALITY SIMCA-P 10.5 - 28/08/2004 08:24:42

Positive

Negative

Non significant

(23)

23

Méthodes explicatives

Plusieurs variables à expliquer, plusieurs variables explicatives :

Régression PLS

Variable à expliquer

X

1

, X

2

, …, X

k

Y

Quantitatives

Qualitatives

Mélange

Quantitatif

Régression multiple

Analyse de la variance

Analyse de la

covariance

Qualitatif

- Régression

Logistique

-

Segmentation

-

Analyse factorielle

discriminante

-

Analyse factorielle

bayesienne

- Régression

Logistique

-

Segmentation

-

Analyse factorielle

discriminante

- Régression

Logistique

-

Segmentation

-

Analyse factorielle

discriminante

Variables explicatives

(24)

Méthodes descriptives

Méthodes de visualisation

X

1

, X

2

, …, X

k

Quantitatives

Qualitatives

Mélange

Analyse en

composantes

principales

Analyse des

correspondances

multiples

-

ACP

-

ACM

- Codage optimal

Méthodes de classification

-

Classification ascendante hiérarchique

(observations ou variables)

- Méthode des nuées dynamiques

(25)

25

Méthodes de prévision

]

Analyse d’une série chronologique

-

Recherche d’une tendance et de facteurs

saisonniers

-

Identification de valeurs atypiques

]

Prévision

-

Méthodes de lissage (série courte)

(26)

« Une goutte d’eau dans l’océan…

Ne la sous-estimez pas,

L’océan n’est fait que de gouttes d’eau… »

Photo extraite du livre « Rendons à César … »

(27)

27

References

Related documents

Evolutionary algorithm is an umbrella term used to describe population based stochastic direct search algorithms that in some sense mimic natural evolution.. Prominent

interactions) and off-task performance (i.e., the period of time between IVIS tasks when the driver was not interacting with the IVIS, but rather was driving as in the single-task

The general aims of this thesis are to: [1] evaluate the effect of nature’s time by characterizing the natural history of distinct patient groups within the spectrum of

The overall findings in this article indicate that strategic financial management decision making is a complex phenomenon to South African small business owner managers.. The

Department of Human Services (DHS) staff, plans call for ADRC to have office space to ac- commodate elder information specialists and rep- resentatives from the University of

Infractions of the Rules and Regulations will be heard by a Review Committee consisting of two Pine Valley Fair Association members (Court Directors) and the Court Advisor.. If

European Parliament Mr Dietmar NICKEL External Policies Services Delegate Director General Mr Marco AGUIRIANO External Policies Services Delegate Director Ms Kristin ARP

The goal of this longitudinal study is to examine the evolution of the perceptions, namely anxiety, ease of use, usefulness, misfit (not customization), trust and usefulness,