• No results found

Data mining for prediction

N/A
N/A
Protected

Academic year: 2021

Share "Data mining for prediction"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Data

mining for prediction

Prof. Gianluca Bontempi

Département d’Informatique

Faculté de Sciences

ULB Université Libre de Bruxelles

(2)

Outline

• Extracting knowledge from

observations.

• Introduction to Data Mining (What, Why,

and How?)

• Our techniques for data mining.

• Prediction from data: applications.

• What can we do for you?

(3)

Knowledge from observations

Extracting knowledge from observations is a

major issue in applied sciences.

•qualitative description of phenomena,

•theoretical understanding of laws underlying nature, •prediction of future observations.

(4)
(5)

What is data-mining?

• Search for valuable information in large

volumes of data.

• Exploration & analysis, by automatic or

semi-automatic means, of large

quantities of data in order to discover

meaningful patterns & rules.

(6)

Why Mine Data?

(for businesspersons)

• Lots of data has been and is being collected

and warehoused.

• computing has become affordable.

• competitive pressure is strong

– Provide better, customized services for an edge. – Information is becoming product in its own right.

• « convert data into business intelligence »

• « make management decision making based

on facts not intuition »

• « get closer to the customers »

• « gain competitive advantage ».

(7)
(8)

Why mine data?

(for scientists)

• Data collected and stored at enormous speeds

(Gbyte/hour)

– remote sensor on a satellite – telescope scanning the skies

– microarrays generating gene expression data

– scientific simulations generating terabytes of data

• Large dimensionality (thousands of variables).

• Data mining for data reduction.

– cataloging, classifying, segmenting data

(9)

What does data-mining?

Description models

– Find human-interpretable patterns that describe the data.

Prediction models

– Use some variables to predict unknown or future values of other variables.

– Examples: classification, regression, clustering, time series prediction.

From [Fayyad, et.al.] Advances in Knowledge Discovery and

(10)

Example of descriptive measures

Think to difference between this two types

of information:

1. the whole historical record of your bank transactions

2. the evaluation if you are a good or bad client (in your bank’s perspective)

(11)

Prediction task

PHENOMENON MODEL input output prediction error

•Finite amount of noisy observations.

•No a priori knowledge of the phenomenon.

(12)
(13)

Prediction modeling issues

• What are the

relevant factors (inputs &

outputs)

in the data?

• Which kind of

correlation (

or

model)

exists

between input and outputs? (linear,

nonlinear,… )

• Is this correlation

stationary

or

time varying

?

• Which

level of accuracy

in our prediction can

(14)

Model discovery

MODEL GENERATION MODEL VALIDATION PARAMETRIC IDENTIFICATION MODEL DELIVERY
(15)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Target function

Nonlinear relationship

input output
(16)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Training set

Observations

output
(17)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Global modeling

input output
(18)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Prediction with global models

query

(19)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Local modeling

(20)

Prediction with local models

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 query query query
(21)

Our competences in modeling

• Conventional statistic techniques

• Neural networks techniques

• Regression trees

• Local modeling techniques

• Neuro-fuzzy techniques

(22)
(23)

Financial prediction

Task: predict the future trends of the financial series.

daily stock market index

0 5000 10000 15000 20000 25000 30000 10/01/94 10/05/94 10/09/94 10/01/95 10/05/95 10/09/95 10/01/96 10/05/96 10/09/96 10/01/97 10/05/97 10/09/97 10/01/98 10/05/98 10/09/98 10/01/99 10/05/99 10/09/99 MIB

Goal: automatic trading system to anticipate the fluctuations of the market.

(24)

Car matriculations in Belgium 300000 320000 340000 360000 380000 400000 420000 440000 460000 480000 500000 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998

Economic variables

Task: predict how many cars will be matriculated next year. Goal: support the marketing campaign of a car dealer.

(25)

Modeling of industrial plants

Task: predict the flow stress of the steel plate as a function of the chemical and physical properties of the material.

Rolling steel mill

Goal: cope with different types of metals, reduce the production time and improve final quality.

(26)

Control

Task: model the dynamics of the plant on the basis of accessible information.

Waste water treatment plant

Goal: control the level of water pollutants.

(27)

Environmental problems

Task: predicting the biological state (e.g. density of algae communities) as a function of chemicals.

Goal: make automatic the analysis of the state of the river by

monitoring chemical concentrations.

(28)

Performance modeling of

embedded systems

Task: model the performance of software tasks on different hardware platforms on the basis of profiling information.

Embedded microprocessor

Goal: select the hardware architecture that better meets the product requirements.

(29)

Multimedia quality of service

Task: model the time/energy requirements of a multimedia application

Multimedia application

Goal: find the best trade-off quality/performance

(30)

Chaotic time series prediction

0 100 200 300 400 500 600 700 800 900 1000 0 50 100 150 200 250 300
(31)

Our prediction

(32)

Awards in international

competitions

• Data analysis competition:

awarded as a runner-up among 21 participants at the 1999 CoIL International Competition on Protecting rivers and streams by monitoring chemical concentrations and algae communities.

• Time series competition:

ranked second among 17 participants to the International

Competition on Time Series organized by the

International Workshop on Advanced Black-box techniques for nonlinear modeling in Leuven, Belgium

(33)

What can we do for you?

• Take and analyze

your most valuable

data (in electronic form) .

• Understand

if some relevant information

is hidden within them.

• Choose, using our techniques

, the best

model for your data

.

• Make

(or providing you tools for doing)

predictions

on the basis of your historical

References

Related documents

In addition, just as other jurisdictions have recognized the merit for keeping enterprise service offerings and specialized services such as virtual private networks outside

It would be expected as densities of turban snails within the central Californian kelp forest increase from zero to moderate densities/grazing intensities, Macrocystis fronds

The main topic of this paper was the definition of housing submarkets in terms of its conceptualization and empirical delineation. A new framework and methods

reviewing the teacher information as well as analyzing the data of the participants during the focus groups, this study showed that teachers with more experience have a higher

Taking the envelope in hand, the performer tears it open and APPARENTLY READS ALOUD THE TOTAL OF FOUR FIGURES FROM OFF THE CARD OR PAPER, but he is REALLY READING TO HIMSELF THE

This provision shall also apply to the skinsuit worn by the leader on which spaces are also reserved for use by riders/teams on the lower part (shorts), as described in the “UCI

European Journal of Clinical Microbiology & Infectious Diseases: official publication of 26 the European Society of Clinical Microbiology, 21(8), 613-616. Invasive

In developed economies, the bulk of OIAs reviewed were development finance institutions or investment guarantee schemes, while in developing and transition economies, the