• No results found

What Are They Thinking? With Oracle Application Express and Oracle Data Miner

N/A
N/A
Protected

Academic year: 2021

Share "What Are They Thinking? With Oracle Application Express and Oracle Data Miner"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

What  Are  They  Thinking?  With  

Oracle  Application  Express  and  

Oracle  Data  Miner

Brendan Tierney

Roel Hartman

(2)

Agenda

Who  are  we  

The  Scenario  

Graphs  &  Charts  in  APEX  -­‐  Live  Demo  

Oracle  Data  Miner  &  DBA  tasks  

Including  Oracle  Data  Mining  in  APEX  

(3)

Currently:  

▪ Lecturer

 

▪ DBA

 

▪ Data  Mining  Consultant

 

▪ BI  &  Data  Architect

 

▪ Trainer  

Working  with  Oracle  products  since  1992/1993  

▪ Oracle  version  5  up  to  11g

 

▪ Oracle  Reports  (RPT),  ReportWriter  I,  RPT,  …

 

▪ Forms  2.3…

 

▪ Oracle  Data  Miner  since  2005

 

Data  Warehousing  since  1997  

Data  Mining  since  1998  

Analytics  since  1993

Brendan  Tierney

Available  in  

eBook    

(4)
(5)

The  Scenario

But  ?  

         Is  there  an  Alternative?

(6)

The  Scenario

We  have  a  number  of  products  

We  got  the  opinions  from  Amazon  (star  rating)  

Can  we  use  Data  Mining  to  predict  opinions    

Can  we  build  interactive  dashboards  in  the  DB  

Data  Mining  &  Interactive  Dashboards  with  APEX    

(7)

APEX  -­‐  POOR  MAN’S  BI  TOOL

(8)
(9)
(10)

DEMO

(11)

ORACLE  ADVANCED  ANALYTICS

(12)
(13)

Technique Algorithms Applicability

Classification Logistic  Regression  (GLM)

 

Decision  Trees

 

Naïve  Bayes

 

Support  Vector  Machine

Classical  Statistical  Technique

 

Popular  /  Rules  /  Transparency

 

Embedded

 

Wide  /  Narrow  Data  /  Text Regression

 

Multiple  Regression

 

Support  Vector  Machine

Classical  Statistical  Technique

 

Wide  /  Narrow  Data  /  Text

 

Anomaly  Detection

 

One  Class  SVM Lack  Examples

Attribute  Importance

 

Minimum  Descriptive  Length Attribute  Reduction

 

Identify  Useful  Data

 

Reduce  Data  Noise Association  Rules

 

Apriori Market  Basket  Analysis

 

Link  Analysis Clustering Enhanced  K-­‐Means

 

O-­‐Cluster

 

Expectation  Maximization

Product  Grouping

 

Text  Mining

 

Gene  and  Protein  Analysis Feature  Extraction Non-­‐Negative  Matrix  Factorization

 

Principal  Components  Analysis

 

Singular  Vector  Decomposition

Text  Analysis

 

Feature  Reduction

(14)

Oracle'Data'Mining'

! 

PL/SQL'Package'

! 

DBMS_DATA_MINING'

! 

DBMS_DATA_MINING_TRANSFORM'

! 

DBMS_PREDICTIVE_ANALYTICS'

! 

SQL'FuncBons'

– 

PREDICTION'

– 

PREDICTION_PROBABILITY'

– 

PREDICTION_BOUNDS'

– 

PREDICTION_COST'

– 

PREDICTION_DETAILS'

– 

PREDICTION_SET'

– 

CLUSTER_ID'

– 

CLUSTER_DETAILS'

– 

CLUSTER_DISTANCE'

– 

CLUSTER_PROBABILITY'

– 

CLUSTER_SET'

– 

FEATURE_ID'

– 

FEATURE_DETAILS'

– 

FEATURE_SET'

– 

FEATURE_VALUE'

! 

12c'–'PredicBve'Queries'

! 

aka''Dynamic'Queries'

! 

TransiBve'dynamic'Data'Mining'models'

! 

Can'scale'to'many'100+'models'all'in'one'

statement''

(15)

Sta$s$cal(Func$ons(in(Oracle(

All(of(these(are(

FREE

((

with(the(Database(

These(are(o:en(

forgo<en(about(

(16)

Text  Mining  in  Oracle

Natural language processing

 

– It deals with the actual text element. It transforms it into a format that the

machine can use.

 

Artificial intelligence / Machine Learning

 

▪ It uses the information given by the NLP and uses a lot of maths to determine

whether something is negative or positive.

▪ All done in Oracle Data Miner (using Oracle Text)

▪ Allows Data Analysts to do this

▪ Isolated from the underlying complexity

Oracle  Text

Oracle  Data  

Mining

(17)

How  is  it  done  with  Oracle  Text  &  Oracle  Advanced  Analytics

Product  

Review

Labelling

Human  

Tokenization

Stop  Word

Punctuation

Text  Ready  for  

DM

New  

Product  

Reviews

Machine  

Learning  

Algorithms

Evaluation

Model

Sentiment    

(18)
(19)

Tokenization

Tokenization  is  the  process  of  breaking  a  stream  of  text  up  into  words,    

phrases,    symbols,    or  other  meaningful  elements  called  tokens.    

The  list  of  tokens  becomes  input  for  further  processing  such  

as  

parsing

 or  

text  mining

 

Tokens  are  separated  by  

whitespace

 characters,    such  as  a  space  or  line  break,    or  by  

punctuation  characters.

 

Punctuation  and  whitespace  may  or  may  not  be  included  in  the  resulting  list  of  tokens.

Today  28  Sept  we  are  at  OUF  Sunday.  

(20)

Stop  Words

For  analyzing  twitter  we  

can  include  hash  tags  

 

e.g.    #OOW14

(21)

Stop  Words

Today

28

Sept

we

are

at

OUF

Sunday

.

For  analyzing  twitter  we  

can  include  hash  tags  

 

e.g.    #OOW14

(22)

Punctuations

Characters  that  are  defined  as  punctuations  are  removed  from  a  

token  before  text  indexing  

.    ,    :    ;    ‘    @    ~    #    {    }    [    ]    +    =    -­‐    _    (    )    *    &    ^    %    $    £    €    “    !    `    ¬    ¦    \    

|  /    ?

Product  

Review

Labelling

Human  

Tokenization

Stop  Word

Punctuation

Text  Ready  for  

DM

(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)

Using  your  Sentiment  Analyzer

Add  the  view  to  

Physical  layer  of  the  

BI  Repository

 

Then  add  to  the  

Business  Model  

layer

 

CASE_ID   PRED   PROBABILITY CUSTOMER_SENTIMENT SENTIMENT_V CUSTOMER CUSTOMER_SENTIMENT CUSTOMER_SENTIMENT CUSTOMER TRANSACTIONS CUSTOMERTRANSACTIONS TITLE NAME STATUS SEX AGE RATING LOCATION DEFAULTS REGION PRED PROBABILITY

(31)

The  models  are  first  class  objects  in  the  DB  

Just  like  calling  any  other  function  

They  are  fast  

Built  a  model  on  550,000  records  in  2  minutes  

Scored  1.2M  records  in  52  seconds      (on  a  mid  spec  development  sever)  

▪ >80M  records  per  hour    without  using  the  Parallel  Option

33  lines  of  SQL  code  

to  build  and  implement  a  

Sentiment  Classifier    

(32)

DEMO  

-­‐  ADDING  ADVANCED  ANALYTICS  TO  APEX  GRAPHS

(33)
(34)

DEMO

-­‐  Create  a  visualisation  of  your  model  

-­‐  Dashboard  

-­‐  Use  your  model  for  workflow  decisions

(35)

APEX  -­‐

30

POOR

SMARTMAN’S  BI  TOOL

+

=

(36)

Brendan Tierney

Roel Hartman

[email protected]  

@brendantierney

[email protected]  

@roelh

References

Related documents

Therefore, the purpose of this research was three-fold: (1) to analyze three exemplary ethnographies of science, to present the commonalities and differences among them, to build

q =132 Allegro Rediculoso Trombone

In his first keynote speech as CEO, David Abraham said “We need to look to support great creative individuals, fund the work of smaller companies and reach those parts of the

Secondary bond market intervention by EFSF has a twofold objective. First, it serves to support the functioning of the debt markets and appropriate price formation in

We investigated the possible correlations between (1) the maximum amplitude at the source A0 max of the seismic sig- nal and the absolute momentum |p| before the impact, (2)

This study reports on detection, localization, and quanti fication of frequent small rockfalls and infrequent pyroclastic density currents descending the southeast flanks of

Keywords: interculturalism; cultural diversity; regional cultures; Banat [Romania]; text analysis; linguistic multiculturalism; Gerard of Cenad; Nicolae Stoica of

Scatter plot and linear regression fit between the long range spatial autocorrelation of NDVI (α-DFA) and bare soil rate (a) and species richness (b) obtained from 24 500-m