• No results found

Cost-Sensitive Decision Making

Scoring and Deployment 6-9

<Attribute name="EDUCATION" actualValue="&lt; Bach." weight=".059" rank="3"/>

<Attribute name="Y_BOX_GAMES" actualValue="0" weight=".059" rank="4"/>

<Attribute name="COUNTRY_NAME" actualValue="United States of America" weight=".059" rank="5"/>

</Details>

Cost-Sensitive Decision Making

Costs are user-specified numbers that bias classification. The algorithm uses positive numbers to penalize more expensive outcomes over less expensive outcomes. Higher numbers indicate higher costs. The algorithm uses negative numbers to favor more beneficial outcomes over less beneficial outcomes. Lower negative numbers indicate higher benefits.

All classification algorithms can use costs for scoring. You can specify the costs in a cost matrix table, or you can specify the costs inline when scoring. If you specify costs inline and the model also has an associated cost matrix, only the inline costs are used. The PREDICTION, PREDICTION_SET, and PREDICTION_COST functions support costs. Only the Decision Tree algorithm can use costs to bias the model build. If you want to create a Decision Tree model with costs, create a cost matrix table and provide its name in the CLAS_COST_TABLE_NAME setting for the model. If you specify costs when building the model, the cost matrix used to create the model will be used when scoring. If you want to use a different cost matrix table for scoring, first remove the existing cost matrix table then add the new one.

A sample cost matrix table is shown in Table 6–1. The cost matrix specifies costs for a binary target. The matrix indicates that the algorithm should treat a misclassified 0 as twice as costly as a misclassified 1.

Example 6–13 Sample Queries With Costs

The table nbmodel_costs contains the cost matrix described in Table 6–1.

SELECT * from nbmodel_costs;

ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE COST --- --- --- 0 0 0 0 1 2 1 0 1 1 1 0

The following statement associates the cost matrix with a Naive Bayes model called nbmodel.

BEGIN

dbms_data_mining.add_cost_matrix('nbmodel', 'nbmodel_costs');

Table 6–1 Sample Cost Matrix

ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE COST

0 0 0

0 1 2

1 0 1

Cost-Sensitive Decision Making

END; /

The following query takes the cost matrix into account when scoring mining_data_ apply_v. The output will be restricted to those rows where a prediction of 1 is less costly then a prediction of 0.

SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v

WHERE PREDICTION (nbmodel COST MODEL

USING cust_marital_status, education, household_size) = 1 GROUP BY cust_gender ORDER BY cust_gender; C CNT AVG_AGE - --- --- F 25 38 M 208 43

You can specify costs inline when you invoke the scoring function. If you specify costs inline and the model also has an associated cost matrix, only the inline costs are used. The same query is shown below with different costs specified inline. Instead of the "2" shown in the cost matrix table (Table 6–1), "10" is specified in the inline costs.

SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v

WHERE PREDICTION (nbmodel COST (0,1) values ((0, 10), (1, 0))

USING cust_marital_status, education, household_size) = 1 GROUP BY cust_gender ORDER BY cust_gender; C CNT AVG_AGE - --- --- F 74 39 M 581 43

The same query based on probability instead of costs is shown below.

SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v

WHERE PREDICTION (nbmodel

USING cust_marital_status, education, household_size) = 1 GROUP BY cust_gender ORDER BY cust_gender; C CNT AVG_AGE - --- --- F 73 39 M 577 44

See Also: Example 1–1, "Predict Best Candidates for an Affinity Card"

DBMS_DATA_MINING.Apply

Scoring and Deployment 6-11

DBMS_DATA_MINING.Apply

The APPLY procedure in DBMS_DATA_MINING is a batch apply operation that writes the results of scoring directly to a table. The columns in the table are mining

function-dependent.

Scoring with APPLY generates the same results as scoring with the SQL scoring functions. Classification produces a prediction and a probability for each case; clustering produces a cluster ID and a probability for each case, and so on. The difference lies in the way that scoring results are captured and the mechanisms that can be used for retrieving them.

APPLY creates an output table with the columns shown in Table 6–2.

Since APPLY output is stored separately from the scoring data, it must be joined to the scoring data to support queries that include the scored rows. Thus any model that will be used with APPLY must have a case ID.

A case ID is not required for models that will be applied with SQL scoring functions. Likewise, storage and joins are not required, since scoring results are generated and consumed in real time within a SQL query.

Example 6–14 illustrates anomaly detection with APPLY. The query of the APPLY output table returns the ten first customers in the table. Each has a a probability for being typical (1) and a probability for being anomalous (0).

Example 6–14 Anomaly Detection with DBMS_DATA_MINING.APPLY

EXEC dbms_data_mining.apply

('SVMO_SH_Clas_sample','svmo_sh_sample_prepared', 'cust_id', 'one_class_output');

SELECT * from one_class_output where rownum < 11; CUST_ID PREDICTION PROBABILITY

--- --- ---

Table 6–2 APPLY Output Table

Mining Function Output Columns

classification CASE_ID PREDICTION PROBABILITY

regression CASE_ID PREDICTION

anomaly detection CASE_ID PREDICTION PROBABILITY

clustering CASE_ID CLUSTER_ID PROBABILITY

feature extraction CASE_ID FEATURE_ID MATCH_QUALITY

DBMS_DATA_MINING.Apply 101798 1 .567389309 101798 0 .432610691 102276 1 .564922469 102276 0 .435077531 102404 1 .51213544 102404 0 .48786456 101891 1 .563474346 101891 0 .436525654 102815 0 .500663683 102815 1 .499336317

See Also: DBMS_DATA_MINING.APPLY in Oracle Database PL/SQL Packages and Types Reference

7

Mining Unstructured Text 7-1

7

Mining Unstructured Text

Related documents