Oracle Advanced Analytics - Business Intelligence and Data Warehousing

1.1.7.3 64-Bit ODP.NET XCopy for Windows

1.2 Business Intelligence and Data Warehousing

1.2.1 Oracle Advanced Analytics

The following sections describe new Oracle Advanced Analytics features.

1.2.1.1 Decision Tree Mining Text Data

The Decision Tree algorithm now supports nested data and can be used for text mining.

Decision Tree is popular due to its transparency and prevalence, therefore, it is important to enable the algorithm to handle unstructured data.

See Also:

Oracle Database JDBC Developer's Guide for details

See Also:

Oracle Database JDBC Developer's Guide for details

See Also:

Oracle Database JDBC Developer's Guide for details

See Also:

Business Intelligence and Data Warehousing

1.2.1.2 Expectation Maximization (EM) Clustering and Density Estimation

In Release 11g, Oracle Data Mining offered two clustering algorithms. However, these algorithms did not easily integrate data coming from different domains (for example, structured and unstructured data). Expectation Maximization (EM) is a probabilistic clustering algorithm that creates a density model of the data. The density model allows for an improved approach to combining data originating in different domains. Each domain can be modeled by distributions appropriate for the domain. The distribution parameters are optimized to provide the most likely joint distribution of the data. Given EM's probabilistic nature, its cluster assignment probabilities are more reliable than those produced by the current Oracle Data Mining algorithms. The EM algorithm also automatically determines the optimal number of clusters needed to model the data.

In bringing analytics to applications, Oracle Data Mining provides different types of clustering capabilities currently being used by multiple applications. While the current capabilities solve a range of problems, an additional method is needed that can effectively combine data from different domains, such as sales transactions and customer demographics, or structured and unstructured (for example, text) data, as well as help answer queries involving range and equality predicates. Expectation Maximization can address all of these requirements.

1.2.1.3 Feature Extraction Using Singular Value Decomposition

Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) are powerful feature extraction methods that use orthogonal linear projections to capture the underlying variance of the data. This property is extremely useful for reducing the dimensionality of high-dimensional data and for supporting meaningful data

visualization. Text mining is one of the domains where SVD projections have found wide application.

PCA can be viewed as a special scoring method under the SVD algorithm. It produces projections that are scaled with the data variance. Projections of this type are

sometimes preferable in feature extraction to the standard non-scaled SVD projections. In bringing analytics to applications, Oracle Data Mining provides powerful feature extraction capabilities that can be used in many contexts, special handling of unstructured data, and large numerical data sets such as those from sensors (for example, Radio Frequency Identification (RFID)) and time series.

While Oracle Data Mining already provides a basic feature extraction capability, additional feature extraction methods capable of scaling to large data sizes (both rows and attributes) and allowing greater compression of the data are necessary to support many applications.

1.2.1.4 Feature Selection and Creation for Generalized Linear Models (GLM)

Feature selection is used to reduce the number of predictors used by a model. This allows for smaller, faster scoring, and more meaningful Generalized Linear Models (GLM).

Feature generation allows the creation of GLM models that use non-linear terms (up to cubic terms). This produces more powerful, transparent models.

See Also:

Oracle Data Mining Concepts for details

See Also:

Business Intelligence and Data Warehousing

In bringing analytics to applications, Oracle Data Mining continuously strives to address the competing goals of high accuracy and transparency (the ability to explain predictions).

Some of the techniques (for example, GLM) used most often by applications provide great transparency at the expense of lower accuracy than less transparent methods. There is a need for a transparent, highly accurate, and scalable method capable of handling thousands of attributes efficiently. This can be achieved by adding feature selection and creation to GLM.

1.2.1.5 Native Double in Data Mining Functions

Support has been added for native double types (BINARY_DOUBLE and BINARY_FLOAT) in Oracle Data Mining functions.

Mining model deployment (scoring) performance is critical because this is run on the majority of production data, both in batch and real time. Through C performance analysis, Oracle has identified that the cost of scoring can be dominated by type coercion between Oracle number and double, rather than by the model itself. Removing this overhead leads to much faster scoring behavior.

1.2.1.6 Native SQL Support for Row Pattern Matching

The MATCH_RECOGNIZE clause enables native SQL queries to match specified patterns in sequences of rows.

Row pattern matching in native SQL improves application and development

productivity and query efficiency for row sequence analysis. The syntax incorporates regular expressions and full conditional logic, enabling precise and flexible pattern definition. Whatever the domain (for example, financial market prices, internet clicks, or security sensor output), applications analyzing row sequences can benefit from MATCH_RECOGNIZE.

1.2.1.7 Native Text Support

Native support for text mining in Oracle Data Mining has been added in this release. This change embeds some text processing in Oracle Data Mining, enabling simpler and more performant deployment.

1.2.1.8 On-the-Fly Models

On-the-fly models (called predictive queries in the Oracle Data Miner GUI workflow SQLDEV extension) are transient data mining models that are formed as part of analytic clauses. They represent a simpler form of mining which is tightly integrated with the

See Also:

Oracle Data Mining Concepts for details

See Also:

Oracle Data Mining User's Guide for details

See Also:

Oracle Database Data Warehousing Guide for details

See Also:

Business Intelligence and Data Warehousing

SQL language and engine. Moreover, they introduce the concept of partitioned models without the overhead of the persistence of many models.

Applications need to build models per partitioned segment, and this approach addresses that with a transient model.

1.2.1.9 Prediction Details and Cluster Functions

Prediction detail support for Decision Tree algorithms has been added in this release. Also added, cluster distance and details functions.

Applications require details explaining the reasons behind a prediction to be provided. In addition, certain applications need to find the record closest to the cluster center and the CLUSTER_PROBABILITY function is not capable of providing that information.

In document Oracle Database-12c-New Features Guide (Page 35-38)