Monitoring: Models are monitored to control and check the performance to ensure

that the desired results are obtained as expected (Fhyzics Buisness Consultants).

Figure 2.2.1: Predictive Analytics Process

2.2.2 Predictive Analytic Models

Predictive analytics is often used to mean predictive models. However, we are increasingly using the term to describe related analytic disciplines used to improve customer decisions. Since different forms of predictive analytics tackle slightly different customer decisions, they are commonly used together. So we have 3 main types in the field of predictive analytics, which are discussed below.

Predictive Models:

Predictive models analyze past performance to predict the possibility that a special behavior is exhibited by a customer in the future. This category also encompasses models that detect subtle data patterns to answer questions about customer behavior, such as fraud detection models. Operational processes often include predictive models which are activated during live transactions. The models analyze historical and transactional data to isolate

providers. These analyses weigh the relationship between hundreds of data elements to isolate each customer’s risk or potential, which guides the action on that customer.

Descriptive Models:

Unlike predictive models that predict a single customer behavior, descriptive models identify many different connections between customers and products. Descriptive models encode relationships into data in a way that is often used to classify customers or prospects into groups. For example, a descriptive model may categorize customers into various groups with different buying patterns. Descriptive modeling tools can be used for the development of further models that can simulate huge numbers of individualized agents and make predictions

Decision Models:

Decision models predict the outcomes of complex decisions in much the same way predictive models predict customer behavior. Decision models make predictions of what is going to happen in case a given action is taken, through the mapping of the relationships between all the elements of a decision. These models can be utilized in optimization, maximizing certain results while minimizing others. Decision models are generally used to develop decision logic or a group of business regulations that will produce the desired action for every customer or condition (FICO).

2.2.3 Predictive Analytic Models Applications and Techniques

Predictive analytics can be of use in many applications and had a great impact in some of them in recent years. Some of these applications are CRM, Health care, Collection Analytics, Cross Sell, Fraud Detection, Risk Management, Direct Marketing and Underwriting. As a result, predictive analytics find a wide range of usage in telecom, insurance, banking,

marketing, financial services, retail, travel, heath care, pharmaceuticals, oil and gas and a host of other industries where organizations are getting to take decisions based on data.

The idea behind any predictive model is to create a mapping function between a set of input data fields and a target variable. How you can make this feasible depends on the size and complexity of your data set, but there is a number of tested predictive modelling techniques you can use across a variety of applications.

Support Vector Machines (SVMs):

SVMs are linear models with the difference that they have different margin-based loss function. It is a supervised machine learning technique that analyzes smaller datasets and recognizes patterns which can be used for classification and regression analysis. This technique uses a hyperplane to divide datasets into two distinct classes. Often, SVMs require three-dimensional mapping to ensure the widest possible margin between the hyperplane and the two data classes. Although results in small datasets are accurate, training time for larger datasets is usually a limiting factor. SVMs are great for specific classification tasks such as facial and image recognition.

Decision Trees:

Decision trees work by recursively partition data into smaller subsets. At each new branch of the decision tree, data are further split until a classification or decision is made.

Decisions trees are computationally cheap and easy to understand. However, they are prone to overfitting and need to be pruned regularly. While a model may fit training data well, it can often do a poor job of classification in the real-world. Boosted decision trees use additive boosting techniques to combine the outcomes of weaker decision trees and provide a more weighted measure of accuracy. Decision trees are often used because they are easy to

understand and interpret. The most common decision trees algorithms used these days are Random Forests and Boosting Trees.

Naïve Bayes:

Naïve Bayes is a group of algorithms that use Bayes Theorem for calculating probability.

This predictive modelling technique classifies items assuming that each variable is independent to each other. A Bayesian recommendations system creates a probabilistic model of personalized item recommendations, drawing on previous user behavior. Naïve Bayes models are originally easy to build and interpret and produce very fast results.

However, its naivety comes from the assumption that every variable is independent, which is often not the case. Simple text classification tasks such as spam detection and sentiment detect patterns in data: an input layer, a hidden layer and an output layer. They aim to use the mathematical functions in the hidden layer to produce an output free of noise. Neural networks can be used in both classification and regression and are great at handling a huge amount of non-linear data. They are mainly used for predictions about time series data such as weather data or economic trends (Redpiexe).

Linear regression:

Linear regression is one of the most famous modeling techniques. This technique is among the first few topics which people pick while learning predictive modeling. It is

technique, the dependent variable is continuous, independent variables can be continuous or discrete, and the regression line is linear. In linear regression, we predict value of one variable from the value of another variable. The variable we are predicting is called the target variable and the variable we are basing our predictions on, is called the predictor variable.

The purpose of this technique is to find optimal weights for the training instances by minimizing the error between the real and the predicted values. As long as the data set contains more instances than attributes this is easily done using the least square method.

Linear regression is quite intuitive and easy to understand but the negative is that it can't handle non-numerical attributes well enough and that it can’t handle more complex nonlinear problems

Principal Component Analysis:

The purpose of principal component analysis is to derive a small number of independent linear combinations of a group of variables that retain as much of the information in the original variables as possible.

In document Crime statistical analysis and predictive policing: bthe case study of Volos (Page 40-45)