Graphical models versus neural networks

Statistical data mining

5.6 Graphical models

5.6.3 Graphical models versus neural networks

We have seen that the construction of a statistical model is a long and conceptually complex process and it requires the formulation of a series of formal hypotheses. On the other hand, a statistical model allows us to make predictions and simulate scenarios on the basis of explicit rules that are easily scalable – rules that can be generalised to different data. In Chapter 4 we saw how computationally intensive techniques require a lighter analytical structure, allowing us to ﬁnd precious information rapidly from large volumes of data. Their disadvantages are low transparency and low scalability. Here is a brief comparison to help underline the different concepts. We shall compare neural networks and graphical models; they can be seen as rather general examples of computational methods and statistical methods, respectively.

The nodes of a graphical model represent random variables, whereas in neural networks they are computational units, not necessarily random. In a graphical model an edge represents a probabilistic conditional dependency between the

corresponding pair of random variables, whereas in a neural network an edge describes a functional relation between the corresponding nodes. Graphical models are usually constructed in three phases: (a) the qualitative phase establishes the conditional independence relationships among the random variables; (b) the probabilistic phase associates the graph with a vector of random variables having a Markovian distribution with respect to the graph; (c) the quantitative phase assigns the parameters (if known) that characterise the distribution in (b). Neural networks are constructed in three similar phases: (a) the qualitative phase establishes the organisation of the layers and the relationships among them; (b) the functional phase speciﬁes the functional relationships between the layers; (c) the quantitative phase ﬁxes the weights (if known) associated with the connections among the different nodes.

I believe that these two methodologies can be used in a complementary way. Taking a graphical model and introducing latent variables – variables that are not observed – confers two extra advantages. First, it allows us to represent a multi- layer perceptron as a graphical model, so we can take formal statistical methods valid for graphical models and use them on neural networks (e.g. conﬁdence intervals, rejection regions, deviance comparisons). Second the use of a neural network in a preliminary phase could help to reduce the structural complexity of graphical models, reducing the number of variables and edges present, and doing it in a more computationally efﬁcient way. Adding latent variables to graphical models, corresponding to purely computational units, allows us to enrich the model with non-linear components, as occurs with neural networks. For more on the role of latent variables in graphical models, see Cox and Wermuth (1996).

5.7

In this chapter we have reviewed the main statistical models for data mining applications. Their common feature is the presence of probabilistic modelling. This makes the results much easier to interpret but it may slow down the imple- mentation and elaboration phases. I have tried to give an overview of the relevant literature.

We began with methods for modelling uncertainty and inference; there are many textbooks on this. One to consult is Mood, Graybill and Boes (1991); another is Azzalini (1992), which takes more of a modelling viewpoint. Non- parametric models are distribution-free, as they do not require heavy preliminary assumptions. They may be very useful, especially in an exploratory context. For a review of non-parametric methods see Gibbons and Chakraborti (1992). Semi- parametric models, based on mixture models, can provide a powerful probabilistic approach to cluster analysis. For an introductory treatment from a data mining viewpoint, see Hastie, Tibshirani and Friedman (2001).

Introduction of the Gaussian distribution allows us to bring regression methods into the ﬁeld of normal linear models, and therefore to correlate the least squares method with measures of sample variability, as well as to provide thresholds for evaluating goodness of ﬁt. For an introduction to the normal linear model,

consult Mood, Graybill and Boes (1991) or a classic econometrics text such as Greene (1999). It is possible to develop the normal linear model into generalised linear models. For an introduction consult the original article of Nelder and Wedderburn (1972) and the books of Dobson (1990), McCullagh and Nelder (1989) and Agresti (1990).

Log-linear models are an important class generalised linear models. They are symmetric models and are mainly used to obtain the associative structure among categorical variables, whose observations are classiﬁed in multiple contingency tables. Graphical log-linear models are particularly useful for data interpretation. For an introduction to log-linear models, look at the earlier texts or at Christensen (1997). For graphical log-linear models it is better to consult texts on graphical models, for example Whittaker (1990).

We introduced the concept of conditional independence (and dependence); graphical representation of conditional independence relationships allowed us to take what we saw for graphical log-linear models and generalise it to a wider class of statistical models, known as graphical models. Graphical models are very general statistical models for data mining. In particular, they can adapt to different analytical objectives, from predicting multivariate response variables (recursive models) to ﬁnding associative structure (symmetric models), in the presence of both qualitative and quantitative variables. For an introduction to graphical models, consult Edwards (1995), Whittaker (1990) or Lauritzen (1996). For directed graphical models, also known as probabilistic expert systems, see Cowell et al. (1999) or Jensen (1996).

In document Applied Data Mining Statistical Methods for Business and Industry Giudici P (2003) pdf (Page 198-200)

Graphical models versus neural networks

Statistical data mining

5.6 Graphical models

5.6.3 Graphical models versus neural networks

5.7

Further reading