Expert Bayesian Network 48 - EXPERIMENTAL DESIGN 45

CHAPTER V EXPERIMENTAL DESIGN 45

5.4 Expert Bayesian Network 48

With a foundation as to which attributes are dependent on each other through Cramer’s V measure of association, we constructed the Bayesian Network with a specified structure. By running the conditional independence test on the arcs illustrated in Figure 5, we found out that all the parent-child pairs of the NEBN_FS are indeed dependent. The maximum p-value of the test detailed in Section 5.3 over all the arcs was 2.2 ∗ 10 ; therefore, we reject the null hypothesis, that these nodes are independent of their parent nodes given all their other parent nodes, as there is enough evidence to suggest that they are dependent on them. After verifying that the hill- climbing algorithm constructed reasonable arcs, we examined these arcs and the heatmap in

Figure 6 to specify the Expert Bayesian Network’s structure. After considering many

combinations and tinkering with the arcs to match our assumptions, the structure of its DAG was finalized as illustrated in Figure 7.

Figure 7: Directed Acyclic Graph of the Expert Bayesian Network 5.4.2 Initial Comparison to NEBN_FS

First, the same conditional independence test was performed on all arcs in the Expert Bayesian Network, and they all gave a p-value 2.2 ∗ 10 . Therefore, as was the case for NEBN_FS, we reject the null hypothesis, that each node at the head of an arc is independent of that parent node given the state of all their other parent nodes. When comparing the arcs of the Non-Expert Bayesian Network using feature selection whose structure was specified using the hill- climbing algorithm (shown in Figure 5) to our Expert Bayesian Network’s arcs (seen in Figure

7), some similarities but also some major differences are noted.

Some arcs are included in both networks but reversed in the Expert Bayesian Network, as is the case with the arcs from DEFRTG to FGM_OPP, eFG% to TS%, FGM to PSUTP and

FGM_OPP to PSUTP. The direction of these arcs was reversed based on specialty domain

knowledge, knowing how attributes depend on each other. The arc from eFG%_OPP to

TS%_OPP stayed the same. A major difference in the structure of the networks results from the

decision to remove all the arcs from MINUTES PLAYED to the other variables and add an arc from MINUTES PLAYED to TOTAL POINTS in the Expert Bayesian Network. Adding the latter arc is not surprising as we want TOTAL POINTS to be directly dependent on the end of the quarter when we are making our wager, but the removal of the others can be questioned. Although one of the primary reasons to include MINUTES PLAYED in our final set of features was because of the dependence of other attributes on it, the data proved, through statistical tests, that an arc from MINUTES PLAYED to these other attributes is not needed, as these attributes are independent of the time given their new parents. One last, unorthodox, approach to specifying our structure was that, instead of having the points at the end of each quarter be dependent on PSUTP, we had the total points scored by both teams at the end of the game, TOTAL POINTS, depend on PSUTP and these attributes be dependent on TOTAL POINTS. This was done because of how we calculate the probabilities to aid in our decision-making process, as we are finding the probability distribution of TOTAL POINTS instead of just predicting the attribute’s value. This was the only change to the network’s structure driven by the purpose of the BN rather than the data.

We computed the BIC of both networks, seeing how this measure was used to validate feature selection, to see which network was better supported by the data. The BIC values for both networks can be seen in Table 7. Unsurprisingly, the BIC of the Non-Expert Bayesian Network is greater than that of the Expert Bayesian Network, which indicates that the latter is a better model

for our data. Obviously, because the Non-Expert Bayesian Network was constructed using an algorithm that maximizes this score, we expect it to perform better according to this metric.

Table 7: BIC Comparison of Bayesian Networks

Although the BIC score of the Non-Expert Bayesian Network is greater than that of the network whose structure we specified, this is just one measure used to compare the networks. Moreover, the BIC score considers all the attributes in the data set and how likely we are to predict all those attributes given a new unseen data set (Scutari & Denis, 2014). For our purpose, we are interested in seeing how well we can predict only one of those attributes, namely TOTAL

POINTS, given an unseen set. If we use a different measure that is not directly related to how

either of the networks were built and is therefore unbiased, we can see that our Expert Network does a better job at predicting the value of TOTAL POINTS. Using the same package (bnlearn) as before, with the same rescaling factor of -2 from the classical definition, we computed the Akaike Information Criterion (AIC) according to Eq. (5.3).

| (5.3)

Table 8 shows that, using this unbiased measure on the specific node for TOTAL POINTS, the Expert Bayesian Network is a better predictor of this attribute than the Non-Expert

Bayesian Network. Although the Non-Expert Bayesian Network yields a more favorable overall BIC score, the Expert Bayesian Network yields a more favorable (higher valued) AIC score for

the attribute in question. While the improvement in AIC for the class attribute is encouraging, the real question is whether either network can provide information for profitable betting.

Table 8: AIC Comparison on the Class Node

In document A dynamic Bayesian network to predict the total points scored in national basketball association games (Page 57-61)