• No results found

– Handling Massive Models

8.3 Experimental Evaluations of DMMAL Framework

8.3.3 Evaluation of Execution Speed

8.7 crashes

3 (or Tuesday) El-Nino

(178,080)

8.8 25.65<=28.01 25.65<=28.01 8.9 25.65<=28.01 25.65<=28.01 8.10 -6.7<=-2.1 -6.7<=-2.1

Accuracy

Equivalent Results

8.3.3 Evaluation of Execution Speed

We conducted another set of experiments to compare the execution speed of the three algorithms on the three datasets in Table 8.1. From the results, we specifically compared the learning speeds of Weka, GeNIe, and our framework using GA on the El-Nino and Banking datasets stored remotely on a secondary storage. In the same vein as with Weka and GeNIe, which use a processor, we learned models from the massive datasets with one symmetric processor (or actuator). This set of experiments was successfully repeated by distributing and concurrently increasing the number of configurable CPT actuators while recording the learning time. The effectiveness of our framework using concurrent CPT actuators is depicted by Figures 8.5 and 8.6. The results show that using the Genetic algorithm and increase in the number of CPT actuators in our framework makes the learning process faster.

Figure 8.5: Increasing number of CPT actuators on El-Nino dataset minimizes (or speeds up) learning time better than the Weka and GeNIe, whose learning process was aborted.

University

of Cape

Town

152

Figure 8.6: Increasing number of CPT actuators on Banking dataset minimizes (or speeds up) learning time better than the Weka and GeNIe whose learning process was aborted.

By contrast with DMMAL, when looking at Figures 8.5 and 8.6, one can observe that the time of Weka and GeNIe learning algorithms increases exponentially as they tend toward infinity. Their modelling processes were eventually aborted due to the limited memory, as shown in Table 8.1. By comparing the learning speed of the highest (best) number of CPT actuators used with the usual single processor of Weka and GeNIe, within the allocated limited memory, the GA using the framework performs remarkably better than Weka and GeNIe. This shows from Table 8.1 or Figures 8.5 and 8.6 that by using the framework, the GA modelling is successful and faster than Weka and GeNIe when learning models from the three datasets respectively.

A similar performance pattern is revealed by Figures 8.5 and 8.6 when the old model adapted new observations using the adaptive operator in Figure 8.4. By cross validation method [8], 20% of each of the datasets were selected at random as new observations, and were used to update the old knowledge (or model). Minimization of the learning time results was also recorded by increasing the CPT actuators similarly to Figures 8.5 and 8.6. However, such excellent improvement balances scalability by equally managing the limited memory, as experimented in the next subsection. Thus, a complete scalable learning optimizes Bayesian networks and makes them reliably available for intelligent systems.

University

of Cape

Town

153 8.3.4 Evaluation of Memory Consumption

The results of experiments 1 to 4 show that users who are not fortunate enough to be in a networking environment or who cannot afford a suitable networked machine can safely learn models from massive datasets on a machine with limited memory using our DMMAL framework.

We conducted experiments similar to 1 to 3 above to evaluate the memory management capability of our framework when using the El-Nino and the Banking datasets. The results are depicted in Figures 8.7 and 8.8. As discussed in our introduction, most existing conventional learning methods load the entire datasets into the memory, which can lead to memory failure when they become massive. Also, a client workstation that receives processing loads from a server is susceptible to memory failure, if most of the available memory slots are fully occupied or the processing takes too long to complete. One can observe in Figures 8.7 and 8.8 from the Weka and GeNIe learning processes that varying the number of CPT actuators does not improve on memory usage because all the records are loaded onto the memory at one time. The details of occupied megabytes of memory can be seen in Table 8.1, which eventually results in a halt state.

From the results in Figures 8.7 and 8.8, the GA using our framework successfully managed the same limited memory by concurrently exploiting secondary storage resources on remote locations (e.g.

hard disk on a machine or on workstations). In comparing DMMAL with Weka and GeNIe learning, only 37.9 megabytes and 37.2 megabytes of memory were used in Figures 8.7 and 8.8 respectively by the fastest learning process of the configurable actuators. Though there are slight increases in memory usage as the number of CPT actuators increases, one can observe in Figures 8.7 and 8.8 that our framework reduces the memory usage to a minimum acceptable level. For example, this shows that DMMAL saves 35.76% and 41.51% of the limited memory from crashing, as compared with Weka learning in Figures 8.7 and 8.8 respectively.

University

of Cape

Town

154

Figure 8.7: Concurrent distribution of actuators on El-Nino dataset minimizes memory usage better than Weka and GeNIe whose learning process crashes the memory.

Figure 8.8: Concurrent distribution of actuators on Banking dataset minimizes memory usage better than Weka and GeNIe whose learning process crashes the memory.

As percentage differences of the best (highest) number of actuators are computed from Table 8.1, Table 8.3 shows the average percentages of memory minimized by our framework as improvement over the Weka and GeNIe learning processes in terms of memory management. The overall average memory minimized is 43.76% as shown in Table 8.3. This once again supports our claim that users cannot afford to trade-off between time and space in real-life Bayesian learning. These impressive scalability results of our economic framework on a computer machine guarantee further improved performance if deployed on

University

of Cape

Town

155

environments with additional physical processors or more suitable networks. Thus, this framework is significantly beneficial to the existing work on learning approaches in optimizing learning and making Bayesian networks reliably available for intelligent systems.

Table 8.3: Average Percentage of memory minimized by DMMAL over other memory usages of learning methods. network models from massive datasets, as an alternative optimization solution to the computational intensity (or NP-hard) problems arising in intelligent systems. Experimental results revealed that the use of our framework is an economically scalable solution to the problem, as it does not require purchasing expensive hardware. The results of Figures 8.5 – 8.8 support the claim that using the framework with our Genetic algorithm leads to faster emergence from massive datasets without memory failure as compared with conventional algorithms such as Weka and GeNIe modelling. This excellent result confirms the concurrent distribution of the configurable CPT actuators including other efficient components in DMMAL.

Our framework was rigorously subjected to a number of scalability evaluations to show that models can be emerged from massive datasets by balancing between space and time to mitigate memory failures, computational time problems, and therefore optimize network learning. One of its greatest competitive advantages is the capability of our framework to continue modelling where execution stops, if the learning process is accidentally suspended possibly due to electricity power failure. This is as a result of the CPT actuators acting on the massive datasets residing on the hard-drive. This is a tremendous solution for developing countries.

University

of Cape

Town

156

We have shown through experimentation that limited memory needs to be dynamically managed by concurrently distributing configurable CPT actuators remotely (secondary storage or workstations) with faster execution. We have also shown qualitative experimental results by using representative network structures which guarantee a significant optimization of Bayesian models, making these models available for reasoning by intelligent systems.

This study shows that the framework has the potential to become a more powerful scalable solution that puts an end to all computational problems raised in various research efforts. The impressive results presented in this chapter motivate the successful application of DMMAL components in [73] [89].

If Bayesian model researchers and intelligent-system engineers can integrate the framework into their developments, there will be more contributive solutions to computational intensity. We have a vested interest in applying our framework to solve computational intensity problems which are also challenges in diverse research fields and industries.

University

of Cape

Town

157