NBM with Four Dependent Features Trimmed

Chapter 4 Results

4.4 NBM Analysis Results

4.4.8 NBM with Four Dependent Features Trimmed

Trimming all four features resulted in the final model gave mixed Accuracy results as discussed in the previous models. The results are shown in Table 31.

Actual 5 4 3 2 1

Confusion Matrix: NBM3 minus Core, SameEvent, SysType

ALL DATA MINUS 3: Core, SameEvent, SysType

Global Accuracy = (TP+TN)/(TP+TN+FP+FN)

NBM 18 - All Data minus Core, SameEvent, &

SysType Prediction

Table 31 - NBM 18 Trim Core, SameEvent, SysType & ACAT Features

4.4.9 NBM Accuracy Measures Comparisons

Table 32 shows the various measures of accuracy that are relevant to classification models as explained below:

1. A brief description of the features that define the NBM is in columns 1 and 2.

2. The Prediction and Global Accuracies are shown in the columns 3 and 4 to indicate the overall performance of the NBM.

Table 32 - Model Accuracy Summary

Actual 5 4 3 2 1

Confusion Matrix: NBM3 minus Core, SameEvent & ACAT

IDI

ALL DATA MINUS 3: Core, SameEvent & ACAT

Global Accuracy = (TP+TN)/(TP+TN+FP+FN)

NBM 20 - All Data minus Core, SameEvent, SysType, & ACAT

Prediction

0.600 0.878 0.757 0.841 0.797 0.612 0.750 0.674

0.533 0.883 0.839 0.825 0.832 0.612 0.750 0.674

0.533 0.905 0.846 0.873 0.859 0.674 0.775 0.721

0.729 0.893 0.825 0.825 0.825 0.667 0.750 0.706

0.706 0.893 0.812 0.889 0.848 0.682 0.750 0.714

0.533 0.888 0.831 0.857 0.844 0.674 0.775 0.721

0.533 0.881 0.848 0.889 0.868 0.674 0.775 0.721

0.667 0.898 0.828 0.803 0.835 0.638 0.769 0.698

0.600 0.902 0.852 0.825 0.839 0.674 0.775 0.721

0.533 0.902 0.836 0.889 0.862 0.738 0.775 0.756

0.533 0.908 0.825 0.825 0.825 0.660 0.775 0.713

0.533 0.902 0.809 0.873 0.840 0.732 0.750 0.741

0.600 0.885 0.828 0.841 0.835 0.674 0.756 0.731

0.667 0.885 0.820 0.794 0.806 0.625 0.750 0.682

0.600 0.895 0.831 0.857 0.844 0.667 0.750 0.706

0.667 0.885 0.810 0.794 0.806 0.659 0.725 0.690

0.600 0.892 0.852 0.825 0.839 0.667 0.750 0.706

0.600 0.888 0.800 0.825 0.813 0.659 0.725 0.690

** N/A is due to division by zero however, since this interval has the least number of instances it does not affect the maximum most accurate interval F1 Score

*Note: Model NBM4 & higher result from feature removal based on Independence testing showing feature

a. The Prediction Accuracy is the typical measure of the correct predictions divided by the total number of predictions, which as indicated earlier is based on a set of 15 randomly selected use cases.

b. The Global Accuracy is the more accurate measure for classification models, because it averages the accuracy of 10 random samples of prediction sets and provides the Confusion Matrix to show the performance of the model within each interval class.

3. The remaining columns provide the precision and recall for the two intervals of interest - IDI-1 and IDI-2. Rather than consider the performance of all of the intervals, IDI-1 and IDI-2 combine for 85.9% of the data (relative

frequency shown in Table 7). The interval precision and recall calculations are important because they provide the opportunity to see which intervals are more accurate and allow the calculation of the F1 Score.

4. The F1 Score for intervals IDI1 and IDI-2 have the highest relative frequency to they were used to determine the NBM 10 as the best set of features to predict the schedule delay.

The following analysis provides additional assessment of the accuracy measures.

No model has the most accurate prediction for every interval. The prior sections provide model assessments that show the discrepancy between Global and Prediction Set

Accuracy. However, as shown in Figure 17, the Global Accuracy measure is relatively stable while the Prediction Set Accuracy is more susceptible to the set of features. Using Prediction Set Accuracy measure, Model 4 would be the choice but the Global Accuracy

measure would select Model 10. These results represent why the F1 Score is the recommended accuracy measure.

Figure 17 - NBM Prediction & Global Accuracy Comparison

The Confusion Matrix for each NBM was used to calculate the Precision, Recall and the F1 Score for each interval to make the final determination of the recommended NBM. As discussed earlier, IDI-1 and IDI-2 intervals were used to select the most accurate NBM since the majority of the data is in these two intervals.

Figure 18 provides a graphical comparison of the F1 Score for each of the NBMs.

Based on the F1 score as the criteria, Model 10 is the recommended NBM for schedule delay prediction based on the F1 Score for intervals IDI-1 and IDI-2 at 86.2% and 75.6%

respectively.

0.500 0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 PredicNon Accuracy GLOBAL ACCURACY

Figure 18 - F1 Score Comparison (IDI-1 & IDI-2)

4.5. Contribution Analysis for Final Set of Features

The most accurate model based on F1 Score is NBM 10 with the following 8 features: ErrorCat, SysGroup, OrgDepend, SysDepend, SysType, Core, Severity and PriorEvent. The Contribution Analysis for these features is provided in the following Pareto Chart (Figure 19).

Figure 19 – Pareto Chart of Contribution Analysis

0.500

ErrorCat SysGroup OrgDepend SysDepend SysType Core Severity PriorEvent Feature ContribuBon to Delay CumulaBve ContribuBon

Contribution Analysis shows that as proven in the Data Preprocessing phase, both technical and non-technical factors are responsible for SWI delay. As Figure 21

indicates, the ErrorCat at 25% has the highest impact on the time to resolve an error followed by the SysGroup (13%), OrgDepend (12.5%), SysDepend (12.2), SysType (11%), Core (10%), Severity (9.3%) and PriorEvent Errors (8.2%). Also notable is that 80% of the time to resolve an error is caused by five features: ErrorCat, SysGroup, OrgDepend, SysDepend and SysType. Only one of these features was captured in the initial error reports, the remaining four features were mined from the data as known SWI challenges. These features link to following SWI Challenges and Questions (Table 33).

Table 33 – Contribution of SWI Challenges

SWI Challenges

(Literature Survey Results) Questions Used to Populate

Learning Database Learning Database Column and Node in NBM

(Features) Technical Risk 4. What is the type of error? CEC

SoS Complexity 10. Is the system a SOS? SysGroup

Independent Management 3. Does the error impact more

than 1 organization? OrgDepend

System Interdependencies 1. Does the error impact more than 1 system?

SysDepend

System Interdependencies 2. Is the system in the

command post? SysType

SoS Complexity 9. Is the system a core

system? Core

Technical Risk 5. Does the error impair a

critical task? Severity

Non-Technical Risk 6. Does the system have

errors in prior events? PriorEvent

97 4.6 Summary of Findings

These results of the NBM development presented in this dissertation are an indication of the ability to use historical data to provide accurate schedule prediction by using feature selection to determine the most important features. The prediction and contribution analysis can be used to support decisions that enable systems engineers and managers to realign resources or shift schedules to meet their priorities, as well as support risk mitigation decisions. Essentially, the features developed for this research are

accurate for predictions of the IDI-1, IDI-2, IDI-3, IDI-5 that include errors that are resolved within 92 days or those that take more than 176 days to resolve. However, the NBM is less accurate in predicting IDI-4, that includes errors that take between 92 and 176 days to resolve. Fortunately, this range does not occur frequently (1.49% of the time – see Table 10). Additional study is required to fully investigate this outcome.

4.6.1 Feature Selection Had Mixed Results on Accuracy

Model accuracy was assessed using features to develop different models.

Initially, features based on the historical US Army error reports were used to determine the baseline model accuracy. These same features were subsequently used with the addition of the external feature (ACAT) that show a slight increase in Global Accuracy while Prediction Set Accuracy showed a slight decrease. Finally, a third model with the first two sets of features and the addition of the seven features from SWI Challenges determined from the literature survey resulted in increased Global Accuracy but no change in the Prediction Set Accuracy. This difference between Global and Prediction Set accuracy continues through each of the model variations assessed. Removal of

features based on Independence testing, provided further variation in these accuracy measures. However, to fully assess the impact of removal of features, the Precision, Sensitivity and F1Score measures were the final determinant of the most accurate model.

NBM-10 with eight features was the most accurate with 90.2% Global accuracy and 53.3% Prediction Set accuracy. This accuracy is comparable to other similar NBM models (Bielza, 2014) but exceeds other methods for SW development resource estimation that includes schedule estimation and averages 39% (Boehm & Valerdi, 2011).

4.6.2 Implications to Technical Impacts on Delay

Four of the eight features primarily quantify the impact that interdependencies have on schedule delay and were not included in the original error reports. SysGroup, SysDepend, OrgDepend, SysType, all define different aspects of the interdependencies that define complex systems represented as SWI Challenges. Based on this dissertation research, these features contribute 48% of the schedule delay. Because these errors often are not revealed through system level testing, they create challenges that generally

require stakeholder and system owner communications and resources to troubleshoot and resolve. By not considering the impact of these interdependencies in the original or updated schedule estimation, almost half of the time required to resolve an error is omitted, which is likely responsible for the underestimation of the time required to resolve an error.

4.6.3 Implications to Organizational Impacts on Delay

The NBM contribution analysis also provides objective analysis of the impact organizational dependencies (OrgDepend) have on schedule delay prediction caused by

integration errors. OrgDepend was mined from the data based on those errors that had at least two systems and two organizations that responsible for resolving the error.

According to the contribution analysis, OrgDepend is third when ranked according to its impact on the integration schedule delay. While the majority of the features are primarily technical, the organizational impacts ranking show its importance in understanding the causal factors that create the schedule delay.

100

Chapter 5 – Conclusions

This dissertation set out to develop a NBM to predict the schedule delay created by SWI errors. Error reports from Army SWI events from three events were used as a data source. The results are promising and show that SWI challenges can drive features that accurately predict the schedule delay. Previous chapters in this dissertation research presented the key findings that include determination of key features for the NBM based on accuracy and the contribution each feature has on the model prediction.

First, the features for a NBM that best predict the SWI delay created by errors during the SWI phase of development were determined. The approach to develop the features relied on a Literature Survey of SWI challenges documented through journal articles and conference papers published in the past 10 years. The features that were most prevalent in 30 articles were summarized and used to drive data mining activities that extracted the evidence of these challenges from the original error reports. The resulting SWI challenges include: System Interdependencies, Independent Management, SW Integration Risks (Technical and Non Technical) and Complexity. These challenges as reflected in the Army error reports were represented as 10 features that were further trimmed through feature engineering.

Feature engineering was an important aspect of model development that

supported the removal of features based on Independence analysis results. Research has shown that multiple dependencies between features can reduce model accuracy, which was proven by this research. To reduce this effect, two features were trimmed due to their dependency with multiple features resulting in improved accuracy. The remaining

101

eight features resulted in the most accurate model included seven technical and one organizational category.

The research also determined the contribution each of the features adds to the prediction along with the model accuracy. Based on the features in the most accurate model, the Error Category is shown to have the most impact in predicting the schedule delay created by SWI errors and accounts for approximately 24% of the estimate. While this is significant, a more accurate prediction results from the combined impact of the remaining features. The NBM global accuracy was 90% based on the resulting confusion matrix for NBM-9.

Based on the data set used for this research, it is reasonable for managers to prioritize their efforts on implementing technical fixes that reduce the specific error categories. However, it is also worth noting that 46% of the schedule delay is due to non technical factors that may offer opportunities to reduce schedule delay by establishing methods and processes that facilitate stakeholder communications. Further study is needed to offer more specific recommendations.

The prior probabilities, as shown earlier in, also provide some insights into the data: approximately 86% of the errors are corrected within three months, but the remaining 14% can take up to 324 days. The interdependencies that create these vastly different outcomes must be looked into further, which is outside the scope of this study.

The more immediate concern for the decision maker is how to adjust the schedule to accommodate the prediction or decide to modify the intended capability by eliminating a system that cannot be integrated within time or funding constraints.

102 5.1 Importance of this Research

This research validated the role of historical records in feature selection for NBM development that traditionally relies on expert judgment. The results are a clear

indication of the ability of the NBM to use historical data to develop features that provide accurate schedule prediction. This prediction can then be used to support decisions that enable systems engineers and managers to realign resources or shift schedules to meet their priorities as well as support risk mitigation decisions.

Additionally, this research developed and validated a process to use a survey of known SWI Challenges to define features for NBM. The literature survey process is relevant to legacy systems that may lack access to expert judgment for problem

resolution. This process also supports prediction in the SWI phase that traditionally has a lack of expertise and relies primarily on the judgment of system level experts that

generally do not have the ability to predict the impact of interdependencies on schedule.

Finally, this research provides the evidence-based contribution that each feature provides to the schedule delay. Understanding this contribution is important because it allows managers to allocate resources accordingly and emphasize solutions earlier in the SW development phase that may allow reduction or avoidance of types of errors that show a high schedule impact.

5.2 Limitations of this Research

The example used in this paper is based on Army SW integration events, and therefore the results are specific to this set of systems. However, a similar methodology may support SW integration for a set of systems by analyzing historical data to build a

103

prediction model such as the NBM. Also the accuracy achieved for this research still results in 20 percent of the SWI errors not being accurately predicted. Most of these errors occur infrequently in the dataset so their patterns were not well represented. None of these limitations are significant and on a case-by-case basis, the circumstances that create these types of errors may rely more heavily on expert judgment while records continue to be accumulated in the dataset. For NBM, every error is added to the database for the next prediction that further establishes the pattern that results in improved model predictions.

5.3 Future Work

As discussed in the Limitations, in order to implement further refinement of the model, additional analysis on outliers in the data that create extreme delay is

recommended for future work. The uncertainty represented by these error reports requires that an additional set of constraints be assessed by the manager to supplement the model with additional nodes or additional states to one or more of the nodes. These incidents are primarily System of System (System Type 1) and Error Category 3 and likely introduce an additional set of integration challenges that are not easily apparent in the available variables that represent the data (evidence nodes). Further analysis is recommended to determine the best way to adjust the model to include the potential for additional nodes or additional states to select nodes. The model can also be expanded to (1) Consider the impact that unexpected architecture changes resulting from the SW update or modification have on the integration delays. (2) Assess the impact of additional discretization techniques on accuracy. (3) Enhance training for the model by increasing the number of samples.

104 References

Baldwin, W. C., & Sauser, B. (2009). Modeling the characteristics of system of systems. Paper presented at the System of Systems Engineering, 2009. SoSE 2009. IEEE International Conference On, 1-6.

Basili, V. R., Briand, L. C., & Melo, W. L. (1996). A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering, 22(10), 751-761.

Bielza, C. (2014). Discrete bayesian network classifiers: A survey. ACM Computing Surveys, 47(1), 1.

Boehm, B., & Valerdi, R. (2011). Impact of software resource estimation research on practice: A preliminary report on achievements, synergies, and challenges. 33rd International

Conference on Software Engineering (ICSE) , Waikiki, Honolulu, HI USA. 1057-1065.

Cantot, P., & Luzeaux, D. (2011). Simulation and modeling of systems of systems [electronic resource]. London : Hoboken, N.J.: ISTE ; Wiley.

Chittister, C. G., & Haimes, Y. Y. (1996). Systems integration via software risk management.

IEEE Transactions on Systems, Man & Cybernetics, Part A (Systems & Humans), 26(5), 521-32. doi:10.1109/3468.531900

Dahmann, J. (2015). The state of systems of systems engineering knowledge sources. Paper presented at the System of Systems Engineering Conference (SoSE), 2015 10th, 475-479.

doi:10.1109/SYSOSE.2015.7151979

105

Dahmann, J., Lane, J., Rebovich, G., & Lowry, R. (2010). Systems of systems test and evaluation challenges. Paper presented at the System of Systems Engineering (SoSE), 2010 5th

International Conference On, 1-6. doi:10.1109/SYSOSE.2010.5543979

Dahmann, J., Rebovich, G., Lane, J., Lowry, R., & Baldwin, K. (2012). An implemented view of systems engineering for systems of systems. Aerospace and Electronic Systems Magazine, IEEE, 27(5), 11-16. doi:10.1109/MAES.2012.6226689

Davendralingam, N., & Kenley, C. R. (2013). A mechanism design framework for the acquisition of independently managed systems of systems. Paper presented at the System of Systems Engineering (SoSE), 2013 8th International Conference On, 171-176.

doi:10.1109/SYSoSE.2013.6575262

Davis, J. J., & Foo, E. (2016). Automated feature engineering for HTTP tunnel detection.

Computers & Security, 59, 166-185.

Dillon, R. L., Paté-Cornell, M. E., & Guikema, S. D. (2005). Optimal use of budget reserves to minimize technical and management failure risks during complex project development.

Engineering Management, IEEE Transactions On, 52(3), 382-395.

doi:10.1109/TEM.2005.850733

DiMario, M., Cloutier, R., & Verma, D. (2008). Applying frameworks to manage SoS architecture.

. Engineering Management Journal, 12(4), 18-23.

DOD directive 5000.1 (2015). . Washington DC: Government Printing Office.

Domingos, P., & Pazzani, M. (1997). On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29(2-3), 103-130.

106

Felder, W. (2012). The elephant in the mist: What we don't know about the design, development, test and management of complex systems. Journal of Aerospace Operations, 1(4), 317-327.

Ferreira, S., Faezipour, M., & Corley, H. W. (2013). Defining and addressing the risk of

undesirable emergent properties. Paper presented at the Systems Conference (SysCon), 2013 IEEE International, 836-840. doi:10.1109/SysCon.2013.6549981

Flores, M. J., Gámez, J. A., Martínez, A. M., & Puerta, J. M. (2011). Handling numeric attributes when comparing bayesian network classifiers: Does the discretization method matter?

Applied Intelligence, 34(3), 372-385.

Frese, R., & Sauter, V. (2014). Improving your odds for software project success. Engineering Management Review, IEEE, 42(4), 125-131.

Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2-3), 131-163.

Gandhi, S. J., Gorod, A., & Sauser, B. (2011). A systemic approach to managing risks of SoS.

Paper presented at the Systems Conference (SysCon), 2011 IEEE International, 412-416.

doi:10.1109/SYSCON.2011.5929045

Gupta, A., Mehrotra, K. G., & Mohan, C. (2010). A clustering-based discretization for supervised learning. Statistics & Probability Letters, 80(9–10), 816-824.

Holte, R. (1993). Very simple classification rules perform well on most commonly used datasets.

Machine Learning, 11(1), 63.

Houston, D. (2014). A generalized duration forecasting model of test-and-fix cycles. Journal of Software: Evolution and Process, 26(10), 877-889.

107

Jain, R. (2010). A framework for end-to-end approach to systems integration. International Journal of Industrial and Systems Engineering, 5(1), 79-109.

Jain, R., Chandrasekaran, A., Elias, G., & Cloutier, R. (2008). Exploring the impact of systems architecture and systems requirements on systems integration complexity. IEEE Systems Journal, 2(2), 209.

Kodama, M. (2011). Knowledge integration dynamics [electronic resource] : Developing strategic innovation capability. Singapore: World Scientific.

Kupervasser, O. (2014). The mysterious optimality of naive bayes: Estimation of the probability in the system of "classifiers". Pattern Recognition and Image Analysis, 24(1), 1-10.

Lamb, C. T., & Rhodes, D. H. (2009). Collaborative systems thinking: Uncovering the rules of team-level systems thinking. Paper presented at the Systems Conference, 2009 3rd Annual IEEE, 413-418. doi:10.1109/SYSTEMS.2009.4815837

Langford, G. O. (2012). Engineering systems integration : Theory, metrics, and methods (1st ed.).

Hoboken: Taylor and Francis.

Larose, D. T. (2014). Discovering knowledge in data : An introduction to data mining (2nd ed.).

Somerset: Wiley.

Little, T. (2006). Schedule estimation and uncertainty surrounding the cone of uncertainty. IEEE Software, 23(3), 48-54.

Liu, H., Hussain, F., Chew Lim Tan, & Dash, M. (2002). Discretization: An enabling technique.

Data Mining and Knowledge Discovery, 6(4), 393-423.

108

López-Martín, A., Chavoya, A., & Meda-Campaña, M. E. (2015). A fuzzy logic model for

In document Predicting Schedule Delay Caused By Errors During Software Integration. by Kelly Dula Alexander (Page 105-0)