Learning Database - NBM Features Developed from SWI Challenges

Chapter 4 Results

4.3 NBM Features Developed from SWI Challenges

4.3.2 Learning Database

The Learning Database was developed in Excel based on the mapping of the three data sources and transformation of the 134 Army Records based on the responses to the questions that represent each of the predictor nodes. Each row represents one of the error reports. The responses to the questions are captured in Columns 1 through 10 of the Learning Database. The database consists of eleven columns and 134 rows to represent each record. Columns 1 through 10 contain the predictor node data and Column 11 contains one of the five discretized IDI that corresponds to the actual data in the error report. The Columns represent the features developed from SWI challenges. A key step

in knowledge integration is the transformation of the original error reports in the features that populate the Learning Database.

A sample of the records in the Learning Database is shown in Table 9. The numbers in the cells represent True=1 or False=2 responses to the questions in Table 8.

The entire Learning Database is provided in Appendix B.

Table 9 - Sample of Learning Database Entries

The distribution of the training and prediction set data as compared to the entire database for each target node category is provided in Table 10. These data sets when compared to the entire dataset show the similarity of the frequencies as representative samples of the entire Learning database. The training and prediction sets were randomly selected and were used to determine the accuracy. Further discussion of these datasets along with their distribution for each predictor node (features) is provided in 4.4.1.1 and 4.4.1.2.

Table 10 - Dataset Summary

The training data set of 119 records was randomly selected as approximately 90%

of the 134 records in the Learning Database. The same training set was used for each

Severity ErrorCat ACAT1 SysDepend OrgDepend SameEvent PriorEvent SysGroup SysType Core IDI (Days)

2 1 2 2 2 1 1 1 1 1 1

IDI IDI-1 73 54.48% 63 52.94% 8 53.33%

IDI-2 42 31.34% 40 33.61% 4 26.67%

IDI-3 11 8.21% 10 8.40% 2 13.33%

IDI-4 2 1.49% 1 0.84% 0 0.00%

IDI-5 6 4.48% 5 4.20% 1 6.67%

134 100.00% 119 100.00% 15 100.00%

Target

Global Accuracy measure. The frequency count and the prior probability distribution of each of the 10 features is provided in Table 11. The prediction data set consists of 15 records and is shown in Table 12. This dataset is used for Prediction Set Accuracy for each of the models as discussed in paragraph 4.5 below.

Table 11 - Training Data Set Distribution

Table 12 - Prediction Data Set Distribution

Feature Categories Frequency %

80 4.4 NBM Analysis Results

NBM versions were assessed to determine the impact the features have on the accuracy of the model. Prediction Set Accuracy and Global Accuracy were calculated for each of the resulting NBM versions. Prediction Set Accuracy is the traditional measure to determine the performance of a model while the Global Accuracy has particular significance to classification models such as the NBM developed in this dissertation research. The ratio of correct predictions to total predictions based on a random set of data is the Prediction Set Accuracy.

The Global Accuracy was developed from the Confusion Matrix that resulted from 10 fold cross-validation. The Confusion Matrix shows the performance of the NBM relative to each of the intervals. Each of the resulting models and their associated

features along with the Global Model Accuracy and the Prediction Set Accuracy is shown in 4.4.1 through 4.4.8. The resulting NBM with the features, accuracy measures and confusion matrix is shown in Tables 13 through 33.

4.4.1 NBM 1: Army Data Features (Severity and ErrorCat)

Model 1 as shown in Table 13 was developed using the two data elements recorded in the original error reports: Severity and ErrorCat. The resulting NBM used the training data set with only Severity and ErrorCat data at the percentages represented in Table 11.

Table 13 – NBM 1 Severity & ErrorCat

4.4.2 NBM 2: Severity, ErrorCat and ACAT

Model 2 was developed using the Model 1 data elements with the addition of the ACAT. The resulting model analysis with features Severity, ErrorCat and ACAT along with Global and Prediction Accuracy is shown in Table 14.

Table 14 - NBM 2 - Accuracy Results With External Data (ACAT)

Confusion Matrix: Army Data &

ACAT

4.4.3 NBM 3: All Features (including SWI Challenges)

Model 3 was developed using the Model 2 data elements with the addition of the seven features based on the SWI challenges: The resulting model with 10 features and resulting accuracy analysis is shown in Table 15. Based on Global Accuracy, Model 3 is the most accurate, however based on Prediction Set Accuracy, Model 1 is the most accurate. This pattern where the Prediction Set and Global Accuracy do not result in the same recommendation for the most accurate model continues throughout the NBM development. To resolve this matter, the final recommendation will be made based on additional accuracy measures that assess the individual interval accuracy as discussed in 4.5.9.

Table 15 - NBM 3 - All Features (including SWI Challenges)

4.4.4 Independence Test Results

Chi-Square for Independence (Table 16) results show that the following features have at a dependency with at least one of the feature: Core, SysType, ACAT and

SameEvent. These features were individually and jointly trimmed to determine the

IDI

impact on model accuracy. The test was conducted at alpha = .05 with the H0 =Features are Independent; HA = Features are Dependent. Each of the dependent features was trimmed to determine their impact on accuracy measures.

Table 16 - Chi Squared Test for Independence Results

4.4.5 NBMs With Dependent Variables Trimmed

Singularly trimming each feature with dependencies resulted in models 4, 5, 6 and 7 as shown in Table 17 through Table 20. Again, the Global and Prediction Accuracy show differing accuracy measures with Global Accuracy being consistently higher but trimming the Core feature results in the most accurate model to this point.

Table 17 - NBM 4 - Trim Core Feature

FEATURES Severity ErrorCat Oversight SysDepend OrgDepend PriorEvent SameEvent SysGroup SysType Core Severity

ErrorCat DNR

ACAT DNR R

SysDepend DNR DNR DNR

OrgDepend DNR DNR DNR DNR

PriorEvent DNR DNR DNR DNR DNR

SameEvent DNR DNR DNR DNR DNR R

SysGroup DNR DNR DNR DNR DNR DNR DNR

SysType DNR DNR R R DNR DNR DNR R

Core R DNR DNR DNR DNR R R R R

DNR Do Not Reject the Null Features Independent HO

Features are Independent HA

Features Are Dependent Reject the Null Dependency Exists

IDI

Confusion Matrix: NBM3 minus Core

Table 18 - NBM 5 – Trim SysType Feature

Table 19 - NBM 6 - Trim SameEvent Feature

Actual NBM 5 - All Data minus SysType

Prediction

Confusion Matrix: NBM3 minus SysType NBM 5 - All Data minus SysType

Prediction

Confusion Matrix: NBM3 minus SysType

Table 20 - NBM 7 - Trim ACAT Feature

4.4.6 NBMs With Sets of Two Dependent Variables Trimmed

Models 5 through 13 were developed after trimming sets of two features to determine the impact on accuracy. The sets of features were selected based on the Chi Squared test for Independence that indicated which features had dependency with other features. The resulting models show that NBM10 that results from trimming the features ACAT and SameEvent result in the highest Global Accuracy measure and the least accurate Prediction Set Accuracy. This difference in accuracy measures is similar to the previous sections, and as previously stated, an assessment of the model behavior at each interval show which model provides the best accuracy at the IDI level. Tables 21 through Table 26 shows the resulting models.

1 56 10 45 7

Confusion Matrix: NBM3 minus ACAT

Table 21 - NBM 8 – Trim Core and SysType Features`

Table 22 - Trim Core & SameEvent Features

Actual

NBM 8 - All Data minus Core & SysType Actual 5 4 3 2 1

Confusion Matrix: NBM3 minus Core & SysType ALL DATA MINUS Core AND SysType

True ALL DATA MINUS Core AND SysType

Actual 5 4 3 2 1

Confusion Matrix: NBM3 minus CORE & SameEvent ALL DATA MINUS CORE & SameEvent

Actual

NBM 9 - All Data minus Core & SameEvent Prediction

Table 23 - NBM 10 - Trim ACAT & SameEvent Features

Table 24 - NBM 11 - Trim SameEvent & SysType Features

Actual 5 4 3 2 1

Confusion Matrix: NBM3 minus ACAT & SameEvent

ALL DATA MINUS ACAT & SAME

Global Accuracy = (TP+TN)/(TP+TN+FP+FN)

NBM 10 - All Data minus ACAT & SameEvent Prediction

Confusion Matrix: NBM3 minus SameEvent & SysType

NBM 11- All Data minus SameEvent&SysType Prediction

ALL DATA MINUS SameEvent & SysTYPE

Global Accuracy = (TP+TN)/(TP+TN+FP+FN)

Table 25 - NBM 12 - Trim ACAT & SysType Features

Table 26 - NBM 13 - Trim Core & ACAT Features

4.4.7 NBMs with Sets of Three Features Trimmed

Additional models were developed based on trimming features in groups of three.

The features that were trimmed show dependency with other features is based on the

IDI

ALL DATA MINUS ACAT & SysType

Global Accuracy = (TP+TN)/(TP+TN+FP+FN)

Confusion Matrix: NBM3 minus ACAT & SysType

NBM 12 - All Data minus ACAT&SysType Prediction

ALL DATA MINUS CORE & ACAT

Global Accuracy = (TP+TN)/(TP+TN+FP+FN)

Confusion Matrix: NBM3 minus Core & ACAT

results of the Chi-Squared test. The resulting model accuracy results are shown in Tables 27 through 31.

Table 27 - NBM 14 Trim Core, ACAT, & SysType Features

Table 28 - NBM 15 Trim ACAT, SameEvent, & SysType Features

Actual 5 4 3 2 1

Confusion Matrix: NBM3 minus Core, ACAT & SysType

IDI

ALL DATA MINUS 3: Core, ACAT & SysType

Global Accuracy = (TP+TN)/(TP+TN+FP+FN)

ALL DATA MINUS 3: ACAT, SameEvent, & Type

Global Accuracy = (TP+TN)/(TP+TN+FP+FN)

Confusion Matrix: NBM3 minus ACAT, SameEvent & SysType

NBM 17 - All Data minus ACAT, SameEvent,

& SysType

Table 29 - NBM 16 Trim Core, SameEvent, & SysType Features

Table 30 - NBM 17 Trim Core, SameEvent & ACAT Features

4.4.8 NBM with Four Dependent Features Trimmed

Trimming all four features resulted in the final model gave mixed Accuracy results as discussed in the previous models. The results are shown in Table 31.

Actual 5 4 3 2 1

Confusion Matrix: NBM3 minus Core, SameEvent, SysType

ALL DATA MINUS 3: Core, SameEvent, SysType

Global Accuracy = (TP+TN)/(TP+TN+FP+FN)

NBM 18 - All Data minus Core, SameEvent, &

SysType Prediction

Table 31 - NBM 18 Trim Core, SameEvent, SysType & ACAT Features

4.4.9 NBM Accuracy Measures Comparisons

Table 32 shows the various measures of accuracy that are relevant to classification models as explained below:

1. A brief description of the features that define the NBM is in columns 1 and 2.

2. The Prediction and Global Accuracies are shown in the columns 3 and 4 to indicate the overall performance of the NBM.

Table 32 - Model Accuracy Summary

Actual 5 4 3 2 1

Confusion Matrix: NBM3 minus Core, SameEvent & ACAT

IDI

ALL DATA MINUS 3: Core, SameEvent & ACAT

Global Accuracy = (TP+TN)/(TP+TN+FP+FN)

NBM 20 - All Data minus Core, SameEvent, SysType, & ACAT

Prediction

0.600 0.878 0.757 0.841 0.797 0.612 0.750 0.674

0.533 0.883 0.839 0.825 0.832 0.612 0.750 0.674

0.533 0.905 0.846 0.873 0.859 0.674 0.775 0.721

0.729 0.893 0.825 0.825 0.825 0.667 0.750 0.706

0.706 0.893 0.812 0.889 0.848 0.682 0.750 0.714

0.533 0.888 0.831 0.857 0.844 0.674 0.775 0.721

0.533 0.881 0.848 0.889 0.868 0.674 0.775 0.721

0.667 0.898 0.828 0.803 0.835 0.638 0.769 0.698

0.600 0.902 0.852 0.825 0.839 0.674 0.775 0.721

0.533 0.902 0.836 0.889 0.862 0.738 0.775 0.756

0.533 0.908 0.825 0.825 0.825 0.660 0.775 0.713

0.533 0.902 0.809 0.873 0.840 0.732 0.750 0.741

0.600 0.885 0.828 0.841 0.835 0.674 0.756 0.731

0.667 0.885 0.820 0.794 0.806 0.625 0.750 0.682

0.600 0.895 0.831 0.857 0.844 0.667 0.750 0.706

0.667 0.885 0.810 0.794 0.806 0.659 0.725 0.690

0.600 0.892 0.852 0.825 0.839 0.667 0.750 0.706

0.600 0.888 0.800 0.825 0.813 0.659 0.725 0.690

** N/A is due to division by zero however, since this interval has the least number of instances it does not affect the maximum most accurate interval F1 Score

*Note: Model NBM4 & higher result from feature removal based on Independence testing showing feature

a. The Prediction Accuracy is the typical measure of the correct predictions divided by the total number of predictions, which as indicated earlier is based on a set of 15 randomly selected use cases.

b. The Global Accuracy is the more accurate measure for classification models, because it averages the accuracy of 10 random samples of prediction sets and provides the Confusion Matrix to show the performance of the model within each interval class.

3. The remaining columns provide the precision and recall for the two intervals of interest - IDI-1 and IDI-2. Rather than consider the performance of all of the intervals, IDI-1 and IDI-2 combine for 85.9% of the data (relative

frequency shown in Table 7). The interval precision and recall calculations are important because they provide the opportunity to see which intervals are more accurate and allow the calculation of the F1 Score.

4. The F1 Score for intervals IDI1 and IDI-2 have the highest relative frequency to they were used to determine the NBM 10 as the best set of features to predict the schedule delay.

The following analysis provides additional assessment of the accuracy measures.

No model has the most accurate prediction for every interval. The prior sections provide model assessments that show the discrepancy between Global and Prediction Set

Accuracy. However, as shown in Figure 17, the Global Accuracy measure is relatively stable while the Prediction Set Accuracy is more susceptible to the set of features. Using Prediction Set Accuracy measure, Model 4 would be the choice but the Global Accuracy

measure would select Model 10. These results represent why the F1 Score is the recommended accuracy measure.

Figure 17 - NBM Prediction & Global Accuracy Comparison

The Confusion Matrix for each NBM was used to calculate the Precision, Recall and the F1 Score for each interval to make the final determination of the recommended NBM. As discussed earlier, IDI-1 and IDI-2 intervals were used to select the most accurate NBM since the majority of the data is in these two intervals.

Figure 18 provides a graphical comparison of the F1 Score for each of the NBMs.

Based on the F1 score as the criteria, Model 10 is the recommended NBM for schedule delay prediction based on the F1 Score for intervals IDI-1 and IDI-2 at 86.2% and 75.6%

respectively.

0.500 0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 PredicNon Accuracy GLOBAL ACCURACY

Figure 18 - F1 Score Comparison (IDI-1 & IDI-2)

4.5. Contribution Analysis for Final Set of Features

The most accurate model based on F1 Score is NBM 10 with the following 8 features: ErrorCat, SysGroup, OrgDepend, SysDepend, SysType, Core, Severity and PriorEvent. The Contribution Analysis for these features is provided in the following Pareto Chart (Figure 19).

Figure 19 – Pareto Chart of Contribution Analysis

0.500

ErrorCat SysGroup OrgDepend SysDepend SysType Core Severity PriorEvent Feature ContribuBon to Delay CumulaBve ContribuBon

Contribution Analysis shows that as proven in the Data Preprocessing phase, both technical and non-technical factors are responsible for SWI delay. As Figure 21

indicates, the ErrorCat at 25% has the highest impact on the time to resolve an error followed by the SysGroup (13%), OrgDepend (12.5%), SysDepend (12.2), SysType (11%), Core (10%), Severity (9.3%) and PriorEvent Errors (8.2%). Also notable is that 80% of the time to resolve an error is caused by five features: ErrorCat, SysGroup, OrgDepend, SysDepend and SysType. Only one of these features was captured in the initial error reports, the remaining four features were mined from the data as known SWI challenges. These features link to following SWI Challenges and Questions (Table 33).

Table 33 – Contribution of SWI Challenges

SWI Challenges

(Literature Survey Results) Questions Used to Populate

Learning Database Learning Database Column and Node in NBM

(Features) Technical Risk 4. What is the type of error? CEC

SoS Complexity 10. Is the system a SOS? SysGroup

Independent Management 3. Does the error impact more

than 1 organization? OrgDepend

System Interdependencies 1. Does the error impact more than 1 system?

SysDepend

System Interdependencies 2. Is the system in the

command post? SysType

SoS Complexity 9. Is the system a core

system? Core

Technical Risk 5. Does the error impair a

critical task? Severity

Non-Technical Risk 6. Does the system have

errors in prior events? PriorEvent

97 4.6 Summary of Findings

These results of the NBM development presented in this dissertation are an indication of the ability to use historical data to provide accurate schedule prediction by using feature selection to determine the most important features. The prediction and contribution analysis can be used to support decisions that enable systems engineers and managers to realign resources or shift schedules to meet their priorities, as well as support risk mitigation decisions. Essentially, the features developed for this research are

accurate for predictions of the IDI-1, IDI-2, IDI-3, IDI-5 that include errors that are resolved within 92 days or those that take more than 176 days to resolve. However, the NBM is less accurate in predicting IDI-4, that includes errors that take between 92 and 176 days to resolve. Fortunately, this range does not occur frequently (1.49% of the time – see Table 10). Additional study is required to fully investigate this outcome.

4.6.1 Feature Selection Had Mixed Results on Accuracy

Model accuracy was assessed using features to develop different models.

Initially, features based on the historical US Army error reports were used to determine the baseline model accuracy. These same features were subsequently used with the addition of the external feature (ACAT) that show a slight increase in Global Accuracy while Prediction Set Accuracy showed a slight decrease. Finally, a third model with the first two sets of features and the addition of the seven features from SWI Challenges determined from the literature survey resulted in increased Global Accuracy but no change in the Prediction Set Accuracy. This difference between Global and Prediction Set accuracy continues through each of the model variations assessed. Removal of

features based on Independence testing, provided further variation in these accuracy measures. However, to fully assess the impact of removal of features, the Precision, Sensitivity and F1Score measures were the final determinant of the most accurate model.

NBM-10 with eight features was the most accurate with 90.2% Global accuracy and 53.3% Prediction Set accuracy. This accuracy is comparable to other similar NBM models (Bielza, 2014) but exceeds other methods for SW development resource estimation that includes schedule estimation and averages 39% (Boehm & Valerdi, 2011).

4.6.2 Implications to Technical Impacts on Delay

Four of the eight features primarily quantify the impact that interdependencies have on schedule delay and were not included in the original error reports. SysGroup, SysDepend, OrgDepend, SysType, all define different aspects of the interdependencies that define complex systems represented as SWI Challenges. Based on this dissertation research, these features contribute 48% of the schedule delay. Because these errors often are not revealed through system level testing, they create challenges that generally

require stakeholder and system owner communications and resources to troubleshoot and resolve. By not considering the impact of these interdependencies in the original or updated schedule estimation, almost half of the time required to resolve an error is omitted, which is likely responsible for the underestimation of the time required to resolve an error.

4.6.3 Implications to Organizational Impacts on Delay

The NBM contribution analysis also provides objective analysis of the impact organizational dependencies (OrgDepend) have on schedule delay prediction caused by

integration errors. OrgDepend was mined from the data based on those errors that had at least two systems and two organizations that responsible for resolving the error.

According to the contribution analysis, OrgDepend is third when ranked according to its impact on the integration schedule delay. While the majority of the features are primarily technical, the organizational impacts ranking show its importance in understanding the causal factors that create the schedule delay.

100

Chapter 5 – Conclusions

This dissertation set out to develop a NBM to predict the schedule delay created by SWI errors. Error reports from Army SWI events from three events were used as a data source. The results are promising and show that SWI challenges can drive features that accurately predict the schedule delay. Previous chapters in this dissertation research presented the key findings that include determination of key features for the NBM based on accuracy and the contribution each feature has on the model prediction.

First, the features for a NBM that best predict the SWI delay created by errors during the SWI phase of development were determined. The approach to develop the features relied on a Literature Survey of SWI challenges documented through journal articles and conference papers published in the past 10 years. The features that were most prevalent in 30 articles were summarized and used to drive data mining activities that extracted the evidence of these challenges from the original error reports. The resulting SWI challenges include: System Interdependencies, Independent Management, SW Integration Risks (Technical and Non Technical) and Complexity. These challenges as reflected in the Army error reports were represented as 10 features that were further trimmed through feature engineering.

Feature engineering was an important aspect of model development that

supported the removal of features based on Independence analysis results. Research has

In document Predicting Schedule Delay Caused By Errors During Software Integration. by Kelly Dula Alexander (Page 92-0)