Predicting Schedule Delay Caused By Errors During Software Integration
by Kelly Dula Alexander
B.S. in Industrial Engineering, December 1983, North Carolina State University M.S. in Engineering Management, May 1996, George Washington University
M.S. in National Resource Strategy, June 2006, National Defense University
A Dissertation submitted to The Faculty of
The School of Engineering and Applied Science of the George Washington University
in partial fulfillment of the requirements for the degree of Doctor of Philosophy
January 19, 2018 Dissertation directed by
Thomas Mazzuchi
Professor of Engineering Management and Systems Engineering & of Decision Sciences
Shahram Sarkani
Professor of Engineering Management and Systems Engineering
ii
The School of Engineering and Applied Science of The George Washington University certifies that Kelly Dula Alexander has passed the Final Examination for the degree of Doctor of Philosophy as of October 24, 2017. This is the final and approved form of the dissertation.
Predicting Schedule Delay Caused By Errors During Software Integration
Kelly Dula Alexander
Dissertation Research Committee:
Thomas Mazzuchi, Professor of Engineering Management and Systems Engineering & of Decision Sciences, Dissertation Co-Director
Shahram Sarkani, Professor of Engineering Management and Systems Engineering, Dissertation Co-Director
Bereket Tanju, Professorial Lecturer of Engineering Management and Systems Engineering, Committee Member
Timothy Eveleigh, Professorial Lecturer of Engineering Management and Systems Engineering, Committee Member
Amir Etemadi, Assistant Professor of Engineering and Applied Science, Committee Member
iii
© Copyright 2017 by Kelly D. Alexander All rights reserved
iv
Dedication
This dissertation is dedicated to my mother, Gertrude Elaine Dula and my father, Charles Barber Dula who are both deceased. Their belief in the power of education served as a guiding light that encouraged me to never stop learning. I further dedicate this work to the previous and current generation of Dulas that persevered and achieved the seemingly impossible through education.
v
Acknowledgments
This dissertation would not be possible without the love and encouragement of my family and friends. First and foremost, I thank God for putting a fire inside of me to pursue and attain this esteemed degree. I would be remiss if I did not thank my GWU cohort members for their encouragement and optimism that was truly inspirational. A special thank you goes out to my baby sister, Caroline Dula, for her editing skills that were patiently applied in response to each request I made. Also a heartfelt “thank you”
goes to my dearest friends Pastor Julia Henson and Reverend Melodie Boone for their prayers and guidance. Finally thank you to my daughter, my “special girl”, Nicole Alexander for her steadfast belief that her mom can do anything!
vi
Abstract of Dissertation
Predicting Schedule Delay Caused By Errors During Software Integration
This research was conducted to evaluate and quantify the impacts complexity has on schedule based on errors created by organizational and system dependencies that occur during software integration testing. The combination of individually developed and managed systems and their interdependencies results in a complex system (or System of Systems) that is difficult to integrate within the original schedule. During the
integration event the requirement that multiple systems provide a holistic capability creates a challenge that is rooted in the emergence of technical and non-technical integration challenges that managers are often unprepared to resolve. These challenges result in multiple errors that require significant collaboration and coordination during the problem resolution cycle that can result in significant schedule delay; however, there is little existing research that quantifies the delay. The inability of engineering managers to accurately predict schedule impact after a software integration test error results in
increased schedule risk based on underestimation of the schedule delay. This research will provide engineering managers with a simple approach to support decisions by incorporating evidence-based recommendations for schedule impact that considers both legacy and current data. Knowledge Integration is used to merge known software integration challenges with error reports from US Army software integration test events to define Naïve Bayes Model features. The data from the US Army software integration events was used to quantify the schedule delay, which concluded that errors that have a dependency (either technical or non-technical) with another system or organization create longer schedule delays when compared to errors created by systems that are standalone.
vii
Additionally, the research concludes that software integration challenges represented as features can be quantified to successfully predict the number of days required to resolve an error. The software integration challenges based on the past ten years of research indicates the following categories: System Interdependencies, Independent Management, Technical Risk, Non-Technical Risk and System of System Complexity. The combined impact of these challenges quantified as features results in a model that predicts schedule delay caused by a software integration error with 90% Global Accuracy that is
significantly improved over the 39% to 79% accuracy that is the reported average for the original software resource schedule estimation for software development.
viii Table of Contents
Dedication ... iv
Acknowledgments ... v
Abstract of Dissertation ... vi
List of Figures ... xiii
List of Tables ... xiv
Chapter 1: Introduction ... 1
1.1 Background and Approach ... 5
1.2 System Integration and System of System Integration ... 7
1.3 Problem Definition ... 7
1.3.1 Schedule Risk. ... 9
1.3.2 Bottlenecks in SW Integration ... 9
1.4 Purpose and Significance Of Dissertation Research ... 10
1.5 Gaps in Schedule Prediction Models ... 11
1.6 Research Scope and Limitations ... 13
1.7 Summary of Dissertation Organization ... 13
Chapter 2 – Literature Review ... 15
2.1 SWI Environment ... 16
2.2 Existing SW Development Estimation and Prediction Models ... 17
ix
2.2.1 Fuzzy Logic Model Using Function Points ... 18
2.2.2 Probabilistic Models ... 18
2.2.3 Parametric Models ... 20
2.2.4 Simulation Models ... 20
2.3 Systems Thinking ... 21
2.4 Knowledge Integration ... 23
2.4.1 Knowledge In A SoS Environment ... 24
2.4.2 Feature Engineering and Feature Selection ... 24
2.5 SW Integration Challenges Based on Literature Survey ... 26
2.5.1 System Interdependencies ... 28
2.5.2 Independent Management (Autonomy) ... 29
2.5.3 Software Integration (SWI) Risks ... 30
2.5.3.1 Technical Risk ... 31
2.5.3.2 Non-Technical Risk ... 31
2.5.4 Complexity ... 33
2.6 Bayesian Probabilities ... 34
2.7 NBM Theoretical Framework ... 35
2.8 NBM for Classification ... 36
2.9 Model Development Process ... 37
2.9.1 Business Understanding ... 38
2.9.2 Data Understanding ... 39
2.9.3 Data Preparation ... 39
x
2.9.4 Modeling (Naïve Bayes Model Development) ... 40
2.9.4.1 Discretization to Improve Accuracy ... 40
2.9.4.2 NBM Analysis ... 41
2.10 NBM Cautions And Limitations for Prediction ... 41
2.11 Literature Review Summary ... 43
Chapter 3 Method ... 44
3.1 Data Collection ... 44
3.1.1 Error Reports ... 45
3.1.2 Literature Survey ... 46
3.1.3 External Data Source ... 48
3.2 Preprocessing the Dataset: ... 48
3.3 Knowledge Integration ... 51
3.3.1 Software Used to Build Model and Conduct Analysis ... 51
3.3. 2 Learning Database ... 52
3.3.3 Node (Feature) Description ... 53
3.3.3.1 Severity ... 53
3.3.3.2 Error Category ... 54
3.3.3.3 Acquisition Category ... 54
3.3.3.4 System and Organizational Dependency ... 55
3.3.3.5 Multiple Event Errors ... 55
3.3.3.6 Type of System ... 56
3.4 NBM Development ... 56
xi
3.4.1 Target Node ... 57
3.4.2 Predictor (Evidence) Nodes ... 59
3.5 Analysis ... 60
3.5.1. Model Node Values ... 60
3.5.2. Feature Engineering and Feature Selection. ... 61
3.5.3 Model Analysis ... 63
3.5.3.1. Prediction Set Accuracy ... 64
3.5.3.2. Global Accuracy ... 64
3.5.3.3. Model Comparison ... 65
3.5.3.4 Contribution Analysis ... 66
3.6 Methods Summary ... 66
Chapter 4 Results ... 68
4.1 Data Preprocessing Results ... 68
4.1.1 Statistical Analysis and Hypothesis Testing ... 70
4.1.2 Preprocessing Results Summary ... 73
4.2 Target Node ... 74
4.3 NBM Features Developed from SWI Challenges ... 75
4.3.1. Questions Developed From SWI Challenges ... 77
4.3.2 Learning Database ... 77
4.4 NBM Analysis Results ... 80
4.4.1 NBM 1: Army Data Features (Severity and ErrorCat) ... 80
4.4.2 NBM 2: Severity, ErrorCat and ACAT ... 81
xii
4.4.3 NBM 3: All Features (including SWI Challenges) ... 82
4.4.4 Independence Test Results ... 82
4.4.5 NBMs With Dependent Variables Trimmed ... 83
4.4.6 NBMs With Sets of Two Dependent Variables Trimmed ... 85
4.4.7 NBMs with Sets of Three Features Trimmed ... 88
4.4.8 NBM with Four Dependent Features Trimmed ... 90
4.4.9 NBM Accuracy Measures Comparisons ... 91
4.5. Contribution Analysis for Final Set of Features ... 95
4.6 Summary of Findings ... 97
4.6.1 Feature Selection Had Mixed Results on Accuracy ... 97
4.6.2 Implications to Technical Impacts on Delay ... 98
4.6.3 Implications to Organizational Impacts on Delay ... 98
Chapter 5 – Conclusions ... 100
5.1 Importance of this Research ... 102
5.2 Limitations of this Research ... 102
5.3 Future Work ... 103
References ... 104
Appendix A – Literature Survey Data ... 114
Appendix B – Feature Contingency Tables (Entire Database) ... 115
Appendix C – Example Prior and Posterior Probability Calculations ... 116
xiii List of Figures
Figure 1- SWI Environment As A Complex System ... 3
Figure 2 - SWI Environment Relationships to NBM ... 17
Figure 3 - SWI Environment in Systems Thinking ... 22
Figure 4 - Feature Engineering Cycle ... 25
Figure 5 - CRISP Model Development Process ... 38
Figure 6 - Framework for NBM Development Process ... 45
Figure 7 - Role of Commercial SW in NBM Development ... 52
Figure 8 - SWI Challenges Transformed to Features ... 53
Figure 9 - Learning Database Relationship to NBM Nodes ... 57
Figure 10 - Example of K Means Clusters ... 58
Figure 11 - Maximum Posterior Probability (MPP) ... 61
Figure 12 - Feature Selection Process ... 62
Figure 13 - Preprocessing Data Categories ... 69
Figure 14 - Probability Plot of Schedule Delay (Original Data) ... 71
Figure 15 - Johnson Transformation of Schedule Delay ... 72
Figure 16 - K-Means Clusters of Target Node Data ... 75
Figure 17 - NBM Prediction & Global Accuracy Comparison ... 94
Figure 18 - F1 Score Comparison (IDI-1 & IDI-2) ... 95
Figure 19 – Pareto Chart of Contribution Analysis ... 95
xiv List of Tables
Table 1 - Literature Survey Challenges Summary ... 27
Table 2 - Summary of Models & Data Sources ... 66
Table 3 - Characterization of the Delay Based on Error Reports ... 70
Table 4 - Characterization of the Delay By Groups Based on Error Reports ... 71
Table 5 - Characterization of Transformed Data (Days to Resolve Errors) ... 72
Table 6 - Hypothesis Test of Mean Time to Resolve Error Summary ... 73
Table 7 - Discretized Target Node Intervals ... 75
Table 8 - SWI Challenges Relationship to Learning Database ... 76
Table 9 - Sample of Learning Database Entries ... 78
Table 10 - Dataset Summary ... 78
Table 11 - Training Data Set Distribution ... 79
Table 12 - Prediction Data Set Distribution ... 79
Table 13 - NBM-1 Severity & ErrorCat ... 81
Table 14 - NBM 2 - Accuracy Results With External Data (ACAT) ... 81
Table 15 - NBM 3 - All Features (including SWI Challenges) ... 82
Table 16 - Chi Squared Test for Independence Results ... 83
Table 17 - NBM 4 - Trim Core Feature ... 83
xv
Table 18 - NBM 5 – Trim SysType Feature ... 84
Table 19 - NBM 6 - Trim SameEvent Feature ... 84
Table 20 - NBM 7 - Trim ACAT Feature ... 85
Table 21 - NBM 8 – Trim Core and SysType Features` ... 86
Table 22 - Trim Core & SameEvent Features ... 86
Table 23 - NBM 10 - Trim ACAT & SameEvent Features ... 87
Table 24 - NBM 11 - Trim SameEvent & SysType Features ... 87
Table 25 - NBM 12 - Trim ACAT & SysType Features ... 88
Table 26 - NBM 13 - Trim Core & ACAT Features ... 88
Table 27 - NBM 14 Trim Core, ACAT, & SysType Features ... 89
Table 28 - NBM 15 Trim ACAT, SameEvent, & SysType Features ... 89
Table 29 - NBM 16 Trim Core, SameEvent, & SysType Features ... 90
Table 30 - NBM 17 Trim Core, SameEvent & ACAT Features ... 90
Table 31 - NBM 18 Trim Core, SameEvent, SysType & ACAT Features ... 91
Table 32 - Model Accuracy Summary ... 91
Table 33 – Contribution of SWI Challenges ... 96
1
Chapter 1: Introduction
An important event during Software (SW) Development is the SW Integration (SWI) Phase that often occurs as the final phase of the lifecycle prior to deployment. The software integration phase brings multiple software SW systems (embedded and stand- alone) together to provide an integrated capability which is similar to other complex system integration events (Lu, Chang, Yang, Zhao, & Chen, 2010). VanMoll (2008) defines two distinct categories of integration: (1) incremental integration is a phased approach with multiple integration activities throughout software development; while (2) non-incremental integration occurs as a singular event at the end of software
development. Big bang (van Moll & Ammerlaan, 2008) integration is a non-incremental method at the end of the SW development cycle as a combined test and integration phase.
The SW development life cycle is aligned with the systems development life cycle process that includes a series of phases and steps to develop and maintain a SW product.
While there are different processes and steps that a particular developer may use; the SW development life cycle typically includes planning, design, development, build,
testing/integration, and deploying an information system. Because big bang integration occurs at the end of the software development life cycle, any delay can create significant schedule risk.
As systems continue to be more reliant on SW to provide required functionality, the ability to integrate SW that was independently developed is common. A study by (van Moll & Ammerlaan, 2008) concluded that 50% of development project leaders see the system integration as problematic or an extremely difficult phase while another researcher (Mendoza, Pérez, & Grimán, 2006) had similar findings and concluded that
2
integration is a “complex technological task.” Big bang integration (van Moll &
Ammerlaan, 2008) executes SWI at the end of the SW development cycle as a combined test and integration phase. The integration of these SW systems includes legacy systems that were often developed to be stand-alone, and this can result in multiple integration errors. Due to the non-linear relationships and the nature of these interactions between systems, standard predictive models are inadequate for prediction during this phase. The time required to resolve errors resulting from this complex environment are difficult to predict. Historically, schedule estimation for the overall SW development is inaccurate and the SWI phase in particular is difficult to estimate. The schedule inaccuracy creates unanticipated delays that cause schedule uncertainty and increases schedule risk in the SWI phase of SW Development.
The Problem Resolution Cycle (PRC) begins when an error interrupts integration of a specific SW system or systems and further integration of the particular SW cannot be fully achieved until the error is resolved. The PRC and the schedule delay it creates is the a key focus of this paper that includes the identification of the error, the systems
impacted, and the collaborative environment in a SWI environment required to develop the consensus path forward. The SWI environment includes systems, processes and people with relationships, interactions and interdependencies that form a complex system of systems (Cantot & Luzeaux, 2011) as shown in (Figure 1).
For the purposes of the research outlined in this dissertation, the System of Systems (SoS) environment is defined as “ multiple, heterogeneous, distributed systems that can (and do) operate independently but can also assemble in networks and
collaborate to achieve a goal” (Mane & DeLaurentis, 2009).
3
Figure 1- SWI Environment As A Complex System
This definition is consistent with the SWI phase of SW development. According to (Sousa-Poza & Kovacic, 2008), a System of Systems (SoS) environment like the PRC creates a “complex situation” that requires decisions based on “a number of possibilities”.
Essentially, the SoS represents multiple integrated systems that combine to deliver one set of capabilities. When the PRC occurs during the SoS integration, it experiences many of the SoS challenges caused by the involvement of multiple systems, organizations and people.
The PRC is the process within the boundaries of the SWI environment that is activated when an error occurs that is a formal or informal set of activities undertaken to resolve an error. The PRC is the effort that includes identification of the source of the error (system and organization), the severity of the error, and most importantly, who is responsible for error resolution. A team comprised of the tester, Project Manager, Technical Subject Matter Experts and the user generally participates in the PRC. The
4
importance of PRC is frequently underestimated and managers often are not prepared to reliably predict the extent of the schedule delay. A lack of tools to support management decisions during this critical phase is an additional source of risk. Engineering Managers are faced with increasingly complex environments where traditional predictive tools are inadequate. Data Science is an emerging field that relies on machine learning through probabilistic modeling techniques to support prediction in environments with uncertainty, such as the SWI phase. NBM is one method that uses data science to model probabilistic environments. This dissertation research focused on development of a model to reduce schedule risk during SWI by predicting schedule delay created by the errors that are routinely found during the SWI phase. Foundational to this research is the premise that a Naïve Bayes Model (NBM) can estimate the impact SWI errors have on schedule. The NBM relies on Bayesian probabilities assigned to relevant features that provide the basis for multiple outcomes expressed as the likelihood of their occurrence.
Real world data was used to validate the approach for feature selection and the resulting model. The NBM will predict the number of days the SWI event is delayed for a given error based on the features in the model. The predicted integration delay is one of multiple factors that support the decision process. This dissertation research does not include the decision process or other considerations a manager will weigh when determining the fate of a project. Instead the NBM developed for this research demonstrates how to quantify the schedule delay caused by an error during the SWI phase of SW development.
5
1.1 Background and Approach
Historically, schedule estimation for SW development has been inaccurate even though it is critical to project success. Due to the uncertainty in estimating the schedule for SW development, research (Senesi, Javernick, & Molenaar, 2015) has shown probabilistic analysis offers an opportunity to better manage project schedules. Errors from a specific integration event can appear to be unique; however, several researchers have documented common challenges that routinely impact SWI resulting in errors that create schedule delays. These SWI errors are defined as an unexpected outcome that creates a pause in the integration phase of SW development. Many times the raw data elements from the error reports that are generated in this phase have basic information, but do not explicitly represent the interrelationships that are a critical element of the SWI phases. Knowledge Integration can be defined as the process to combine core diverse knowledge internal and external the normal organizational construct (Kodama, 2011).
Understanding Knowledge Integration is essential for building features that reflect the SWI dependencies and in combination with feature engineering, provides a deliberate process to transform raw data into knowledge (Davis & Foo, 2016) during pre-
processing that can support a predictive model. Similarly, Systems Thinking provides a holistic approach to consideration of all factors that impact the SWI event.
Given the importance and uncertainty regarding SW integration, there are limited tools available to accurately estimate schedule delays during integration after errors occur. Generally SW development resource estimation is used to determine the schedule for the entire SW project, which includes the SW Integration phase. However, the
existing tools are known to be inaccurate, and in fact evidence from surveys show the
6
schedule accuracy rate at 39% (Boehm & Valerdi, 2011). Many factors are attributed to this inability to accurately predict the schedule and several recent researchers have proposed alternative methods that offer increased accuracy. This research will propose a Naïve Bayes Model (NBM) as an accurate predictor of schedule delay caused by a SWI error.
The NBM developed for this research provides a probabilistic schedule delay estimation. The method used to develop the model provides a simple approach to incorporate evidence-based schedule delay prediction to support the decision process.
The CRISP five-step method for model development with modifications to suit this research is used to create the NBM from existing historical data. The research method included data collection from three sources, knowledge integration to develop a learning database, and finally development of the NBM. The initial phases of the model
development used a literature survey of common SWI challenges to develop the features for the model. To determine the set of features that result in the most accurate model, analysis based on Independence testing was used to trim features.
Recent data from prior US Army SWI events was used to validate the approach for model development. The SWI event includes systems that comprise the U. S. Army’s modernized communications network that includes networked capabilities based on legacy software (SW) upgrades as well as new SW intensive systems. Prior to
deployment, significant integration effort is required to assess and certify the readiness of the integrated capability. The error reports resulting from the SWI integration event were mined based on the SWI challenges that resulted from the Literature Survey that was further used to develop the features that form the basis of the NBM.
7
1.2 System Integration and System of System Integration
First, it is important to define the similarities between the use of the terms system integration (SI), system of systems integration (SoSI), and software integration (SWI).
For this research, SI is defined by (Loutchkina, Jain, Nguyen, & Nesterov, 2014) as “the process where fully engineered components and subsystems are linked to each other and made to perform as a unified functional entity..” SOSI is similar to SI with the
distinction resulting from the fact that SoS consists of fully capable systems that have a specified task (Vaneman, 2016) and when integrated result into a larger system that delivers unique capabilities that do not reside in any individual system (Vaneman, 2016).
SOSI brings increased complexity due to fully functional systems being integrated rather than components and subsystems. These definitions are appropriate for the Army
Software Integration event that provided the data used to validate the NBM in this
dissertation. The distinction between these definitions is important because each of these activities, SI, SWI, and SOSI face similar challenges that are discussed later in this research as SWI challenges. For the remainder of this study, SWI will be used to define the SOS environment that is the basis for this research. SI Test is also used
interchangeably with SI. According to Summers (2013), SI test is where managers generally find and fix. Because SI and SIT typically occur together they will also be considered as equivalent.
1.3 Problem Definition
Errors that occur during SWI make it difficult to predict and manage the schedule during this phase of SW development. Research shows that traditional SW Development modeling using engineering data is difficult because of complex non-linear relationships
8
and outliers (Mori, Tamura, & Kakui, 2013). In spite of this difficulty, researchers have validated the criticality of schedule estimation to software development project success (Frese & Sauter, 2014; Steindl & Mottok, 2012b; Vasantrao, 2012) and recommended various techniques to manage the schedule. Frese (2014) surveyed 60 projects and showed schedule control as a top three factor in project success and argued that monitoring the schedule provides an early indicator of problems. Other researchers (Trammell, Moulton, & Madnick, 2016) showed that programmatic factors such as funding instability can make it difficult to reliably predict the schedule impact and offered a dynamic model as a means to assess the impact that funding uncertainty creates in SW Development. These works propose new tools to improve the engineering
manager’s understanding needed to facilitate decision-making across the life cycle but do not specifically target the uniqueness of SWI environment. Schedule volatility is known to be responsible for cost overruns and shows a lack of management controls. A survey (Little, 2006) of SW development projects concluded that schedules are updated on average eight times during SW development as the impact of uncertainty remained throughout the project even though the expert judgment of Project Managers with average of 20 years of experience was used to predict the schedule (Little, 2006). This heavy dependency on the experience of the PM can be a problem when legacy systems are involved and knowledgeable personnel are no longer with the program. Ideally, individual SW errors are caught well before integration is attempted; however emergent behaviors in the SoS environment can produce new failures that rely on a team approach to understand and resolve the error. The resulting SoS environment creates “complex stakeholder relationships” (Davendralingam & Kenley, 2013) due to the technical and
9
management challenges (Dillon, Paté-Cornell, & Guikema, 2005) that continuously evolve (Davendralingam & Kenley, 2013). Following is a discussion of other research that is intended to highlight the concerns with schedule delay to include increased schedule risk and the impact of bottlenecks that occur during SWI.
1.3.1 Schedule Risk.
Research indicates that 40% of schedule risk is in the integration phase (Steindl
& Mottok, 2012b). Vastantrao (2012) acknowledges the importance of schedule
estimation and proposes an uncertainty matrix that identifies the source, type and nature of known uncertainty that creates schedule risk. The uncertainty is addressed through a framework based on the work breakdown structure that recommends a five step modeling process to define the dependencies as the source of the uncertainty (Vasantrao, 2012). As the project evolves and knowledge is gained the model is recalibrated to show the
decrease in uncertainty (Vasantrao, 2012). The paper provided an early look at the framework but no data was presented to validate the methodology. One concern with this study is based on other research that demonstrates the uncertainty may not decrease throughout the SWI phase (Little, 2006). However, the Vastantrao (2012) study does recognize the important role dependencies have on schedule. Both studies (Little, 2006;
Vasantrao, 2012) recognize the value of using new data to increase knowledge and update prediction models accordingly.
1.3.2 Bottlenecks in SW Integration
Another concern with inability to predict schedule during SWI is based on the impact bottlenecks have on deployment of the capability. (Petersen, Roos, Nyström, &
Runeson, 2014) researched causes of bottlenecks in a SoS environment and
10
recommended a process to improve the integration schedule. This research (Petersen etal, 2014) does not provide the means to resolve the bottleneck. Steindl and Mottok (2012) go further and show how the integration process can be optimized resulting in an improved integration schedule. This research (Steindl & Mottok, 2012a) concludes that SW integration schedule is dependent on test complexity and test effort. Steindl (2012) based test effort on the number of dependencies between components and the number of test stubs required to test them and showed the potential for a reduction in complexity.
Further the study concludes that when solved as a multi-objective optimization problem (reduce the test complexity and test effort), test complexity was further (Steindl &
Mottok, 2012b). One concern with these conclusions is that the results are based on solving one use case and the paper is not clear on explicitly relating the reduced complexity to the schedule,
These research efforts share an emphasis on schedule’s role by focusing on the process for SW Integration as a major risk to schedule and acknowledgment of the impact the dependencies have on schedule, but do not define the dependencies. While these researchers offer techniques to understand, manage or predict schedule none of them offer a simple means to quantify the schedule risk through the individual errors that typically create the delay that is the focus of this research.
1.4 Purpose and Significance Of Dissertation Research
The intent of this research is to develop a NBM to predict the schedule delay created by integration errors as an important tool that supports the decision process. In spite of the uniqueness of each SWI event, historical data can be mined for indicators of common SWI challenges that can then be used as predictor nodes for an accurate model.
11
To support model development, a survey of current literature was used to document common challenges as the basis to drive features for an NBM that will predict the schedule delay. NBM is chosen because it is shown to be robust, easy to develop even with small data samples, and provides a quick analysis tool to predict the likelihood of specified outcomes (Friedman, Geiger, & Goldszmidt, 1997). Research (Senesi, Javernick, & Molenaar, 2015) has shown probabilistic analysis offers an opportunity to
“better manage project schedules.” The model will predict a range of days the software integration (SWI) event is delayed for a given error based on the features in the model that are defined to represent SWI Challenges documented in the past ten years of research. The predicted integration delay is one of multiple factors that support the decision process. This paper does not include the decision process, but rather
demonstrates how an accurate assessment of the impact an integration error has on the schedule can be achieved.
Additionally, this dissertation research focused on development of the
contribution each feature of the NBM has on the prediction. The contribution ranking of the features provides the opportunity to focus limited resources in the areas that have the most impact on to the schedule delay and potentially avoid the delay altogether in future events.
1.5 Gaps in Schedule Prediction Models
Historically, SW Development resource estimation models are often inaccurate and require multiple updates that are made without a reliable predictive tool. Accurate schedule prediction is limited and (Boehm & Valerdi, 2011) estimate that the accuracy
12
of the initial resource estimates to be as low as 39% when compared to the actual schedule performance.
Most existing SW prediction models are used to estimate errors for the individual system, which is important to managing the development of that system. One researcher (Mende, Koschke, & Peleska, 2011) used case study analysis to assess the usefulness of SW defect prediction models that focus on identifying errors in avionics component and concluded that 50% of the component/system errors were not identified by the model.
While system level models have a role in schedule estimation, they lack the ability to fully address the integration of several systems and the impact the resulting integration errors have on schedule delay.
Few schedule estimation models target the SWI phase (Madni & Sievers, 2014).
This may be due to the limited amount of data available to assess SWI since the data regarding errors is often proprietary. While it is promising that new approaches to models for managing complexity are emerging as discussed further in section 2.2, most of the current research does not provide data or enough use cases to assess the accuracy or validity of the prediction model. Additionally, these models require significant amounts of data, and do not offer a simplistic approach that users understand and are more apt to use. A 2013 survey of 50 countries concluded that predicting schedule is the main
concern of managers (Lopez-Martin, 2015). As discussed in Section 2.2, there is a gap in existing models that provide a predictive tool to adjust schedule during SWI or other phases of SW Development based on relevant factors. Further discussion in Section 2.11 will show that the NBM is a viable option to fill the gap as an accurate schedule
estimation tools for the SWI phase.
13 1.6 Research Scope and Limitations
This study focused on a novel approach to develop features for a NBM that accurately predicts schedule delay. The research targeted three key objectives.
1. Develop and validate the accuracy of the NBM to predict the schedule integration delay created by SWI errors.
2. Develop the NBM features through a Literature survey of recent SWI Challenges.
3. Determine the contribution of each feature to the predicted delay based on the most accurate model.
The assumptions and limitations of this dissertation research are provided as:
1. Each error report is considered an independent use case.
2. The Army SWI approach that produced the error reports is “big bang” that is executed at the end of the SW development cycle and does not include phased integration approaches.
3. The focus of this study is not the decision process to include whether the predicted delay is acceptable to the manager.
4. The data used to support this study has been sanitized so that systems are anonymous however; the facts for the actual data elements were not modified.
1.7 Summary of Dissertation Organization
The remainder of this research is presented with the fllowing organizational structure:
14
Chapter 2 provides the Literature Review that includes a review of 30 sources of SWI Challenges that provide insights in to the uncertainty found during SI and SOSI.
Additionally, a review of literature regarding current models related to SW development with a focus on schedule is included. Other topics included in the review are a
discussion on Systems Thinking and Knowledge Integration as fundamental concepts for defining SWI Challenges that serve as features for the NBM. The remainder of the review focused on defining and assessing the NBM theoretical framework and the use of the Cross-Industry Standard Process for Data Mining (CRISP-DM) process as the basis for model development.
Chapter 3 is the Research Method that included data collection, development of a Learning database and the NBM development. Key to the NBM development is the use of feature engineering to select the NMB features that result in the most accurate model.
A discussion of the different measures of accuracy includes the prediction set and global accuracy. The confusion matrix and other measures that are unique to classification models are used to assess the accuracy on the most relevant integration delay intervals (IDI).
Chapter 4 includes the results of NBM variations based on the feature selection process. This chapter also provides the accuracy assessment of each NBM and the contribution of each feature in the most accurate model. Finally, Chapter 5 includes the Conclusions of the research and recommended Future Work.
15
Chapter 2 – Literature Review
This dissertation research focused on Software Integration (SWI) testing activities and the impact errors that are routinely found during this phase of system development have on schedule. The literature review presents a body of relevant work from the past ten years that focuses on three important areas: SWI environment, SWI challenges, and NBM Development with discussion on factors important to a NBM. These focus areas define the difficulties with schedule prediction in the SWI environment that includes a discussion on challenges inherent in a SoS environment.
The literature review begins with a discussion of the SWI environment (Section 2.1) and the difficulties it presents to managers. Next, a review of current SW
development estimation models (Section 2.2) is provided with a focus on efforts that recognize uncertainty with the differing approaches for increasing accuracy in predicting the schedule during the SW Development. The different models reviewed include Fuzzy logic models (FLM), parametric models (PM), simulation and probabilistic models as representative of the uncertainty that is prevalent in SW development estimation. The role of Systems Thinking (Section 2.3) and Knowledge Integration (Section 2.4) are reviewed as concepts that support understanding prediction during SWI. A survey of journal articles is used to determine the SWI challenges (Section 2.5) routinely
experienced during SWI with the intent of developing NBM features. Finally, a review of literature that defines the important aspects of a Bayesian Probabilities (Section 2.8) and NBM Development (Section 2.9) process are discussed.
16 2.1 SWI Environment
The SWI environment is the culmination of the SW Development phase where multiple systems that are required to operate as a unified capability are integrated and tested. The SWI environment includes the software systems and the associated hardware designated for integration, the project managers and system engineers, the testers with their associated test equipment, and the user. Generally, legacy and new systems participate in the integration event. According to VanMoll (2008) a survey of over 60 complex projects showed that 50% of development project leaders see the system
integration as “problematic or an extremely difficult phase.” SWI typically occurs at the end of the SW Development phase and some estimate it utilizes 40% of the resources (Steindl & Mottok, 2012b) required for SW Development. Many of these resources are utilized to support root cause analysis and problem resolution when an error occurs.
Often the systems engineers and project managers or system owners are collaborating for the first time, resulting in a PRC that is frequently underestimated or approached by managers that are not well equipped to predict the extent and impact the errors will have on the schedule. This dissertation researches how managers can represent the SWI
environment challenges as NBM features to predict schedule impact after an error occurs.
The general proposition is that SWI challenges: (1) create errors that cause schedule delays that can be predicted by a NBM and (2) enable feature selection through
knowledge integration. The relationships between SWI challenges, SWI Errors and the development of a NBM are depicted in Figure 2.
17
Figure 2 - SWI Environment Relationships to NBM
2.2 Existing SW Development Estimation and Prediction Models
Researchers that focused on the SW development resource estimation models including schedule, generally focused on mitigating and understanding the role of uncertainty. These studies use different approaches to define the causes of uncertainty and propose different prediction techniques to improve traditional estimation models.
The goal of most of these models is focused on individual software development where most of the expertise and system understanding is found (Madni & Sievers, 2014).
However, these SW development prediction tools do not recognize the unique challenges that occur during the integration phase. Many linear models are limited in their accuracy because of the nonlinear relationship as well as the multiple dynamic and interrelated dependencies within a SOS. However, there are models that are intended to provide better accuracy in complex environments such as SW integration. A review of
representative models is reviewed below to show current approaches to prediction in the SW development environment.
enable
through
result in
estimated by SW Integration
Challenges
Feature Selection
Knowledge Integration
SW Integration Errors
Schedule Delay
Naïve Bayes Model create
supports
18
2.2.1 Fuzzy Logic Model Using Function Points
One category of prediction models uses function points (FP) to improve schedule prediction in complex environments. López-Martín, Chavoya and Meda-Campaña (2015) proposed a FLM using adjusted function point (AFP) as an independent variable to estimate new project size with the intent to improve the development schedule prediction. The AFP is a composite variable that includes nineteen independent variables: internal logical files, external inquires, and fourteen characteristics (López- Martín et al., 2015) that are specific characteristics of the project. The resulting mathematical model was validated through 20 SW projects as use cases based on their mean absolute residuals (MAR = actual schedule minus predicted schedule). The research concluded that the MAR for the FLM is more accurate at 33% when compared to the 29% accuracy of simple linear regression (SLR). Similar assessment of FPs calibrated by FLM (Kaushik, 2013) also reported small but promising improvements over traditional SW development estimation. Kaushik (2013) used artificial data to reach these conclusions.
2.2.2 Probabilistic Models
Bayesian Belief Networks (BBN) are also used for schedule prediction to quantify schedule risk. BBN is a directed acyclic graphical (DAG) method for assessing the relationships between elements in a probabilistic manner. Each BBN nodes can have multiple parents and multiple child nodes resulting in a complex set of relationships. For this reason, BBNs heavily rely on expert judgment, surveys, and interviews to establish the variables and their relationships that are essential to the accuracy of the BBN (Luu, Kim, Tuan & Ogunlana, 2009). BBNs also require a significant amount of data domain
19
expertise regarding multiple relationships. A 2014 study (Misirli & Bener, 2014) shows that most BBNs lack data and the majority depend solely on expert knowledge that described determination of the conditional probabilities as time consuming and difficult to determine. The same survey shows BBNs used for SW engineering primarily focused on system level fault prediction (Misirli 2014). There was no percentage provided for schedule prediction explicitly, however, 27% of survey respondents reported SW Engineering Management as the prediction subject that includes schedule and other resource estimations factors.
NBM offers a simpler approach as a unique BBN that has one parent (target) and multiple child nodes (features or predictors) while retaining the accuracy of the BBN.
The NBM looks only at the dependencies between the target and predictors, which simplifies the model and the understanding required to analyze the results. Also, there is evidence that NBMs can be successfully constructed from data and not rely on expert judgment that is typically used to create decision tools (Loutchkina et al., 2014). Because of its simplicity, NBM is widely used in different environments. Defect prediction
models provide an example of NBM that are aimed at software component reliability (Steindl & Mottok, 2012a) during the SW development cycle and do not specifically focus on errors created during the SW integration and test phase (Mende et al., 2011).
Further discussion on NBM and the value of Bayesian probabilities in predicting in complex environments is provided in Sections 2.6 through 2.9.
20 2.2.3 Parametric Models
Parametric models with modifications to accommodate the complexity most system integration venues now experience have shown promise. Loutchkin (2014) provides a parametric model (PM) that estimates the technical risk that occurs during integration. This model combines a parametric model with Bayesian Belief Network (BBN) to understand and predict the potential risks a system will face for a given
integration environment. To provide the conditional probabilities required for the BBN, the researchers (Loutchkina et al., 2014) chose to use PMs for this purpose due to the unavailability of historical data to provide the node probabilities. This approach (Loutchkina et al., 2014) focuses solely on the technical risk and does not discuss the potential schedule impact that is related to the interdependencies that occur during SWI, nor are the project and organizational related factors considered. The BBN represents the probabilistic relationships between 69 nodes that connect to while the PMs define risks specific to a project. One project use case was used to evaluate the process for the
modeling concept with no accuracy measure provided. Additional discussion on BBNs is discussed in Section 2.2.4 below.
2.2.4 Simulation Models
A review of research on simulation models to support schedule estimation shows a focus on process. (Mizell & Malone, 2007) provides a method to assess the
uncertainty throughout the lifecycle of the project by focusing on the three parameters that are key to the initial SW cost estimation (that includes the schedule). The outcome presented by (Mizell & Malone, 2007) will provide the manager with an understanding of the magnitude of the uncertainty at each phase of the development to include
21
integration. This information will help the manager make better estimates as (Mizell &
Malone, 2007) showed through the assessment of the model on two projects. However, there is no evidence that this initial estimate is sufficient to support schedule adjustment during SW integration.
In other research, (Houston, 2014) simulation was used to propose a method using discrete event simulation to forecast the duration of test-fix-test (TFT) cycles during SW development (Houston, 2014). The model required inputs regarding the constraints of the test environment such as number of test beds, number of errors in the queue and modeled the rework cycle given these constraints. The TFT work is focused on process and
technical parameters, while the proposed NBM in this dissertation approaches schedule prediction through machine learning of historical data and does not explicitly consider the process. The TFT discrete simulation can also be considered complementary to the
proposed NBM since both are attempting to bring attention and analysis to this important phase of SW development. The accuracy of the TFT model ranged from 49-79%;
however, the independent variables used for the TFT model is not applicable to the SWI environment.
2.3 Systems Thinking
Systems thinking provides the mental model and insights into the holistic
environment (Lamb & Rhodes, 2009) that supports understanding the SWI environment.
The SWI environment is important to support development of variables that contribute to the delay created by the iterative nature of the PRC. Systems Thinking contributes to the recognition that solutions in complex environments are influenced by factors that include technical (HW/SW System) and organizational factors (Madni & Sievers, 2014).
22
As shown in Figure 3, these relationships and interdependencies are exhibited as the SWI Challenges that are discussed in Section 2.5.
Figure 3 - SWI Environment in Systems Thinking
Felder (2012) attributes the increased system complexity experienced in the aerospace industry to integrated systems capabilities that necessitates collaboration as key to systems thinking. Lamb and Rhodes (2009) also discuss collaborative system’s thinking as an essential component of resolving and managing SWI due to multiple dynamics that are at work in complex environments where interdependent systems are required to meet the user’s integrated capabilities requirements (Lamb, 2010). Lamb (2010) concludes that complexity has both organizational and technical implications that must be considered to solve critical issues and errors. During the Literature Survey, as the data was extrapolated from existing problem reports, attention was given to both types of variables as potential causal factors that contribute to the integration delay.
System’s Thinking SWI Environment
HW/SW Systems
Project Management
Influences Subject Matter Expert
Testers
System Integration
Factors
23
Langford (2012) advocates for considering systems thinking during system integration as the way to bring context to SI data. By viewing SWI through the context of systems thinking, the feedback loop between the integration event and the problem resolution cycle provides insights into the factors that contribute to the schedule delay. Balancing this “paradoxical nature” of the SoS is rooted in SoS thinking (B. J. Sauser, Reilly, &
Shenhar, 2009). SoS Thinking is defined as derivation of systems thinking which offers a methodology that considers the holistic rather than the individual (B. Sauser &
Boardman, 2008).
According to researchers (Felder, 2012; Lamb & Rhodes, 2009; Langford, 2012) tystems thinking provides the rationale defining the boundaries of system integration that include the organizational influences beyond the physical integration event. By looking at the expanded environment, the PRC must include the technical product and relevant organizational and project management factors. Allowing for both acknowledges the complexity reflected in the combination of factors that create the delay.
2.4 Knowledge Integration
For this dissertation, knowledge integration is used to demonstrate the combined impact of sources that are used to define the features for the NBM. Knowledge
integration is the “combination of existing knowledge to create new knowledge” (Sargis- Roussel & Deltour, 2012). For system integration, knowledge discovery is a key
component of data preprocessing to leverage various sources that support understanding influences in the SoS that impact schedule. The accuracy of NBM, similar to other models, is heavily dependent on these features (Strumbelj & Kononenko, 2014) that represent the independent variables in the NBM.
24 2.4.1 Knowledge In A SoS Environment
Knowledge is often dispersed in a SoS environment (Patnayakuni, Arun Rai, &
Tiwana, 2007) and thus requires merging disparate knowledge sources to accumulate actionable information. In order to transform the raw data from the Army SW Integration event into features that can be used for prediction, the SWI challenges presented in a focused literature survey were used to extract information that maps to one or more SWI challenges. Information sources external to the event were also included as features to further enable the model, which aligns with the systems thinking that recommends expanding the system boundaries to more accurately reflect (Rebovich, 2006) those influences on the prediction. Knowledge integration for this research relied on literature survey to establish SW integration challenges that are then used to elicit knowledge from historical data and external sources to create the features for the NBM.
Knowledge integration also includes the intent to institutionalize the newly created knowledge, which for this research resulted in a learning database for the model.
The combined knowledge from the literature survey, raw data preprocessing, and external information sources were used to create the training database. The external data consists of acquisition category for the system with the error that is an indicator of the program’s programmatic influences on the delay based on the acquisition category. The acquisition category is assigned by the acquisition executive and provides the level of oversight for the program (DOD directive 5000.1, 2015).
2.4.2 Feature Engineering and Feature Selection
Feature Engineering (FE) is a critical preprocessing step in Knowledge Discovery and Data Mining (KDDM). The feature engineering for this research is a cycle that
25
results in feature selection that improves the accuracy of the model based on analysis thresholds set for features as shown in Figure 4. Research has shown that additional nodes increase complexity of the NBM but do not necessarily increase accuracy (Holte, 1993).
Figure 4 - Feature Engineering Cycle
While there is emerging work that investigates automating this process, FE is still primarily manual. The intent of FE is to transform raw data into derived that are
meaningful to the domain being modeled features (Davis & Foo, 2016). FE typically requires domain knowledge to initiate the features followed by a process to trim features based on a constraint regarding strength of the contribution each feature has toward the prediction. Mapping the data to knowledge sources provides context to the features. In this dissertation, subsets of the features were used to show their impact on model accuracy with the intent to determine the minimum number of features that provide the most accurate model. NBM accuracy is heavily dependent on the features, so significant effort was spent on this process. While there is emerging work that investigates
automating this process, FE is still primarily manual. As stated earlier, NBM assumes independence between features; however, Domingos (1997) proved the model could still be effective in spite of not meeting this assumption. Other researchers recommend
Independence Test
Features Decision (Trim?) Model
Accuracy
26
variables that show dependence with multiple features be trimmed to determine if the accuracy improves. For this dissertation, a combined manual and SW driven process was used. To improve accuracy, the G-test for Independence was used to trim features that were determined to have a dependency with more than one other feature. The G-test determines the independence between two variables.
2.5 SW Integration Challenges Based on Literature Survey
Understanding the SWI challenge is an essential step in developing a model to represent the important attributes that impact the prediction of schedule delay. While individual errors from any one event can appear to be unique, several researchers have documented categories or types of errors routinely experienced during SI that represent common challenges. This research has identified common contributors to problems or risks in the SWI phase. To document these challenges, a literature survey of recent research was conducted to define SW Integration challenges that potentially impact the integration schedule. Each of the articles included in the survey are discussed below with respect to the SWI challenges presented in the article. The relevant literature was limited to peer reviewed journal articles and conference papers based on the result of online journal searches. Each search focused on the period from 2006-2016. The original searches for SW integration challenges resulted in few relevant articles that focused on SW integration in a System of System environment. Therefore, subsequent searches were expanded to include the following:
• SW Integration challenges in a SoS environment
• SW Integration testing challenges in a SoS environment
• SW Resource Estimation (Focused on SW integration phase)
27
• SW Schedule Estimation (Focused on SW integration phase)
The results of the literature survey were used to reveal SWI challenges that are commonly discussed in research with the intent to quantify their impact on schedule delay. The definition of each challenge identified was aligned with data from the Army SWI events to determine the frequency, prior and posterior probabilities required to define and predict the schedule impact of an error. A summary of the survey findings is provided in Table 1.
Table 1 - Literature Survey Challenges Summary
*Note: Each journal article can include multiple SWI challenges The challenges all presumed to contribute to schedule risk (Gandhi, Gorod, &
Sauser, 2011), while also considered characteristics of a SoS (B. Sauser & Boardman, 2008). The SWI Challenges include System Interdependencies, Independent
Management, Technical Risk (Quality & Emergence) and Non-Technical Risk (Project Management related) and SoS complexity. Each of the SWI challenges found during the Literature Survey is discussed further in section 2.7.
SWI Challenge Frequency of Mention in 30 Sources *
Systems Interdependencies 60%
Independent Management 57%
Technical Risks 33%
Non Technical Risks
(Project Mgmt, Business, Resources)
37%
SoS /Complexity 73%
28 2.5.1 System Interdependencies
This challenge is a reflection of the physical or technical interfaces that create a dependency between systems. Mane and DeLaurentis (2009) demonstrate through case study analysis that the interdependencies between systems increase the time to complete integration. The researcher instead used simulation of scenarios in U.S. Department of Defense acquisition to model the hypothesis and reach the conclusions.
Other authors recognize the role of system interdependencies on SoS
performance. A key characteristic of the SoS is emergent behavior that is holistic and cannot be exacted from an individual system (Dahmann, Rebovich, Lane, Lowry, &
Baldwin, 2012). Emergent behavior is good when it results in a desirable, planned outcome, however when the outcome is unexpected and unwanted, the behavior results in an error (Ferreira, Faezipour, & Corley, 2013). The interoperability challenge created by information exchanges, data sharing, or dynamic connections required to complete a thread (Felder, 2012) is dependent on the quality of the interfaces (B. Sauser, Reilly, &
Shenhar, 2015) that rely on coordinated configuration management of the individual systems (Felder, 2012). System upgrades that are at different phases of the lifecycle and the fact that systems are being developed in parallel (Petersen et al., 2014) are also problematic when resolving integration errors. Additionally, the individual system optimization does not ensure the SoS optimization (Jain, Chandrasekaran, Elias, &
Cloutier, 2008) that relies on understanding the interdependencies and the individual system contribution. The collateral impact of these interdependencies creates further difficulties in problem resolution because the “cause and effect” relationships are not easily discernible at an individual system level (Jain, 2010).
29
The totality of the challenges created by interdependencies is the source of
“bottlenecks” (Petersen et al., 2014) in the SW development phase. Modeling these interdependencies (Loutchkina et al., 2014) is offered as a means to understand these impacts and risks in the SoS environment. Also, representing the SoS capability through architecture (DiMario, Cloutier, & Verma, 2008) is recommended to fully understand the complex interrelationships.
2.5.2 Independent Management (Autonomy)
Understanding the system interdependencies is critical to the PRC; however, execution and implementation of a process makes the role of management just as
essential. Independent management of the component systems is another challenge of the SoS. Many of the same concerns with the physical interdependencies established within the SoS are equally problematic with the independent management processes during SW integration phase.
Research indicates that the traditional SE management process is insufficient to be successfully applied to the SoS environment (Baldwin & Sauser, 2009). The functional independence with separate chains of command adds to the process complexity (Felder, 2012). During PRC, the need for collaboration among stakeholders to troubleshoot and determine the source of the error is often managed without an authoritative leader
(Dahmann, Lane, Rebovich, & Lowry, 2010). Often, processes such as risk management are implemented at the individual system level, but are not adequately managed at the SoS level (Gandhi et al., 2011). Individual system errors can be triggered by a different system that can result in the modification to yet another system. The time to collaborate within and across management lines of authority to resolve an error often extends the
30
PRC due to the lack of “span of control” (Mane & DeLaurentis, 2009) at the SoS level.
Mane (2009) also concluded that the extent of the “direct or indirect” span of control managed at the SoS level is an even larger contributor to the time required for integration.
This work further concluded that the variation in time to integrate is related to the span of control with more control taking less time to integrate (Mane & DeLaurentis, 2009). It is reasonable to expect that when an error occurs this trend is still true; however, only hypothetical proof was provided. Many times, these management challenges are not resolved prior to the integration event (van Moll & Ammerlaan, 2008) resulting in increased schedule delay when an error occurs. Understanding the correlation and co- dependency of the systems with the SoS is key to managing the environment
(Davendralingam & Kenley, 2013; B. J. Sauser et al., 2009) regardless of the role of the individual management chains.
2.5.3 Software Integration (SWI) Risks
Research (Chittister & Haimes, 1996) on areas of SW risk includes systems and organizational factors. They further recognize the contribution of technical and non- technical risks. The researcher further acknowledges the overlap of these considerations with factors that includes “ temporal (life cycle), leadership, environment, acquisition process, quality and technology”. By acknowledging these risks, there is the opportunity to understand and quantify their impact on schedule. Sections 2.6.3.1 below describes these risks and research on how they can be identified and defined.
31 2.5.3.1 Technical Risk
The technical risks associated with the SoS are based on uncertain emergent behaviors and can also be traced to the quality of the constituent systems (Ferreira et al., 2013). Technical risks are those that are directly traced to the system HW or SW. These risks are susceptible to emergent behavior. While emergent behavior is a desirable feature of the SoS when predicted and controlled, the opposite is true when the behavior is undesired (Ferreira et al., 2013). Both of these behaviors are a result of the interactions of the systems but only those that are undesired properties (Ferreira et al., 2013) present a risk to the SoS. The risk process can anticipate and plan for expected risks, but
unexpected risks represent the challenge to the PRC that requires collaboration and combined effort to resolve. With the existing data available for this research, emergence can be mined within the categories of data collected such as the type, severity and frequency of errors. Many of the technical risks are found during integration testing reflect on the quality of the effected system. Each of these categories is also an indicator of system quality although no further specificity is available in the data.
2.5.3.2 Non-Technical Risk
This category is used to discuss research that is traditionally considered project management that primarily focuses on business and resource management with a focus on the activities that likely impact schedule. While these factors are external to the specific ongoing integration event, they may constrain the PRC resolution options available to systems managers. The need for cross functional and cross organizational collaboration (Thamhain, 2013b) is a factor in the time needed to conduct root cause analysis to include allocation of required personnel resources. The lack of SI processes
32
that support management in this phase is indicative of the continued emphasis solely on the physical challenges (Jain, 2010).
Alignment of stakeholder objectives and project management to include funding requires agreed upon governance (Dahmann, 2015) that is often lacking in the SoS environment. The implementation of this alignment further has to support programmatic milestones as approved in acquisition category (ACAT) with the process different
depending on the ACAT as defined in DoD Directive 5000 (2015). This category creates varying levels of oversight that can have an overall impact on the PRC. According to (Thamhain, 2013b):
“The involvement of many people, processes, and technologies spanning different organizations, support groups, subcontractors, vendors, government agencies...
compounds the level of uncertainty and distributes risk over a wide area…. often creating surprises with potentially devastating consequences.”
In spite of these challenges and risks, Project Management as traditionally executed does not provide the tools and processes that are intended for the SoS environment and its uncertainty (Thamhain, 2013b). Through surveys of prior U.S.
Department of Defense aerospace projects (Davendralingam & Kenley, 2013) measured the complex relationships of these stakeholders to determine the impact of specified risk categories on cost and schedule. The survey (Davendralingam & Kenley, 2013)
concluded, “more complex stakeholder relationships demonstrate significantly higher schedule delay”.
33 2.5.4 Complexity
Since the 1990s researchers have recognized and tried to quantify SW complexity (Basili, Briand, & Melo, 1996) by focusing on individual and internal influences
specifically related to the technical aspects of a system. However, there is no single way to define or measure complexity (Maurer, Schneller, & Omer, 2014). Current research shows the uncertainty associated with complex environments creates a challenge that is nonlinear and results in increased program risk (Madni & Sievers, 2014) that is
considered responsible for multiple failed projects in public and private sector. Multiple technical and non-technical factors create the complexity experienced in the SWI
environment. Complexity can also include programmatic or organizational influences as non-technical risk factors. SI should considers “lifecycle, architecture, process,
interfaces, enterprise, product and data” (Madni & Sievers, 2014) to be among the critical factors that bring complexity to SI. Research on the rationale for this continued underestimation of SI effort, particularly in regards to schedule indicates the significant role of complexity in environments such as SW integration. Other similar research by Thamhain (2013b) identifies organizational factors as the key to success within a
complex environment. The researcher (Thamhain, 2013b) further argues that integrating cross-organizational resources is a source of uncertainty. Sauser (2008) concludes that this complexity is driven by a convergence of individual organizations, unique and sometimes overlapping capabilities, separately controlled organizations and a schedule that is never executed as originally planned. The combined effects of these challenges provide the characteristics of complexity even if there is no consensus definition.