Chapter 2 – Literature Review
2.2 Existing SW Development Estimation and Prediction Models
Researchers that focused on the SW development resource estimation models including schedule, generally focused on mitigating and understanding the role of uncertainty. These studies use different approaches to define the causes of uncertainty and propose different prediction techniques to improve traditional estimation models.
The goal of most of these models is focused on individual software development where most of the expertise and system understanding is found (Madni & Sievers, 2014).
However, these SW development prediction tools do not recognize the unique challenges that occur during the integration phase. Many linear models are limited in their accuracy because of the nonlinear relationship as well as the multiple dynamic and interrelated dependencies within a SOS. However, there are models that are intended to provide better accuracy in complex environments such as SW integration. A review of
representative models is reviewed below to show current approaches to prediction in the SW development environment.
18
2.2.1 Fuzzy Logic Model Using Function Points
One category of prediction models uses function points (FP) to improve schedule prediction in complex environments. López-Martín, Chavoya and Meda-Campaña (2015) proposed a FLM using adjusted function point (AFP) as an independent variable to estimate new project size with the intent to improve the development schedule prediction. The AFP is a composite variable that includes nineteen independent variables: internal logical files, external inquires, and fourteen characteristics (López-Martín et al., 2015) that are specific characteristics of the project. The resulting mathematical model was validated through 20 SW projects as use cases based on their mean absolute residuals (MAR = actual schedule minus predicted schedule). The research concluded that the MAR for the FLM is more accurate at 33% when compared to the 29% accuracy of simple linear regression (SLR). Similar assessment of FPs calibrated by FLM (Kaushik, 2013) also reported small but promising improvements over traditional SW development estimation. Kaushik (2013) used artificial data to reach these conclusions.
2.2.2 Probabilistic Models
Bayesian Belief Networks (BBN) are also used for schedule prediction to quantify schedule risk. BBN is a directed acyclic graphical (DAG) method for assessing the relationships between elements in a probabilistic manner. Each BBN nodes can have multiple parents and multiple child nodes resulting in a complex set of relationships. For this reason, BBNs heavily rely on expert judgment, surveys, and interviews to establish the variables and their relationships that are essential to the accuracy of the BBN (Luu, Kim, Tuan & Ogunlana, 2009). BBNs also require a significant amount of data domain
19
expertise regarding multiple relationships. A 2014 study (Misirli & Bener, 2014) shows that most BBNs lack data and the majority depend solely on expert knowledge that described determination of the conditional probabilities as time consuming and difficult to determine. The same survey shows BBNs used for SW engineering primarily focused on system level fault prediction (Misirli 2014). There was no percentage provided for schedule prediction explicitly, however, 27% of survey respondents reported SW Engineering Management as the prediction subject that includes schedule and other resource estimations factors.
NBM offers a simpler approach as a unique BBN that has one parent (target) and multiple child nodes (features or predictors) while retaining the accuracy of the BBN.
The NBM looks only at the dependencies between the target and predictors, which simplifies the model and the understanding required to analyze the results. Also, there is evidence that NBMs can be successfully constructed from data and not rely on expert judgment that is typically used to create decision tools (Loutchkina et al., 2014). Because of its simplicity, NBM is widely used in different environments. Defect prediction
models provide an example of NBM that are aimed at software component reliability (Steindl & Mottok, 2012a) during the SW development cycle and do not specifically focus on errors created during the SW integration and test phase (Mende et al., 2011).
Further discussion on NBM and the value of Bayesian probabilities in predicting in complex environments is provided in Sections 2.6 through 2.9.
20 2.2.3 Parametric Models
Parametric models with modifications to accommodate the complexity most system integration venues now experience have shown promise. Loutchkin (2014) provides a parametric model (PM) that estimates the technical risk that occurs during integration. This model combines a parametric model with Bayesian Belief Network (BBN) to understand and predict the potential risks a system will face for a given
integration environment. To provide the conditional probabilities required for the BBN, the researchers (Loutchkina et al., 2014) chose to use PMs for this purpose due to the unavailability of historical data to provide the node probabilities. This approach (Loutchkina et al., 2014) focuses solely on the technical risk and does not discuss the potential schedule impact that is related to the interdependencies that occur during SWI, nor are the project and organizational related factors considered. The BBN represents the probabilistic relationships between 69 nodes that connect to while the PMs define risks specific to a project. One project use case was used to evaluate the process for the
modeling concept with no accuracy measure provided. Additional discussion on BBNs is discussed in Section 2.2.4 below.
2.2.4 Simulation Models
A review of research on simulation models to support schedule estimation shows a focus on process. (Mizell & Malone, 2007) provides a method to assess the
uncertainty throughout the lifecycle of the project by focusing on the three parameters that are key to the initial SW cost estimation (that includes the schedule). The outcome presented by (Mizell & Malone, 2007) will provide the manager with an understanding of the magnitude of the uncertainty at each phase of the development to include
21
integration. This information will help the manager make better estimates as (Mizell &
Malone, 2007) showed through the assessment of the model on two projects. However, there is no evidence that this initial estimate is sufficient to support schedule adjustment during SW integration.
In other research, (Houston, 2014) simulation was used to propose a method using discrete event simulation to forecast the duration of test-fix-test (TFT) cycles during SW development (Houston, 2014). The model required inputs regarding the constraints of the test environment such as number of test beds, number of errors in the queue and modeled the rework cycle given these constraints. The TFT work is focused on process and
technical parameters, while the proposed NBM in this dissertation approaches schedule prediction through machine learning of historical data and does not explicitly consider the process. The TFT discrete simulation can also be considered complementary to the
proposed NBM since both are attempting to bring attention and analysis to this important phase of SW development. The accuracy of the TFT model ranged from 49-79%;
however, the independent variables used for the TFT model is not applicable to the SWI environment.