2nd Annual Conference on Systems Engineering Research, April 2004, Los Angeles, CA.
Lessons Learned From Collecting Systems Engineering Data
Ricardo Valerdi
Center for Software Engineering University of Southern California 941 W. 37th Place, SAL Room 330
Los Angeles, CA 90089 [email protected]
John E. Rieff
Raytheon
Intelligence and Information Systems P.O. Box 660023
Dallas, TX 75266-0023 [email protected]
Garry J. Roedler
Lockheed Martin
Integrated Systems and Solutions P.O. Box 8048, Building 100, Room U8253
Philadelphia, PA 19101 [email protected]
Marilee J. Wheaton
The Aerospace Corporation 2350 E. El Segundo Blvd.
El Segundo, CA 90245 [email protected]
Abstract
The emergence of the Capability Maturity Model Integration (CMMI) as the de facto process capability standard highlights the importance of the integration of the Systems Engineering function with other engineering disciplines. Estimating Systems Engineering effort, however, is currently not a formalized activity. The Center for Software Engineering at the University of Southern California, in conjunction with its Corporate Affiliates and INCOSE, has been working towards formalizing Systems Engineering cost estimation and developing a parametric cost estimation model. Corporate Affiliates provide historical data from completed projects which play a pivotal role in the development of cost estimation models and help determine the relevant parameters that drive project cost.
While companies have Systems Engineering processes that conform to EIA/ANSI 632 and ISO/IEC 15288, the individual cost accounting efforts follow dramatically different approaches. In order
to develop a cost model that is relevant across different domains, there needs to be some convergence in the costing approaches and associated data collection so that meaningful cost estimating relationships can be developed.
To help facilitate the introduction of the Constructive Systems Engineering Cost Model (COSYSMO), three organizations have mapped their cost accounting approaches to a similar framework. This paper is a collection of lessons learned from the data collection activities and the model development process that will help create a model for improved Systems Engineering cost estimation. It will describe the data collection requirements for COSYSMO, the challenges encountered in this effort, and the data collection approach adopted by the collaborative team to overcome the challenges. These challenges include verifying and refining the data, appropriate analysis and use of the data for model calibration, and safeguarding the data.
Background
Parametric Models. The use of
parametric models in engineering management serve as valuable tools for engineers and project managers to estimate engineering effort. Developing these estimates requires a strong understanding of the factors that affect engineering effort. Industry and academia have teamed up in the past to develop one of the most popular software development models in the world: the Constructive Cost Model (COCOMO). The Center for Software Engineering (CSE) at USC is building on this work to develop a parametric model to estimate systems engineering effort. The Constructive Systems Engineering Cost Model (COSYSMO) represents the latest collaboration between CSE and its Corporate Affiliates which include Raytheon, Lockheed Martin, and The Aerospace Corporation, among others.
USC CSE Corporate Affiliate Program. In the Spring of 2002 a working
group was formed to develop the initial version of COSYSMO and identify possible sources of data to use for calibration of the model. Since that time over a dozen CSE Affiliate organizations have joined the working group and have participated in various working group meetings to refine the model. The diverse experience of the working group members includes but is not limited to space systems hardware, information technology, radars, satellite ground stations, and military aircraft. This broad scope helps ensure that the model is robust enough to address multiple areas.
The typical involvement of Affiliate companies is twofold. First, each company provides a group of systems engineering experts to rate the model drivers through the use of a wideband Delphi survey. This exercise allows for expert judgement to be captured and included in the model. An
added source of support has been the INCOSE Measurement Working Group. The members of INCOSE have provided valuable feedback that has greatly improved the model. Second, the Affiliate companies provide historical project data for the COSYSMO calibration to validate the model parameters. This ensures that the Cost Estimating Relationships (CERs) in the model are appropriately weighed according to the data received from completed projects.
Need for SE data to calibrate COSYSMO. Industrial participation in the
development of COSYSMO is key to the usefulness and relevance of the model. The four size drivers defined by the COSYSMO working group are listed in Table 1. Each driver has a corresponding item that can provide the necessary data for the calibration. The # of System Requirements,
for example, can be derived from the system specification document. For complete definitions of the size driver and effort multipliers included in the model see Valerdi et al (2003).
Table 1: COSYSMO Size Drivers
Driver Name Data Item # of System
Requirements Counted from system specification # of Interfaces Counted from
interface control document(s) # of Operational
Scenarios
Counted from test cases or use cases # of Critical
Algorithms
Counted from system spec or mode
description docs
The initial industry calibration is essential to understanding the model’s robustness, establishing initial relationships between parameters and outcomes, and determining the validity of drivers.
However, each organization using COSYSMO will need to perform a local calibration. Counting the # of Interfaces
may be a very different activity for two distinct organizations. Through the industry calibration, the working group can establish the values for various scale factors for each driver. This might not be possible or feasible from a local calibration due to the size of the calibration data set and the narrow scope of a single organization’s project database. The industry data can also identify elements or features of the model that need refinement, for instance, the definition of the # of Operational Scenarios
size driver. Organizations have different approaches in identifying scenarios and how to count them. Obtaining data from multiple sources may also identify new drivers that need to be included in future revisions of the model.
An important reason for an industry-level calibration is the acceptance of the model for cost estimation by the Defense Contract Audit Agency (DCAA). Even though each organization needs to prove the local calibration matches the local organization's productivity and trends, the industry calibration shows DCAA the model meets the expectations and standards of the Systems Engineering industry.
Lessons Learned
The following is a collection of lessons learned that have been identified by the working group during the data collection activity for the COSYSMO model. Some lessons are generalizable to the model building process and others are more specific to particular features of COSYSMO.
Scope of the model. A large emphasis
has been placed during the development of the COSYSMO model on the scope of what
activities the model would cover. It was recognized early on that what comprised systems engineering activities varied extensively across organizations and projects. The key to collecting consistent data across disparate organizations is to clearly define the content in a Work Breakdown Structure (WBS). Mappings also need to be established between the organization’s or project’s WBS elements and the COSYSMO standard WBS elements. The standardized WBS then becomes the framework for discussion of what is included and what is not included for a particular data collection point. The standardized WBS is also used for determining the scope of what is included when using the model for estimating as well.
Lesson #1: A standardized WBS and dictionary provides the foundation for decisions on what is within the scope of the model for both data collection and for estimating.
Types of projects needed for data collection effort. The breadth of domains
represented by the Affiliate members encompasses a wealth of high technology systems across both commercial and defense industries. In order for the COSYSMO model to accurately estimate systems engineering costs, it is imperative to collect sufficient data across multiple domains to calibrate and validate the model. Projects identified for collection must be completed projects and have both the cost and the technical data elements available. In addition, there needs to be access to project domain experts to clarify discrepancies or inconsistencies in the data collected.
Another attribute for selecting projects for data collection is that projects identified must have sufficient granularity in the cost data to allow mapping to the systems engineering cost elements. If the
aggregation of the cost elements is such that the systems engineering effort cannot be separately discerned, then that project cannot be used for calibrating the model.
A final consideration is that the cost data and the technical data need to be synchronized for the data collection point. That is, the data collected for the systems engineering cost elements needs to cover the same content as the data collected for the size drivers and effort multipliers representing the systems engineering effort for a specific project.
Lesson #2: Careful examination of potential projects is necessary to ensure completeness, consistency and accuracy across all required data collection items for the project.
Size drivers. The size drivers for the
COSYSMO model as identified in Table 1 represent the necessary project level parameters that define the technical scope, or “size”, of the systems engineering effort. These parameters in many instances have not been collected in the past for completed projects. Deriving the quantities associated with these sizing parameters requires access to project technical documentation such as the system specifications and interface control documents. In addition, access to project systems engineering personnel familiar with these documents is key to determining accurate counts for these size drivers.
Lesson #3: The collection of the size driver parameters requires access to project technical documentation as well as project systems engineering staff that can help interpret the content.
Effort Multipliers. The COSYSMO
effort multipliers are identified in Table 2. Collection of the listed data items provide the qualitative assessment of the impact of
the significant parameters that drive the cost of systems engineering on a specific project.
Collecting these parameters on a completed project requires that they be assessed considering the project as a whole, and not solely with the hindsight that comes at the end of a project. For example, at the end of a project, the requirements understanding is high, however, this effort multiplier must be rated taking into account the level of requirements understanding throughout the entire project in order to properly account for the entire systems engineering effort associated with the requirements.
Table 2: COSYSMO Effort Multipliers
Driver Name Data Item
Requirements Understanding
Subjective assessment of the system requirements Architecture
Understanding
Subjective assessment of the system architecture Level of Service
Requirements
Subjective difficulty of satisfying the key performance parameters
Migration Complexity
Influence of legacy system (if any)
Technology Risk Maturity, readiness, and obsolescence of technology Documentation to
Match Life Cycle Needs
Breadth and depth of required documentation
# and Diversity of Installations/ Platforms
Sites, installations, operating environment, and diverse platforms
# of Recursive Levels in the Design
Number of levels of the Work Breakdown Structure Stakeholder team
cohesion
Subjective assessment of all stakeholders
Personnel/team capability
Subjective assessment of the team’s intellectual capability Personnel
experience/ continuity
Subjective assessment of staff consistency
Process capability CMMI level or equivalent rating Multisite
coordination
Location of stakeholders and coordination barriers
Tool support Subjective assessment of SE
Lesson #4: The rating of effort multiplier parameters for a completed project requires an assessment from the total project perspective.
SE hours across life cycle stages. The
discrepancies between data reporting structures of different companies has added some complexity to collecting data for the model. Organizations do similar tasks differently mainly because of the corporate culture that develops at local sites. Even projects within the same organization are performed differently due to customer requirements. This will be the same with SE Cost Estimation and Data Collection. Major companies that develop complex systems also employ unique life cycle models and, as a result, divide their SE work into diverse categories and subcategories. In fact, there are various standards (EIA/ANSI 632, ISO/IEC 15288, and IEEE 1220) for a systems engineering process, each covering a different level of depth and breadth of the processes and life cycle. Even though these standards are similar, they lead to unique implementations by the various companies.
Lesson #5: Agree on a standardized set of life cycle stages for the model despite the different processes used by Affiliate companies.
Data collection form. Some CSE Affiliates have already established their own data collection process. Aside from the standard information being sought by the COSYSMO model, they have anticipated the need for additional data. Other Affiliates have just recently begun collecting systems engineering data. The data collection form for COSYSMO has to accommodate both of these extremes. As such, it must be detailed enough to gather systems engineering effort information for the parameters listed in
Tables 1 and 2 as well as cost accounting information for systems engineering tasks for a standard WBS. At the same time it must be useful to Affiliates that only collect high level systems engineering and program management data.
Lesson #6: The data collection form must be easy to understand and flexible enough to accommodate organizations with different levels of detail so that they can contribute data and use the model.
Definition. It is important to provide
clear, complete definitions of all drivers and associated data to ensure that all constituents who will provide data interpret them in the same way. The absence of clear definitions will result in different types of data being provided. Thus good definitions will help ensure that the right data is collected and can be used in the model development.
For the COSYSMO project, the working group reviewed and revised the driver and data definitions multiple times, continuing to examine their clarity and completeness. Each meeting included some new representatives, which provided a fresh view. This prevented arriving at a false conclusion that the goal had been met when the team had actually become too familiar with the definitions. During the meetings all changes were documented after achieving consensus of the team members present. The resulting revised definitions were then provided to all constituents for further review. This multi-pass review process has proven to be effective.
Lesson #7: Spending more time on improving the driver definitions has ensured consistent interpretation and improved the model’s validity.
Significance vs. Data Availability.
Prior to collecting the data, the data items should be analyzed for their significance in
providing the desired insight to allow derivation of the cost estimation relationships. The objective is to identify the data that will provide relevant and reliable insight for the drivers in the model yet is feasible and cost-effective to collect.
The significance for the COSYSMO project was evaluated first through expert consensus and Delphi analysis techniques to arrive at a set of drivers and associated data that would likely be meaningful for determining the cost estimating relationships. However, data collection feasibility and cost effectiveness must also be considered. Data that provides the exact insight desired is not the right choice if it cannot be collected in a cost-effective manner (or at all). The data must be analyzed for its availability during the normal course of performing the activities of the life cycle processes. It should be readily available and relatively easy to capture while performing the processes, in order to ensure cost-effective data capture and absence of error from deriving it after-the-fact.
For the COSYSMO project, a data collection feasibility survey was constructed prior to the actual data collection. The objective of the survey was to determine whether the desired data could easily be collected by a large number of organizations. The premise was that the organizations would most likely only expend resources a single time to collect and provide the data. Therefore, it was important to ensure the set of data items were feasible to collect from the start. Any data item that was identified as not available to a significant number of organizations would be a candidate for replacement by another data item that could provide similar insight. If no other data item existed for the driver, then the driver would be retired or replaced. It is better to use the second choice and get the data, than to insist on the
first choice and receive nothing. Figure 1 shows an excerpt of the data survey form.
Driver Name Data Item Have it Can get it Forget it Requirements Understanding Subjective assessment of reqs
□
□
□
Level of Service Requirements Difficulty of satisfying KPPs□
□
□
Technology Risk Maturity,
readiness, and obsolescence of technology
□
□
□
# of Recursive Levelsin the Design Number of levels of the
WBS
□
□
□
Stakeholder team cohesion Subjective assessment of all stakeholders□
□
□
Process capability CMMI level
or equivalent
□
□
□
Multisite coordination Location of
stakeholders
□
□
□
Figure 1. Data Collection Survey Excerpt.
Lesson #8: If no data can be collected for a particular driver then that driver cannot be used because its influence on systems engineering effort cannot be validated.
Influence of data on the drivers and statistical significance. After the data is
collected, verified and normalized, it is analyzed to understand the quantitative relationship between the drivers. Obviously, verified, accurate data will provide better understanding of the relationships. The results of the analysis are used to determine which drivers are the most sensitive and are independent of other drivers. These drivers are more critical to the development of the estimation model. The drivers that are found to be insignificant are considered for elimination.
As is the case when analyzing any data, there needs to be enough data points to provide statistical significance and raise the confidence level in the results.
Lesson #9: Historical data can help determine which drivers should be kept in the model and which should be discarded.
Data safeguarding procedures. Data
collection activities at CSE have a heritage to the original data collection effort by Dr. Barry Boehm in the early 1980s for the original COCOMO model. Since then the CSE has maintained a solid track record of protecting the data it receives from Affiliates and ensuring confidentiality. The project leads for the COSYSMO working group have successfully obtained data for the model by signing non-disclosure agreements with Affiliates and communicating the data management and storage procedures in place at CSE.
Lesson #10: Establishing non-disclosure agreements early on in the process enables the data sharing and collaboration to easily take place.
Buy-in from constituents. The project
has spent significant energy cultivating buy-in from its constituents. The constituents of the COSYSMO project include contractors, systems acquisition authorities, the International Council on Systems Engineering (INCOSE) Measurement Working Group, the Systems Engineering Center of Excellence, and the USC CSE Corporate Affiliates. Without developing the buy-in of the constituents, there is little to no chance to obtain the necessary data for the model development. Buy-in has been primarily a function of need, relevance, and confidence in the team developing the model.
Showing the constituents that the model is targeted at addressing one of their needs has been essential. Since no SE cost model currently exists in the technical community, it was fairly easy to show that an important need would be addressed with a successful model. Both contractors and the customer community can benefit from having a common parametric model based on actual industry data to aid development and validation of SE cost estimates.
In addition to addressing a need, it was necessary to demonstrate to the constituents that the results would be relevant to their work. This resulted in changes to the scope and planning of the project. These changes in scope included coverage of more of the life cycle and SE tasks performed. As a result of the changes, the data collection and analysis effort is more complex.
The confidence in the model development team has grown by getting many of the constituents involved and/or incorporating their input. In addition, periodic reviews of the driver and data definitions and progress of the model have been conducted with the constituents.
A prototype version of the model has also been developed to facilitate the participation of the end user community. See Boehm et al (2003) for more information on the Excel implementation of COSYSMO.
Lesson #11: The success of the model hinges on the support from the end-user community.
Conclusion
Our experience during the development of the COSYSMO model has led to several valuable lessons learned in the areas of parametric model building and systems engineering effort estimation. While the nature and kind of available systems
engineering data is non-standardized, there is a wealth of information that can be harvested to develop a useful model that can help us better understand the drivers of systems engineering. Moreover, we believe that COSYSMO will help advance the field of systems engineering research by providing a quantifiable method for measuring systems engineering effort and therefore providing a foundation for CMMI process improvement.
References
Valerdi, R., Boehm, B., Reifer, D., “COSYSMO: A Constructive Systems Engineering Cost Model Coming Age.”
Proceedings of the 13th Annual International INCOSE Symposium
(Crystal City, VA, July 2003).
Boehm, B., Rieff, J., Thomas, G., Valerdi, R.., “COSYSMO Tutorial.” 13th Annual
INCOSE Symposium, (Crystal City, VA,
July 2003).
Biographies
Ricardo Valerdi
Ricardo is a Research Assistant at the Center for Software Engineering and a PhD student at the University of Southern California in the Industrial and Systems Engineering department. He earned his bachelor’s degree in Electrical Engineering from the University of San Diego and his master’s from the University of Southern California. Ricardo is currently a Member of the Technical Staff at the Aerospace Corporation in the Economic & Market Analysis Center. Previously, he worked as a Systems Engineer at Motorola and at General Instrument Corporation.
John E. Rieff
John is the Deputy Director of Systems Engineering for the Garland site of Raytheon's Intelligence and Information Systems business area. He is one of the co-authors of the Raytheon Enterprise Architecture Process (REAP). John has been employed by E-Systems (now Raytheon) from 1986 to the present. He was previously employed by Texas Instruments and Rockwell International. John received his Bachelor of Science degrees from Iowa State University and graduate and post-graduate degrees from Iowa State University, University of Texas and the University of Iowa.
Garry J. Roedler
Garry Roedler is a Principal Systems Engineer at Lockheed Martin Integrated Systems Solutions. He currently is the Engineering Process Manager in which he is responsible for strategic planning of technology needs, process technology development and infusion, and process maintenance and improvement of engineering processes. Garry has 23 years experience in engineering, measurement, and teaching and holds degrees in mathematics education and mechanical engineering from Temple University. Other work includes roles on the Corporate Systems Engineering Subcouncil; INCOSE Technical Board and Committees; INCOSE Delaware Valley Chapter; US Technical Advisory Group for ISO software and systems engineering process standards; and IEEE Standards Committee.
Marilee J. Wheaton
Marilee J. Wheaton is currently the General Manager of the Computer Systems Division at the Aerospace Corporation. She has a
B.A. in Mathematics from California Lutheran University and an M.S. in Systems Engineering from the University of Southern California. She is an Associate Fellow of the AIAA and a member of the AIAA Technical Committee on Economics. She is also a Fellow and Life Member of the Society of Women Engineers and a Past President of SWE's Los Angeles Section.