Collecting and Retaining Data
4.3 Retaining Data
Data is like garbage. You better be sure what you’re going to do with it before you collect it.
Mark Twain (Apocryphal?)
Retaining data inherently involves creating and using one or more databases to organize and save the data for later use. Depending on the nature of your measurement activities, this may be a reasonably simple task or a very complex and technically demanding one. In either case, it is important to give serious consideration to the data retention system that will
be employed. For example, while hard-copy forms may suffice for some data collection purposes, experience has shown that paper forms are often inadequate for retaining and aggregating measured results.
A personal computer database system and a full-functioned spreadsheet program may be sufficient for retaining and analyzing data for many processes. However, the size and complexity of the retention system will increase significantly if there is a need to support multiple projects or multiple organizations, or if you are using the measurement results for multiple purposes. The length of time you must retain the data can also influence your choice of a database system. For example, data for process management will often be retained and used well beyond the duration of individual projects.
A project management database, if it exists, may well serve as the basis for retaining process measurements for process management. This is something to be considered seriously before undertaking to develop a separate process management database, as there are often many overlaps between data collected for managing projects and data collected for managing processes.
You should consider the issues listed below when planning a process management database.
Database Planning Issues
Measurement Definitions
• The desire to standardize measures for data retention and analysis may conflict with software process tailoring or with the legitimate needs of projects to define and collect data in ways that address issues that are important to them. It may be unwise or impossible (or require added effort) to insist that all projects use the same measurement definitions.
• One alternative to standardizing measurement definitions is to permit freedom of definition (perhaps within prescribed limits), but to require standardized reporting via standardized formats for the definitions of the measures and measurement processes used.
Multiple Databases
• Do differing user needs, responsibilities, or levels of management require separate databases? If the answer is yes, a monolithic “software process database” is an unlikely choice, and the following issues arise:
- How many databases?
- Who will operate them, and where?
- How will the databases be coordinated? (This involves addressing issues of concurrency, consistency, and propagation of data corrections and updates.)
Database Design Goals (Recommendations)
• Capture and retain definitions and context descriptions, not just direct measurement data.
• Tie measured values to measurement definitions, rules, practices, and tools used.
• Tie measured values to the entities and attributes measured.
• Tie measured values to the contexts and environments in which they were collected (product, environment, and process descriptors; process and project status; time and place measured; method of measurement; and so forth).
• Accommodate process tailoring (by recording descriptions of process specializations, tailorings, and other differences among processes).
• Accommodate evolving measurement definitions and process descriptions.
• Address linking to, accessing, and coordinating with other databases, such as those used for time and cost reporting, cost estimating, configuration management, quality assurance, personnel, and so forth.
• Avoid storing indirect measures (such as defect densities and rates of change) that can be computed by users from directly measured results. There are three reasons for this advice:
- Storing computed values introduces redundancies in databases that are difficult to keep synchronized. When the underlying data change, indirect measures may not get recomputed.
- If only the results of computations are stored, essential information easily becomes lost.
- Other people may want to compute alternative indirect measures or compute them differently, so the underlying values will need to be retained anyway.
Logistical and Timeline Issues
• What are the media and mechanisms for moving data from the point of measurement to the database?
• How fast is the process from measurement to data entry? Will the data be timely and up to date?
• What are the provisions for coordinating the database with automated measurement tools? Can these provisions be automated?
Rules and Policy Issues
• What are your privacy objectives?
• What are your proprietary data objectives?
• What are your data access objectives?
• What are your retention objectives?
• What are your archiving objectives?
Database Operation Issues
Once you have settled the database planning issues, you should document your operational procedures in detail. This includes identifying
• who will enter and maintain the data
• who can access the data
• levels of access—for example, you may not want certain financial data to be available to everyone who has access to staff-hour time records.
• where the data will be retained
• the tools you will use, including the editing and retrieval mechanisms
Database Management Issues
Some additional database management issues that you should address are listed below.
None of these issues are unique to software process management. We provide the list here to serve as a checklist and a reminder that the methods you use to retain and access data play a significant role in the success of any measurement activity.
Operating the Database - responsibilities
- practices & tools for preventing simultaneous editing
- practices & tools for preventing contamination & corruption of previously verified data - tools & training to support
browsing, searching, retrieval, and display
- backup practices (frequency & off-site storage)
- funding - training Access
- Who is permitted to enter or change data?
- Who is permitted to access data?
- Who grants authority to access or change data?
- Who enforces access and change authority?
- What tools & practices will be used to support controlled access?
Entering Data - responsibilities
- coordinating data entry with data verification
- backup practices (frequency and off-site storage)
- backup practices (frequency & off-site storage)
- Who is responsible?
- funding
Privacy (e.g., personnel data &
personal performance data) - Is there a need to isolate protected
data from public data?
- Who grants authority to access protected data?
- Who enforces access rules?
- What tools & practices will be used to protect privacy?
Protecting Proprietary Data
- Who designates proprietary data?
- How is proprietary data identified?
- What tools & practices are used to protect proprietary information?
- Who can authorize access to or use of proprietary data?
- What are the ground rules for authorizing access?
Security (provisions for handling classified information)
- Is there a need for security procedures?
- Is multilevel security a requirement?
- What tools & practices will be used to protect security?
System Design - hardware selection - software selection
- database design (structure)
- communications among databases - maintenance
- evolution
- operational support - funding
- training