System Test - dtj v02 02 1990 pdf

There are two forms of system testing, directed and random. Most testing groups use directed tests, which test hardware or software features, or follow a strict test sequence. Directed tests seek specific results and are well defined .

The System Integration Group performed many directed tests on the Model 400 system. Some of these tests were done to satisfy the requirements of external regulatory agencies, or internal Digital development standards. Other directed tests include system DVT tests, which are discussed in more detail later in this paper.

Many aspects of complex systems cannot be ade quately tested i n a directed fashion . For example,

Digital Tecbnicaljournal Vol. 2 No. 2, Spring 1990

an operating system and a processor can operate in a nearly i nfinite number of states. It is impossible to design a series of tests to verify each of these states.

Random tests exercise the system in more com prehensive ways than directed tests. They do not seek specific results. I nstead, random tests attempt to push the system into as many different states as possible, as quickly as possible. Greater test cover age results from these tests, but problem diagnosis and isolation are more difficult.

Because random testing does not look for specific results, it is effective only if done for extended periods of time. Even if identical test scripts are run repeatedly, system activity becomes unpredictable over time, due to events such as network activity or disk fragmentation. This unpredictability is impor tant because it means more system states are being exercised.

The System Integration Group developed a ran dom test package, called the Systems I ntegration Test Package (SITP), to test the VAX 6000 Model 400 system. This package consists of a comprehensive collection of test programs and a script-driven mechanism that controls their execution. SITP is diverse and flexible. The test programs were obtained from many sources. The System Integra tion Group also wrote some test programs to exer cise specific aspects of the Model 400 system that were not fully exercised by other tests.

The test programs used with SITP are high level. Each high-level test uses many lower level functions within the system. Many of these programs are run together, with varying test parameters and run times. The programs are self-checking. If an action does not complete properly, the program nores the error immediately. The program does not attempt to identify the cause of an error; rather, it gathers as much information about the error as possible. This information is later examined by a test engineer.

SITP is easy to use, restarts automaticaUy after system crashes or power failures, and includes mon itoring tools. Periodic reports , with details about system activity and error log data, are generated by the test package. With this i nformation, the test engineer can gauge the effectiveness of the tests and adjust them as necessary. The test engineer can also control and monitor tests on many different machines. Machines can be located locally and remotely.

A number of SITP scripts were developed to provide different workloads for testing the Model 400 system. Each set of scripts emphasized a dif ferent type of system activity. Some were com-

pure intensive, some I/O intensive, and some stressed parallel and multiprocessing activity. The scripts were modified to suit system configurations as needed.

SITP and the test scripts were installed on all the Model 400 system prototypes in the system integra tion lab . Tests under the control of SITP were run on the prototypes as prototypes were avail:lble. Because the prototypes were heavily used during daytime hours for various debugging tasks, SITP

tests were run overnight and on weekends. Tht: test scripts were designed to run for a specific number of hours and thm stop. The prototype was then available for the next user. This procedure allowed otherwise idle prototype hours to be used in system testing and ensured a clean shutdown of the tests. In this way, test data could be retrieved without inter ference from other prototype users.

SITP was used on the earliest Model 400 system prototypes and was continually used throughout the qualification period. as prototype time was avai lable. Scripts wen: t:�ilored to cause test con centration in specific areas and were modified as necessary to suit various prototype configurations. Typical SlTP runs would last for 16 or 19 hours (overnight), or 58 or 60 hours (over weekends). Processor, memory, and 1/0 configurations varied from run to run, and depended on test needs and equipment availability.

The overall results from system testing were very positive. Over 6700 CPU hours were accumulated on various prototypes and configurations. Many errors were encountered during this period, but most were due to SITP bugs (SITP was still under development for most of this period) or to errors in setting up test scripts. Hardware errors occurred in peripheral devices, principally disks and com munications devices, and were corrected as they occurred.

Of the mort: serious problems found, one was a hardware problem that would cause a system hang. The problem was identified as a bug in a bus inter face chip on the CPU module, which was operating in an untested mode. It was resolved by modifying the system console to ensure that this mode was never used. An error was found in the VMS _machine check handler, which was corrected in a subse quent release of the VMS operating system.

Five other serious bugs were found in the new

CPU modules. Although none of these bugs were found by the System Integration Group's testing, each took time to invcstipte, resolve, and test the fixes. As a result, there was less time available on

protOtype machines for other testing. Two of these bugs were fixed by modifications to the CPU mod ule. The other three required changes to the proces sor chip. As corrected processor chips became available, SITP was used tO ensure the fixes had not introduced further bugs.

It is interesting to note that fou r of these five problems occurred in system areas not simulated during hardware design. Of these four , two occurred in the handling of external system events. One was in system reset handling. The other was in handling "control/P" interrupts. Conrrol/P is the standard method an operator uses to get the atten tion of the system console on VAX systems H Two bugs were caused by interactions between the new processor and other system components. These interactions were not simulated during hardware design. The fifth bug was not found during simula tion because of a deficiency in a simulation test tool.

Reliability Confidence Test

To accumulate uninterrupted run-time on the Model 400 system, five identically configured sys tems were set up in an isolated area. The machines were isolated to protect them from outside inter ference while rhe confidence test was running.

The purpose of this test was to derermine the actual reliability parameters of the Model 400 system hardware and to compare the results to the system's actual reliability requirements. A second ary goal of the test was to determine the long-term system reliability, both for the hardware and oper ating system software.

The duration of the test was planned for SL'X weeks, which was sufficient to show the hardware reliability. Once this six-week period was over, we planned to continue to run the machines in the same environment with the same workload for as long as possible to accumulate further system run-time.

The test started at the beginning of May 1989, when enough CPU modules became available to populate the five machines. The formal test period ended two months later, in late June. Three of the machines continued to run for two and a half months, until mid-September. Of the other two machines, one machine was needed for other pur poses, and another's CPU modules were removed to change the configurations of the remaining three.

The systems ran identical SITP scripts that con centrated on exercising the new CPUs. Tests included compute-intensive programs, programs that explicitly tested various aspects of the new

In document dtj v02 02 1990 pdf (Page 77-79)