HPGAST: High Performance GA-based Sequential circuits Test generation on Beowulf PC-Cluster

(1)

HPGAST: High Performance GA-based Sequential circuits

Test generation on Beowulf PC-Cluster

Tepakorn Siriwan Pradondet Nilagupta Department of Computer Engineering, Kasetsart University 50 Pahonyothin Rd. Lardyao JatujakBangkok 10900 Thailand

Phone (+662)9428555 ext. 1403,1404 Fax.(+662)5796245 [email protected], [email protected]

Abstract

This paper deals with a High Performance Automated Test Pattern Generation for sequential circuits on single stuck-at fault model. HPGAST Parallel Genetic Algorithm on Beowulf PC-Cluster is presented. In this work, we describe a parallel version of an existing GA-based ATPG and tools: (PGAPack parallel genetic algorithm library to evolve candidate test vectors, HOPE fault simulator to compute the fitness of each candidate test vectors). The HPGAST uses the ISCAS89 benchmark are testing circuits, running on PIRUN, 72 nodes PC-Cluster. The experimental results show high fault coverage for 0.2 and 0.3-mutation probability. When increasing number of processors from 2, 4, 8, 16 and 32 respectively then the speedup increase, however when increasing processors from 32 to 64 the speedup mostly decrease. The speedup of the larger circuits has been improved over the smaller circuits.

1. Introduction

The objective of Automated Test Pattern Generation (ATPG) is to find a test sequence that, when applied to the circuit, enable testers to distinguish between the correct circuit and any circuit with a model fault. The test sequence’s effectiveness is measure by the fault coverage achieved for that fault model and the number of generated vectors, which is directly proportional to test application time. ATPG for combinational circuits is relatively easy. In these circuits, all inputs of the combinational part of the circuits (primary input and state variable) can be assigned arbitrary values, and the fault effect is observable on any output (circuit outputs and state variables). Test generation for sequential circuits is more complex because you cannot directly control or observe state lines. Testing a sequential circuit using a simulation-based technique are more easily handled complex component types than deterministic technique. In a simulation-based approach, fault simulator is processing in the forward direction only, no backtracking is required and various fault models can be accommodated.

The basic principles of GA were first laid down by Holland [1]. Our goal in this work is to use simulation-based test generation implemented on a framework of genetic algorithm described by Goldberg [2]. The GA contains a population of strings or individual, in which each individual is a candidate solution, each individual is represented as a string

(chromosome) of elements (genes). A fitness value is assigned to each individual, based on

the value given by a fitness function. The population is initialized with random strings. The evolutionary process of selection, crossover, and mutation are use to generate an entirely new population from the existing population.

(2)

Parallel GA’s are particularly easy to implement and promise substantial gains in performance as such there has been extensive research in this field. The simple method to parallel GA’s is to do a global parallelization, only one population as in the serial GA, but the evaluation of individuals and the genetic operators are parallel explicitly. This method is relatively easy to implement and a significant speedup can be expected if the communications cost does not dominate the computation cost [3]. On a distributed computer, the population can be stored in one processor. This “master” processor would be responsible for sending the individuals to the other processors (the “slave”) for evaluation, collecting the result, and applying the genetic operators to produce the next generation.

Beowulf Cluster was first laid down by Beowulf Project at NASA [4,5]. Beowulf PC-Cluster is consist of Linux PC clustering, which is the building of large supercomputing class system from PC and Linux operating system, is now one of the widely adopted system among high performance computing research communities. In this paper, we propose prototypes named HPGAST High Performance GA-based Sequential circuit Test generation on Beowulf PC-Cluster. Our work uses PGAPack [6] a parallel genetic algorithm library to evolve candidate test vectors, HOPE [7] faults simulator to compute the fitness of each candidate test vector, and Beowulf PC-cluster to improve the speedup of an ATPG system.

In the next section, we begin with the related work in sequential and parallel genetic algorithms for sequential circuit ATPG. Next design and implementation of HPGAST is described. Then experimental results of application for the ISCAS89 [8] sequential benchmark circuit we gathered. Finally conclusions are presented.

2. Related work

GA was first used as a framework for simulation-based test generation in [9,10]. The CRIS test generator [9] use a logic simulator to evaluate candidate test sequence and heuristic crossover scheme to conduct problem-specific knowledge. The result test sets generated often had lower faulted coverage. In a more new version of CRIS [11], fault simulation was used in the evaluation of candidate tests after the easy-to-test faults were detected. Fault coverage improved for many circuits, however execution time also increased.

GATEST is genetic algorithm framework for sequential circuit test generation [12,13]. GATEST is organized in two parts; in the first, single test vectors are generated by the GA, which are able to increase the value of the already generated test sequence; in the second part, the GA generates test sequence. Various GA parameters are studied, including alphabet size, fitness function, generation gap, population size, and mutation rate. The best results were obtained using selection scheme was tournament selection without replacement and uniform crossover. The recommend using a population size of 16 or 32 to reduce the execution time. Non-overlapping populations gave the highest fault coverage.

DIGATE is organized in three-phase [14]. The first phase selects a target fault as the one with the maximum activity so far, the second phase aims at activating the target fault, and the third phase looks at a sequence able to make the target fault observable at the circuit Primary Outputs. The main innovation in DIGATE is the pre-computed distinguishing sequence, which propagates a fault effect from a single flip-flop to the POs.

GATTO is GA-based test generation for large sequential circuits [15,16]. GATTO Targeted a single fault at a time, and the approach was extended to allow for targeting 64 faults simultaneously [16]. The fitness function defined similar to CRIS, however the meanings of the three phases are different; moreover it optimizes the whole test sequence.

(3)

GATTO+ is an enhancement version of GATTO in term of test length minimization and fault excitation [17].

An Application of Parallel Genetic algorithms to sequential circuit test generation has first developed in the distributed algorithm GATTO [18]. GATTO is based on genetic algorithm, which use the computational power of a workstation network. GATTO implements the distributed genetic algorithm using the PVM library for implementing message passing and process spawning. A master process is in charge of executing the kernel of the overall algorithm, while a slave process can be activated on a remote workstation each time the fault simulation of a sequence is required. Several fault simulation processes thus work in parallel in many phase of the algorithm, while communications and synchronization points are reduced. Scalability is good for a small number of slaves. For a large number of slaves, scalability is poor. Since the master become a bottleneck and the slaves are often idle.

ProperGATEST consists of three parallel genetic algorithms [19] using the ProperCAD II library [20]. The first algorithm is a parallel version of the sequential algorithm that produces the same result as the sequential algorithm. The second algorithm uses a parallel search strategy where each processor executes the sequential genetic algorithm with a different seed, and use migration to share information between processors. The third algorithm is a sub-population based version of second algorithm, where sub-sub-populations are distributed across processor and information is migrated from one processor to another. The result of the first algorithm provided significant speedup without degradation in the quality of the result. The second algorithm has improved the quality of the results and is a highly scalable implementation. The third algorithm reduces the workload among the processors and by exploiting the benefits of the randomized migration strategy used.

2.1 HOPE Modification

HOPE is a fault simulator for synchronous sequential circuit [7]. It was developed in the Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute & State University. It employs the parallel fault simulation technique and employs several heuristics to reduce the parallel fault simulation time. HOPE is based on an earlier fault simulator called PROOFS, which improves three new techniques that substantially speed up parallel fault simulation. The first is a reduction of faults simulated in parallel through mapping non-stem faults to stem faults, The second is a new fault injection method, The third is a combination of a static fault ordering method and a dynamic fault ordering method.

We use HOPE to evaluate the fitness of each candidate test. During fault simulation, the good and faulty circuit states are update after each vector is simulated then we store, keep and restore before and after each test is applied. We keep the numbers of fault detect of each candidate test, count of the number of faults which effects to flip-flops.

2.2 PGAPack

PGAPack is a public-domain software package developed by researchers in the MCS Division at Argonne National Laboratory [6]. PGAPack is a parallel genetic algorithm library that is intended to provide most capabilities desired in a genetic algorithm package, in an integrated, seamless, and portable manner supports parallel and sequential implementations of the single population global model (GM) based on MPI message passing protocol. The parallel implementation uses a master/slave algorithm in the master processes, executes all steps of the genetic algorithm except the function evaluations. The slave processes execute the function evaluations.

(4)

The parallel implementation of the GM produces the same result as the sequential implementation, usually faster. If two processes are used, both the master process and the slave process will compute the function evaluations. If more than two processes are used, the master is responsible for bookkeeping only, and the slaves for executing the function evaluation. The number of function evaluations that can be executed in parallel limits the speedup. This number depends on the population size and the number of new strings that created in each generation.

2.3 PIRUN Beowulf PC-Cluster [21,22]

We implemented our ATPG parallel on PIRUN (Pile of Inexpensive and Redundant Universal Nodes) Beowulf PC-Cluster, which belongs to SPMD class of parallel computer. Both of NFS and message-passing traffics can be configured to pass through these full duplex routes in a convenient way depending on the application which is running on.

The PIRUN can be categorized into three main type of nodes as follows:

• CSN (Computing Service Nodes) : nodes that users log on to do their works. CSN is composed of 72-diskless nodes which are 500 MHz Pentium III with 128 MB of memory.

• FSN (File Server Nodes) : serve as a central file system for CSN. There are 3 FSNs. Each is composed of 500 MHz Pentium III Xeon with 512 MB of memory. Each is equipped with 54 GB of Ultra2 SCSI Harddisks with RAID (6x9GB). FSN has total 162 GB of disk space.

• SMN (System Management Node) : SMN is a 500 MHz Pentium III, the same as CSN but it has local hard disk and is used for management purpose.

PIRUN interconnected by full duplex 100 MBPS Ethernet switch as a message passing network and 100 MBPS Ethernet hub as NFS, Linux Red Hat 6.1 is used as an operating system and MPICH1.1.2 to parallel programming support.

3. Design and Implementation

Considering a functionality of sequential genetic algorithm model based ATPG system we can follow master-slave-programming model to implement parallel genetic algorithm. We use PGAPack parallel genetic algorithm library to generate candidate test vector and uses HOPE a synchronous sequential circuit fault simulator for fault simulation.

3.1 The Genetic Algorithm for ATPG Begin Fitness State yes

……..

progress no End

Figure 1: Individual Test Vector Figure 2: HPGAST system model

Master

HOPE _HOPE _HOPE

GA Add vector

HOPE

(5)

HPGAST test generation is illustrated in Figure 1 and 2. Test vectors are repeatedly generated until no more progress in made. Each test vector is generated by HPGAST with a random initial population. The HOPE sequential circuit fault simulator is used in evaluating the fitness of each candidate test vector, and the best vector evolved in any generation is selected.

Our work interested in speedup of execution in parallel by spawn fault simulator to each slave processor. We send candidate test vector to each slave processor to execute in parallel.

The computational powers provide by a Beowulf PC-cluster by distributing the fault simulation task among the available processor. A master process is handle genetic operation and overall algorithm, while a slave process can be activated each time the fault simulation process. Thus communication and synchronization time are reduce.

The master process does the following tasks:

• All I/O operation files system, read netlist and stores the generated test sequence. • Spawns several slave processes.

• Initial distributes a copy of internal format of the netlist and fault list to each slave process.

• Start to execute the algorithm, send individual to slaves processor, wait for fitness function and update the global data model.

The slave processes do a fault simulator, Both netlist and faultlist are stored in a local memory of each slave processors. The slaves compute fitness values, return the result to the master, and wait for a new job.

3.2 Problem encoding

A binary encoding is used for generate individual test vector, each character of a chromosome in the population is mapped to a primary input as shown in Figure 3.

GA Individual 1 0 1 0

Figure 3: Problem Encoding

3.3 Fitness function

The fitness of candidate test is calculated using fitness function from GATEST [12,13] as follows.

Phase 1: total flip flops set + fraction of flip flops changed (1)

Phase 2: #fault detected + fault propagated to flip flops (2)

(#fault) (#flip flops)

Phase 3: #fault detected + fault propagated to flip flops + 2(# good and faulty circuit) (3)

(#fault) (#flip flops) (#fault) (#circuit node)

Sequential Circuit

(6)

The objective of Phase 1 is to initialize the flip-flops. Therefore, the fitness of a candidate vector is a measure of the number of flip-flops set to know (zero or one) state. We also include the fraction of flip-flops changing values since the previous time frame, in addition to different test vectors that cause the same number of flip-flops to be set.

Test generator switches to Phase 2 when all flip-flops are set. In this phase, Test vectors are generated to maximize the number of fault detected. The fitness of candidate test vector indicates number of faults it detects. To differentiate vector that detect the same number of faults, we include the number of fault effect propagated to flip-flops in the fitness function, and offset by number of fault simulated and number of flip-flops.

When a test vector can not detects additional fault, the test generator switches to Phase 3

and begins count the noncontributing test vector. The objective of this phase is to find hard detect fault. We add the good and faulty circuit activity levels to the other two measures used in Phase 2. If the test vector is found that detects any faults before the number of noncontributing vectors generate reaches the progress limit, the test generator goes back to Phase 2, and the noncontributing vector count is reset to zero.

3.4 GA parameters

Various GA parameters are important in achieving good results. Given a sufficient population size and number of generations, The test vector can be found, however execution time is directly proportional to both parameters. We generate test vector with random seed for GA, population sizes of 32, maximum number of 600 generations, tournament selection without replacement and uniform crossover as a default value. We use a crossover probability of one; i.e., two individuals are always crossed in generating two new individuals. Mutation is used to prevent the loss of key characters at the various string positions. We various mutation probabilities for 0.1 to 1.0 respectively for find the best mutation probability of the circuits.

4. Experimental Results

HPGAST was implemented around the HOPE sequential circuit fault simulator [7] and PGAPack parallel genetic algorithm library [6]. Tests were generated for the ISCAS89 [8] sequential benchmark circuit on PIRUN Beowulf PC-Cluster [21,22].

The single processor experiment, We use HPGAST generate test pattern in single processor. We fixed number of generation to 600 to limit the execution time, mark the last detected vector to the test sequence length, and run five times per circuits. The experiment results in Table 1 are average of five runs, and a new random seed for GA was used for each run.

The effects of mutation probability on fault coverage were also investigated. Results are shown in Table 1 averaged over five runs for various mutation tares used during test generation. Tournament selection without replacement and uniform crossover was used. Faults detect show the highest fault coverage for 0.2-mutation probability for S386 and S526, and 0.3-mutation probability for S298 and S641.

Table 2 shows characteristic of ISCAS89 benchmark circuit and results of HPGAST. The number of PIs and Gate shown exclude POs and fan-outs. The numbers of testable faults are taken from [23]. PIs are number of primary input of circuits, Gate is number of gates in circuit, Seq. Depth is depth of the circuit from fault simulator, and Faults are number of collapsed faults. The number of faults detected and vector length of the GATEST [13] for comparison, and the highest fault coverage achieved are highlighted in bold.

(7)

Table 1: Various Mutation Probability Results Mutation Probability Circuit S298 S386 S526 S641 #Fault Detect 243 282 426 400 0.1 Vector Length 129 232 470 87 #Fault Detect 256 300 427 400 0.2 Vector Length 321 213 456 124 #Fault Detect 264 296 379 404 0.3 Vector Length 478 176 526 124 #Fault Detect 263 291 346 400 0.4 Vector Length 263 248 482 102 #Fault Detect 260 297 421 399 0.5 Vector Length 157 383 505 125 #Fault Detect 248 292 72 398 0.6 Vector Length 121 342 35 126 #Fault Detect 255 292 66 399 0.7 Vector Length 497 361 30 91 #Fault Detect 251 281 73 397 0.8 Vector Length 494 267 135 115 #Fault Detect 260 289 68 397 0.9 Vector Length 135 421 489 129 #Fault Detect 205 167 80 288 1.0 Vector Length 257 19 490 24

Table 2: HPGAST Results

GATEST HPGAST

Circuit PI’s Gate Seq.

Depth Faults #Faults Detect Vector Length #Faults Detect Vector Length S298 3 119 9 308 264 161 264 478 S386 7 159 11 384 295 154 300 213 S526 3 193 9 555 416 281 427 456 S641 35 379 74 467 404 139 404 124

The parallel processor experiment, we increasing number of processors from 2, 4, 8, 16, 32 and 64 processors respectively. We fixed number of generation to 600 to compare the execution time in parallel processing. The experimental results in Table 3 show average execution time in minute of five runs.

The experimental results of test sequence in fault coverage and test vector length are mostly same to the experiment results of single processor because the parallel implementation of the GM will produce the same result as the sequential implementation.

Table 3 shows the experiment result of HPGAST for ISCAS89 sequential benchmark circuit on a parallel processing. When increasing processors from 2 to 4, 8, 16 and 32 processors respectively, the execution times decrease. When increasing processors from 32 to 64 processors the execution times mostly increase.

Table 3: HPGAST parallel execution time

Execute time (minute) VS Number of Processor Circuit 1 2 4 8 16 32 64 S298 9.47 4.70 4.70 2.02 1.43 1.07 1.20 S386 3.53 1.88 1.80 0.95 0.72 0.33 0.33 S526 21.73 18.53 9.30 3.68 2.65 1.73 1.53 S641 31.55 19.32 7.42 3.12 2.17 1.28 1.33

(8)

The speedup of execution time is calculated from.

Speedup = Sequential execution time (4)

Parallel execution time

Table 4: Speedup of HPGAST

Speedup VS Number of Processor Circuit 1 2 4 8 16 32 64 S298 1 2.01 2.01 4.69 6.61 8.87 7.89 S386 1 1.88 1.96 3.72 4.93 10.61 10.61 S526 1 1.17 2.34 5.90 8.20 12.54 14.18 S641 1 1.63 4.25 10.12 14.56 24.59 23.67

Table 4 shows speedup result of HPGAST of ISCAS89 benchmark circuits form 2, 4, 8, 16, 32 and 64 processors respectively. For this parallel result, the fault detected and vector lengths are mostly the same but the speedup increase.

When increasing number of processors from 2, 4, 8, 16 and 32 respectively then the speedup increase, however when increasing processors from 32 to 64 the speedup mostly decrease. The speedup for benchmark circuits from 2 to 64 processors is shown in Figure 4. The speedup of the larger circuits has been improved over the smaller circuits.

Figure 4: Speedup of HPGAST

5. Conclusions

The HPGAST test generator was developed for a sequential circuit test generation on PIRUN Beowulf PC-cluster environment.

On single processor experiment, HPGAST can generate high fault coverage for 0.2 and 0.3-mutation probability in the sample of ISCAS89 sequential benchmark circuits. On parallel processor experiment, when increasing number of processors from 2, 4, 8, 16 and 32 respectively the speedup increase, however when increasing processors from 32 to 64 the

0 5 10 15 20 25 30 0 10 20 30 40 50 60 70 Number of Processors Speedups S298 S386 S526 S641

(9)

speedup mostly decrease. The speedup for benchmark circuits from 2 to 64 processors showing the speedup of the larger circuits has been improved over the smaller circuits.

In the future, there are many issues that we will be continue to work hard such as the ATPG performance in both of fault coverage, test vector length and execution time. We will propose these and other results in more detail soon.

References

[1] J.H. Holland, “Adaptation in Natural and Artificial Systems”, MIT Press, 1975.

[2] D.E. Goldberg, “Genetic Algorithms in Search, Optimization, and Machine Learning”, Addison-Wesley, 1989.

[3] Erick Cantu-Paz, “A Survey of Parallel Genetic Algorithms”, IlliGAL Report, 1997. [4] T. Sterling, D. J. Becker, D. Savarese, J. E. Dorband, U. A. Ranawake, and C.E. Paker, “Beowulf: A Parallel Work Station for Scientific Computation”, Proc. ICPP, 1995.

[5] D. Ridge, T. Stering, D. J. Becker, and P. Merkey, “Beowulf: A Parallel Work Station for Scientific Computation”, Proc. of IEEE Aerospace 1997,1997.

[6] D. Levine, “User Guide to the PGAPack Parallel Genetic Algorithm Library”, Argonne National Laboratory, Jan. 1996.

[7] H. K. Lee, and D.S. Ha, “HOPE:An Efficient Parallel Fault Simulator for Synchronous Sequential Circuits”, IEEE Trans. Computer Aided Design of Integrated Circuits and System, Sep. 1996, 1048-1058.

[8] F. Brglez, D. Bryan, and K. Kozminski, “Combinational profiles of sequential benchmark circuits”, Proc. Int. Symp. Circuit System, May 1989, 1929-1934.

[9] D. G. Saab, Y.G. Saab, and J. A. Abraham, “CRIS: A test cultivation program for sequential VLSI circuits”,Proc. Int. Conf. Computer Aided Design, Nov. 1992, 216-219. [10] M. Srinivas and L. M. Patnaik, “A simulation-based test generation scheme using genetic algorithms”, Proc. Int. Conf. VLSI Design, Jan. 1993, 132-135.

[11] D. G. Saab, Y.G. Saab, and J. A. Abraham, “Automatic test vector cultivation for sequential VLSI circuits using genetic algorithms,” in IEEE Trans. Computer-Aided Design, vol.15, Oct. 1996, 1278-1285.

[12] E. M. Rudnick, J. H. Patel, G. S. Greenstein, and T. M. Niermann, “Sequential Circuit Test Generation in a Genetic Algorithm Framework”, Proc. ACM/IEEE Design Automation Conf., Jun. 1994, 698-704.

(10)

[13] E. M. Rudnick, J. H. Patel, G. S. Greenstein, and T. M. Niermann, “a Genetic Algorithm Framework for Test Generation”, IEEE Trans. in Computer-Aided Design of Integrated Circuits and System, Sep. 1997, 1034-1044.

[14] M. S. Hsiao, E. M. Rudnick, and J. H. Patel, “Automatic Test Generation using Genetically-Engineered Distinguishing Sequences”, Proc. IEEE VLSI Test Symp., 1996, 216-223.

[15] P. Prinetto, M. Rebaudengo, and M. Sonza Reorda, “An automatic test pattern generator for large sequential circuits based on genetic algorithms”, Proc. Int. Test Conf., Oct. 1994, 240-249.

[16] F. Corno, P. Prinetto, M. Rebaudengo, and M. Sonza Reorda, “GATTO: A genetic algorithm for automatic test pattern generation for large synchronous sequential circuits”, IEEE Trans. Computer-Aided Design,vol. 15, Aug. 1996, 991-1000.

[17] F. Corno, P. Prinetto, M. Rebaudengo, M. S. Reorda, and R. Mosca, “Advanced Techniques for GA-based sequential ATPGs”, IEEE Design & Test Conf., Mar. 1996.

[18] P. Prinetto, M. Rebaudengo, M.S. Reorda, and E. Veiluva, “GATTO: an Intelligent Tool for Automatic Test Pattern Generation For Digital Circuits”, IEEE Int. Conf. on Tools with Artificial Intelligence, Nov. 1994.

[19] D. Krishnaswamy, M. S. Hsiao, V. Saxena, E.M. Rudnick, J.H.Patel, and P. Banerjee, “Parallel Genetic Algorithms for Simulation-Based Sequential Circuit Test Generation”, IEEE VLSI Design Conf., 1997, 475-481.

[20] S. Parkes, J. A. Chandy, and P. Banerjee, “A library-based approach to portable, parallel, object-oriented programming: Interface, implementation and application”, Proc. Supercomputing’94, 1994, 69-78.

[21] P. Uthayopas, S. Sanguanpong, and Y. Poovarawan, “Building a Large Beowulf Cluster System: PIRUN Experience”, Proc. of the 4th ANSCSE, Mar. 2000.

[22] P. Uthayopas, S. Sanguanpong, and Y. Poovarawan, “Building a Large Scale Internet Superserver for Academic Services with Linux Cluster Technology", International Workshop on Asia Pacific Advanced Network and Its Application (IWS-2000), Tsukuba, Japan, Feb. 2000, 65-71.

[23] J.A. Waicukauski, P.A. Shupe, D. J. Giramma, and A. Matin, “ATPG for ultra-large structured designs”, Proc. Int. Test Conf., Sep. 1990, 44-51.