A DoE/RSM-based Strategy for an Efficient
Design Space Exploration targeted to CMPs
Gianluca Palermo, Cristina Silvano, Vittorio Zaccaria Politecnico di Milano -Dipartimento di Elettronica e Informazione
E-mail:{gpalermo, silvano, zaccaria}@elet.polimi.it
Abstract. Application-specific MPSoCs are usually designed by using a platform-based approach, where a wide range of customizable parameters must be tuned to find the best trade-offs in terms of the selected figures of merit (such as energy, delay and area). This optimization phase is called
Design Space Exploration (DSE) and it generally consists of a Multi-Objective Optimization(MOO) problem with multiple constraints. In this paper, an efficient DSE methodology for application-specific MP-SoC is presented. The methodology is efficient since it allows determining a suitable set of candidate architectures with as few system simulations as possible, combining Design of Experiments (DoEs) and Response Sur-face Modeling (RSM) strategies.1
1
Introduction
Customizable MPSoCs supported by parallel programming represent an emerg-ing computemerg-ing paradigm for application-specific processors. In fact, they rep-resent the best compromise in terms of a stable hardware platform which is software programmable, thus customizable, upgradeable and extensible. In this sense, the MPSoC paradigm minimizes the risk of missing the time-to-market deadline while allowing for greater efficiency due to architecture customization and software compilation techniques.
For these architectures, the platform-based design approach [1] is widely used to design application-specific architectures meeting time-to-market constraints. In this scenario, configurable simulation models are used to accurately tune the on-chip architectures and to meet the target application requirements in terms of performance, battery lifetime and area. TheDesign Space Exploration(DSE) phase is used to tune the configurable system parameters and it generally con-sists of a multi-objective optimization problem. The DSE problem concon-sists of exploring a large design space consisting of several parameters at system and micro-architectural levels. Although several heuristic techniques have been pro-posed to address this problem so far, they are all characterized by low efficiency to identify the Pareto front of feasible solutions. Evolutionary or sensitivity based algorithms are among the most notable, state-of-the art techniques [2–4].
2
An application specific DSE methodology
In this paper, we present an application-specific design space exploration strat-egy leveraging Design of Experiments (DoE) and Response Surface Modeling
1
(RSM) techniques. Once the objective functions associated to the system have been identified, the proposed methodology allows the efficient identification of an approximate Pareto sets of candidate architectures by evaluating as few sys-tem configurations as possible. This is a notable achievement, since, nowadays, evaluating the objective functionf(x) of a single system configurationx(being it either performance or power consumption) means hours or days of simulations under a realistic workload for complex SoCs.
DESIGN OF EXPERIMENTS.The term Design of Experiments (DoE) [5] is used to identify the planning of an information-gathering experimentation campaign where a set of variable parameters can be tuned. In this paper, we define anexperiment as an actual simulation of the target system. The reason for DoEs is that very often the designer is interested in the effects of some pa-rameter’s tuning on the system response. Design of experiments is a discipline that has very broad application across natural and social sciences and encom-passes a set of techniques whose main goal is the screening and analysis of the system behavior with a small number of simulations. Each DoE plan differs in terms of the layout of the selected design points in the design space. Although several design of experiments have been proposed in the literature so far, we used here the most traditional DoEs which we will leverage in the construction of our efficient design space exploration methodology:
– Random. In this case, design space configurations are picked up randomly by following a Probability Density Function (PDF). In our methodology, we will use a uniformly distributed PDF.
– Full factorial.In statistics, a factorial experiment [5] is an experiment whose design consists of two or more parameters, each with discrete possible values or ”levels”, and whose experimental units take on all possible combinations of these levels across all such parameters. Such an experiment allows studying the effects of each parameter on the response variable, as well as the effects of interactions between parameters on the response variable. In this paper, we consider a 2-level full factorial DoE, where the only levels considered are the minimum and maximum for each parameter.
– Central composite design.A Central Composite Design [5] is an experimental design specifically targeted to the construction of response surfaces of the second order (quadratic) without requiring a three-level factorial DoE. – Box-Behnken.The Box-Behnken design [5] is suitable for quadratic models
where parameter combinations are at the center of the edges of the process space plus a design with all the parameters at the center. The primary ad-vantage is that the parameter combinations avoid extreme values taken at the same time (in contrast with the central composite design).
RESPONSE SURFACE METHODS.Response Surface Modeling tech-niques allow determining an analytical dependence between several design pa-rameters and one or more response variables. The working principle of RSM is to use a set of simulations generated by DoE in order to obtain a response model. A typical RSM flow involves atraining phase, in which known data (ortraining set) is used to identify the RSM configuration, and aprediction phase in which the RSM is used to forecast unknown system response. RSMs are an effective tool for analytically predicting the behavior of the system platform without resorting to a system simulation; they represent the core of the presented methodology. The RSM models that used in the presented methodology are:
– Linear regression. Linear regression is a regression method that models a linear relationship between a dependent response function f and some in-dependent variablesxi, i = 1· · ·p plus a random term ε. In this work we
apply regression by taking into account also the interaction between the parameters as well as quadratic behavior with respect to a single parameter. – Shepard’s interpolation. The Shepard’s technique is a well known method for multivariate interpolation. This technique is also calledinverse distance weighting (IDW) method because the value of the response function in un-known points is the sum of the value of the response function in un-known points weighted with the inverse of the distance.
– Artificial Neural Networks Artificial neural networks (ANNs) [6] represent a powerful and flexible method for generalized non-linear regression.
The ANN approximation functionf is defined, recursively, as a function of other, linearly combined functionsfi:
f(x) =Θ X i
wifi(x)
!
(1)
The functionΘis called theactivation functionwhile the coefficientswi are
called weights. Functionsfi can be recursively defined as in Equation 1 in
order to create a layered structure.
– Radial Basis Functions Radial basis functions (RBF) are a widely used in-terpolation/approximation model [7]. The interpolation function is built on a set of training configurationsxj as follows:
f(x) =
n
X
j=1
λjφ(kx−xjk) (2)
whereφis a scalardistance function,λj are the weights of the RBF andn
is the number of samples in the training set.
THE PROPOSED DESIGN FLOW. The proposed strategy is called Response Surface-based Pareto Iterative Refinement (ReSPIR). It is based on the concept of iterative refinements of the approximate Pareto set by using predictions given by RSM model. The methodology is parametric in terms of DoE and RSM technique, as well as the maximum number of simulations to be run (see Algorithm 1).
Initially (step 2), the DoE plan is used to pick up the set of initial configu-rations corresponding to the plan of simulations to be run. This step provides an initial coarse view of the target design space, by running the simulations to obtain the actual measurements f associated with F0. In the successive steps,
F0represents the archive containing significant information about all the
archi-tectural configurations simulated so far.
At the first iteration, provided that themaxnsimvalue is greater than the DoE size, condition in step 5 is met and the while loop body is entered. The RSM technique (step 7) is thus trained with the current archiveF0. The response
surface model generates a prediction archiveR0 which is then filtered for Pareto
configurations in step 8. Successively (step 9) the simulations associated with the Pareto setR1are run; the result is put into the intermediate archiveF1 and the
Algorithm 1 The RSM-Supported Iterative Pareto Refinement Design Space Exploration Flow
Require: DOE, RSM, maxnsim 1: nsim= 0
2: Generate and run the simulations fromDOE. Updatensimaccordingly. Put results intoF0.
3: cov= 100% 4: F1 ={}
5: while(cov >0)∧(nsim < maxnsim)do 6: F0 =F0∪F1
7: TrainRSM with the content ofF0 and compute a predictionR0,∀x∈X 8: R1 =Ψ/(R0)
9: Generate and run the simulations associated with the configurations inR1. Up-datensimaccordingly. Put results intoF1.
10: cov=χ(F1, F0) 11: end while
12: return Ψ(F0) by pruning non-feasible configurations.
coverage value reaches 0 or the number of simulations has reached the maximum
maxnsim. In the case of reiteration, the freshly generated Pareto points in F1
are merged with F0 to improve the prediction accuracy of the RSM. Finally
(step 12),F0is Pareto filtered by pruning all the non-feasible configurations; the
resulting archive is the approximate solution to our Design Space Exploration problem.
3
Validation of ReSPIR
To validate the presented ReSPIR methodology, we applied it to the customiza-tion of a symmetric shared-memory multiprocessor architecture for the execucustomiza-tion of a set of standard benchmarks derived by the SPLASH-2. Also in this case, we focused our analysis on the architectural parameters listed in Table 1 which con-stitute a design space consisting of|X|= 217alternative configurations. To carry
out the system metrics evaluation (execution time and energy consumption), we leveraged the Sesc [8] simulation tool, a fast simulator for chip-multiprocessor architectures that is able to provide energy and performance results for a given application. Within Sesc, the energy consumption computation is supported by CACTI [9] and WATTCH models [10].
In order to give a fair comparison of ReSPIR other state-of-the-art heuristics, we introduce a Multi-Objective Simulated Annealing (MOSA) derived from [11] and a Multi-Objective Genetic Algorithm (NSGA-II) derived from [12]. Each of these heuristics is parametrized in terms ofvariables such as the initial popula-tion, the number of iteration steps, the set of permutation probabilities and other algorithm specific parameters. Generally, calibrating these parameters is a very difficult task which depends strongly on the problem domain; moreover, since the heuristics are inherently random2, each combination of heuristics variables
should be evaluated more than once in order to infer a more general trend. The number of runs for each algorithm is set such that the actual performance of the 2 this is true also for ReSPIR whenever we use a random DoE or a neural network
Table 1.Design space for the shared-memory multi-processor platform
Parameter Min. Max.
# Processors 2 16
Processor issue width. 1 8 L1 instruction cache size 2K 16K L1 data cache size 2K 16K L2 private cache size 32K 256K L1 instruction cache assoc. 1w 8w L1 data cache assoc. 1w 8w L2 private cache assoc. 1w 8w I/D/L2 block size 16 32
algorithm (in terms of ADRS, Average Distance from Reference Set) reaches an average asymptotic value. Practically speaking, this resulted into more than a hundred evaluations for each heuristic.
We underline that we are focused on obtaining a good approximation of the exact Pareto set (ADRS≤1%) by executing as few simulations as possible, i.e., by simulating less than 3.5% of the entire design space. As a consequence, each strategy has been run by considering an upper bound on the number of simula-tions which is 3.5% of the entire design space; the resulting Pareto front has been validated against the reference, exact Pareto front of the target architecture3.
Concerning ReSPIR, we focus on the overall performance by presenting a collapsed view for all the combinations of DoE and RSMs without breaking-out the actual algorithm performance for each DoE and RSM. This is due to the fact that what we ant demonstrate is the goodness of the presented ReSPIR exploration strategy and not the goodness of a particular DoE or RSM.
Figures 1(a), 1(b) and 1(c) show the average ADRS of the approximated Pareto fronts with respect to the exact Pareto, by varying the size of the design space analyzed. The figures show also the estimated ADRS standard deviation. We can note that, the MOSA algorithm is the worst heuristic in terms of average ADRS, starting from 18% for 1% of the design space and decreasing to 10% for 2.5% of the design space. The NSGA-II and ReSPIR reach, respectively, ∼5% and∼2.5% for the same percentage of design space. Also from the point of view of the standard deviation, the MOSA algorithm is running behind the NSGA-II reaching a 4% at the upper bound of the design space, where the NSGA-II and ReSPIR obtain around 0.5%.
4
Conclusions
In this paper, we presented ReSPIR a design space exploration methodology that leverages the traditional DoE paradigm and RSM techniques combined with a powerful way of considering customized application constraints. The design of experiments phase generates an initial plan of experiments which are used to create a coarse view of the target design space; then a set of response surface extraction techniques are used to identify non-feasible configurations and refine the Pareto configurations. This process is repeated iteratively until a target criterion, e.g. number of simulations, is satisfied.
3 The reference Pareto front has been computed with a full-search algorithm, thus it is the exact Pareto front.
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.01 0.015 0.02 0.025 0.03 0.035 0.04 ADRS
Percentage of design space analyzed "mean(ADRS)" "std(ADRS)" (a) MOSA 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.01 0.015 0.02 0.025 0.03 0.035 0.04 ADRS
Percentage of design space analyzed "mean(ADRS)" "std(ADRS)" (b) NSGA-II 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.01 0.015 0.02 0.025 0.03 0.035 0.04 ADRS
Percentage of design space analyzed "mean(ADRS)"
"std(ADRS)"
(c) ReSPIR Fig. 1.Average ADRS (with standard deviation) and percentage of the design space analyzed by (a) MOSA, (b) NSGA-II and (c) ReSPIR
References
1. K. Keutzer, S. Malik, A. R. Newton, J. Rabaey, and A. Sangiovanni-Vincentelli. System level design: Orthogonolization of concerns and platform-based design.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 19(12):1523–1543, December 2000.
2. Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. Multi-objective design space exploration of embedded system.Journal of Embedded Computing, 1(3):305– 316, 2006.
3. Giuseppe Ascia, Vincenzo Catania, Alessandro G. Di Nuovo, Maurizio Palesi, and Davide Patti. Efficient design space exploration for application specific systems-on-a-chip. Journal of Systems Architecture, 53(10):733–750, 2007.
4. Giovanni Beltrame, Dario Bruschi, Donatella Sciuto, and Cristina Silvano. Decision-theoretic exploration of multiprocessor platforms. In Proceedings of CODES+ISSS: International Conference on Hardware-Software Codesign and Sys-tem Synthesis, pages 205–210, 2006.
5. T. J. Santner, Williams B., and Notz W. The Design and Analysis of Computer Experiments. Springer-Verlag, 2003.
6. C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 2002.
7. M. J. D. Powell. The theory of radial basis functions. InAdvances in Numerical Analysis II: Wavelets, Subdivision, and Radial Basis Functions, W. Light (ed, pages 105–210. University Press, 1992.
8. Jose Renau, Basilio Fraguela, James Tuck, Wei Liu, Milos Prvulovic, Luis Ceze, Smruti Sarangi, Paul Sack, Karin Strauss, and Pablo Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net.
9. S. Wilton and N. Jouppi. CACTI:An Enhanced Cache Access and Cycle Time Model. volume 31, pages 677–688, 1996.
10. David Brooks, Vivek Tiwari, and Margaret Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In Proceedings ISCA 2000: International Symposium on Computer Architecture, pages 83–94, 2000.
11. Jaszkiewicz A. Czyak P. Pareto simulated annealing - a metaheuristic technique for multiple-objective combinatorial optimisation. Journal of Multi-Criteria Decision Analysis, (7):34–47, April 1998.
12. K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan. A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA-II. Proceedings of the Parallel Problem Solv-ing from Nature VI Conference, pages 849–858, 2000.