8. CONCLUSIONS AND FUTURE WORK
8.1 Conclusions
A multifrequency methodology for generating purely asynchronous designs, as well as, designs with asynchronous and synchronous blocks using synchronous computer aided design(CAD) tools and flows using a unifying timing representation called relative timing (RT) are presented in this dissertation. The methodology is divided into two parts: the first part addresses the characterization of asynchronous design templates and describes the generation of constraints required to enable the use of synchronous CAD tools and flows on these designs. The second part uses the constraints to generate a working asynchronous/multifrequency circuit using the synchronous CAD tools. Custom algorithms required to enable the asynchronous template characterization with its automation are also presented.
The existing design flow for asynchronous circuits consists of good synthesis algorithms, but addition of reset to these circuits is an important manual step that needs to be automated. This work contibutes an algorithm to add a reset signal based on power/performance optimization. The theory of finding reset addition candidates based on the topology of the circuit, with and without the inputs of the circuit being defined is derived. Details about optimizations based on logical effort to reduce the impact of reset addition in terms of power and performance penalty for any design is also presented. The benefits of this algorithm are justified by a 21 percent and 24 percent reduction in area and energy/token, respectively, for the asynchronous FIFO controllers as compared to reset addition done by Petrify. A 14 percent and a 12 percent average reduction in area and energy/token is seen for the three asynchronous benchmark circuits.
The key to deriving the maximum benefits from the synchronous CAD tools and flows is to use their timing driven optimization and sizing algorithms. Because the asynchronous
circuits are cyclic and sequential circuits, they have to be represented as directed acyclic graphs(DAGs) to apply these algorithms. This work presents an algorithm that automatically generates the cycle cut constraints to represent the timing graphs of asynchronous circuits as DAGs. The algorithm preserves the timing paths from being cut, thus enabling the use of static timing analysis (STA), timing driven optimization and sizing algorithms of the synchronous CAD tools to optimize asynchronous circuits. This algorithm preserves the timing paths, thus guaranteeing the applicability of the RT constraints to derive a functioning circuit. It also allows full control over the delays required to generate the best asynchronous circuits. The circuits generated by using this cycle cutting algorithm are 1/3 the size, consume 1/3 the energy and are 50 percent faster that those derived by the synchronous CAD tools.
The characterization of asynchronous circuits and the constraint generation is applied to derive a set of asynchronous templates. A family of 4-phase handshake protocols with data valid at the falling edge of request, also known as late data validity protocols are characterized for area, latency and energy. The tabulated results for the late data validity protocols enable the quick selection of the best asynchronous handshake protocols for any application.
General templates for steering data and control information were identified and circuit structures were implemented to characterize them. A simple toy example and four different FIFO structures were generated and automated for characterizing these templates. Detailed comparison of results for area, latency and energy of these designs is presented. These automated flows for characterization and benchmarking new circuit templates assists in quick development and comparison of new designs, thus facilitating in the design, analysis and optimization required for deriving better circuits.
The application of the novel RT based methodology to large and complex designs is shown in a few case studies. A 225k gate 64-point FFT circuit is designed and compared to a synchronous equivalent. The benefit of asynchronous designs for multifrequency applications is demonstrated with the asynchronous FFT circuit showing a benefit of 2.4×, 2.4× and 3.2× in terms of area, energy and throughput, respectively, over its synchronous counterpart.
164
The applicability of this methodology on design combination of both synchronous and asynchronous circuits is explored. A subset of the open core protocol (OCP) is implemented and the domain interface (DI) circuit concept is extended to circuits consisting of asynchronous circuit blocks. Detailed circuit implementations of the DI for asynchronous to synchronous domain crossing and vice versa are developed, and these designs are compared against the purely synchronous, asynchronous, and synchronous design with two different asynchronous clock domains. The purely asynchronous design has 3× the performance and approximately 1/9 the energy of the clocked design. The GALS design also demonstrated almost 4× the throughput at less than 1/5 the energy per transaction.
The utility and impact of this research work can be summed up as follows: This methodology enables the industry to transition from purely synchronous design approaches to asynchronous designs by exploring various asynchronous circuit design styles. It also allows the circuit designers to choose the best circuit solution for any specification. Thus, this work not only allows designers to create better designs, but it also opens up a host of optimization and algorithmic approaches that can be explored.