Timing-Driven Partitioning - PHYSICAL DESIGN FLOW

8.1 PHYSICAL DESIGN FLOW

8.1.1 Timing-Driven Partitioning

Timing-driven partitioning refers to the partitioning process during the placement and rout- ing steps of conventional physical design flow of integrated circuits. The objective of the conventional timing-driven partitioning process is to generate circuit placements that are more likely to meet a particular timing budget. Path-based and net-based partitioners [1] are the two most widely used kinds of partitioners in current state-of-the-art physical design. Both path-based and net-based partitioners are used to limit the lengths of selected critical paths in a circuit. Such limitation in the number of analyzed paths significantly reduces the processing time for partitioning (and static timing analysis) while generally preserving the accuracy of the analyses.

In clock skew scheduling, the local data paths in an entire circuit (or circuit partition) are equally important and analyzed together. Thus, an alternative partitioning approach is proposed in this work using selection criteria that lead to partitions which are amenable to clock skew scheduling. Traditional path-based and net-based timing-driven partitioning methods are not used. Instead, a hypergraph partitioning tool is used to generate partitions that are amenable to clock skew scheduling and easily implementable with the rotary clocking technology. Principally, timing-driven partitioning is performed within the proposed design methodology subject to the following considerations:

1. To construct the logic network partitions that will be synchronized by individual ROA rings of the rotary clocking technology.

2. To enable the completion of path enumeration on large scale circuits.

3. To enable the completion of clock skew scheduling algorithms on large scale circuits.

The first of the three factors listed above is directly related to the implementation of the rotary clocking technology. If clock tree synthesis is performed completely independent from logic synthesis, the assignment of synchronous components to individual ROA rings can be inefficient for physical implementation. As discussed in Section7.3, a relatively balanced distribution of clock phases is necessary for the quality of synchronization with a rotary clock signal. An unbalanced loading of synchronous components to ROA rings may also cause hot spots in the circuit or significantly increase the clock load on one side of the chip compared to

another (thereby causing performance degradation). To prevent such negative effects where a fraction of the registers must be connected to ROA rings outside the close proximity of the register location, logic and clock tree synthesis need to be performed interdependently. The partitioning procedure presented here achieves this goal by generating balanced logic partitions to be synchronized by each ROA ring. Advantageously, the clock phases at the synchronous components within each partition are well distributed after the application of clock skew scheduling (see Figure25 on page63) to the logic partitions.

The second and third factors that drive the timing-driven partitioning process are related to the design and analysis methodologies of large-scale circuits. Although discussed here within the context of rotary clock synchronization, the partitioning procedures presented in this dissertation can also be applied to circuits synchronized with traditional clocking tech- nologies. From a CAD perspective, the generality of the partitioning procedure to improving the scalability of clock skew scheduling (independent of the particular clocking technology) is discussed next.

As reported earlier in Chapter 4, scalability of clock skew scheduling is an important criteria for its widespread acceptance in mainstream design. Most industrial-strength timing tools or circuit designers that implement variations of clock skew scheduling perform these tasks only on certain portions of the circuit, without analyzing the circuit in its entirety. Analysis of the entire circuit in order to implement a full-scale application of clock skew scheduling can be computationally intensive for very large-scale circuits. Two main obstacles for the application of clock skew scheduling to the entire circuit are path enumeration and run times of LP model problems.

With increased logic depth and complexity in state-of-the-art integrated circuits, path enumeration becomes highly costly and intractable. In practice, hierarchical timing models [71] are used in order to simplify the enumeration of paths in the circuit. In a flattened circuit, however, path enumeration can not always be completed within reasonable time and computation resources. Partitioning, as proposed in this work, remedies this shortcoming. Generally, very long paths are split with a cut and a level-sensitive latch in the transparent phase is inserted on the cut. The transparent phase latch has no effect on the functionality

of the circuit because the data signal immediately propagates through the latch. This latch, however, simplifies path enumeration by shortening the logic depth of the original path.

Once path enumeration is complete, the LP problem for the application of clock skew scheduling is formulated as described in Chapter 4. The LP problems generated for an integrated circuit with millions of paths and hundreds of thousands or more synchronous components can be very large. The run times of such large LP problems are usually reasonable within the typically long IC design cycles (up to a few days with industrial strength LP solvers and common computing resources). However, very large models might not be solvable at all within the memory limits of common computing resources. In several industry applications, for instance, LP model problems for the clock skew scheduling of large-scale circuits are observed to exceed the practical limits of standard industrial strength computing resources (e.g. 4 gigabytes of memory for 32-bit systems) [50].

In document Advanced Timing and Synchronization Methodologies for Digital VLSI Integrated Circuits (Page 140-142)