• No results found

Wire Length Estimation for Nets in Static Circuits

7.5 T PLACE

7.5.1 Wire Length Estimation for Nets in Static Circuits

In this section we partially repeat what we described in Section 5.2.1 to clearly accentuate the difference with the wire length estimation in tuneable circuits, which is explained in the next section.

In case of a conventional static circuit, a legal routing solution con- tains a disjoint set of routing resources for each net. A wirelength- driven placer estimates the total wire length as the sum of the estimated wire lengths of each net. The wire length of a net is estimated as the half-perimeter of its bounding box weighted by a factor which depends on the number of terminals of the net.

Cwl =

X

n∈nets

q(#terminals(n)) · HP W L(n) (7.2)

The factor q(.) is taken from [29]. It is equal to 1 for nets with up to three terminals and slowly grows to 2.79 for nets with 50 terminals.

To evaluate the estimation, the circuits in the Toronto 20 benchmark suite [15], the 20 largest circuits of the MCNC benchmark suite, were placed with the wirelength-driven placer and routed with the breadth- first router in VPR 4.30 with default settings. The channel width had 20% more tracks than the minimum channel width, to allow a relaxed routing, as recommended in [17]. To evaluate the conventional estima- tion we calculate the correlation between the estimated and the actual routing cost of the placed benchmark circuits. The resulting correlation co¨efficient is 0.9705. Further on this correlation co¨efficient will be com- pared with the correlation co¨efficient of the newly proposed estimation method for tuneable circuits to evaluate the quality of the proposed es- timation.

7.5.2 Wire Length Estimation for Tuneable Circuits

This section describes how the routing resource usage of a tuneable circuit can be estimated. It is important not to make the calculation of the estimation more complex than the estimation of the nets in static circuits, because the estimation is needed in the kernel of the simulated annealing algorithm. Hence the time needed for the estimation should be reduced to the very minimum.

(a) Sharing of routing resources be- tween connections with the same source.

p !p

(b) Sharing of routing resources be- tween connections (TCONs) with the same sink.

Figure 7.9: Sharing of routing resources between connections with the same source or sink. Shared resources are annotated with a dotted shape.

A new wirelength estimation method is necessary because in tune- able circuits, a routing solution does not contain a disjoint routing set for each of the TCONs. There are two sharing mechanisms. In Fig- ure 7.9 the two sharing mechanisms are demonstrated on a small ex- ample routing solution. TCONs can legally share resources with other TCONs if they carry the same signal or if they are not active at the same time. The first resource sharing mechanism, TCONs carrying the same signal, is easily distinguished, because the TCONs are driven by the same source. The second resource sharing mechanism, TCONs that are not active at the same time, is harder to recognise. The connection con- ditions of the TCONs have to be compared. After comparison, each TCON t has an associated set of TCONs. Each of the TCONs in that set may share resources with t, but some of the TCONs in the set can be far away from t and are not interesting to consider as sharing resource with. For example, two connections that are not active at the same time

with the terminals of the connections situated on the other side of the FPGA. These two connections will most likely not share resources, even if it is allowed. The most interesting TCONs in the set are the TCONs that have the same sink as t. They can be distinguished easily and they are forced to overlap because they have to reach the same sink anyway. So to simplify the problem we consider only overlap between TCONs with the same source or TCONs with the same sink.

The estimation is not as straightforward as in case of static circuits. In case we use the same estimation method that is used for static cir- cuits, then the second sharing mechanism is neglected and the wire- lenght is systematically overestimated. Let’s consider the toy example in Figure 7.10a. The conventional estimation method considers all con- nections starting from the source as a collection and tracks the bound- ing boxes of these collections during placement. The estimation for the toy example is 10 wires, but the post-route solution contains only 9 wires. The number of bounding boxes that are tracked during place- ment equals the number of sources in the circuit.

To achieve a better estimation, all connections starting from a sink could also be considered a collection. Each connection would then be part of a source collection and a sink collection. The estimation method would than track the bounding boxes of the source and the sink col- lections. The total wirelength estimate in this case is the half of the sum of the bounding box estimates. As can be seen in Figure 7.10b, the estimated wirelength is 9.5, which is slightly better than the con- ventional estimation, but the number of bounding boxes that needs to be tracked during placement is the number of sources and sinks in the circuit, which is typically at least twice the amount of bounding boxes. The toy circuit has three sources and two sinks, so there are five bound- ing boxes which need to be tracked.

To reduce the number of bounding boxes that needs to be tracked and improve the accuracy even more, we propose to partition the TCONs in the tuneable circuit according to the dominant resource shar- ing mechanism. After the partitioning process, each TCON is only part of one collection and the number of collections is typically in the same ballpark as the conventional estimation and the accuracy is typically better. The partition-based estimation correctly predicts the number of wires for our toy example in Figure 7.10c while only needing 2 bound- ing boxes. This improvement is even more pronounced for tuneable circuits with larger fanout source collections and/or larger fanin col- lections.

We define the dominant resource sharing mechanism to be the shar- ing mechanism that is used by the largest number of connections. Let’s

(a) Conventional Cwl,est = 10, #BB = 3

(b) Both sharing mechanisms Cwl,est= 9.5, #BB = 5 A B

(c) Partitioning according to the dominant sharing mechanism

Cwl,est= 9, #BB = 2 A B (d) Post-route solution Cwl= 9

Figure 7.10: The wirelength estimation methods for tuneable circuits applied to a simple toy circuit and the post-route solution

Listing 7.4: Pseudocode of the greedy multi-iteration algorithm used to partition the TCONs

1 while(candidateParts.size > 0):

2 int maxCardinality = 0 3 Set maxPart = 0

4 /* Find the set with the maximum number of TCONs */ 5 for each Set s in candidateParts do:

6 if (s . size > maxCardinality || (s . size == maxCardinality && BB(s) > BB( maxPart)))

7 maxCardinality = s.size

8 maxPart = s

9 partitioning . add(maxPart)

10 /* Remove each TCON of maxPart from its other set */ 11 for each TCON t in maxPart do:

12 Set source = candidateParts.getSourceSet(t) 13 Set sink = candidateParts.getSinkSet(t) 14 if (maxPart == source) sink.remove(t) 15 else source.remove(t)

consider the TCON that connects block A and B in the toy circuit de- picted in Figure 7.10c. The TCON is driven by a source that has a fanout of two (two connections in the circuit are driven by this source). On the other hand, the sink of the TCON is also used by two other connections. In this case the dominant resource sharing mechanism for this TCON is sharing between TCONs with the same sink.

To partition the TCONs, according to the dominant resource shar- ing mechanism, we propose a greedy multi-iteration algorithm. In List- ing 7.4 the pseudocode of the partitioning algorithm is given. Initially, the TCONs sharing the same sink are part of the same sink set and TCONs sharing the same source are part of the same source set. Each TCON is therefore initially part of 2 sets, a source set and a sink set. All the sets are added to a possibleParts collection. Each iteration of the partitioning algorithm consists of two steps. It starts with greedily se- lecting the largest set in the possibleParts collection and in the second step each TCON in the selected set is removed from the other set of which they are part of. The loop iterates until all the connections are covered by the chosen sets. In the end every TCON belongs to only one set.

The total wire length of the tuneable circuit is then estimated as shown in Equation (7.3). The estimate is the sum of the estimated wire lengths of each set in the partition, where the wire length of a set is es- timated as the half-perimeter of its bounding box weighted by a factor which depends on the number of terminals of the set.

TRoute   Parameterized  Configura3on   …   Combine  Modes   …   Synthesis   HDL  design  Mode  1  

Pack  &  Place   Technology  Mapping  

Synthesis   HDL  design  Mode  2  

Pack  &  Place   Technology  Mapping  

…  

…   …  

Placed  Tuneable  Circuits  

Figure 7.11: A tool flow for the compilation of multi-mode circuits, that increases the routing similarity between modes. The tuneable circuits and the corresponding placement produced by this tool flow can be used to evaluate routing cost estimations for tuneable circuits.

Cwl=

X

∀s ∈ partition

q(#terminals(s)) · HP W L(s) (7.3)

The same weighting factors q(.) are used as in the conventional esti- mation of a net, see Equation (7.2). For source sets this is quite straight- forward, because they are the equivalent to nets in a static circuit, but the estimation can be generalised for the sink sets. The factor q(.) is in- dependent of the fact if a set is a source or a sink set. This generalisation can be justified because the bounding box estimation first proposed in [29] has as goal to estimate the cost of a minimum Steiner tree. Rout- ing sink or source sets is in fact the same problem, finding the shortest interconnect for a given set of terminals, which is the minimum Steiner tree problem.