CASE Í ‘LIST_EMPTY’; }else{ - Graph processing hardware accelerator for shortest path algorithm

for each D(v[i]) є L[v] {

if (sf є D(v[i])== VALID && rv > r є D(v[i]) && tv > t є D(v[i])){ // this new candidate is dominated.

CASE Í ‘NEW_DOMINATED’;

Break; // exit “for each D(v[i]) є L[v]”

}

elsif(sf є D(v[i]) ==VALID && rv < r є D(v[i]) && tv < t єD(v[i])) || (sf є D(v[i]) == VALID && t є D(v[i]) < estimated_delay){ // existing candidate-dataset is dominated, so mark invalid.

sf є D(v[i]) Í NON-VALID;

CASE Í ‘OLD_DOMINATED’;

Break; // exit “for each D(v[i]) є L[v]” }

else {

// neither new, nor old candidate-datasets dominate.

CASE Í ‘NONE_DOMINATED’;

}

}// end “for each D(v[i]) є L[v]” }

// #PART_2: MANIPULATE V-LIST & PRIORITY QUEUE. if ( CASE == ‘LIST_EMPTY’ ) {

D(v[0]) = {u, k, e, rv, tv, VALID } // D(v[k]) = {u, uk, e, r, t, sf } L[v] Å L[v] U D(v[0])

INSERT(Q, D(v[0]), tv є D(v[0]))

}

Figure 3.12: Function InsertCandidate ( ) (continued)

elsif ( CASE == ‘NEW_DOMINATED’ ) { // do nothing.

}

elsif ( CASE == ‘OLD_DOMINATED’ ) { for each D(v[i]) є L[v] {

if ( sf є D(v[i]) == NON_VALID ) {

// get the first invalid candidate-dataset and overwrite it. // it’s ok to just leave the rest of invalid candidate-dataset.

D(v[i]) Í {u, k, e, rv, tv, VALID }

DECREASE-KEY(Q, D(v[i]), t є D(v[i]))

Break; // exit “for each D(v[i]) є L[v]” }

}

elsif (CASE == ‘NONE_DOMINATED’) { // append to the v-list.

i = Length[L[v]] + 1

D(v[i]) = {u, k, e, rv, tv, VALID } L[v] Å L[v] U D(v[i])

INSERT(Q, D(v[i]), t є D(v[i]))

}

// #PART_3: UPDATE estimated_delay IF NECESSARY. if (v = z) { // reach the destination

if (estimated_delay > tv + rv*Cz ) {

estimated_delay Å tv + rv*Cz // update the value

estimated_end_candidate Å D(v[i]) // remember this candidate }

}

Figure 3.13: Simultaneous Maze Routing and Buffer Insertion (S-RABI) S-RABI(G, B, W, s, z){

for (each vertex v є V[G]){

L[v] Å NIL

}

estimated_delay Å ∞

D(s[0]) = {NIL, NIL, NIL, Rs, 0, VALID} // D(v[k]) = {u, uk, e, r, t, sf } L[s] Å L[s] U D(s[0])

INSERT(Q, D(s[0]), t є D(s[0])) // INSERT(Q, identifier, key) do{

do{

(D(u[k]), t є D(u[k])) Å EXTRACT-MIN(Q) }(while sf є D(u[k]) == NON_VALID)

if (estimated_delay > t є D(u[k])) { for (each vertex v є Adj[u]) {

if (v є OW[G]’) { // if v is not wire-obstacle. for each w є W {

(rv, tv) Å Cost(r є D(u[k]), t є D(u[k]), w[i]) if (tv < estimated_delay)

{ InsertCandidate(D(u[k]), v, rv, tv, w[i], L[v]) }

if (v є OB[G]’) { // if v is not buffer-obstacle. for each b є B{

(rv, tv) Å Cost(r є D(u[k]), t є D(u[k]), b[i]) if (tv < estimated_delay)

{ InsertCandidate(D(u[k]), v,, rv, tv, b[i], L[v])}

}

}// end buffer trials }

}// end wire trials }// end all adjacent-vertices }(while Q ≠ Ø)

Let us now explain the working of S-RABI algorithm, which is given in Figure 3.13. Initially, the priority queue Q is empty. The algorithm begins by initializing the v-list to all vertices to be set empty (L[v] Å NIL). The estimated source-to-destination delay is initially set to infinite (estimated_delay Å ∞); as we shall see later, this parameter plays a role to control Q size.

Starting from source s, a candidate-dataset is created, D(s[0]) = {0, 0, NIL,

Rs, 0, VALID}. This is the first candidate-dataset at vertex s, hence the index k = 0,

i.e. D(v=s[k=0]). There is no vertex precedence of s (i.e. u = 0), therefore no reference to the index of candidate-dataset at precedent vertex (i.e. uk = 0), no interconnect prior to s (i.e. e = 0), driving-resistance at source is Rs (i.e. r = Rs), propagate-delay prior to s is zero (i.e. t = 0). This candidate-dataset is added to the v-

list at source s, i.e. L[s] Å L[s] U D(s[0]). This candidate-dataset is inserted to Q

with propagate-delay as priority-value and (pointer to) candidate-dataset as identifier. The identifier dereference to the location of candidate-dataset, i.e. at which vertex the candidate-dataset belongs to, at what index the candidate-dataset resided in the v-list of that vertex.

Consider now the program loop where graph-traverse is performed. The top- priority element is extracted from Q. With the identifier, the origin of candidate- dataset is known. It is created at vertex u with index uk in v-list of u, i.e. D(u[k]). Each adjacent-vertex v is now scanned, and if v is not in wire-obstacle region, available wire-size w[i] is picked, the propagated-resistance rv and propagated-delay tv is computed by function Cost( ). The (rv, tv) are utilized in dominancy-check in the InsertCandidate( ) function.

The InsertCandidate( ) can be explained in three separate parts, namely: #PART_1, #PART_2 and #PART_3. In #PART_1, the context state of the vertex is first determined. If vertex v has not been visited yet, indicated by an empty v-list at v (i.e. L[v] = NIL), then the context state is set as ‘LIST_EMPTY’. Otherwise, if v has been visited before, there must be candidate-dataset in v-list. So consider each candidate-dataset in turn; if this candidate with (rv, tv) has been dominated by any of the existing candidate-dataset, the context state is set as ‘NEW_DOMINATED’,

meaning that this new one is dominated. Else, if this new candidate dominates any exiting candidate-dataset, the context state is set as ‘OLD_DOMINATED’, with the existing candidate-dataset now set to NON_VALID. Note also, the existing candidate- datasets in list will be set to NON_VALID if its propagated-delay has exceeded the value of estimated_delay: t є D(v[i]) < estimated_delay. In this evaluation, once the context state is identified as either ‘NEW_DOMINATED’ or ‘OLD_DOMINATED’, this #PART_1 part of the algorithm is immediately exited. Lastly, if it ends up where for all candidates in the v-list, neither the new candidate nor (any) existing candidates dominates the other, the context state is set as ‘NONE_DOMINATED’.

Next in #PART_2, specific action is taken for the context state determined in #PART_1. If the context state is ‘LIST_EMPTY’, the new candidate is inserted into the v-list. The candidate-dataset is created: D(v[0]) = {u, k, e, rv, tv, VALID }, then added to v-list: L[v] Å L[v] U D(v[0]), and inserted into priority queue, Q. Else, if the context state is ‘NEW_DOMINATED’, simply do nothing, implying that the new candidate is discarded. Else, if the context state is ‘OLD_DOMINATED’, the new candidate has dominated one of the existing candidate-datasets or there is an invalid candidate-dataset which its propagated-delay has exceeded the estimated_delay. Here, the NON_VALID candidate-dataset is overwritten with parameters of new candidates: D(v[i]) Í {u, k, e, rv, tv, VALID}, and Decrease-Key is invoked for Q relaxation. Lastly, if the context state is ‘NONE_DOMINATED’, neither new nor existing one dominates. A new candidate-dataset is hence created P[v][i] = {u, k, e,

rv, tv, VALID}, appended to the list L[v] Å L[v] U D(v[i]), and inserted to Q.

Next, if eventually we are visiting the destination z (i.e. if v = z), one possible value of source-to-destination-delay is obtained. It is computed using the formula tv + rv*Cz where Cz represents the load-capacitance of destination/sink. In multi- weighted routing, however, if not all vertices have been visited and not all possible interconnect-types have been tried out, it is not certain that this source-to-

destination-delay is the exact minimum delay path. Therefore, it is called “estimated-

delay”, the candidate which gives this estimated_delay is remembered as

“estimated_end_candidate”. The estimated_delay parameter is powerful. In S-RABI, if the returned tv from Cost( ) is greater than estimated_delay, the interconnect-

candidate is dropped immediately because it can never give a minimum-delay better than the estimated_delay. The use of estimated_delay parameter eliminates

unnecessary expansion of tentative search result, thus lighten the load on priority queue Q. Without the deployment of estimated_delay, the NP behavior could arise and the problem becomes unsolvable.

When all wire candidates has been considered, if v is not in buffer-obstacle region, then an available buffer-choice b[i] is picked, the propagated (rv, tv) in Cost( ) is estimated, dominancy is checked in InsertCandidate( ). This is reiterated with other buffer-choices. The process repeats for all vertices with all possible wire-sizes and buffer-choices with frequent tighter-updates on estimated_delay, until the priority queue Q is empty. When Q empty, there is no other possible route, the

estimated_delay is the exact minimum-source-to-destination-delay based on the Elmore Delay model. This exact minimum path can be traced-back by dereferencing {u, uk, e} є D(v[k]) from the estimated_end_candidate at vertex z backward to s.

A numerical example that illustrates the detailed working of S-RABI is given in Appendix C.

3.4 Summary

This chapter explains the S-RABI algorithm and Insertion Sort priority queue in detail. In the next chapter, necessary algorithmic modifications on S-RABI are presented, in order to benefit from hardware priority queue which only provide INSERT and EXTRACT function. The necessary algorithmic modification on Insertion Sort is also presented in the next chapter, for high-speed hardware priority queue implementation.

In document Graph processing hardware accelerator for shortest path algorithms in nanometer very large-scale integration interconnect routing (Page 73-79)