Global instruction scheduling works on bounded, contiguous regions of basic blocks, called scheduling regions. These regions constitute the scheduling scopes within which the global scheduler can rearrange and parallelize instructions; they should be chosen as large as possible to provide maximal opportunities for the extraction of instruction-level parallelism. Scheduling regions are often—initially also in this work—required to be acyclic (free of loops). Apart from such scheduler-specific restrictions, they are naturally limited by procedure boundaries since code motion between procedures is not considered viable. The notion “scheduling region” is often used with two different meanings:
• It denotes the input the scheduler receives on an invocation during code generation.
• It can also refer to subregions formed by the scheduler itself: Some of them partition the given scheduling region further into smaller regions and pass them to a scheduler proper in order to be scheduled. They exhibit a two-level structure: region formation and region scheduling.
The following definitions describe several formal representations of scheduling regions as they are used for scheduling. They are in some details adapted to the Itanium processor architecture:
Definition 3.2.1 (Control Flow Graph) A control flow graph (CFG) of a scheduling region is an acyclic digraphGC = (V, EC, Ventry, Vexit) with the region’s instructions as nodes. The edges represent possible control flow between the instructions and are marked by predicate registers.
If an edge (a, b) ∈ EC is marked by p then instruction b is executed after a if and only if p has value true. b is then said to be control dependent on a. Ventry contains those nodes without predecessors (entry points) andVexit those nodes without successors (exit points). ✷ In this thesis, assembly instructions form the nodes of the CFG, however, this and the fol-lowing representations can also be instantiated on a source or intermediate level. Conditional branches are not represented by nodes but only by edges in the graph.
Definition 3.2.2 (Basic Block) A basic block in a CFG is a path of maximal length where no
inner node has more than one successor or predecessor. ✷
Those instructions that are nodes on a path of Def. 3.2.2 are said to be contained in the basic block embodied by the path. The block is then also called the source block of these instructions.
The control flow inside a basic block is simple: If during program execution a basic block is reached, then always all of its instructions are executed. To focus on the more dynamic control flow between basic blocks, we can regard them as nodes of a new graph, the basic block graph:
Definition 3.2.3 (Basic Block Graph) A basic block graph (BBG) of a scheduling region is an acyclic digraphGB = (B, EB, Bentry, Bexit) with the region’s basic blocks as nodes. The edges EB ⊆ B×B represent possible control flow between the basic blocks and are marked by predicate registers analogously to Def. 3.2.1.Bentry ⊆ B contains those nodes without predecessors (entry blocks) andBexit ⊆ B those nodes without successors (exit blocks). ✷ Definition 3.2.4 (Control Flow Paths) Paths inGC andGB are called control flow paths. They are said to be complete if they start from a node inVentryorBentry and end at a node inVexit or Bexit. In the context of a BBG, we denote byC the set of all complete control flow paths and by C(A) ⊆ C the subset of those paths that pass through block A. Complete control flow paths are
also referred to as program paths. ✷
In both the CFG and the BBG, nodes with more than one predecessor are called joins and nodes with more than one successor are called splits. Edges from a split to a join are called JS edges [BMM00]. JS edges can complicate scheduling algorithms; it is possible to remove them by adding a new block between the split and the join (called JS block). The JS edge is then replaced by two new edges, one from the split to the JS block and one from there to the join.
In the context of basic block graphs, we often use the notions of (direct) successors or pre-decessors as known from graphs in general: For instance, we say that a blockA is a predecessor of blockB if there exists a nonempty control flow path from A to B, also written A ≺ B. If this path consists only of one edge, A is called direct predecessor. The predecessor relationship imposes a partial order onB.
Definition 3.2.5 (Dominance, Postdominance) We define for two nodesa and b in an acyclic digraph (typically a CFG or BBG):
• a dominates b (a dom b) if every path from an entry node to b passes through a.
• b postdominates a (a pdom b) if every path from a to an exit node passes through b.
• a and b are control equivalent if a dominates b and b postdominates a (or vice versa).
Each node dominates and postdominates itself. We extend this definition to node sets as follows:
Leta be a node and S be a subset of nodes, then
• a is dominated by S, denoted by a ∈ D−1+ (S), if every path from an entry node to a passes throughS.
• b is postdominated by S, denoted by a ∈ P−1+ (S), if every path from a to an exit node
passes throughS. ✷
Control flow properties can also be described by control dependence graphs or program dependence graphs [SS02], but these concepts are not used and presented here. Instead, we introduce a further essential graph that describes the data dependences of instructions:
Definition 3.2.6 (Global Data Dependence Graph) Let an acyclic control flow graph GC = (V, EC, Ventry, Vexit)
be given. The corresponding global data dependence graph (DDG)GD = (V, ED) is an acyclic digraph that contains an edge from nodem to n if
• n is data dependent on m with respect to a storage resource (as defined in Sec. 1.3) and
• there exists a control flow path in GCfromm to n that contains no definition of this storage resource, and in the case of a WAW dependence also no use (this condition is here referred to as the exclusion criterion).
The data dependence edges can be partitioned according to the involved dependence and re-source types: ED = EDRAW ∪ EDW AR∪ EDW AW (true, anti, and output dependences) and ED = EDreg ∪ EDmem (register and memory dependences), respectively. A (total) latencywmnin cycles is associated with each data dependence edge; this value is often also interpreted as the length of
the edge. ✷
On the Itanium architecture, all intra-group (WAR and memory) dependences have latency zero since the dependent instructions may appear in the same instruction group (though only in an order that complies with the dependence). WAW register dependences, however, cause stalls on the Itanium 2 processor similarly to RAW dependences and are thus assigned the same latencies.
The existence of a DDG edge implies the existence of a (possibly empty) control-flow path between the two instructions’ source blocks. Hence acyclic control flow graphs always yield acyclic data dependence graphs. An instruction is called a DDG predecessor of another instruc-tion if there is a nonempty path from the former to the latter in the DDG. The order on the instructions defined by the data dependences is transitive, thus a data dependence edge can be regarded as redundant if it is already implied by a sequence of other edges. We can assume that a given DDG has no such redundant edges:
Definition 3.2.7 (Minimal DDG) A data dependence graphGD is called minimal if for no edge (m, n) ∈ ED there exists a path inGD fromm to n that does not contain the edge (m, n) and
has length greater or equal towmn. ✷
The DDG is sufficient to describe all feasible orders of instructions contained in a basic block—it renders the CFG, which gives for each basic block a linear instruction sequence (as one possible order), dispensable. In other words, the DDG extracts from this linear sequence the information that is relevant for scheduling.
Thus, if we have a functions : V −→ B that gives the source block of each instruction, then BBG and DDG together are an (almost) complete description of a global scheduling problem instance. For the example routine from Alg. 2 in Sec. 2.1.2.1, both representations are shown in Fig. A.1 and Fig. A.2 in the appendix.