• No results found

Application in Partial CFG Linearization

5.9 Rewire Target Analysis

5.9.4 Application in Partial CFG Linearization

The Partial CFG Linearization phase transforms the CFG in such a way that all disjoint paths that are executed by some instances are always executed (Section 6.3). These paths are described by means of div causing blocks and rewire targets. Implicitly, this means that optional blocks—blocks that are no rewire targets—do not always have to be executed: Edges that target such blocks can be retained, and the CFG still exhibits some of the original structure.

A conservative, complete linearization of the CFG thus can be forced by assuming that every conditional branch is varying: Each block with multiple outgoing edges then is a div causing block, which in turn makes each of their successor blocks and each block with multiple incoming edges a rewire target. CFG linearization then has no choice but to linearize the entire CFG.

Note that this does not only affect linearization, but also the precision of the Vectorization Analysis: Since all branches are considered varying, every φ-function also has to be considered varying, which may result in less efficient code.

5.9 Rewire Target Analysis 83 a b c d f e g h i j k l m n o p q r s u u u u v u u u a b c d e i f g j h k l m n u u v u u u a b c d f e g h i j k l m n o p q r s t v u u u u v u u u u u u

Figure 5.13: Complex examples with nested loops, loops with multiple exits, and exits that leave multiple loops at once. In the second graph, i is rewiree because criterion 4 is fulfilled for the outer loop (disjoint paths e → g → h →

c → d → i and e → f → j → l), albeit not for the inner. Partial linearizations

6 Whole-Function Vectorization

In this chapter, we present the main transformation phases of the Whole- Function Vectorization algorithm: Mask Generation, Select Generation, Partial CFG Linearization, and Instruction Vectorization.

6.1 Mask Generation

As already mentioned, control flow may diverge because a condition might be true for some scalar instances and false for others. Consequently, all code has to be executed. The explicit transfer of control is modeled by masks on control flow edges. A mask is a vector of truth values of size W . If the mask of a CFG edge a → b is set to true at position i, this means that the i-th instance of the code took the branch from a to b. Thus, the mask denotes which elements in a vector contain valid data on the corresponding control flow edge.

Algorithm 1: Pseudo-code for the main mask generation function. Input: A CFG in SSA form.

Output: Mask information for every basic block and loop exit. begin

foreach B ∈ return blocks do createMasks(B); end foreach L ∈ loops do createLoopExitMasks(L); end end

Algorithm 1 shows how masks are generated. The presented pseudo-code generates a graph where each node represents a mask. Code generation boils down to a straightforward depth-first traversal of the graph.

Function createMasks(Block B) begin

if B already has masks then return;

end

if B is loop header then createMasks(preheader ); else foreach P ∈ predecessors do createMasks(P); end end createEntryMask(B); createExitMasks(B); if B is loop header then

createMasks(latch); if loop is divergent then

Mask latchMask ← ExitMasks[latch→header]; EntryMasks[B].blocks.push(latch);

EntryMasks[B].values.push(latchMask); end

end end

The edge masks implicitly define entry masks on blocks (Function cre- ateEntryMask): The entry mask of a block that is no loop header is either true for by all blocks or the disjunction of the masks of all incoming edges. The mask of a loop header is a φ-function with incoming values from the loop’s preheader and latch for divergent loops. Otherwise, the mask is always the one coming from the preheader. This is because as long as the loop iterates, all instances that were active upon entry of the loop remain active.

The masks of the control flow edges that leave a block are given by the block entry mask and a potential conditional (Function createExitMasks). Note that this does not apply to edges to rewire loop exit blocks, which are discussed in the next paragraph. If a block exits with an unconditional

6.1 Mask Generation 87 ma← · · · .. . x1← · · · cond ← · · · ma→b← ma∧ ¬cond ma→c← ma∧ cond br cond, c, b a mb← ma→b x2← · · · .. . mb→c← mb b mc← ma→c∨ mb→c x3← phi(x1, x2) · · · ← x3 c false true

Figure 6.1: Edge and block entry masks. ma, mb, and mcare the entry masks of the corresponding blocks a, b, and c. ma→b, ma→c, and mb→care the block exit masks connected to the edges a → b, a → c, and b → c.

branch, the mask of its single exit-edge is equal to the entry mask. If the block ends with a varying conditional branch, the exit mask of the “true edge” of the block is the conjunction of its entry mask and the branch condition. The exit mask of the “false edge” is the conjunction of the entry mask and the negated branch condition. For the “true edge” of a uniform, conditional branch condition, a select returns the entry mask of the block if the condition is met, otherwise it returns false. For the corresponding “false edge,” an inverse select is used. This scheme extends naturally to blocks with more than two outgoing edges, e.g. due to a switch statement: The comparison of each case value to the switch value is the condition of that edge. Figure 6.1 shows an example with three basic blocks a, b, and c with corresponding block entry masks (ma, . . . ) and edge masks (ma→b, . . . ).

The analyses presented in Chapter 5 enable various optimizations here. First, if our analysis found out that a block is always executed by all instances (by all), the mask is set to true. Second, at the end of regions with a single entry and exit block, the mask can be reset to the one of the entry block. However, there is a trade-off involved: The mask has to be kept alive for the entire region, which can result in inferior performance to recomputing

Function createEntryMask(Block B)

begin

if B is entry block then

if function has mask argument then

EntryMasks[B] ← Mask(VALUE, mask argument);

else

EntryMasks[B] ← Mask(VALUE, true);

end return; end

if B is by all then

EntryMasks[B] ← Mask(VALUE, true);

return; end

if has unique predecessor P then

EntryMasks[B] ← ExitMasks[P→B];

return; end

if B is header of loop with preheader P then

Mask loopEntryMask ← ExitMasks[P→B];

if loop is divergent then

EntryMasks[B] ← Mask(PHI); EntryMasks[B].blocks.push(P); EntryMasks[B].values.push(loopEntryMask); else EntryMasks[B] ← ExitMasks[P→B]; end return; end if B is blend then

Mask entryMask ← Mask(OR);

foreach P ∈ predecessors do

entryMask.push(ExitMasks[P→B]);

end

EntryMasks[B] ← entryMask;

else

Mask entryMask ← Mask(PHI);

foreach P ∈ predecessors do

Mask predMask ← ExitMasks[P→B]; entryMask.blocks.push(P); entryMask.values.push(predMask); end EntryMasks[B] ← entryMask; end end

6.1 Mask Generation 89 Function createExitMasks(Block B) begin if no successors then return; end

if has unique successor S then

ExitMasks[B→S] ← EntryMasks[B]; return;

end

foreach S ∈ successors do

// C is the condition of edge B→S,

// e.g. Mask(NEG, C) for false edge of cond. branch. if exit condition C is uniform then

ExitMasks[B→S] ← Mask(SELECT); ExitMasks[B→S].cond ← C;

ExitMasks[B→S].trueVal ← EntryMasks[B]; ExitMasks[B→S].falseVal ← Mask(VALUE, false); else ExitMasks[B→S] ← Mask(AND); ExitMasks[B→S].push(EntryMasks[B]); ExitMasks[B→S].push(Mask(VALUE, C)); end end end

the mask with a disjunction. Third, optional blocks always use the mask of their only predecessor or a phi with the incoming masks if the block has multiple incoming edges. This is because all instances that were active in the executed predecessor will also be active in the optional block (and none from a different direction, or the block would have been marked rewire). Otherwise, disjunctions would be generated, introducing some performance overhead compared to the original, scalar function. Also, the incoming mask of a block with a uniform branch and only optional successor blocks is used for both outgoing edges. Without the Rewire Target Analysis, the mask would have to be updated with the comparison result first. This implies that on paths with only optional blocks, all edges have the same mask as the

first block, without requiring mask update operations. In the rightmost CFG of Figure 5.12, blocks c and d can both use the entry mask of block b instead of performing conjunction-operations with the (negated) branch condition in b, and block f can use the same mask instead of the disjunction of both incoming masks.