Extended standard - Data layout types : a type-based approach to automatic data layout transfor

In this section we sketch a proposal to extend the existing C standard to include the work we have implemented in terms of GCC. We start with the set of general assumptions we had in mind.

3.6.1 General principle

• Vectors can be of size 2nbytes, n > 0. We want to make vector types extensible for future architectures. The 2n property comes from observing sizes of the

SIMD registers and considering the sizes of the built-in types. Implementation- wise, operations on large vectors can be implemented using smaller vectors or scalars.

• We aim at supporting the intersection of the various instruction sets in the C language. For instructions outside the intersection, we explore if they can be expressed with a combination of the instructions from the intersection. For example, some of the architectures define multiply-add as a single instruction. We are not going to include multiply-add as a C primitive, as there is a way to express it with a combination of multiplication and addition. On the other hand it would be much more difficult to express the rotation operation without the designated primitive operation.

• We assume that compilers can perform complicated pattern matching to recog- nise the cases when several operations can be replaced by a single one on given hardware.

• Large vectors should be implemented with the largest hardware vector available on a given target or by scalar operations. Exact choices can be dictated by a cost model used for auto-vectorisation.

• A compiler shall provide a flag which would report the cases when a vector operation was implemented by scalar operations. Alternatively a user would have to analyse a final output, but she will not be able to reason if the generated code is optimal, or the fallback implementation has been used.

• The math library has to be extended by the vector variants of all the existing operations. Semantically, an operation on a vector type is an application of the scalar operation to all vector components. In this case we would be able to support hardware implemented mathematical functions for SIMD vectors. • We should support estimated mathematical operations. Some of the instruction

sets define estimated mathematical operations (like estimated square root) on vector types. These instructions are normally faster, but are less precise. In order to support them we propose to extend the math library by introducing estimated variants for all the operations. Semantically, if there is no implementation for estimated operation, we alias estimated operation to the normal one, or map it to hardware instruction otherwise.

Now we are going to describe extensions to the standard, identify the sections of ISO/IEC 9899:2011 [64] which would need altering and give a rationale regarding each operation.

3.6.2 Vector types

We introduce a notion of a new derived type [64, 6.2.5] which we are going to call a vector type. We do this by duplicating a passage about an array type with some

added restrictions:

A vector type describes a contiguously allocated nonempty set of ob- jects with a particular member object type, called the element type. The element type shall be complete whenever the vector type is specified. Element types must be evaluated either to integral type or a floating point type. Vector types are characterized by their element type and by the number of elements in the vector. The number of elements must be a compile time constant. The length of the vector type (the number of elements multiplied by the size of the element type) must be a power of two. A vector type is said to be derived from its element type, and if its element type is T, the number of elements is N, the vector type is sometimes called “vector of N T”.

A syntax for vector types has to be chosen and should be described in the new subsection of [64, 6.7.2].

vector-type:

vector (vector-size , type-specifier ) vector-size:

constant-expression

A vector-size multiplied by the size of type-specifier must evaluate to 2k, where k is an integer and k > 0.

An element type of the vector type shall be either integer or floating point.

The way we define a vector type is very similar to OpenCL, however we allow arbitrary long vectors and we are not restricting element types to the base types only allowing to use enums or user-defined types (assuming they can be evaluated to floats or integral types).

3.6.3 Declaration and initialisation

Now we need to define how the values of the new type can be constructed. We are going to borrow the syntax described in the Compound literals section [64, 6.5.2.5], allowing one to initialise an object of a vector type as if it were an array of known size. For example:

v e c t o r (4 , int ) x = {1 , 2 , 3 , 4 } ; v e c t o r (4 , int ) y = {1};

f ( ( v e c t o r (4 , int ) ) { 5 , 6 , 7 , 8 } ) ;

According to [64, 6.5.2.5], [64, 6.7.9] this syntax should be supported automatically, as vector types are complete.

If we want to allow the following syntax for vectors: v e c t o r (4 , int ) z = { [ 2 ] = 4 2 } ;

we would have to adjust the Initialization section [64, 6.7.9] paragraph 6 like this3:

The type of the entity to be initialized shall be an array of unknown size or a complete object type that is not a variable length array type. If a designator has the form

[ constant-expression ]

then the current object (defined below) shall have array or vector type and the expression shall be an integer constant expression. If the array is of unknown size, any nonnegative value is valid.

3.6.4 Vector subscript

Vectors can be subscripted as if the vector were an array with the same number of elements and base type. Formal addition would require a new subsection in [64, 6.5.2]:

Vector subscripting. Constraints.

The first expression shall have type “vector of basetype type”, the second expression (inside square brackets) shall have integer type and the result of type “type”.

Semantics

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of a vector object. The definition of the subscript operator [] is that E1[E2] designates the E2-th element of E1 (counting from zero).

3.6.5 Arithmetic operations

The vector operations we are going to support can be divided into 4 groups: • Arithmetic operations (+, -, *, /, %, unary +, unary -);

• Bitwise operations (&, |, ^, ~);

• Comparison operations (>, <, ==, <=, >= !=); • Vector shifts (>>, <<); and

• Vector shuffle (shuffle (v1, v2, mask)).

All the binary operations mentioned above except shuffle are operating on vector of floating-point types or integral types of the same signedness, with the same number of elements. The semantics of the vector operation is component-wise application of the according operation on all the elements of the vector.

Any vector comparison operation returns a mask (vector of signed integer), where false is represented with value 0, and true with value -1 (all bits set) and the size of the mask element type is the same as the size of the operand’s element.

Vector shuffle is defined similarly to the OpenCL shuffle and shuffle2 functions: Vector shuffling is available using functions shuffle (vec, mask) and shuffle (vec0, vec1, mask). Both functions construct a permutation of elements from one or two vectors and return a vector of the same type as the input vector (s). The mask is an integral vector with the same width (W) and element count (N) as the output vector.

The elements of the input vectors are numbered in memory ordering of vec0 beginning at 0 and vec1 beginning at N. The elements of the mask are considered modulo N in the single-operand case and modulo 2N in the two-operand case.

Consider the following example,

typedef v e c t o r (4 , int ) v 4 s i ; v 4 s i a = { 1 , 2 , 3 , 4 } ; v 4 s i b = { 5 , 6 , 7 , 8 } ; v 4 s i mask1 = { 0 , 1 , 1 , 3 } ; v 4 s i mask2 = { 0 , 4 , 2 , 5 } ; v 4 s i r e s ; /∗ r e s i s {1 ,2 ,2 ,4} ∗/ r e s = s h u f f l e ( a , mask1 ) ; /∗ r e s i s {1 ,5 ,3 ,6} ∗/ r e s = s h u f f l e ( a , b , mask2 ) ;

In order to include changes to +,- in the standard, the following fixes are needed in [64, 6.5.6]:

For scalar addition, either both operands shall have arithmetic type, or one operand shall be a pointer to a complete object type and the other shall have integer type. (Incrementing is equivalent to adding 1.) For vector addition, either both operands shall have arithmetic vector type with equal number of elements and signedness, or one operand shall have pointer vector type and the other shall have integer vector type, assuming that the number of elements in both operands is the same.

Fixes for all the other operations are similar.

3.6.6 Future extensions

For conversions we do not need to add anything new. Initialisation by means of compound literals (6.5.2.5) would automatically propagate standard conversions to

the vector components and cast operation (6.5.4) would take care of reinterpreting bytes of data as a given type.

3.6.7 Things to consider

Here is a list of features that are not in the proposed standard but might be added later:

• Horizontal reduction operations. A number of architectures e.g. SSE2, SSE3 support reductions over vectors (often called horizontal operations). The way we propose to deal with them is by recognising patterns in the program. That gives a good potential for using a target architecture more effectively, as the patterns can be found in ordinary programs and replaced with corresponding vector operations.

• Printf/Scanf extensions. AltiVec proposes convenient extensions for printing out vectors and reading in vectors. Currently one would need to implement vector IO by hand.

• Mixed scalar/vector operations. OpenCL supports arithmetic operations where one operand is a vector and another is a scalar, in which case the scalar operand is being promoted to a vector. That seems to be pure syntactic sugar, or is there a good usecase?

• Scatter/Gather operations. Some of the ISAs allow loads and stores from/to non-contiguous memory. Normally they are not very efficient, so it might be a good idea to discourage people from using them.

• Predication. Newest architectures e.g. MIC support masked execution of standard instructions. For example, we may add only those elements of two vectors that are marked as true in the corresponding mask. This is usually called “predicated instructions” and is used in the branches of vectorised conditionals.

In document Data layout types : a type-based approach to automatic data layout transformations for improved SIMD vectorisation (Page 56-61)