3.7 LLBMC’s Intermediate Logic Representation
3.7.5 Memory
Memory related operations are considerably more complex than most other opera- tions. This is to some part because low-level memory operations require various bits of knowledge about the target architecture, including data layout constraints (e.g.
alignment) and the architecture’s native bitwidth:29
Definition 3.32 (Pointer bitwidth). The function pointerwidtha stands for the
pointer width on architecture a.
For example, for the x86 architecture, pointerwidthx86= 32. This definition leads
immediately to the following extension of the definition of a bitwidth:
Definition 3.33 (Bitwidth). The operator |N |a
stands for the width of sort N on architecture a.
• For all integer sorts I, |I|a
= |I|,
• for all pointer sorts P, |P|a
= pointerwidtha,
• for all aggregate sorts A and S, |A|a and |S|a are defined as the sum of the
bitwidths of all elements.
Note that this does not yet take padding or alignment into account, so the bitwidth can be seen as the minimum number of bits required to store a value.
Symbol : Signature Interpretation
loadbV : M × V* → V Big-endian load
loadlV : M × V* → V Little-endian load
storeb
V : M × V* × V → M Big-endian store
storelV : M × V* × V → M Little-endian store
Table 3.22: Memory accessing ILR functions
ILR’s memory accessing functions are listed in table 3.22. LLBMC treats endianness not as a property of the target architecture but as a property of each load or store
operation. Because of this, two functions for load and store are defined each: loadb
and storeb for big-endian memory accesses and loadl and storel for little-endian
memory accesses. This approach allows reasoning about big-endian, little-endian,
mixed-endian30, and bi-endian31architectures in a single formula. This is useful when
comparing programs compiled for different architectures. We will use load instead
of loadb or loadl if something holds for either endianness, e.g. when referring to
loading and storing of a sort with bitwidth less than or equals eight.
As already mentioned above, only loading and storing of LLVM’s simple value types (integers and pointers) are supported. Aggregate sorts are only ever used for deriving pointer sorts and for pointer arithmetics in the gep function introduced below. While LLVM allows virtual registers containing integers of any bitwidth, a byte in memory is always fixed to contain 8 bits. This is relevant when accessing memory, e.g. because of endianness and when doing pointer arithmetics, e.g. due to alignment constraints. Single-byte loading and storing is inspired by McCarthy’s theory of arrays (see section 2.1.4):
∀m, p1, p2, v p1= p2→ loadi8(storei8(m, p1, v), p2) = v
(3.13a) ∀m, p1, p2, v p16= p2→ loadi8(storei8(m, p1, v), p2) = selecti8(m, p2)
(3.13b) 30Mixed-endian architectures have different endianness for different bitwidths.
Extensionality of arrays is defined as usual:
∀m1, m2 m1= m2↔ ∀p loadi8(m1, p) = loadi8(m2, p)
(3.14) Like LLVM itself, LLBMC not only allows for loading and storing of single bytes but for integers and pointers with arbitrary bitwidths. The semantics of multi-byte loading and storing is mapped to that of multiple single-byte loading and storing.
Note that in LLVM, a loadI with |I| 6≡ 0 (mod 8) is undefined if the memory at
this location was not written using a store of the same type. Similarly, the values of the extra bytes for a store with a bitwidth smaller than eight are unspecified. Because LLVM-IR programs generated from C programs do not store values with a bitwidth that is not a multiple of eight we will spend the minimum amount of effort to handle these cases.
Semantics of constant sized, multi-byte reads can be derived from single byte reads by concatenating the result of a single-byte read and a second, complementary read:
∀m, p |I| > 8 → loadbI(m, p) = concati8,I(loadi8(m, p), loadbI1(m, p + 1))
(3.15)
∀m, p |I| > 8 → loadlI(m, p) = concatI,i8(loadlI1(m, p + 1), loadi8(m, p))
(3.16) Loads smaller than a single byte are realized by truncating the result of loading a whole byte:
∀m, p |I| < 8 → loadI(m, p) = trunci8,I(loadi8(m, p)) (3.17)
Similarly, the effects of a constant sized, multi-byte store can be split into a single- byte store and a second, complementary store:
∀m, p, x storeb
I(m, p, x) = storebI2( storei8(m, p, extractI,|I|−8,|I|−1(x)), (3.18)
p + 1, extractI,0,|I|−9(x)) (3.19)
∀m, p, x storel
I(m, p, x) = storelI2( storei8(m, p, extractI,0,7(x)), (3.20)
p + 1, extractI,8,|I|−1(x))
(3.21) Storing of integers with a bitwidth smaller than eight is handled using zero extension:
∀m, p, x |I| < 8 → storeb
I(m, p, x) = storebi8(m, p, extuI,i8(x))
(3.22) Loading and storing of pointers can be mapped to the loading and storing of integers of the same size:
∀m, p1, p2 |I| = |P|a→ storeP(m, p1, p2) = storeI(m, p1, ptrtointP,I(p2))
(3.23a)
∀m, p1, p2 |I| = |P|
a
→ loadP(m, p1, p2) = inttoptrI,P(loadI(m, p1, p2))
(3.23b) Pointer arithmetic is heavily dependent on the architecture’s data layout rules.
In order to handle pointer arithmetic via LLVM’s getelementptr for each native sort a constant symbol is required which represents the space in memory taken up by an array element of this sort. This is not necessarily equal to the size of the object itself, as this has to take alignment of the second element in the array into account. The numerical value of these constants can be retrieved from the LLVM libraries.
Definition 3.34 (Allocation bitwidth). The function allocwidtha
N, indicates the number of bits required to allocate sufficient space for an object of sort N on architecture a so that another such object can be placed after the first object with appropriate alignment.
And similarly:
Definition 3.35 (Offset). The function offseta
S,n is the offset in bytes of nth element in the structure sort S.
The offsetaS,n function is the counterpart to GNU CC’s macro offsetof(s, f),
which returns the offset of the field named f in the structure s.
Symbol : Signature Interpretation
addP,I: P × I → P Pointer addition
subP,I: P × I → P Pointer subtraction
gepa
P1,In1,...,In,P2: P1× I
n→ P
2 Pointer arithmetics
Table 3.23: Pointer arithmetic ILR functions
The functions addP,V and subP,V are convenience functions for sort-correct pointer
arithmetic operations. They convert a pointer to an integer, do integer arithmetics, and convert the resulting pointer back to the initial pointer sort.
∀p, x addP,I(p, x) = inttoptrI,P(addI(ptrtointP,I(p), x))
(3.24a)
∀p, x subP,I(p, x) = inttoptrI,P(subI(ptrtointP,I(p), x))
(3.24b)
gep is easily the most complex function in all of ILR. We will reason about gep as a variadic function, but in reality, for any program p, a largest number n can be found so that the variadic gep can be replaced by a sufficiently large series of gep functions with arities 1 to n.
The gep function takes as arguments a base pointer and a sequence of indices. A gep with an empty list of indices is a NOOP and returns the base pointer itself. For a gep with at least one index argument, the first one indexes the pointer itself. A gep with two or more index arguments is only valid, if the base pointer’s sort is an aggregate sort. The second index operand indexes this array or structure sort.
Validity is extended analogously to gep functions with more index arguments.
gepaP1,P1(p) =p (3.25a)
gepaN*,I,P
2(p, i) = addN*,I(p, mulI(i, (allocwidth
a
N)I)) (3.25b)
gepaA*,I,I,N *(p, i1, i2) = addN*,I(bitcastA*,N *(
addA*,I(p, mulI(i1, (allocwidthaA)I))),
mulI(i2, (allocwidthaN)I)) (3.25c)
gepaS*,I,I,N*(p, i1, i2) = addN*,I(bitcastS*,N *(
addS*,I(p, mulI(i1, (allocwidthaS)I))),
(offsetaS,i
2)I) (3.25d)
Just like a large getelementptr is split into multiple smaller ones in section 3.4, this can also be done with its ILR counterpart gep:
∀p, i1, . . . , in gepaP,I1,I2,I3,...,In,P2(p, i1, i2, i3, . . . , in) =
gepaP1,I,I3,...,In,P2(gepaP,I1,I2,P1(p, i1, i2), (0)I, i3, . . . , in) (3.26)