In a study of arithmetic circuits we see that an integer adder could be made from 32 adders of individual bits. Each of these would give as output both the one bit sum of the two inputs and the “carry” bit sent to the next adder. This reminds us that each bit adder actually has three inputs: the two input bits to be added together and the carry bit from the lower order adder. The 0-bit, however, has no lower order adder attached so when subtracting rather than adding one can use that input wire to complete the final step of adding one to the 1’s complement.
B.6
Bitwise Operations
In addition to arithmetic operations, almost all computers include some operations which act on the individual bits in bytes. The logical functions AND, OR, XOR and NOT used in C/C++/Java are one type of example. We will use the AND operation with a predetermined set of bits on (equal to one) to pick out the values of those certain bits from a byte or group of bytes. Thus, 0x000000FF, when used as one of the two 32-bit operands to AND, will give a result equal to exactly what is in the lowest order eight bits of the other operand. We call a pattern such as 0x000000FF a mask when used in this way.
In a similar way, we may use the OR operation to turn on certain bits. For example, the mask 0x80000000 would turn on the highest order bit of the other 32-bit operand of OR giving us a negative number no matter what the sign was originally. (Note that in 2’s complement notation it would not be the negative of the original number.) Another group of bitwise operations consists of the shift operations. Again, almost all computers have such operations. There are two types of shifts: to the right and to the left. Let us consider the right shifts first (as applied to 32-bit operands). The srl(Shift Right Logical) operation usually takes as one of its operands the count of how far each bit is shifted to the right while zeros are inserted on the left end. Bits that are shifted off the end to the right are lost (fall into the “bit bucket” is sometimes said). For example, a series of one bit right shifts of 0x01010101 would give 0x00808080, then 0x00404040, then 0x00202020, then 0x00101010, then 0x00080808, etc. It is an exercise to contrast these right shifts with division by two.
A problem arises if we consider the result of starting with 0x80808080. Then the very first right shift would give us 0x40404040. That’s fine if we are only considering the bits but not if we look at the values as two’s complement integers! The first is negative while the second is positive! This shift is no longer the same as division by two as mentioned above. Because of this, there is another right shift operationsra(Shift Right Arithmetic). There is no difference if the high-order bit is zero (a positive number) but when the high-order bit is one (a negative number), ones are inserted on the left instead of zeros. For example, the 0x80808080 would give 0xC0404040, then 0xE0202020, then 0xF0101010, then 0xF8080808, etc. It is an exercise to contrast these arithmetic right shifts with division by two.
B. Integers
For left shifts, there is no choice but to bring in zeros from the right and so there is no difference between sll (Shift Left Logical) and sla (Shift Left Arithmetic) and there may or may not be two different mnemonics for this operation. It is an exercise to contrast this left shift with multiplication by two. Note, however, that there may be overflow just as with actual multiplication by two! Thus 0x40404040 would shift into 0x80808080 and change from positive to negative. In some machines such a change is noted in some way and can be checked for in software; in others, it is ignored.
Projects
1. Prove that the two’s complement of the two’s complement is the original number (Law of Double Negation).
2. Explain why the sum of any positive 2’s complement integer and any negative 2’s complement integer gives the correct value (no overflow).
3. Justify all the comments regarding shift operations. In particular, compare them to multiplying or dividing by two.
4. There are also “rotate” operations which shift left or right but, instead of losing the bits which go off one side, have the bits appear on the other side. They are
C
Matrix Multiplication (R.F.I.)
In Chapter 14 we saw the basic elements of the VFPv2, the floating point subarchitecture of the ARMv6. In this Appendix, I (R.F.I.) will implement a floating point matrix multiply using the VFPv2.
Disclaimer: I advise you against using the code in this Appendix in commercial-grade projects unless you fully review it for both correctness and precision.
C.1
Matrix multiply
Given two vectors vand wof rank r where
v=< v0, v1,· · ·, vr−1 > and w=< w0, w1,· · ·, wr−1 >, we define the dot product of vby was the scalar
v•w=v0×w0+v1×w1+· · ·+vr−1×wr−1.
We can multiply a matrixAof nrows andm columns (n×m) by a matrixB of mrows and pcolumns (m×p). The result is a matrix ofn rows and pcolumns (n×p). Matrix multiplication may seem complicated but actually it is not. Every element in the result matrix it is just the dot product (defined in the paragraph above) of the corresponding row of the matrix A by the corresponding column of the matrix B (that is why there must be as many columns in A as there are rows in B). (See Figure C-1 following.) A straightforward implementation of the matrix multiplication in C is as follows.
float A[N][M]; // N rows of M columns each row float B[M][P]; // M rows of P columns each row // Result
float C[N][P]; // N rows of P columns each row
for (int i = 0; i < N; i++) // for each row of the result {
for (int j = 0; j < P; j++) // and for each column {
C. Matrix Multiplication (R.F.I.)
C[i][j] = 0; // Initialize to zero
// Now calculate the dot product of the row by the column for (int k = 0; k < M; k++)
C[i][j] += A[i][k] * B[k][j]; }
}
Figure C-1
In order to simplify the example, we will assume that both matrices AandB are square matrices of size N ×N. This simplifies the algorithm just a bit.
float A[N][N]; float B[N][N]; // Result float C[N][N];
for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { C[i][j] = 0; for (int k = 0; k < N; k++) C[i][j] += A[i][k] * B[k][j]; } }
Matrix multiplication is an important operation used in many areas. For instance, in computer graphics it is usually performed on 3×3 and 4×4 matrices representing 3D geometry. So we will try to make a reasonably fast version of it (we do not aim at getting the best one, though).