Introduction to TMS320C55x Digital Signal Processor
2.6 TMS320C55x Instruction Set
We briefly introduced the TMS320C55x instructions and assembly syntax expression in Section 2.3.5. In this section, we will introduce more useful instructions for DSP applications. In general, we can divide the C55x instruction set into four categories:
arithmetic instructions, logic and bit manipulation instructions, load and store (move) instructions, and program flow control instructions.
2.6.1 Arithmetic Instructions
Instructions used to perform addition (ADD), subtraction (SUB), and multiplication (MPY) are arithmetic instructions. Most arithmetic operations can be executed conditionally.
The combination of these basic arithmetic operations produces another powerful subset of instructions such as the multiply±accumulation (MAC) and multiply±subtraction (MAS) instructions. The C55x also supports extended precision arithmetic such as add-with-carry, subtract-with-borrow, signed/signed, signed/unsigned, and unsigned/
unsigned arithmetic instructions. In the following example, the multiplication instruc-tion, mpym, multiplies the data pointed by AR1 and CDP, and the multiplication product is stored in the accumulator AC0. After the multiplication, both pointers (AR1 and CDP) are updated.
Example 2.10: Instruction
mpym *AR1, *CDP , AC0
AC0 FF FFFF FF00 AC0 00 0000 0020
FRC 0 FRC 0
AR1 02E0 AR1 02E1
CDP 0400 CDP 03FF
Data memory Data memory
0x2E0 0002 0x2E0 0002
0x400 0010 0x400 0010
Before instruction After instruction
In the next example, the macmr40 instruction uses AR1 and AR2 as data pointers and performs multiplication±accumulation. At the same time, the instruction also carries out the following operations:
1. The key word `r' produces a rounded result in the high portion of the accumulator AC3. After rounding, the lower portion of AC3(15:0) is cleared.
TMS320C55X INSTRUCTION SET 63
2. 40-bit overflow detection is enabled by the key word `40'. If overflow is detected, the result in accumulator AC3 will be saturated to its 40-bit maximum value.
3. The option `T3 *AR1' loads the data pointed at by AR1 into the temporary register T3 for later use.
4. Finally, AR1 and AR2 are incremented by one to point to the next data location in memory space.
Example 2.11: Instruction
macmr40 T3 *AR1, *AR2, AC3
AC3 00 0000 0020 AC3 00 235B 0000
FRC 1 FRC 1
T3 FFF0 T3 3456
AR1 0200 AR1 0201
AR2 0380 AR2 0381
Data memory Data memory
0x200 3456 0x200 3456
0x380 5678 0x380 5678
Before instruction After instruction
2.6.2 Logic and Bits Manipulation Instructions
Logic operation instructions such as AND, OR, NOT, and XOR (exclusive-OR) on data values are widely used in program decision-making and execution flow control. They are also found in many applications such as error correction coding in data commu-nications. For example, the instruction and #0xf, AC0 clears all upper bits in the accumulator AC0 but the four least significant bits.
Example 2.12: Instruction and #0xf, AC0
AC0 00 1234 5678 AC0 00 0000 0008
Before instruction After instruction
The bit manipulation instructions act on an individual bit or a pair of bits of a register or data memory. These types of instructions consist of bit clear, bit set, and bit test to a specified bit (or a pair of bits). Similar to logic operations, the bit manipulation instructions are often used with logic operations in supporting decision-making pro-cesses. In the following example, the bit clear instruction clears the carry bit (bit 11) of the status register ST0.
Example 2.13: Instruction bclr #11, ST0
ST0 0800 ST0 0000
Before instruction After instruction
2.6.3 Move Instruction
The move instruction is used to copy data values between registers, memory locations, register to memory, or memory to register. For example, to initialize the upper portion of the 32-bit accumulator AC1 with a constant and zero out the lower portion of the AC1, we can use the instruction mov #k16, AC1, where the constant k is first shifted left by 16-bit and then loaded into the upper portion of the accumulator AC1(31:16) and the lower portion of the accumulator AC1(15:0) is zero filled. The 16-bit constant that follows the # can be any signed number.
Example 2.14: Instruction mov #516, AC1
AC1 00 0011 0800 AC1 00 0005 0000
Before instruction After instruction
Amore complicated instruction completes the following several operations in one clock cycle:
Example 2.15: Instruction
mov uns(rnd(HI(satuate(AC0T2)))), *AR1
AC0 00 0FAB 8678 AC0 00 0FAB 8678
AR1 0x100 AR1 0x101
T2 0x2 T2 0x2
Data memory Data memory
0x100 1234 0x100 3EAE
Before instruction After instruction
1. The unsigned data content in AC0 is shifted to the left according to the content in the temporary register T2.
2. The upper portion of the AC0(31:16) is rounded.
3. The data value in AC0 may be saturated if the left-shift or the rounding process causes the result in AC0 to overflow.
TMS320C55X INSTRUCTION SET 65
4. The final result after left shifting, rounding, and maybe saturation, is stored into the data memory pointed at by the pointer AR1.
5. Pointer AR1 is automatically incremented by 1.
2.6.4 Program FlowControl Instructions
The program flow control instructions are used to control the execution flow of the program, including branching (B), subroutine call (CALL), loop operation (RPTB), return to caller (RET), etc. All these instructions can be either conditionally or uncondi-tionally executed. For example,
callcc my_routine, TC1
is the conditional instruction that will call the subroutine my_routine only if the test control bit TC1 of the status register ST0 is set. Conditional branch (BCC) and condi-tional return (RETCC) can be used to control the program flow according to certain conditions.
The conditional execution instruction, xcc, can be implemented in either condi-tional execution or partial condicondi-tional execution. In the following example, the conditional execution instruction tests the TC1 bit. If TC1 is set, the instruction, mov *AR1, AC0, will be executed, and both AC0 and AR1 are updated. If the condition is false, AC0 and AR1 will not be changed. Conditional execution instruction xcc allows for the conditional execution of one instruction or two paralleled instruc-tions. The label is used for readability, especially when two parallel instructions are used.
Example 2.16: Instruction
xcc label, TC1 mov *AR1, AC0 label
TC1 1 TC1 0
AC0 00 0000 0000 AC0 00 0000 55AA AC0 00 0000 0000 AC0 00 0000 0000
AR1 0x100 AR1 0x101 AR1 0x100 AR1 0x100
Data memory Data memory Data memory Data memory
0x100 55AA 0x100 55AA 0x100 55AA 0x100 55AA
Before instruction After instruction Before instruction After instruction In addition to conditional execution, the C55x also provides the capability of partially conditional execution of an instruction. An example of partial conditional execution is given as follows:
Example 2.17: Instruction xccpart label, TC1 mov *AR1, AC0 label
TC1 1 TC1 0
AC0 00 0000 0000 AC0 00 0000 55AA AC0 00 0000 0000 AC0 00 0000 0000
AR1 0x100 AR1 0x101 AR1 0x100 AR1 0x101
Data memory Data memory Data memory Data memory
0x100 55AA 0x100 55AA 0x100 55AA 0x100 55AA
Before instruction After instruction Before instruction After instruction When the condition is true, both AR1 and AC0 will be updated. However, if the condition is false, the execution phase of the pipeline will not be carried out. Since the first operand (the address pointer AR1) is updated in the read phase of the pipeline, AR1 will be updated whether or not the condition is true, while the accumulator AC0 will remain unchanged at the execution phase. That is, the instruction is only partially executed.
Many real-time DSP applications require repeated executions of some instructions such as filtering processes. These arithmetic operations may be located inside nested loops. If the number of data processing instructions in the inner loop is small, the percentage of overhead for loop control may be very high. The loop control instruc-tions, such as testing and updating the loop counter(s), pointer(s), and branches back to the beginning of the loop to execute the loop again, impose a heavy overhead for the processor. To minimize the loop overhead, the C55x includes built-in hardware for zero-overhead loop operations.
The single-repeat instruction (RPT) repeats the following single-cycle instruction or two single-cycle instructions that are executed in parallel. For example,
rpt #N 1 ; Repeat next instruction N times instruction_A
The number, N 1, is loaded into the single-repeat counter (RPTC) by the RPT instruction. The following instruction_A will be executed N times.
The block-repeat instruction (RPTB) forms a loop that repeats a block of instructions.
It supports a nested loop with an inner loop being placed inside an outer loop. Block-repeat registers use block-Block-repeat counters BRC0 and BRC1. For example,
mov #N 1, BRC0 ; Repeat outer loop N times mov #M 1, BRC1 ; Repeat inner loop M times rptb outloop-1 ; Repeat outer loop up to outloop mpy *AR1, *CDP, AC0
mpy *AR2, *CDP, AC1
rptb inloop-1 ; Repeat inner loop up to inloop mac *AR1, *CDP, AC0
mac *AR2, *CDP, AC1
inloop ; End of inner loop
TMS320C55X INSTRUCTION SET 67
mov AC0, *AR3 ; Save result in AC0 mov AC1, *AR4 ; Save result in AC1
outloop ; End of outer loop
The above example uses two repeat instructions to control a nested repetitive oper-ation. The block-repeat structure
rptb label_name-1 (more instructions . . . ) label_name
executes a block of instructions between the rptb instruction and the end label label_name. The maximum number of instructions that can be used inside a block-repeat loop is limited to 64 Kbytes of code. Because of the pipeline scheme, the minimum cycles within a block-repeat loop are two. The maximum number of times that a loop can be repeated is limited to 65 536 ( 216) because of the 16-bit block-repeat counters.