TMS320C55x Instruction Set - Introduction to TMS320C55x Digital Signal Processor

Introduction to TMS320C55x Digital Signal Processor

2.6 TMS320C55x Instruction Set

We briefly introduced the TMS320C55x instructions and assembly syntax expression in Section 2.3.5. In this section, we will introduce more useful instructions for DSP applications. In general, we can divide the C55x instruction set into four categories:

arithmetic instructions, logic and bit manipulation instructions, load and store (move) instructions, and program flow control instructions.

2.6.1 Arithmetic Instructions

Instructions used to perform addition (ADD), subtraction (SUB), and multiplication (MPY) are arithmetic instructions. Most arithmetic operations can be executed conditionally.

The combination of these basic arithmetic operations produces another powerful subset of instructions such as the multiply±accumulation (MAC) and multiply±subtraction (MAS) instructions. The C55x also supports extended precision arithmetic such as add-with-carry, subtract-with-borrow, signed/signed, signed/unsigned, and unsigned/

unsigned arithmetic instructions. In the following example, the multiplication instruc-tion, mpym, multiplies the data pointed by AR1 and CDP, and the multiplication product is stored in the accumulator AC0. After the multiplication, both pointers (AR1 and CDP) are updated.

Example 2.10: Instruction

mpym *AR1, *CDP , AC0

AC0 FF FFFF FF00 AC0 00 0000 0020

FRC 0 FRC 0

AR1 02E0 AR1 02E1

CDP 0400 CDP 03FF

Data memory Data memory

0x2E0 0002 0x2E0 0002

0x400 0010 0x400 0010

Before instruction After instruction

In the next example, the macmr40 instruction uses AR1 and AR2 as data pointers and performs multiplication±accumulation. At the same time, the instruction also carries out the following operations:

1. The key word `r' produces a rounded result in the high portion of the accumulator AC3. After rounding, the lower portion of AC3(15:0) is cleared.

TMS320C55X INSTRUCTION SET 63

2. 40-bit overflow detection is enabled by the key word `40'. If overflow is detected, the result in accumulator AC3 will be saturated to its 40-bit maximum value.

3. The option `T3 *AR1' loads the data pointed at by AR1 into the temporary register T3 for later use.

4. Finally, AR1 and AR2 are incremented by one to point to the next data location in memory space.

Example 2.11: Instruction

macmr40 T3 *AR1, *AR2, AC3

AC3 00 0000 0020 AC3 00 235B 0000

FRC 1 FRC 1

T3 FFF0 T3 3456

AR1 0200 AR1 0201

AR2 0380 AR2 0381

Data memory Data memory

0x200 3456 0x200 3456

0x380 5678 0x380 5678

Before instruction After instruction

2.6.2 Logic and Bits Manipulation Instructions

Logic operation instructions such as AND, OR, NOT, and XOR (exclusive-OR) on data values are widely used in program decision-making and execution flow control. They are also found in many applications such as error correction coding in data commu-nications. For example, the instruction and #0xf, AC0 clears all upper bits in the accumulator AC0 but the four least significant bits.

Example 2.12: Instruction and #0xf, AC0

AC0 00 1234 5678 AC0 00 0000 0008

Before instruction After instruction

The bit manipulation instructions act on an individual bit or a pair of bits of a register or data memory. These types of instructions consist of bit clear, bit set, and bit test to a specified bit (or a pair of bits). Similar to logic operations, the bit manipulation instructions are often used with logic operations in supporting decision-making pro-cesses. In the following example, the bit clear instruction clears the carry bit (bit 11) of the status register ST0.

Example 2.13: Instruction bclr #11, ST0

ST0 0800 ST0 0000

Before instruction After instruction

2.6.3 Move Instruction

The move instruction is used to copy data values between registers, memory locations, register to memory, or memory to register. For example, to initialize the upper portion of the 32-bit accumulator AC1 with a constant and zero out the lower portion of the AC1, we can use the instruction mov #k16, AC1, where the constant k is first shifted left by 16-bit and then loaded into the upper portion of the accumulator AC1(31:16) and the lower portion of the accumulator AC1(15:0) is zero filled. The 16-bit constant that follows the # can be any signed number.

Example 2.14: Instruction mov #516, AC1

AC1 00 0011 0800 AC1 00 0005 0000

Before instruction After instruction

Amore complicated instruction completes the following several operations in one clock cycle:

Example 2.15: Instruction

mov uns(rnd(HI(satuate(AC0T2)))), *AR1

AC0 00 0FAB 8678 AC0 00 0FAB 8678

AR1 0x100 AR1 0x101

T2 0x2 T2 0x2

Data memory Data memory

0x100 1234 0x100 3EAE

Before instruction After instruction

1. The unsigned data content in AC0 is shifted to the left according to the content in the temporary register T2.

2. The upper portion of the AC0(31:16) is rounded.

3. The data value in AC0 may be saturated if the left-shift or the rounding process causes the result in AC0 to overflow.

TMS320C55X INSTRUCTION SET 65

4. The final result after left shifting, rounding, and maybe saturation, is stored into the data memory pointed at by the pointer AR1.

5. Pointer AR1 is automatically incremented by 1.

2.6.4 Program FlowControl Instructions

The program flow control instructions are used to control the execution flow of the program, including branching (B), subroutine call (CALL), loop operation (RPTB), return to caller (RET), etc. All these instructions can be either conditionally or uncondi-tionally executed. For example,

callcc my_routine, TC1

is the conditional instruction that will call the subroutine my_routine only if the test control bit TC1 of the status register ST0 is set. Conditional branch (BCC) and condi-tional return (RETCC) can be used to control the program flow according to certain conditions.

The conditional execution instruction, xcc, can be implemented in either condi-tional execution or partial condicondi-tional execution. In the following example, the conditional execution instruction tests the TC1 bit. If TC1 is set, the instruction, mov *AR1, AC0, will be executed, and both AC0 and AR1 are updated. If the condition is false, AC0 and AR1 will not be changed. Conditional execution instruction xcc allows for the conditional execution of one instruction or two paralleled instruc-tions. The label is used for readability, especially when two parallel instructions are used.

Example 2.16: Instruction

xcc label, TC1 mov *AR1, AC0 label

TC1 1 TC1 0

AC0 00 0000 0000 AC0 00 0000 55AA AC0 00 0000 0000 AC0 00 0000 0000

AR1 0x100 AR1 0x101 AR1 0x100 AR1 0x100

Data memory Data memory Data memory Data memory

0x100 55AA 0x100 55AA 0x100 55AA 0x100 55AA

Before instruction After instruction Before instruction After instruction In addition to conditional execution, the C55x also provides the capability of partially conditional execution of an instruction. An example of partial conditional execution is given as follows:

Example 2.17: Instruction xccpart label, TC1 mov *AR1, AC0 label

TC1 1 TC1 0

AC0 00 0000 0000 AC0 00 0000 55AA AC0 00 0000 0000 AC0 00 0000 0000

AR1 0x100 AR1 0x101 AR1 0x100 AR1 0x101

Data memory Data memory Data memory Data memory

0x100 55AA 0x100 55AA 0x100 55AA 0x100 55AA

Before instruction After instruction Before instruction After instruction When the condition is true, both AR1 and AC0 will be updated. However, if the condition is false, the execution phase of the pipeline will not be carried out. Since the first operand (the address pointer AR1) is updated in the read phase of the pipeline, AR1 will be updated whether or not the condition is true, while the accumulator AC0 will remain unchanged at the execution phase. That is, the instruction is only partially executed.

Many real-time DSP applications require repeated executions of some instructions such as filtering processes. These arithmetic operations may be located inside nested loops. If the number of data processing instructions in the inner loop is small, the percentage of overhead for loop control may be very high. The loop control instruc-tions, such as testing and updating the loop counter(s), pointer(s), and branches back to the beginning of the loop to execute the loop again, impose a heavy overhead for the processor. To minimize the loop overhead, the C55x includes built-in hardware for zero-overhead loop operations.

The single-repeat instruction (RPT) repeats the following single-cycle instruction or two single-cycle instructions that are executed in parallel. For example,

rpt #N 1 ; Repeat next instruction N times instruction_A

The number, N 1, is loaded into the single-repeat counter (RPTC) by the RPT instruction. The following instruction_A will be executed N times.

The block-repeat instruction (RPTB) forms a loop that repeats a block of instructions.

It supports a nested loop with an inner loop being placed inside an outer loop. Block-repeat registers use block-Block-repeat counters BRC0 and BRC1. For example,

mov #N 1, BRC0 ; Repeat outer loop N times mov #M 1, BRC1 ; Repeat inner loop M times rptb outloop-1 ; Repeat outer loop up to outloop mpy *AR1, *CDP, AC0

mpy *AR2, *CDP, AC1

rptb inloop-1 ; Repeat inner loop up to inloop mac *AR1, *CDP, AC0

mac *AR2, *CDP, AC1

inloop ; End of inner loop

TMS320C55X INSTRUCTION SET 67

mov AC0, *AR3 ; Save result in AC0 mov AC1, *AR4 ; Save result in AC1

outloop ; End of outer loop

The above example uses two repeat instructions to control a nested repetitive oper-ation. The block-repeat structure

rptb label_name-1 (more instructions . . . ) label_name

executes a block of instructions between the rptb instruction and the end label label_name. The maximum number of instructions that can be used inside a block-repeat loop is limited to 64 Kbytes of code. Because of the pipeline scheme, the minimum cycles within a block-repeat loop are two. The maximum number of times that a loop can be repeated is limited to 65 536 ( 2¹⁶) because of the 16-bit block-repeat counters.

In document pdf DSP - Real Time Digital Signal Processing (Page 76-81)