• No results found

Execution Datapath

In document Intel IXP2800 Network Processor (Page 43-48)

Technical Description 2

1. Read Local Memory memory location pointed to by LM_ADDR

2.3.7 Execution Datapath

The Execution Datapath can take one or two operands, perform an operation, and optionally write back a result. The sources and destinations can be GPRs, Transfer registers, Next Neighbor registers, and Local Memory. The operations are shifts, add/subtract, logicals, multiply, byte align, and find first one bit.

2.3.7.1 Byte Align

The datapath provides a mechanism to move data from source register(s) to any destination register(s) with byte aligning. Byte aligning takes four consecutive bytes from two concatenated values (8 bytes), starting at any of four byte boundaries (0, 1, 2, 3), and based on the endian-type (which is defined in the instruction opcode), as shown in Example 5. The four bytes are taken from two concatenated values. Four bytes are always supplied from a temporary register that always holds the A or B operand from the previous cycle, and the other four bytes from the B or A operand of the Byte Align instruction.

The operation is described below, using the block diagram in Figure 6. The alignment is controlled by the two LSBs of the BYTE_INDEX Local CSR.

Table 6. Align Value and Shift Amount

Align Value (in Byte_Index[1:0])

Right Shift Amount (Number of Bits) (Decimal)

Little-Endian Big-Endian

0 0 32

1 8 24

2 16 16

3 24 8

Example 10 shows a big-endian align sequence of instructions and the value of the various operands. Table 7 shows the data in the registers for this example. The value in

BYTE_INDEX[1:0] CSR (which controls the shift amount) for this example is 2.

Figure 6. Byte-Align Block Diagram

A9353-01

Prev_A

Byte_Index

Result

B_Operand A_Operand

Prev_B

. . . . . .

Shift

Table 7. Register Contents for Example 10

Register Byte 3 [31:24]

Byte 2 [23:16]

Byte 1 [15:8]

Byte 0 [7:0]

0 0 1 2 3

1 4 5 6 7

2 8 9 A B

3 C D E F

Example 10. Big-Endian Align

Instruction Prev B A Operand B Operand Result

Byte_align_be[--, r0] -- -- 0123

--Byte_align_be[dest1, r1] 0123 0123 4567 2345

Byte_align_be[dest2, r2] 4567 4567 89AB 6789

Byte_align_be[dest3, r3] 89AB 89AB CDEF ABCD

NOTE: A Operand comes from Prev_B register during byte_align_be instructions.

Example 11 shows a little-endian sequence of instructions and the value of the various operands.

Table 8 shows the data in the registers for this example. The value in BYTE_INDEX[1:0] CSR (which controls the shift amount) for this example is 2.

As the examples show, byte aligning “n” words takes “n+1” cycles due to the first instruction needed to start the operation.

Another mode of operation is to use the T_INDEX register with post-increment, to select the source registers. T_INDEX operation is described later in this chapter.

2.3.7.2 CAM

The block diagram in Figure 7 is used to explain the CAM operation.

The CAM has 16 entries. Each entry stores a 32-bit value, which can be compared against a source operand by instruction:

CAM_Lookup[dest_reg, source_reg]

All entries are compared in parallel, and the result of the lookup is a 9-bit value that is written into the specified destination register in bits 11:3, with all other bits of the register 0 (the choice of bits 11:3 is explained below). The result can also optionally be written into either of the LM_Addr registers (see below in this section for details).

The 9-bit result consists of four State bits (dest_reg[11:8]), concatenated with a 1-bit Hit/Miss indication (dest_reg[7]), concatenated with 4-bit entry number (dest_reg[6:3]). All other bits of dest_reg are written with 0. Possible results of the lookup are:

miss (0) — lookup value is not in CAM, entry number is Least Recently Used entry (which can be used as a suggested entry to replace), and State bits are 0000.

hit (1) — lookup value is in CAM, entry number is entry that has matched; State bits are the value from the entry that has matched.

Table 8. Register Contents for Example 11

Register Byte 3

[31:24] Byte 2

[23:16] Byte 1

[15:8] Byte 0 [7:0]

0 3 2 1 0

1 7 6 5 4

2 B A 9 8

3 F E D C

Example 11. Little-Endian Align

Instruction A Operand B Operand Prev A Result

Byte_align_le[--, r0] 3210 -- --

--Byte_align_le[dest1, r1] 7654 3210 3210 5432

Byte_align_le[dest2, r2] BA98 7654 7654 9876

Byte_align_le[dest3, r3] FEDC BA98 BA98 DCBA

NOTE: B Operand comes from Prev_A register during byte_align_le instructions.

Note: The State bits are data associated with the entry. The use is only by software. There is no implication of ownership of the entry by any Context. The State bits hardware function is:

the value is set by software (at the time the entry is loaded, or changed in an already loaded entry).

its value is read out on a lookup that hits, and used as part of the status written into the destination register.

its value can be read out separately (normally only used for diagnostic or debug).

The LRU (Least Recently Used) Logic maintains a time-ordered list of CAM entry usage. When an entry is loaded, or matches on a lookup, it is marked as MRU (Most Recently Used). Note that a lookup that misses does not modify the LRU list.

The CAM is loaded by instruction:

CAM_Write[entry_reg, source_reg, state_value]

The value in the register specified by source_reg is put into the Tag field of the entry specified by entry_reg. The value for the State bits of the entry is specified in the instruction as state_value.

Figure 7. CAM Block Diagram

A9354-01

Status and LRU Logic Tag

Lookup Value (from A port)

State Match Match Match

Tag State

Tag State

Tag State Match

Lookup Status (to Dest Req)

State Status Entry Number

0000 Miss 0 LRU Entry State Hit 1 Hit Entry

The value in the State bits for an entry can be written, without modifying the Tag, by instruction:

CAM_Write_State[entry_reg, state_value]

Note: CAM_Write_State does not modify the LRU list.

One possible way to use the result of a lookup is to dispatch to the proper code using instruction:

jump[register, label#],defer [3]

where the register holds the result of the lookup. The State bits can be used to differentiate cases where the data associated with the CAM entry is in flight, or is pending a change, etc. Because the lookup result was loaded into bits[11:3] of the destination register, the jump destinations are spaced eight instructions apart. This is a balance between giving enough space for many applications to complete their task without having to jump to another region, versus consuming too much Control Store. Another way to use the lookup result is to branch on just the hit miss bit, and use the entry number as a base pointer into a block of Local Memory.

When enabled, the CAM lookup result is loaded into Local_Addr as follows:

LM_Addr[5:0] = 0 ([1:0] are read-only bits) LM_Addr[9:6] = lookup result [6:3] (entry number) LM_Addr[11:10] = constant specified in instruction

This function is useful when the CAM is used as a cache, and each entry is associated with a block of data in Local Memory. Note that the latency from when CAM_Lookup executes until the LM_Addr is loaded is the same as when LM_Addr is written by a Local_CSR_Wr instruction.

The Tag and State bits for a given entry can be read by instructions:

CAM_Read_Tag[dest_reg, entry_reg]

CAM_Read_State[dest_reg, entry_reg]

The Tag value and State bits value for the specified entry is written into the destination register, respectively for the two instructions (the State bits are placed into bits [11:8] of dest_reg, with all other bits 0). Reading the tag is useful in the case where an entry needs to be evicted to make room for a new value—the lookup of the new value results in a miss, with the LRU entry number returned as a result of the miss. The CAM_Read_Tag instruction can then be used to find the value that was stored in that entry. An alternative would be to keep the tag value in a GPR. These two instructions can also be used by debug and diagnostic software. Neither of these modify the state of the LRU pointer.

Note: The following rules must be adhered to when using the CAM.

CAM is not reset by Microengine reset. Software must either do a CAM_clear prior to using the CAM to initialize the LRU and clear the tags to 0, or explicitly write all entries with CAM_write.

No two tags can be written to have same value. If this rule is violated, the result of a lookup that matches that value will be unpredictable, and LRU state is unpredictable.

The value 0x00000000 can be used as a valid lookup value. However, note that CAM_clear instruction puts 0x00000000 into all tags. To avoid violating rule 2 after doing CAM_clear, it is necessary to write all entries to unique values prior to doing a lookup of 0x00000000.

An algorithm for debug software to find out the contents of the CAM is shown in Example 12.

The CAM can be cleared with CAM_Clear instruction. This instruction writes 0x00000000 simultaneously to all entries tag, clears all the state bits, and puts the LRU into an initial state (where entry 0 is LRU, ..., entry 15 is MRU).

In document Intel IXP2800 Network Processor (Page 43-48)