CAM and TCAM. TCAM programming

(1)

CAM and TCAM

When accessing a conventional RAM (Random Access Memory), a memory address has to be specified in order to obtain the value stored at that address. On the other hand, when accessing a CAM (Content Addressable Memory), a key has to be specified in order to obtain the memory address where that key, if exists, is stored.

The advantage of key searching via CAM, instead of implementing a RAM search algorithm, lies in the fact that CAM leverages a specialized hardware that executes the access and search process in a fixed and very small amount of time, lasting few nanoseconds. The drawbacks, when comparing CAM to conventional RAM, include high costs, greater electric power consumption and, consequently, heat production.

CAM normally allows only two states, 0 (false) and 1 (true), and in that case it is referred to as a binary CAM. It is normally used to store data to be searched for in 'exact match' mode, such as values in a MAC address table.

Ternary Content Addressable Memory (TCAM), a more complex kind of CAM, supports, instead, the access search process with three possible states, 0 (false), 1 (true) and X (don’t care). The ‘X’ state is implemented through the addition of a bitmask, thus allowing a TCAM to store, for example, the pattern “110XX010“, which equals to each of 4 keywords “11000010“, “11001010“, “11010010“ and “11011010”.

Due to its flexible structure a TCAM can deal with complex data according to the VMR (Value, Mask, Result) model, where the Value refers to the pattern that can be searched through a key, the Mask is a bitmask associated with the pattern and the Result is what is returned if the lookup is successful.

CAMs and TCAMs are implemented in many contexts, such as in microprocessor’s TLBs (Translation Lookaside Buffers), in n-way associative caches, in different kinds of specialized ASICs (Application Specific Integrated Circuits) existing in data compressors and Intrusion Prevention Systems and in high-performance switches and routers, to enable line-rate L2/L3 packet forwarding and the management of security ACL and QoS at wire-speed.

TCAM programming

A TCAM can be programmed in many ways but a very common one, used daily by network administrators, is via CLI (Command Line Interface).

For example, if we insert the following security ACL in a Cisco Catalyst 3750-X switch

(2)

and we extract the content of the TCAM connected therewith, using the command

sh platform tcam table acl

we can begin to analyse what has been returned.

============================================================================= Hardware Merge Table:

Index Hardware Merge Entry Map(SM HM) ARD

--- mask-> 000F1FFF_00800000_00000000-00000000_00000000_FFFFFF00

6/2025 00060001_00800000_00000000-00000000_00000000_AC1FFF00 mask-> FFFFFF00_C0000000_FFFFFFFF

6/2025 0A0A6400_80000000_0D3D0457 00 02 02 03000000

At first we can immediately identify some key elements, index, mask and value. The index represents the position of the ACL in the TCAM, the mask marks the match areas ("care" and "don't care") for a "pattern match" lookup, and the value represents the bits composing the ACL according to the TCAM hardware structure.

If we try to colour evenly some parts of the textual ACL and their corresponding representations in the form pattern/mask in the TCAM:

permit tcp 10.10.100.0 0.0.0.255 eq 1111 172.31.255.0 0.0.0.255 eq 3389 mask-> 000F1FFF_00800000_00000000-00000000_00000000_FFFFFF00 6/2025 00060001_00800000_00000000-00000000_00000000_AC1FFF00 mask-> FFFFFF00_C0000000_FFFFFFFF 6/2025 0A0A6400_80000000_0D3D0457 00 02 02 03000000

• Mask/value for source IP address FFFFFF00 -> 255.255.255.0

0A0A6400 -> 10.10.100.0

• Mask/Value for destination IP address

FFFFFF00 -> 255.255.255.0 AC1FFF00 -> 172.31.255.0

• Mask/Value for source port FFFF -> 255.255

0457 -> 1111

• Mask/Value for destination port FFFF -> 255.255

(3)

we can see that the mask associated with source and destination ports has 16/16 bit set to 1, while the mask associated with IP addresses has 24/32 bits set to 1. Bearing in mind that the netmasks of ACLs are expressed in inverse bit notation in IOS, we can see how the bits set to 0 in the mask apply the effect 'X', also known as "don’t care", to the corresponding bit of the field value. Basically, during "pattern match" searches, only the "care" bits are taken into account, those with the corresponding mask bit set to 1. In this way, the ACL matches all keywords whose source fields are in the range of IP addresses "10.10.100.X" port 1111 and whose destination fields are in the range of IP addresses "172.31.255.X" port 3389.

Usually the newly inserted values are placed at the bottom of the TCAM but, when programmed values are the same as some of the existing ones and the associated masks are different, their TCAM indexes must be put in top-to-bottom order by the descending mask values. As a result, operations like "first match wins” searches within ACLs or routing tables result both, logically consistent and hardware accelerated.

Some more complex examples

While choosing another example of a TCAM usage, we looked at a particularly interesting case expressing inequality through VMR pattern. To simplify the calculations but not the concept, the example will be based on the comparison between two quantities that can be expressed by 8 bits.

We will show how, given the inequality:

x != y

it will be possible to get a set of m couples (tMask, tValue), with m defined as "TCAM expansion factor", which would represent the inequality in the TCAM. Subsequently, it will be shown how, based on this set, an ASIC (Application-Specific Integrated Circuit) will be able, "at wire-speed", to assess if this inequality is fulfilled by the key input x.

(4)

Case 1: x > y

In order to get the set of the m couples of type tCouple=(tMask, tValue)

that refers to the inequality x > y, an iterative algorithm could be as follows: a. We assign the variables offset=0 and count=0

b. We scroll the bits of y from the less significative to the most significative using offset as a pointer. If a bit of y in offset position equals 0, we skip to step c; the expansion is left aside if offset > 7 (remember that the row size of our sample TCAM is 8 bits)

c. We assign the variable tMask=0 and set its 8 – offset (read: eight minus offset) bits to 1, starting from the most significative to the less significative

d. We assign the variable tValue=y, set its bit in the offset position to 1 and the remaining less significative bits to 0

e. We assign the variable tCount=count and the couple tCoupletCount=(tMask, tValue) is memorized into the TCAM at the tCount

index

f. We increase the count variable and return to point b

Let us see now how to get the "TCAM expansion" of an inequality using the previous algorithm applied to the following example:

x > 34

Binary representation of y=34 is: y=00100010

a. offset=0, count=0

b. y=00100010, bit of y in position 0 is 0

c. tMask = 11111111 (set 8 - 0 bit to 1 from left) d. tValue = 00100011 (y with bit in position 0 set to 1)

e. (11111111, 00100011) is the element tCount=0 to be inserted into TCAM f. count=1 and we return to point b

we continue with another step of expansion: b. y=00100010, bit of y in position 2 equals 0

(5)

follow)

d. tValue = 00100100 (y with bit in position 2 set to 1, the remaining set to 0)

We will try now to understand the meaning of the algorithm, by explaining the steps performed in the previous part.

Let us start from decimal value y=34, in binary y=00100010. At the first iteration, from points b and c we obtain:

00100010 (tValue) 11111111 (tMask)

by focusing the first bit of y that, from right to left, is equal to 0, we can ascertain that if we change its value from 0 to 1, point d, we obtain:

00100011 > 00100010, i.e. 35 > 34, which fulfils the inequality. By expressing the couple:

(tMask,tValue) = (11111111, 00100011) in hexadecimal, we get:

tCoulple0 = (0xFF, 0x23).

At the second iteration, from points b and c we obtain: 00100010

11111100

by focusing the second bit of y that, from right to left, is equal to 0, we can ascertain that if we change its value from 0 to 1, point d, we obtain:

001001** > 001000** for each value of * in [0..1] By expressing the couple:

(6)

in hexadecimal, we get: tCoulple1 = (0xFC, 0x24)

At this point switch's TCAM will be loaded with couples: tCoulple0 = (0xFF, 0x23)

tCoulple1 = (0xFC, 0x24)

We will leave to the reader the pleasure of continuing the iterations and ensure that, upon finishing the operation, the switch's TCAM is loaded, referring to the inequality x > 34, with following values, arranged by descending tMask:

tCouple tMask tValue --- tCoulple0 0xFF 0x23 tCoulple1 0xFC 0x24 tCoulple2 0xF8 0x28 tCoulple3 0xF0 0x30 tCoulple4 0xC0 0x40 tCoulple5 0x80 0x80

By the process used we can also understand that the number of tCouples representing the expansion factor of inequality of majority x > y is equal to the number of bits to 0 shown in y.

Case 2: x < y

In order to get the set of the m couples of type tCouple=(tMask, tValue)

that refers to the inequality x < y, an iterative algorithm could be as follows: a. We assign the variables offset=0 and count=0

b. We scroll the bits of y from the less significative to the most significative using offset as a pointer. If a bit of y in offset position equals 1, we skip to step c; the expansion is left aside if offset > 7 (remember that the row size of our sample TCAM is 8 bits)

c. We assign the variable tMask=0 and set its 8 – offset (read: eight minus offset) bits to 1, starting from the most significative to the less

(7)

significative

d. We assign the variable tValue=y, set its bit in the offset position to 0 and the remaining less significative bits to 0

e. We assign the variable tCount=count and the couple tCoupletCount=(tMask, tValue) is memorized into the TCAM at the tCount

index

f. We increase the count variable and return to point b

Let us see now how to get the "TCAM expansion" of an inequality using the previous algorithm applied to the following example:

x < 34

Binary representation of y=34 is: y=00100010

a. offset=0, count=0

b. y=00100010, bit of y in position 1 is 1

c. tMask = 11111110 (set 8 - 1 bit to 1 from left, one bit set to zero to follow)

d. tValue = 00100000 (y with bit in position 1 set to 0)

At the first iteration, from points b and c we obtain: 00100010

11111110

by focusing the first bit from right equal to 1, we can ascertain that if we change its value from 1 to 0, point d, we obtain:

0010000* < 0010001* for each value of * in [0..1] By expressing the couple:

(tMask,tValue) = (11111110, 00100000) in hexadecimal, we get:

(8)

At this point switch's TCAM will be loaded with couples: tCoulple0 = (0xFE, 0x20)

We will leave to the reader the pleasure of continuing the iterations and ensure that, upon finishing the operation, the switch's TCAM is loaded, referring to the inequality x < 34, with following values, arranged by descending tMask:

tCouple tMask tValue ---

tCoulple0 0xFE 0x20

tCoulple1 0xE0 0x00

By the process used we can also understand that the number of tCouples representing the expansion factor of inequality of minority x < y is equal to the number of bits to 1 shown in y.

Finally, we will show how an ASIC can use this set of ordered couples (tMask, TValue) in the TCAM to see whether a keyword fulfils the chosen inequality. As a matter of fact, by using the following algorithm, which is very simple to be implemented in hardware:

a. Initialize count to 0 and tCoupleNum to the total number of tCouple in the TCAM

b. If count >= tCoupleNum, the inequality is not met c. If x && tMaskcount = tValuecount the inequality is met

d. Increase count and return to point b

inequality checks result to be easy to obtain through ASIC at wire-speed.

Conclusions

The subject of CAM and TCAM is all but finished, and this article represents only an introduction accompanied by examples and inputs for educational purposes. If the reader decides to examine the topic "TCAM utilization", they could find the way Logical Operation Units (LOU), specialized hardware registers to support Layer 4 Operation (L4Op), have been added to the traditional TCAM and implemented, in different ways, in the ASICs (Cisco has a lot of documentation on the subject, particularly referring to the Catalyst 6500 family of modular switches). The "Factor m" expansion for inequality operators like lt, gt, neq, and range is avoided through LOUs. Unfortunately, LOUs themselves are extremely

(9)

limited in number and a lot more expensive to implement than the normal TCAMs. Once this type of resources is used up, the "Factor m" expansion will be inevitable.

To conclude, when the TCAM resources are also used up, the management of the eventually exceeding TCAM’s rows is taken over by the device’s CPU, and it is exactly at that moment that the Admin’s trouble begins!