Packet processing flow - On the design of efficient caching systems

5.3 H2C design

5.3.4 Packet processing flow

In this section we describe the packet processing flow of H2C operating as content store for a CCN/NDN router. As already mentioned in Chapter 2, CCN/NDN uses two types of packets: Interest and Data. The first is a request that a client uses to fetch a specific packet-sized chunk of data. The second is the data itself that satisfies the corresponding request. Both packets contain the name of the chunk (i.e., content object name plus packet identifier) they refer to.

First, packets are extracted in batch from NIC’s hardware queues by Network I/O cores. Each Network I/O core that receives the batch of Interest and Data packets dispatches them to different processing cores according to the hash value of their segment identifiers (i.e., bchunk id/N c) where N is the number of chunks of a segment) so that packets belonging to the same segment are always assigned to the same processing core. As already mentioned, this task could be taken care by NICs directly if their RSS functionalities support that or if content object identifier information is encoded in transport layer destination port.

Processing cores receive batches of packets and process them according to their type.

If the received packet is an Interest, a lookup in the segment hash table is performed using the segment identifier hash value already computed. In case the element is not present in the segment hash table, then PIT and FIB are looked up. In case the element is present, the processing core checks whether the packet is stored in the first (DRAM) or in the second (SSD) level cache using the DRAM packet bitmap and the SSD flag. If the packet is stored in DRAM, the corresponding Data packet is prepared and sent back to the requester and the request counter field is increased. The segment entry is also moved to the top of the DRAM replacement queue. If the packet is stored in SSD, the request packet is passed to an SSD I/O core that fetches the entire segment to which the requested chunk belong from the SSD and stores it in free DRAM packet buffers. Once the segment is moved to the first level cache, the SSD I/O core notifies the processing core that creates a new DRAM map entry and prepares the Data packet for transmission. When a chunk is copied from SSD to DRAM, the associated DRAM and SSD segment map entries are moved to the top of the DRAM and SSD replacement queues respectively. Notice that, motivated by the findings of Sec. 5.2.3, segments copied from the second to the first level of the cache are not removed from the SSD.

If the received packet is a Data and it is not already in the first or second level cache of H2C, a segment hash table entry is created and the corresponding DRAM segment map entry is created and inserted at top of the DRAM replacement queue by the target processing core. The data is then sent to the requester.

During those packet processing operations, segment eviction from DRAM or SSD cache may be required to make space available for new chunks received from the network or copied from SSD. When the DRAM cache is full, the segment at the tail of DRAM replacement queue is evaluated. If the value of its associated request counter R is greater than or equal to a predefined threshold value Rmin, the

segment is demoted to the SSD cache and placed at the top of the SSD replacement queue, otherwise it is discarded. As discussed in Sec. 5.2.2, setting a threshold Rmin= 2 (counting the initial insertion as a hit)

is sufficient to achieve a considerable reduction in terms of SSD writes without affecting SSD cache hit ratio. However, as shown in Sec. 5.4.5, a greater threshold value may also be used to further increase read throughput although at the cost of reducing cache hit ratio. Also, it should be noted that a segment is discarded if it is not full, i.e., if not all its chunks have been received. This ensures that space in SSD is well utilised and avoid read and write amplifications. As a side effect though, it penalises content objects smaller than a segment as they will never be inserted in SSD.

When the SSD cache is full, the element at the tail of the SSD replacement queue is removed from the list. It is also removed from the segmented hash table if it is not stored in DRAM replacement queue.

It should be noted that, to reduce processing overhead, an entry of the segment hash table is deleted simply by setting the Active field to 0. Similarly a DRAM segment map is deleted by removing it from the DRAM replacement queue, returning all packet buffers to the pool and setting the DRAM packet bitmap of the associated hash table entry to 0. Finally, an SSD segment map is deleted by removing it from the DRAM replacement queue, returning the segment to the SSD segment pool and setting the SSD flag of the associated hash table entry to 0.

5.4 Performance evaluation

After describing the design of H2C, we now present an evaluation of its performance using trace-driven experiments.

We first investigate SSD performance in isolation and then use the insights gathered to fine-tune SSD configuration parameters and evaluate overall H2C performance. It should be noted that as our focus is on H2C performance, we do not analyse PIT and FIB lookup operations (which our prototype is capable of) that are performed after a content store miss.

In document On the design of efficient caching systems (Page 93-95)