5.4 Transmission of Ethernet Frames over Extoll
5.4.5 Multicast Routing
In its most generic form, multicast networking is a type of group communication, where a data transmission is addressed to a group of network nodes simultaneously in a one-to-many or many-to-many fashion. IP multicast [155] is the IP-specific
5 RDMA-Accelerated TCP/IP Communication
implementation of the multicast networking paradigm. It enables the sending of IP datagrams to a group of interested receivers by using specially reserved multi- cast address blocks in IPv4 and IPv6. Broadcasting is a special case of multicast networking, which distributes a message in a one-to-all manner.
5.4.5.1 Technical Overview
IP multicast is a real-time communication technique for sending IP datagrams in a one-to-many or many-to-many distribution over an IP infrastructure. In general, a packet is sent only once and the network nodes (typically network switches and routers) replicate and forward the packet to reach multiple receivers. In order to send and receive multicast messages, senders and receivers use IP multicast group addresses. While senders use the group address as the IP destination address, receivers use the IP multicast group address to notify the network they are interested in receiving packets from the respective multicast group. Typically, receivers join a group by utilizing the Internet Group Management Protocol (IGMP). After joining a group, a multicast distribution tree is constructed.
As explained in the previous section, unicast packets are delivered to a target node by setting a specific Ethernet MAC address. Broadcasts are delivered by using the broadcast MAC address, which is FF:FF:FF:FF:FF:FF. For IPv4, IP multicast packets are delivered by using the reserved MAC address range between 01:00:5E:00:00:00 and 01:00:5E:7F:FF:FF. Note, the multicast bit is set in the first octet of the MAC addresses. In case of IPv6 multicast packets, the Ethernet MAC address is derived by taking the four low-order octets of the IPv6 address and performing a bitwise OR with the MAC address 33:33:00:00:00:00. For example, FF02:DEAD:BEEF::1:3 would be mapped to the MAC address 33:33:00:01:00:03. 5.4.5.2 Extoll Multicast Groups and Routing
The VELO functional unit is currently the only unit that can issue multicast messages from software side. When assembling the software descriptor for a VELO packet, the multicast bit must be set to 1. The target node ID is then interpreted as the multicast group ID. The minimal data granularity of the Extoll network protocol is a
cell, which is 64 bits in size. As mentioned before, the first cell of an Extoll network
packet is the SOP cell. Its payload contains information about the packet, including the multicast bit, routing information (adaptive/deterministic), the traffic class, the node ID (16 bits), the target functional unit, and the VPID. The node ID is split into two segments, which results from dividing the Extoll cluster into N segments, 126
each M nodes large. Otherwise, the routing tables would need 65,536 entry RAMs, which is too large to handle. Currently, N is set to be 64 and M is set to be 1024. The table for the segments is called global routing table, the table for the nodes inside these segments local routing table.
The Extoll on-chip network switch (crossbar) connects the functional units through link ports with the network ports of the NIC, and provides hardware support for efficient multicast networking, especially for broadcasts. The routing in the network layer relies on table-based routing. Each crossbar inport has its own global, local, and multicast routing tables, but the entries in these tables are the same over all crossbar inports. The multicast routing table can distinguish up to 64 Extoll multicast groups and provides information about where to forward the multicast packet to. In general, the routing tables are written to the Extoll register file for every crossbar inport during network configuration time, when the EMP initializes the network.
5.4.5.3 IP Multicasts and Broadcasts over Extoll
IP multicast routing requires a network interface to be configured to listen to all link layer IP multicast group addresses. For an Ethernet interface, this is achieved by turning on the promiscuous multicast mode on the interface. When the promiscuous mode is enabled, the MAC filtering is disabled on the interface and all packets received are sent to the CPU regardless of the destination address of the packet.
The Extoll NIC is not an Ethernet NIC and utilizes the Extoll network protocol to transmit packets. Therefore, it does not provide any hardware support for the promiscuous multicast mode. Extoll can only route packets according to their network descriptors and corresponding routing table entries. Consequently, all multicast and broadcast support comprises of the correct configuration of the routing tables on every Extoll NIC and a software layer, which identifies IP multicast group and broadcast addresses and encapsulates them in corresponding VELO packets with the multicast bit set. By default, the EMP configures multicast group ID 63 on every Extoll NIC to broadcast a packet sent to this ID to every node in the Extoll network. On the software side, the software identifies a broadcast based on the matching broadcast Ethernet MAC address and encapsulate the corresponding Ethernet frame in VELO multicast messages. Besides from the broadcast, EXT-Eth currently does not support IP multicast. In theory, it is possible to define multiple Extoll multicast groups, but the software would need to keep track of group members and facilitate the routing tables accordingly.
5 RDMA-Accelerated TCP/IP Communication
Mes sage Slot
Receive DMA Ring (VELO) Extoll NIC VELO RMA EXN Interrupt Handler Ex to ll N IC Soft IRQ net_rx_action Hardware Interrupt Highe r Layer Proces sing
Packet (skb) Refill alloc_skb()
Ring Buffer de v->poll() 1 2 3 4 5 6 7 8 9 ... ... ... Packet de scrip tor ne tif_rx_ schedule()
Rais ed soft IRQ
check Poll q ueu e (p er CPU )
C o p y p a y lo a d to s k b 1a 1b 2 3 4 5a 5b 6 5a 5a In it ia te RM A G ET
Figure 5.15: Path of an incoming packet in NAPI mode.