4.4 In-Line Measurement Technique: The Prototype
4.4.3 Measurement modules as Linux Dynamically Loadable Kernel Modules (LKM)
4.4.3.7 Dealing with Interface and Path Maximum Transfer Unit (MTU) Issues
An inherent part of the in-line measurement technique is the insertion of additional data into appropriately selected IPv6 datagrams in the form of destination extension header options. One of the most fundamental prerequisites for the in-line measurement destination options to be encapsulated within a datagram is for the additional data not to cause the overall packet size to exceed the Maximum Transfer Unit (MTU) of the first-hop link or the (instrumented) path of the packet. If the size of an IPv6 packet exceeds the link or path MTU then it is decomposed down to multiple fragments according to the protocol specification, but there are several reasons why this should be avoided, and it is discouraged for applications that can adjust their packets to (path) MTU [DeHi98]. Packet fragmentation causes additional processing overhead that might influence a system’s networking performance at both the nodes performing the fragmentation and the re-assembly. Moreover, because the probability of losing a given fragment is nonzero, increasing the number of fragments decreases the probability that the overall IPv6 datagram will arrive, and at the same time it also decreases
151 In the case where packets are first checked against the sampling scheme, the parent population w.r.t
sampling is the overall IPv6 network traffic seen by the measurement module, which can be specified using a wildcard filter.
throughput due to the replication of the unfragmentable part (network headers) in every single fragment [Come00, DeHi98, Mill00]. It has also been reported that the presence of fragments has been exploited for several denial-of-service attacks and have thus become of concern for Internet service providers [ClMT98]. All these facts suggest that by creating fragmented packets as a result of the in-line measurement modules’ operation, there is an increasing probability that the instrumented traffic (fragments) will not elicit an identical network response with the rest of the traffic, let alone the overall performance degradation that might be incurred on the nodes involved in the measurement process.
In addition, unlike IPv4, fragmentation in IPv6 is only performed by the originator of the traffic, and not by any nodes along the packet’s delivery path. At the same time, as it has been discussed in section 3.9, the in-line measurement technique can be gratefully deployed to instrument end-to-end as well as intermediate Internet paths, and the prototype implementation has been designed to facilitate both of these operational scenarios. Hence, if measurement instrumentation deployed between two (or more) intermediate nodes along a packet’s delivery path resulted in fragmentation, this would break the end-to-end IPv6 compatibility.
It is recommended that IPv6 nodes implement Path MTU (PMTU) discovery to determine the minimum MTU along a packet’s delivery path, and avoid fragmentation while maintaining high utilisation of the network resources152. This process begins at an originator node which first assumes PMTU is equal to the MTU of the first hop, and then transmits an adequately- sized packet to check whether an ICMPv6 Packet Too Big message will be received. If such a message is not received then the entire path has a minimum MTU equal to the node’s first-hop link MTU. Otherwise, the process is repeated with the originator transmitting a packet of size equal to the (reduced) value returned in the MTU field of the ICMPv6 message. The process continues until no Packet Too Big message is returned, at which time PMTU has been discovered [Mill00]. It is recommended that the PMTU discovery process for a given Internet path should be repeated relatively infrequently153.
In Linux, the discovered PMTU value is set to the corresponding destination cache entry, which is in turn linked to the socket buffer that manages each associated datagram (Figure 4-4) [WePR05]. Each in-line measurement source module, after a packet passes the sampling
152 Minimal IPv6 implementations may omit PMTU discovery and be restricted to sending packets no
larger than 1280 octets, which is the minimum link MTU for IPv6 [DeHi98].
153 A decrease in PMTU can be discovered almost immediately once a large enough packet is sent over
that path. When a PMTU value has not been decreased for sometime (on the order of ten minutes), the PMTU estimate should be set to the MTU of the first-hop link, which will cause the entire PMTU discovery process to take place again [McDM96].
and filtering checks and before it is instrumented, it examines whether the addition of the measurement data would result in the overall packet size exceeding the first-hop link or the path MTU. If either of these conditions evaluates to true, then the measurement options are not created and the packet is forwarded without being instrumented.
An alternative implementation of the in-line measurement technique that would be more integrated with systems’ protocol stacks would normally communicate its space requirements through some kernel variable to the packetisation layers and would enforce space reservation for the in-line measurement options, like it happens for the transport and network layer protocol headers. However, due to the self-contained nature of this particular prototype implementation, as well as due to the place in the final output function154 of the IPv6 instance where a source measurement module operates (sections 4.4.2.2 and 4.4.3.1), such space requirements are not communicated to the kernel, and the source LKM does not instrument packets in case MTU violation and/or fragmentation could be caused. This limitation might not be a problem for the majority of packets in the Internet since it has been reported that almost 75% of packets are smaller than 552 byte-size155. However, the same study showed that over half of the bytes are carried in packets of size 1500 bytes or larger [ClMT98]. This is particularly valid for bulk TCP flows that try to maximise network utilisation and application throughput by sending their data in segments as large as possible without requiring fragmentation along an Internet path, whereas flows that exhibit interactivity properties and/or are sensitive to burst drops and delay variation use small frames.
TCP uses one of its options for both ends of a connection to agree on the maximum segment they can transfer without creating packets whose total size will exceed the link or path MTU. The Maximum Segment Size (MSS) option is negotiated between two end-systems during the TCP connection establishment process using the three-way handshake, as shown in Figure 4-10. Each end-system advertises its MSS value in its first synchronisation (SYN) message, indicating the maximum segment it is willing to accept. Of course, not all segments across a
154 By the time the IP6_OUTPUT_PACKETS hook passes a datagram to the source measurement
LKM, the packetisation layers have already computed the space available to application-level data.
155 Although the nature of Internet traffic is changing due to the introduction and increasing popularity
of new application-traffic flows, such as streaming media, peer-to-peer and online gaming, the figures of packet size distributions seem to persist. In general, there is a slight increase of UDP traffic whose popular packet sizes are on the orders of a few hundred bytes (e.g. media streaming applications have been reported to generate 821 and 825-byte packets) [FrML03, FoKM04].
connection have to be of the same size, however, bulk transfer connections will attempt to perform packetisation to create MSS-sized segments156.
Figure 4-10: Sequence of Messages in TCP three-way Handshake
At the time of development, the advertised MSS of a system was statically computed within the Linux IPv6 instance to equal the first-hop link MTU (or PMTU157) minus the standard size of IPv6 and TCP headers. This could prove restrictive for in-line measurement modules wishing to instrument bulk transfer flows with measurement data, since most of their segments would be of a prohibitive size. For the purposes of the prototype implementation, this issue has been circumvented by configuring the source measurement module to decrease the MSS value advertised in TCP SYN packets, if necessary. If the relevant module parameter is set, the LKM examines the TCP SYN packets that satisfy the sampling and filtering criteria, and it replaces the MSS value found in the transport header options field with one that reserves space for the corresponding in-line measurement header158. This solution obviously breaks the strict layering of the Internet protocols and would not be advisable for commercial
156 Studies of TCP performance over wide area networks have shown that TCP throughput is directly
proportional to the MSS [MaSM97]. This is one of the reasons for which the increase of the MTU sizes in the Internet has been suggested. [Math05, Dyks99]
157 During TCP connection establishment the PMTU Discovery process for the corresponding Internet
path might have not been completed. If the PMTU value is available, then it is used to compute the MSS, since it will always be less than or equal to the first-hop link MTU.
158 A similar mechanism to specify a user-configurable MSS value for TCP connections (mainly to
circumvent PMTU discovery issues) is provided by a Netfilter/iptables module [Hube05]. However, at the time of development, it was only implemented for the IPv4 instance of the Linux kernel.
products. However, the issue of properly reserving space for IPv6 extension headers and options should be considered as IPv6 implementations evolve. Furthermore, altering the TCP MSS value to accommodate space for in-line measurement headers is useful only as long as it is deployed in both directions of a TCP connection. Decreasing the MSS advertised by a system in this manner, would only enforce the other communication end to clamp its sent segments to this value. However, it would not do anything to prevent the system from sending
segments without accommodating space for the insertion of the in-line measurement headers, unless such an adequate MSS value has been advertised by the other end as well. Edge-to- edge deployment of the in-line measurement technique over a network of adequate MTU might circumvent this problem altogether. However, it needs to be guaranteed that upon exit from the network boarder, the measurement headers will be removed from the packet by an appropriate processing entity.