Chapter 4 A real world trace study
4.4 Program output
Originally, TAM had the following flow output: • Start time (seconds)
• Start time (milliseconds) • Stop time (seconds) • Stop time (milliseconds) • Source IP address • Source Port
• Destination IP address • Destination Port
• Number of bytes transferred based on the IP packet length • Number of packets transferred
• IP protocol
• Recognized application for the flow.
After our enhancements in TAM, its output has been appended with the following fields:
• Length of the payload of the packet
• Number of packets in the flow that carry a non zero payload
• Payload of the packet calculated using TCP sequence numbers that is valid only for TCP flows and in other cases zero
• Direction of the flow that is valid only in identified data flows and in other cases zero
• Flow ID that is either the PDP context ID or the Radius ID depend- ing on the interface.
Moreover, in identified flows only, the following three fields are appended to the output:
• IMSI
• APN
• MSISDN
Since the new TAM output requires a new flow structure in TAM that is used only in the Gn and Gi interfaces, the output is displayed only when these interfaces are used.
In order to process the TAM output Perl or Shell scripts were considered. Perl dominated in performance and capabilities, thus it was selected. Gnuplot is used to produce the graphs demonstrated in paragraph 4.6.
Since it is required the output of the program to consist of one flow for both directions, the output of TAM is processed in order to aggregate the two directions to one. The Perl script used to aggregate the unidirectional TAM flows, utilizes the flow ID and direction of each flow in order to aggregate the flows consistently. However, in unidentified flows the direction and flow ID cannot be used. In this case, the flows still consist of bidirectional traffic but the direction of the traffic is not known.
The script first sorts the TAM output and operates similarly to TAM keep- ing the flows to a hash table and flushing the expired flows to a file. In order to detect the expired flows the timeout value used in TAM is required as a script parameter. The script output is also being sorted. Moreover, it is pos- sible for the script to aggregate flows using a larger timeout value from the one used in TAM. That is a quick way to see how the output is affected by the timeout value. This ability of the script was checked by comparing a TAM output of timeout 64 seconds, manually adjusted to 120 seconds by the script to an TAM output of timeout 120 seconds. The results were similar, same number of flows, with some small differences:
• the script cannot examine the TCP sequence numbers so in TCP flows it is expected to miss retransmitted TCP traffic
• internally TAM uses a function in order to keep the application of the two unidirectional flows consistent, while the script does not consider this and as a result the application of some flows may be different26
The script outputs the bidirectional flows in a consistent format, as fol- lows:
(note that the script output is slightly different for unidentified flows) • Start time (seconds)
• Start time (milliseconds) • Stop time (seconds) • Stop time (milliseconds) • in identified flows
◦ user IP address ◦ user Port
◦ server IP address ◦ server Port
◦ Downlink payload length calculated via the IP length ◦ Downlink number of packets
◦ Downlink payload length calculated via the length, of the pay- load of each packet
◦ Downlink number of packets that have non zero payload
◦ Downlink payload length calculated using TCP sequence num- bers
◦ Uplink payload length calculated via the IP length ◦ Uplink number of packets
◦ Uplink payload length calculated via the length, of the payload of each packet
◦ Uplink number of packets that have non zero payload
26 The application field may also be different when using the same timeout value as TAM. The
reason is that a bidirectional flow may consist of more than one unidirectional flows (the flow expired only in one direction in TAM) and one of them may have a different application, e.g. because it was too sort.
◦ Uplink payload length calculated using TCP sequence numbers • in unidentified flows
◦ source IP address of packet that started the flow ◦ source Port of packet that started the flow
◦ destination IP address of packet that started the flow ◦ destination Port of packet that started the flow
◦ Src to Dest payload length calculated via the IP length ◦ Src to Dest number of packets
◦ Src to Dest payload length calculated via the length, of the pay- load of each packet
◦ Src to Dest number of packets that have non zero payload ◦ Src to Dest payload length calculated using TCP sequence num-
bers
◦ Dest to Src payload length calculated via the IP length ◦ Dest to Src number of packets
◦ Dest to Src payload length calculated via the length, of the pay- load of each packet
◦ Dest to Src number of packets that have non zero payload ◦ Dest to Src payload length calculated using TCP sequence num-
bers • IP protocol
• Recognized application for the flow • Flow ID
• IMSI
• APN
• MSISDN
(in unidentified flows the latter three fields are empty)
The flow ID is necessary when processing the output of TAM in order to find the traffic transferred per PDP context or RADIUS session.