Chapter 4 A real world trace study
4.5 Verification – Comparison of programs
4.5.1 CFlow
CFlow was a experimental tool operating on the Gn interface that aggreg- ated packets to flows, supported the reassembly of IP packets, analyzed the GTP protocol and associated the IMSI to the data flows. Control packets were not included in the program output. The program detected the user ap- plication via a payload evaluation approach.
Comparing the results of TAM to CFlow in Trace3 showed that the default timeout setting of 64 seconds in TAM was a main reason of difference. CFlow had a larger timeout value and, as a result, showed less flows than TAM, with a higher per flow duration. Later we were informed of CFlow timeout value, that is 2 minutes. The comparison results below are based on TAM timeout value of 120 seconds.
The program performed similarly to TAM but it had some fundamental differences:
• only counted the packet payload by using the IP length field. • only counted the packet count by counting all packets seen. • CFlow reassembled IP fragments while TAM skipped them.
27 It would be very helpful to compare how the programs react on a special case of some hun-
• supported only TCP and UDP (no ICMP or Other protocols follow- ing IP).
• skipped control traffic.
• did not show the APN and MSISDN fields.
• had many differences in flow application, mainly due to different ap- plication names (e.g. TAM: http, CFlow: web). It is not easy to judge which program was correct here. For more information see Table 5. However, probably CFlow uses port based application recognition together with payload inspection as a flow carrying no payload was classified as email by the program (the flow was on TCP port 993 that is assigned to IMAP). On this case TAM classified the flow as NonPayload.
• was sometimes different in the millisecond time fields for the start and stop times of a flow. Perhaps because the program counted also the time of IP fragments that are skipped by TAM.
• had difference on the payload bytes, probably because TAM skips fragments. In one flow though, CFlow showed 1 packet less than TAM. It is considered as a case of unsuccessful IP fragment reas- sembly due to missing fragments.
• was missing some flows that were present in TAM. Unknown why. • it identified some flows that had traffic before the control packets
could be detected (flows identified from SGSN response messages) so it perhaps maintained additional state information and completed many passes of the data.
• completely missed the identification of a flow. TAM split the flow into two flows to associate the latter flow with IMSI.
• it is not exactly clear how the program displayed the flow informa- tion, particularly the order of the IP addresses and ports and the traffic direction. TAM is strict on how it displays the IP numbers. For unidentified flows the IP that started the transmission of data is kept first, while for identified data the first IP is always the user IP. For traffic statistics, TAM shows the traffic based on which IP ap- pears first for unidentified data and for identified data TAM always shows first the amount of downlink traffic. CFlow is not clear on this and flows differ on this aspect. That is, CFlow does not keep a constant presentation of IP addresses (sometimes user addresses ap- pear first, while sometimes they appear second applying to both identified and not identified flows). As a result it is perhaps difficult to calculate the downlink and uplink traffic of the flow if the two directions are not known.
• keeps the data to one flow even if the flow contains multiple PDP contexts. TAM splits that flow to flows per PDP context.
• TAM has 10000 more identified flows. However, the above case could be the reason why. The difference on identified flows is, con- sidering only data messages with TCP or UDP without the applica- tion and the two millisecond fields, only 40000 flows. The total number of identified flows is about 700000.
• TAM finds 1300 IMSI that are not output in CFlow and the latter finds 140 IMSI that are not output in TAM. In one occasion studied, the programs identify a flow with different IMSI. Examining the ac- tual data pointed out that TAM was correct.
• Sometimes CFlow identifies a flow with 2 IMSI. That occurs when two users communicate to each other. TAM on this case splits the flow to two flows each one dedicated for the traffic of one user. • There are occasions where the IMSI identification between the two
programs is not the same. At least 4 cases were detected. Examina- tion of the actual data showed that TAM had selected the correct IMSI in order to classify the flow. It is unknown why CFlow dis- played another (wrong) IMSI on the flows.
• In one occasion that a user talks to itself (unknown why but it happened), CFlow shows 1 flow where TAM splits the flow in two flows.
• A positive difference for CFlow is that it detects the WAP protocol, something that is not done by TAM because WAP is found mostly on GPRS data and TAM targets application recognition for the Internet.