P ACKET C APTURES

Data Sources

I am starting the discussion of data sources at the very bottom of the network stack. A network packet is physically received by the network interface. From there, it is passed to the operating system. The network driver in the operating system is responsible for decoding the information and extracting the link-layer headers (e.g., the Ethernet header). From there, the packet is analyzed layer by layer to pass it up in the network stack from one protocol handler to the next. Packet captures are recorded right when the network packet is handed from the network interface to the operating system.

There are multiple benefits to this approach as opposed to intercepting a packet higher up in the network stack. First of all, we get all the data the host sees. There is no layer in the path that could filter or discard anything. We get the complete packet,

PACKETCAPTURES

127.0.0.1

192.168.0.1 AfterGlow 1.6.0

6 _{AfterGlow (http://afterglow.sourceforge.net) ships with such a parser; it can be found in the parser direc-} tory and is called tcpdump2csv.pl.

including the entire payload. The biggest disadvantage is that no higher-layer intelligence is applied to the traffic to interpret it. This means that we will not know, based on looking at the network traffic, how the destined application interpreted the packet. We can only make an educated guess. The second disadvantage is that the amount of data can be very large, especially if we collect traffic at a chokepoint in the network.

Various tools can be used to collect network traffic. The two most common ones are

Wireshark7_and_tcpdump_.8_{These tools listen on the network interface and display the}

traffic. Both tools take the raw network traffic and analyze the entire packet to decode the individual network protocols. They then display the individual header options and fields in a more or less human-readable form rather than the original binary format. Wireshark provides a graphical user interface to explore the network packets. It also ships with a command-line tool called tshark. Tcpdump is a command-line-only tool. Although Wireshark protocol-decoding capabilities are slightly better than the ones of tcpdump, I find myself using tcpdump more often.

Commonly, network traffic needs to be recorded for later analysis. The most common format for packet captures is the PCAP format. Most sniffers (or network traffic analysis tools) can read this binary format.

When using tcpdump, remember a couple of important things:

•

Change the default capture length from 68 to 0. This will capture the entire packet and not just the first 68 bytes, which is generally enough to read the Ethernet, IP, and TCP/UDP headers. A lot of times you want more than just that. The command to execute is tcpdump -s 0.

•

Disable name resolution to make the capture faster. The parameter to use is tcpdump -nn. This will turn off host, as well as port resolution.

•

Make your output nonbuffered. This means that tcpdump will output the data on the console as soon as network traffic is recorded, instead of waiting for its internal buffer to fill up. This can be done by running tcpdump -l.

What is the actual data contained in packet captures that is of interest for visualization and analysis? The following list shows the typical types of information that you can extract from packet captures and their meaning:

•

Timestamp 1 : The time the packet was recorded.

•

IP addresses 2 : The addresses show the communication endpoints that generated the traffic.

7_{www.wireshark.org} 8_{www.tcpdump.org}

•

Ports 3:: Network ports help identify what service is used on the network.

•

TCP flags 4:: The flags can be used to verify what stage a connection is in. Often, looking at the combination of flags can identify simple attacks on the transport layer.

•

Ethernet addresses 5:: Ethernet addresses reveal the setup of the local network.

•

Packet size 6:: Packet size indicates the total size of the packet that was transmitted. A sample packet capture, recorded with tcpdump, looks like this:

1 18:57:35.926420 5 00:0f:1f:57:f9:ef > 5 00:50:f2:cd:ce:04, ethertype IPv4 (0x0800), length 6 62: 2 192.168.2.38. 3 445 >

2 192.168.2.37. 3 4467: 4 S 2672924111:2672924111(0) 4 ack 1052151846 win 64240 <mss 1460,nop,nop,sackOK>

Sometimes it is interesting to dig deeper into the packets and extract some other fields. This is especially true when analyzing higher-level protocols inside the TCP packets. You could potentially even extract user names. Wireshark, for example, extracts user names from instant messenger traffic. Be careful when you are doing your visualizations based on network traffic. You will run into the source/destination confusion that I mentioned earlier.

For traffic analysis, I tend to use tshark rather than tcpdump because of its more advanced protocol analysis. There are a fair number of application layer protocols, such as instant messenger protocols, which are interpreted, as you can see in this sample capture:

0.000000 192.168.0.3 -> 255.255.255.255 UDP Source port: 4905 Destination port: 4905

1.561313 192.168.0.12 -> 207.46.108.72 MSNMS XFR 13 SB 1.595912 207.46.108.72 -> 192.168.0.12 MSNMS XFR 13 SB 207.46.27.163:1863 CKI 11999922.22226123.33471199

1.596378 192.168.0.12 -> 207.46.108.72 TCP 51830 > 1863 [ACK] Seq=11 Ack=62 Win=65535 Len=0 TSV=614503137 TSER=8828236

1.968203 192.168.0.12 -> 207.46.27.163 TCP 52055 > 1863 [SYN] Seq=0 Len=0 MSS=1460 WS=3 TSV=614503140 TSER=0

2.003898 207.46.27.163 -> 192.168.0.12 TCP 1863 > 52055 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0 TSV=0 TSER=0

2.003980 192.168.0.12 -> 207.46.27.163 TCP 52055 > 1863 [ACK] Seq=1 Ack=1 Win=524280 Len=0 TSV=614503141 TSER=0

2.004403 192.168.0.12 -> 207.46.27.163 MSNMS USR 1 [email protected]

1111111111.77777777.6666699

2.992735 192.168.0.12 -> 207.46.27.163 MSNMS [TCP Retransmission] USR 1

[email protected] 1111111111.77777777.6666699

As you can see in this example, tshark explicitly calls out the users communicating over instant messenger. Tcpdump does not contain that level of information.

Network captures prove useful for a lot of network-level analysis. However, as soon as applications need to be analyzed, the packet captures do not provide the necessary application logic to reproduce application behavior. In those cases, you should consider other sources.

T

RAFFIC

F

LOWS

One layer above packet captures in the data stack we find traffic flows. By moving up the stack, we lose some of the information that was available on lower levels. Traffic flows are captured on routers or switches, which operate at Layer 4 (transport layer) in the network stack. This means that we don’t have any application layer information available. It is primarily routers that record traffic flows. However, sometimes people set up hosts to do the same. The different router vendors designed their own protocol or format to record traffic flows. Cisco calls it NetFlow,9_{the IETF task force standardized the concept}

of NetFlow as IPFIX,10_{and yet another version of traffic flows is sFlow,}11_{a.k.a. RFC}

3176, and the version of traffic flows that Juniper supports is called cFlow.12_{All the for-}

mats are fairly similar. They mainly differ in the transport used to get the flows from the

9 _{www.cisco.com/go/netflow}

10 _{www.ietf.org/html.charters/ipfix-charter.html} 11 _{www.sflow.org}

routers to a central collector. Any one of the flow protocols can be used to collect traffic information and analyze it. Traffic flows record the following attributes:

•

Timestamp 1 : The time the flow was recorded.

•

IP addresses 2: The addresses representing the endpoints of the observed communications.

•

Ports 3: Network ports help identify the services that were used in the observed communications.

•

Layer 3 protocol 4: The protocol used on the network layer. Generally, this will be IP.

•

Class of service:The priority assigned to the flow.

•

Network interfaces 5: The network interfaces that the traffic enters, respectively leaves the device.

•

Autonomous systems (ASes):In some cases, the AS13_{of the endpoints of the observed}

communication can be recorded.

•

Next hop: The Layer 3 address of the next hop to which the traffic is forwarded.

•

Number of bytes 6 and packets 7:: The size and number of Layer 3 packets in the flow.

•

TCP flags:Accumulation of TCP flags observed in the flow.

Routers can generally be instrumented to record most of this information. It is not always possible to collect all of it. Mainly it is the AS, the next hop, and the TCP flags that are not always available. A sample record, collected with nfdump (see the sidebar for more information about nfdump), looks like this:

1 2005-10-22 23:02:53.967 0.000 4 TCP 2 10.0.0.2: 3 40060 0>

2 10.0.0.1: 3 23 7 1 6 60 1 5 0 5 1

You might realize that there is some interesting new information that we did not get from packet captures, namely the AS, the next hop of a packet’s path through the network, and the number of packets in a flow. Each router decides, based on its routing table, what interface and next hop to forward a packet to. This information can prove useful when you are trying to plot network traffic paths and to understand the general network topology.

In document Applied Security Visualization pdf (Page 52-56)