In addition to passive eavesdropping attacks, RTP is also vulnerable to active attacks. The following attacks describe when an attacker can sniff on the net- work, using something like Wireshark, and then execute active attacks, such as voice injection, against VoIP endpoints supporting RTP. Injection attacks allow malicious entities to inject audio into existing VoIP telephone calls. For example, an attacker could inject an audio file that says “Sell at 118” between two stockbrokers discussing insider trading information.
There are a few ways to inject voice communication between two VoIP endpoints. We’ll discuss two methods, which are audio insertion and audio replacement. Both methods involve manipulation of the timestamp, session information, and SSRC of an RTP packet.
Audio Insertion
The session information between two VoIP endpoints is controlled by a 32-bit signaling source (SSRC) as well as the 16-bit sequence number and timestamp number. The SSRC number is a random number that ensures any two end- points will use different identifiers within the same RTP. Although the like- lihood of collision is low, the SSRC number ensures the uniqueness of the identifier. However, because the session information is sent in cleartext,
Me dia : R T P Se cur it y 83
attackers can view it over the network. Also, because most vendor VoIP pro- ducts do not truly randomize any of the values, the ability to inject RTP packets from a spoofed source is possible. The sequential information allows attackers to predict the values for each state-controlling entity, which opens the door for injection attacks.
NOTE Injection techniques were introduced in a tool called Hunt (available from http:// packetstormsecurity.org/sniffers/hunt/hunt-1.5bin.tgz), which would inject session information to hijack telnet connections.
RTP sessions are also vulnerable to injection attacks because the packets do not use random information for session management, in addition to the problem that the information is sent in cleartext. For example, for a given RTP session, the timestamp usually starts with 0 and increments by the length of the codec content (e.g., 160ms); the sequence starts with 0 and increments by 1; and the SSRC is usually a static value for the session and a function of time. All three of these values are either predictable in nature and/or static. An attacker who is able to sniff the network can create packets with the correct timestamp, sequence, and SSRC information, ensuring that the packet increases appropriately as specified by the current session (usually by one).
Once the attacker has predicted the correct information, he or she will be able to inject packets (audio) into an existing VoIP conversation. The ability to gather the correct information for the timestamp, sequence, and SSRC can be quite easy because all of the information traverses the network in cleartext. An attacker can simply sniff the network, read the required information for the attack, and inject new audio packets. Furthermore, because the informa- tion is not random, a tool can be written to automate the process and thus require little effort on the part of the attacker.
Figure 4-10 shows an example of the RTP injection process. Notice that the attacker’s SSRC number is the same as that of its target, but its sequence number and timestamp are in sync with the legitimate session, making the endpoint assume that the attacker’s packets are part of the real session.
Figure 4-10: RTP injection Attacker 1. Established Session Sonia Kusum RTP Packet RTP Packet 2. Injected RTP
Packets (with atta cker’s Audio) Time: 790462029 Sequence: 6153 SSRC: 909524487 Time: 790462184 Sequence: 6154 SSRC: 909524487 RTP P acket Time: 790462349 Sequence: 6 155 SSRC: 9095244 87 RTP P acket Time: 79046 2509 Sequence: 6156 SSRC: 909524487 RTP Packet Time: 790462669 Sequence: 6157 SSRC: 909524487
84 Cha pt er 4
Complete the following steps to inject an audio file into an existing VoIP conversation.
1. Download RTPInject (written by Zane Lackey and Alex Garbutt) from
http://www.isecpartners.com/tools.html.
2. Follow the Readme.txt file for usage of a Windows machine. For the Linux version, RTPInject depends on the following packages, which are pre- installed on most modern Linux systems, such as Ubuntu, Red Hat, and BackTrack Live CD (must be run with root privileges):
Python 2.4 or higher GTK 2.8 or higher PyGTK 2.8 or higher
3. Install the pypcap library included with RTPInject by using the following commands:
bash# tar zxvf pypcap-1.1.tar.gz
bash# cd pypcap-1.1
bash# make all
bash# make install (*note: this step must be performed as root)
4. Install the dpkt library included with RTPInject by using the following commands:
bash# tar zxvf dpkt-1.6.tar.gz
bash# cd dpkt-1.6
bash# make install
5. Perform a man-in-the-middle attack on the network (if necessary) using dsniff (Linux) or Cain & Abel (Windows), as described earlier in this chapter, in order to capture all RTP streams in the local subnet. 6. Launch RTPInject using the following commands:
bash# python rtpinject.py
7. Once RTPInject is loaded, it will show three fields in its primary screen, including the Source field, the Destination field, and the Voice Codec field. See Figure 4-11 for the details of the injection. The Source field will be auto-populated as RTPInject detects RTP streams on the network. When a new IP address appears in the Source field, click the IP address, which will show the destination VoIP phone and voice codec being used in the stream.
Me dia : R T P Se cur it y 85 Figure 4-11: RTPInject main window
8. RTPInject then automatically transcodes the provided .wav file into the correct codec (because RTPInject displays the voice codec in use, the user could also create the audio file with the proper codec he or she wishes to inject). Using Windows Sound Recorder or Sox for Linux, create an audio file in the file format shown by RTPInject, such as A-Law, u-Law, GSM, G.723, PCM, PCMA, and/or PCMU.
a. Open Windows Sound Recorder (Start Programs Accessories
Entertainment Sound Recorder).
b. Click the Record button, record the audio file, and then click the
Stop button.
c. Select File Save As.
d. Click Change. Under Format, select the codec that was displayed in RTPInject. See Figure 4-12. Both Windows Sound Recorder and Linux Sox audio utilities provide the ability to transcode audio to most of the common codecs used.
Figure 4-12: Windows Sound Recoder codec
e. Click OK and then Save.
Attacker
1. Established Session
Sonia Kusum RTP Packet RTP Packet
2. Injected RTP
Packets (with attacker’s Audio) Time: 790462029 Sequence: 6153 SSRC: 909524487 Time: 790462184 Sequence: 6154 SSRC: 909524487 RTP Packet Time: 790462349 Sequence: 6 155 SSRC: 9095244 87 RTP Packet Time: 79046 2509 Sequence: 6156 SSRC: 909524487 RTP Packet Time: 790462669 Sequence: 6157 SSRC: 909524487
Packets appear old because of attacker’s much higher sequence number and timestamp
86 Cha pt er 4
9. Once this audio file has been created, click the folder button on RTPInject and navigate to the location of the file recorded in Step 6. See Figure 4-13.
Figure 4-13: Select dialog
10. With the RTP stream and audio file selected, click the Inject button. RTPInject injects the selected audio file to the destination host in the RTP stream. See Figure 4-14.
Me dia : R T P Se cur it y 87 Audio Replacement
As mentioned previously, the session information between two VoIP endpoints is controlled by the SSRC, sequence number, and timestamp number. Unlike the audio insertion attack, the audio replacement attack does not inject audio during an existing phone conversation but replaces the existing audio during a call. For example, if two trusted endpoints are holding a phone conversation, an attacker can replace the legitimate audio information with the attacker’s own information. Instead of hearing the communication from either source, the endpoints would be listening to what the attacker chooses. Audio replace- ment would be highly damaging in cases where many endpoints are listening to a single source, such as company conference calls.
In order to replace the existing audio stream, the attacker needs to send RTP packets with a higher sequence number and timestamp, but using the same SSRC information. The target will then see RTP packets with a single SSRC number, one from the legitimate endpoint and one from the attacker. However, when the endpoint sees that the attacker’s packet has a higher timestamp and sequence number, it will assume that the attacker’s packets are the most current and thus continue on with its information. The higher sequence number and timestamp on the attacker’s packets makes the legitimate endpoint’s packet information look old and outdated. Old and outdated packet information would be discarded by the target in favor of the most recent information on the network, which in this case has been provided by the attacker.
This technique allows the attacker’s packet to look current while the endpoint’s packets look old and invalid. As a result, the target receives the packet information from the attacker and plays the rogue audio information, which can be whatever the attacker wishes to play. For this attack to occur, the attacker’s sequence information and session ID information must always be higher than that
of the real endpoint. Figure 4-15 shows an example of the RTP replacement process. Notice that the attacker’s SSRC number is the same as its target, but its sequence number and timestamp are much higher than in the legitimate session. This forces the receiving end- point to assume that the legitimate phone’s packets are old.
88 Cha pt er 4