Interactive Connectivity Establishment - WebRTC: APIs and RTCWEB Protocols of the HTML5 Real-Ti

Interactive Connectivity Establishment, or ICE, is a standardized protocol for hole punching. It uses STUN and TURN to help endpoints establish connectivity. The basic steps in ICE are shown in Figure 3.10.

1) Gather Candidate Transport Addresses 2) Exchange Candidates over Signaling Channel

3) Perform Connectivity Checks

4) Choose Selected Pair and Begin Media 5) Send Keepalives

If either side detects a change in IP address in use,

Figure 3.10 High Level ICE Call Flow The following sections will cover these steps. 3.4.1 Gather Candidate Transport Addresses

The first step is to gather candidate transport addresses. Candidates addresses are an IP address and port where media might be able to be received for a Peer Connection. These addresses must be gathered at the time of the call – they cannot be gathered ahead of time in many cases. In the example of Figure 3.10, ICE Agent A begins gathering candidate addresses as soon as the user at A initiates a Peer Connection with B. ICE Agent B begins gathering candidate addresses as soon as the Peer Connection request from A is received in the signaling channel.

There are four types of address candidates, shown in Table 3.1. A host candidate address is an address obtained through the operating system and represents an actual address on a network interface card (NIC). If the ICE Agent is behind a NAT, this address will be a private IP address and not be routable outside the subnet. The next two candidate addresses are known as reflexive addresses since they represent addresses that are reflected back to the ICE Agent by a STUN check, as if the client is looking in a mirror through the STUN server to learn their actual IP addresses. A server reflexive candidate is an address learned from a response to a STUN check sent to a STUN server. If the ICE Agent is behind a NAT, this address will be the outside address of the outermost NAT. That is, there could be multiple layers of NAT between the ICE Agent and the STUN server, but this check only allows discovery of the last NAT before the STUN server.

Table 3.1 ICE Candidate Address Types

A peer reflexive candidate is an address learned from a received STUN check sent by the other ICE Agent (a peer). This type of candidate address is not exchanged over the signaling channel but is discovered during the STUN connectivity checks of Step 3.

A relayed candidate is an address of a media relay. Usually this is obtained using the TURN protocol. The transport address obtained using a TURN allocate request is a relayed candidate.

Browsers are configured with the STUN and TURN servers used in this gathering candidates step. This is done using STUN URIs [draft- nandakumar-rtcweb-stun-uri] and TURN URIs [draft-petithuguenin- behave-turn-uris] in the ICE Servers object. Note that access to a TURN server means having STUN server functionality as well.

It is important to note that just learning public IP address candidates using STUN is not enough on its own to traverse the NATs. NATs are complicated and vary widely in operation between networks and service providers. As a result, the full functionality of ICE is needed to ensure NAT traversal.

3.4.2 Exchange of Candidates

The second step is the exchange of candidate addresses over the signaling channel. Candidates are exchanged between the browsers over

the signaling channel, as described in Chapter 4. The candidates are first ordered or prioritized. In general, the highest priority are host candidates, followed by reflexive addresses, followed lastly by relayed candidates. If there is a preference between IPv4 and IPv6, this can be expressed by different priority settings. Candidates are associated with a particular media stream in SDP. The default behavior with WebRTC is to multiplex all media, including voice, video, and data, over the same transport address. As such, a single set of candidates is all that is needed. 3.4.3 STUN Connectivity Checks

The ICE Agents begin connectivity checks as soon as they have sent and received the candidates. In figure 3.10, for Agent A, this is when the SDP answer is received from Agent B. For Agent B, this is when the SDP answer is sent to Agent A. During this phase, the ICE Agents generate STUN responses to any STUN connectivity requests they receive from their peer that pass authentication.

The first step is to pair candidates based on IP address type (IPv4 or IPv6) and other factors. The purpose of pairing and generating foundations for each pair is to reduce the number of connectivity checks performed to minimize the time needed to obtain a working candidate.

Peer reflexive candidate addresses can be discovered during this step and are automatically paired as they are discovered. There are five possible states of connectivity states, as shown in Figure 3.11.

Queued candidate pairs start in the “frozen” state (a joke on the name of the protocol) – a holding state until the checks are ready to be performed. When the ICE connectivity check algorithm determines that a check should be performed, it is “unfrozen” and moves to the “waiting” state. A pair could stay in the waiting state due to pacing considerations for the checks, so that a flood of packets is not sent at once. When the pacing allows for the check to be made, the state moves to “in-progress” when the STUN connectivity check is sent to the other peer. If a response comes back, the state moves to “succeeded”, while if the check times out without a response, it moves to “failed”.

Figure 3.11 ICE Connectivity Check State Machine

There is an optimization of ICE known as Trickle ICE (see Section 8.5.1) where instead of all of the candidates being provided at the start of ICE processing, ICE is started with a minimal set, and additional candidates are added, or trickled in, as processing continues. These new candidates are paired and queued, and go through the same steps as Figure 3.11.

3.4.4 Choose Selected Pair and Begin Media

The connectivity checks continue until either all possible checks have completed (all have moved from the “frozen” state to either “succeeded” or “failed”) or one pair has been chosen. Choosing a pair is done by the controlling ICE Agent. The ICE protocol has an algorithm to choose which browser is the Controlling ICE Agent and which is the Controlled ICE Agent. The Controlled ICE Agent learns that the other ICE Agent has chosen a candidate pair when a STUN connectivity check is received with an attribute indicating that this pair is to be used. The Controlled ICE Agent then replies to the connectivity check echoing that the pair will be used. Media is now sent by both browsers using the chosen candidate pair.

To ensure that NAT mappings and filter rules do not time out during the media session, ICE continues to send STUN connectivity checks at 15 second intervals over the candidate pairs in use. This ensures that packets are sent, even when media is on hold or otherwise not being sent. If the media session is still active, the other ICE Agent generates a STUN response. The receipt of this STUN response by the other ICE Agent is taken as an indication that media can continue to be sent. If the STUN response is not received, ICE restarts, per the next section. Note that this behavior is not defined in the original ICE protocol specification, but is defined in the ICE extension of Section 8.5.5

3.4.6 ICE Restart

An ICE restart is triggered if either ICE Agent detects a change in the base transport address. Recall that the base address the transport address which was used to generate the candidate pair which is in use. This will cause the ICE Agent to go back to Step 1, gather candidates and send those candidates in an SDP offer to the other ICE Agent. That will cause the other ICE Agent to also go back to Step 1 and the whole process will repeat.

Note that this will occur if a browser page involved in an active Peer Connection is reloaded by a user. This is sometimes described as “rehydration,” and is an active area of discussion in the standards on how to best handle this situation.

In document WebRTC: APIs and RTCWEB Protocols of the HTML5 Real-Time Web. Alan B. Johnston. Daniel C. Burnett. Second Edition June C: Digital Codex LLC ØØØ1 (Page 56-61)