• No results found

Mid-dialog requests

Once a dialog is established, many situations will require some control information to be transmitted in the middle of the call. For a real-time communication application, the most frequent uses of mid-dialog requests are:

• Transmission of DTMF information. • Renegotiation of media streams. • Redirection of media streams.

3.3.2.3.1 Transmission of DTMF and flash-hook information

Dual-Tone Multi-Frequency (DTMF) signals are those generated by modern analogue phones when you press one of the keys. Older rotary phones generate a series of small

interruptions in the current loop through the phone, corresponding to the digit dialing. Such small interruptions are called flash-hook. They are also frequently used to control some class 5 features of the phone line, such as three-way calling.

The original SIP specification was focused on PC-based IP telephony and had over- simplified its specification for the transmission of DTMF and flash-hook signals for complex real-world telephony applications. Of course, the problem emerged quickly: without DTMF, you cannot call your answering machine, a prepaid telephony service, and most call centers, as these information systems frequently use DTMF tones to get information from you.

3.3.2.3.1.1 The issues

(a) Telephony signals and low-bitrate voice coders

Low-bitrate voice coders (in practice anything lower than 32 kbit/s) usually cannot reliably transport DTMF tones. The reason is that these tones are composed of a mix of two pure frequencies that are almost impossible to find in the human voice. Many low-bitrate voice coders work by modeling a set of basic human speech components and transmitting only the model parameters on the other side, making it impossible to reproduce exactly pure frequencies. For this reason most of these coders also degrade music significantly, as they are designed for voice only.

DTMF signals that have been encoded and decoded using such low-bitrate coders will not be accurately recognized by DTMF-driven automatic systems (e.g., it will be almost impossible to enter a credit card number using DTMF, since at least one of the 16 or more digits will be misinterpreted).

Obviously, flash-hook, which is not a sound, is not transported using traditional voice- coding systems.

(b) DTMF-driven call control services

The key advantage of VoIP over all other telephony techniques is the ability to control a phone call without ever being in the voice path, allowing the building of softswitches, as opposed to the traditional telephony systems which require a dedicated hardware-based switching matrix to route the media stream.

As an example, a traditional prepaid card system would connect the call from caller A, establish the media connection with A to get the PIN code and desired destination B for the call, then it would call B and continue to relay the media stream for the entire duration of the call (Figure 3.17). Some systems may optimize this slightly by using intelligent network commands to instruct an intelligent network telephony switch in the path of the call, the Service Switching Function (SSF), to make the call to B, but this SSF device will also route the media streams between A and B for the entire duration of the call. In such a traditional telephony system, two media streams of 64 kbit/s are established though the call control function for the entire duration of the call.

A properly designed VoIP network-based prepaid telephony server would establish a media stream with caller A only during the initial phase of the service, in order to get the PIN code, and destination of the call. The server would then call B but instruct A and B to exchange media streams directly over the IP network (see ‘Redirection of media streams’ in Section 3.3.2.3.2.5). It is no longer in the media path of the call.

Caller A Callee B Caller A Callee B

Caller A Callee B Caller A Callee B

Traditional TDM system VoIP system

Prepaid server Prepaid server Prepaid server Signalling media stream After call

connects connectsAfter call

Prepaid server

Figure 3.17 VoIP avoids media tromboning: example of a prepaid application.

The issue is that many such DTMF-driven call control services will still need to receive DTMF information, even when the media stream is passing directly between caller A and callee B. In most prepaid telephony services it is possible to stop the current call by pressing the ‘#’ key, and get the opportunity to make another call to a new destination C without having to re-enter the PIN code. This requires DTMF information to be available on the call control link.

3.3.2.3.1.2 RFC 2833

A quick fix to the first issue (i.e., the transmission of DTMF and other events for com- munications using low-bitrate coders) was presented in RFC 2833 (RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals), published in May 2000. RFC 2833 requires edge media devices to implement DTMF detection algorithms for all the media streams they generate. It is trivial for an IP phone because it obviously receives information about which keypad key is pressed, but for VoIP gateways connected to the PSTN this requires the implementation of DTMF detection algorithms in the G.711 stream received from the PSTN.

The idea is to send the DTMF information in the RTP stream as a named event, not as an audio-encoded signal. If the resulting RTP stream is received at another PSTN gateway, the PSTN gateway has enough information to regenerate the DTMF information as a waveform. The transmission of DTMF events in the RTP stream, with the same sequence number and timestamp reference as the rest of the RTP stream, allows perfect synchronization of the DTMF and media information, and avoids the possible duplication

of DTMF signals at a VoIP gateway (one received directly from the media stream, the other received later in RFC 2833 encoded form).

Figure 3.18 shows the format of a telephony event encoded in an RTP packet. Such events should be generated as soon as a tone of more than 50 ms is detected. Each tone packet is sent three times for redundancy purposes, with the RTP sequence number incremented, while all other fields remain identical. Very short tones can be encoded in a single packet (by setting the ‘end’ bit). Longer tones can be sent either by continuously sending tone packets with a shorter duration until the tone stops, or by forming two packets: one signaling the beginning of the tone, one signaling the end of the tone. This prevents the sender from having to wait until the end of the tone to send the tone packet, which would obviously involve an unacceptable delay.

The volume is a value in negative dBm0 (e.g., the value 20 denotes a volume of −20 dBm0). The possible range is between 0 dBm0 and −63 dBm0, but values lower than −55 dBm0 should be rejected. The counter can encode durations up to 8 s if the timestamp unit is 1/8000 s, which is more than enough for most uses. A DTMF tone should always be longer than 40 ms in order to be properly recognized by in-band detectors.

As SIP uses the Session Description Protocol (SDP) to declare which type of media encoding, it was necessary to add a SDP payload format to declare which types of events a receiver can understand.

The following ‘m’ line can be used for receiving telephone events:

m=audio 44143 RTP/AVP 110

a=rtpmap:110 telephone-events/8000 a=recvonly

Marker bit (beginning of a new event) Timestamp of the beginning of the event Event code End bit Reserved Volume (dBm0) Duration (timestamp unit) RTP header RFC 2833 payload

Dynamic payload type 0 to 9 0 .. 9

∗ 10

# 11

A to D 12..15 Flash 16

In addition the fmtp specifier can be used to detail which events can be received. The format is:

a=fmtp:<format> <list of values>

For instance, a receiver understanding all the events in the Figure except A, B, C, D, with dynamic payload type 100, would declare it using:

a=fmtp:100 0-11,16

In fact, all implementations are required to handle event 0 to 15, so the fmtp line is optional.

RFC 2833 also describes another format where tones are sent as a series of frequency, amplitude modulation, volume, and duration parameters.

One of the advantages of signaling DTMF information as an event is that all waveform analysis would then be performed by edge devices, making IP-based interactive voice response servers much easier to implement.

RFC 2833 is an interesting discussion of telephony signals in a VoIP network, which does solve the problem of transmitting DTMF and other signals in simple class 4 VoIP networks (these networks only route phone calls, without performing any complex ser- vice). It is a comprehensive reference to all types of tones and signals found on current networks, including DTMF, modem and fax tones, special information tones, etc.

RFC 2833 can also in principle solve the more complex problem of DTMF and call control, because any intermediary proxy can add its own ‘m=’ line requesting receipt of telephony events at a specific IP address, in which case a ‘c=’ line must be present in the media section of the SDP right after the ‘m=’ line,13while all other media streams are directed to the target user agent. However, many SIP user agent implementations have overlooked this requirement, and are unable to send media and telephony events to differ- ent destinations, let alone duplicate the events for transmission to multiple destinations. Right now in practice, it is not possible to implement a reliable prepaid system with RFC 2833 without routing the RTP stream.

RFC 2833 also caused some confusion as it leaves the implementer free to transmit DTMF simultaneously through the media stream and in event-encoded form, or to mute the media stream for the duration of the DTMF signal sent in event form. Only the latter is safe, as in many complex call flows, where the synchronization information may be lost, the simultaneous transmission of RTP in the regular media stream and using an event may cause duplicates.

Note that in H.323 it is mandatory to transmit all DTMF information out-of-band using the H.245 signaling channel. The audio signal is muted for the duration of the event transmitted out of band. However, some vendors can disable this mechanism and use RFC 2833 instead. This was introduced to allow some interworking between H.323 networks and SIP networks implementing RFC 2833 and lacking any signaling channel DTMF transmission function. This is not a good solution, however, and the use of one of the

13For servers which can handle multiple simultaneous calls, this also requires the allocation of a

methods described in Section 3.3.2.3.1.3 for signaling channel DTMF tone transmission in SIP networks is recommended.

The current weaknesses of RFC 2833 could be addressed by a more formal specifica- tion of how SIP should handle telephony events in complex call control applications, how events should be duplicated and sent to multiple destinations, and when the DTMF tones should be removed from the audio stream. Another issue is feature overlap of SIP appli- cation servers, as RFC 2833 possibly enables several call control devices on the signaling path to request telephony events, which may cause several of them to take incompatible actions simultaneously. This clarification work will probably be done in future revisions of SIP, but for now most vendors who face these issues have decided to solve the problem by using new SIP messages, as described below.

3.3.2.3.1.3 Alternatives to RFC 2833

VoIP is a technology breakthrough for the design of value-added services for telephone networks. The possibility of controlling calls without routing the media stream greatly enhances the density and scalability of application servers. It also decreases the cost of such servers, as many functions now do not require specialized telephony hardware and can run on standard computer platforms. Last but not least, the services are cheaper to operate, because the application servers no longer need to be located close to end-users, and therefore most services can be implemented using a single point of presence.

The implications of this paradigm shift are just beginning to be fully understood, and as expected many VoIP devices have been designed with the old TDM model in mind, assuming all servers that control the call are also controlling the media stream.

A properly designed VoIP edge device (gateway, IP phone) should be able to send all information that is possibly of interest to an application server over the signaling link, because only this link is guaranteed to reach all application servers. This is mandatory in H.323, using the H.245 channel.

Unfortunately, at the time of writing, there was no agreed standard way of doing this in SIP. Most network VoIP gateway vendors faced the problem and solved it using their own methods, some of which are described below. Interestingly, most SIP phones seem to be using only RFC 2833, most of the time without the ability to create separate UDP connections for telephony events, or sometimes even send the DTMF tones in- band without any form of coding. Unfortunately, this is a showstopper to any attempt to implement large scale class 5 services using such SIP phones, as many services (interactive voice response, prepaid, call centers, etc.) need access to the DTMF information and would require the application servers to route the media streams.

The methods used by VoIP gateway vendors to send events on the signaling link in SIP roughly fall in two categories:

• The use of the new INFO mid-dialog request defined in RFC 2976 to carry the telephone event. Some vendors use one of the MIME types defined by RFC 2833 (audio/telephone event and audio/tone MIME types), other vendors use encodings derived from H.323 or MGCP.

Unfortunately, since there is no common agreement of the exact encodings, there is no interoperability between the various SIP implementations, and most existing SIP networks use a single gateway vendor to overcome this problem. Some proxies are capable of understanding several formats and convert between them. The following sections describe the encoding used by some popular SIP gateway vendors.

(a) Cisco

Cisco uses a combination of the general INFO message defined in RFC 2976 (other meth- ods are available as well, e.g., RFC 2833), and the SUBSCRIBE/NOTIFY method. Cisco implemented DTMF transport according to an Internet draft (draft-mahy-sip-signaled- digits-00, ‘Signaled Digits in SIP’.

A SIP device can instruct a Cisco gateway to send a DTMF tone by sending it an INFO message formatted as follows:

INFO sip:[email protected] SIP/2.0 Via: SIP/2.0/UDP 192.168.0.1 From: [email protected] To: [email protected] Call-ID: [email protected] CSeq: 20 INFO Content-Type: application/dtmf-relay Content-Length: 22 Signal=9 Duration=250

The duration is in milliseconds. The gateway will confirm the receipt of this indication by responding to the SIP INFO message with a 200 OK response:

SIP/2.0 200 OK Via: SIP/2.0/UDP 192.168.0.1 From: [email protected] To: [email protected] Call-ID: [email protected] CSeq: 20 INFO

In order to be notified of DTMF events from a Cisco gateway, a SIP application must first request to receive the DTMF events using the SUBSCRIBE mechanism. The advantage of using the SUBSCRIBE mechanism is that any SIP application server, in the case the call is processed by a chain of proxies, can request DTMF notification at any time during a call. The problem is that many application servers can react simultaneously and take incompatible actions:

SUBSCRIBE sip:[email protected] SIP/2.0 Via: SIP/2.0/UDP 213.56.166.173:5060

From: <sip:[email protected];user=phone> To: <sip:[email protected]> Call-ID: 6CD8C67B-C0A011D3-806DB047- [email protected] CSeq: 1 SUBSCRIBE Contact: <sip:[email protected]> Expires: 3600 Events: telephone-event;duration=2000 User-Agent: NetCentrex IN Stack

Content-Length: 0 SIP/2.0 200 OK

Via: SIP/2.0/UDP 213.56.166.173:5060

From: <sip:[email protected];user=phone> To: <sip:[email protected]>;tag=A1E07C4-694 Date: Sun, 02 Jan 2000 23:08:57 GMT

Call-ID: 6CD8C67B-C0A011D3-806DB047- [email protected] Server: Cisco-SIPGateway/IOS-12.x Content-Length: 0 CSeq: 1 SUBSCRIBE Expires: 3600 Contact: <sip:[email protected]:5060;user=phone>

If one of the requested events is received from the gateway, a SIP NOTIFY message with a representation of the signaled digits is sent to the requesting application server. In the following sample, the ‘9’ key is pressed:14

NOTIFY sip:[email protected]:5060 SIP/2.0 Via: SIP/2.0/UDP 192.168.110.239:5060

From: <sip:[email protected]>;tag=A1E07C4-694 To: <sip:[email protected];user=phone>

Date: Sun, 02 Jan 2000 23:08:57 GMT Call-ID: 6CD8C67B-C0A011D3-806DB047- [email protected] User-Agent: Cisco-SIPGateway/IOS-12.x Max-Forwards: 6 Timestamp: 946854581 CSeq: 102 NOTIFY Event: telephone-event;rate=1000 Contact: <sip:[email protected]:5060;user=phone> Content-Length: 10

Content-Type: audio/telephone-event 0x0980010E SIP/2.0 200 OK Via: SIP/2.0/UDP 192.168.110.239:5060 From: <sip:[email protected]>;tag=A1E07C4-694 To: <sip:[email protected];user=phone> Call-ID: 6CD8C67B-C0A011D3-806DB047- [email protected] CSeq: 102 NOTIFY

Server: NetCentrex IN Stack Content-Length: 0

(b) Nuera

Nuera also uses SIP INFO messages, encapsulating an MGCP-like syntax. A message body containing MGCP event information will be formatted as follows:

Content-Type: application/mgcp-event Content-Length: <length of payload> <MGCP event information>

An application requiring DTMF out-of-band information must request it using an MGCP notification request, embedded in an INFO message:

INFO sip:10.0.0.157 SIP/2.0 Via: SIP/2.0/UDP 10.0.0.168:5060 Route: NUERA-ID<sip:216.188.94.117> From: 1003<sip:[email protected]> To: NUERA- ID<sip:216.188.94.117;user=phone>;tag=216.188.94. 117-eg101483118153 Call-ID: tac12320020227093301525205- [email protected] CSeq: 1 INFO

User-Agent: NetCentrex IN Stack Content-Type: application/mgcp-event Content-Length: 15

R: [0-9∗#](N) SIP/2.0 200 OK

Record-Route: <sip:10.0.0.157;maddr=10.0.0.157> From: 1003<sip:[email protected]> To: NUERA- ID<sip:216.188.94.117;user=phone>;tag=216.188.94. 117-eg101483118153 Call-ID: tac12320020227093301525205- [email protected] CSeq: 1 INFO Content-Length: 0

If one of the requested events is received from the gateway, a SIP INFO message with an MGCP event message body containing this observed event is sent to the application server. In the following sample, the ‘∗’ key is pressed:

INFO sip:[email protected];maddr=10.0.0.157 SIP/2.0 Route: <sip:[email protected]> To: "1003" <sip:[email protected]> From: "NUERA-ID" <sip:216.188.94.117;user=phone>;tag=216.188.94. 117-eg101483118153 Via: SIP/2.0/UDP 216.188.94.117:5060 Via: SIP/2.0/UDP 216.188.94.117:5061 Call-ID: tac12320020227093301525205- [email protected] CSeq: 2 INFO

Content-Type: application/mgcp-event; version=1.0 Content-Transfer-Encoding: text Content-Length: 5 O:∗ SIP/2.0 200 OK Via: SIP/2.0/UDP 216.188.94.117:5060 Via: SIP/2.0/UDP 216.188.94.117:5061 From: NUERA-ID<sip:216.188.94.117;user=phone>;tag=216.188.94. 117-eg101483118153 To: 1003<sip:[email protected]> Call-ID: tac12320020227093301525205- [email protected] CSeq: 2 INFO

Server: NetCentrex IN Stack Content-Length: 0

(c) Sonus

Sonus offers two mechanisms: DTMF relay and DTMF trigger.

In DTMF relay, a mechanism similar to the signal and signal-update methods from H.245 is used for precise control of DTMF detection and generation. The signal parame- ter indicates the detected DTMF tone, the duration parameter indicates the total duration of the tone if known or an initial estimate of the tone duration, and signal-update sub- sequently updates the estimate of the total duration. The Content-Type header is set to ‘application/dtmf-relay’. In the following example a DTMF is pressed for 250 ms:

INFO sip:[email protected] SIP/2.0 Via: SIP/2.0/UDP 192.168.0.1 From: [email protected] To: [email protected] Call-ID: [email protected] CSeq: 20 INFO Content-Type: application/dtmf-relay Content-Length: 22 Signal=9 Duration=250

The server is expected to confirm the receipt of this indication by responding to the SIP